Tika

Skip to end of metadata
Go to start of metadata

Summary

Purpose Detects and extracts metadata and text content from documents.
Homepage
http://tika.apache.org/
Source Code Repository
https://github.com/apache/tika
License
Apache License, Version 2.0
Debian Package  

Description

Java based tool for detecting and extracting metadata and text content from documents.

User Experiences

e.g. links to AQuA/SCAPE/Hackathon issues that use the tool

News Feeds

Release Feed

Link to any RSS feed that is updated when new releases occur, if any, e.g:

rss: javax.net.ssl.SSLException: java.lang.RuntimeException: Could not generate DH keypair

Activity Feed

Link to any RSS feed that is updated when issue or code updates occur, if any, e.g:

Activity Streams
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=hudson" class="activity-item-user activity-item-author">Hudson</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2470"><span class='resolved-link'>TIKA-2470</span> - Another Illegal reflective Access -- more cleanup for Java 9</a>

SUCCESS: Integrated in Jenkins build Tika-trunk #1371 (See https://builds.apache.org/job/Tika-trunk/1371/)
TIKA-2470 – modernize DocumentBuilderFactory security for Java 9 (tallison: https://github.com/apache/tika/commit/0e38f9419121f08117283e1876e8abd02b2ab52f

Read more
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=grossws" class="activity-item-user activity-item-author">Konstantin Gribov</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2470"><span class='resolved-link'>TIKA-2470</span> - Another Illegal reflective Access -- more cleanup for Java 9</a>
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=tallison%40mitre.org" class="activity-item-user activity-item-author">Tim Allison</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2470"><span class='resolved-link'>TIKA-2470</span> - Another Illegal reflective Access -- more cleanup for Java 9</a>
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=tallison%40mitre.org" class="activity-item-user activity-item-author">Tim Allison</a> resolved <a href="https://issues.apache.org/jira/browse/TIKA-2470"><span class='resolved-link'>TIKA-2470</span> - Another Illegal reflective Access -- more cleanup for Java 9</a> as 'Fixed'
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=tallison%40mitre.org" class="activity-item-user activity-item-author">Tim Allison</a> updated the Description of <a href="https://issues.apache.org/jira/browse/TIKA-2470"><span class='resolved-link'>TIKA-2470</span> - Another Illegal reflective Access -- more cleanup for Java 9</a>
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=tallison%40mitre.org" class="activity-item-user activity-item-author">Tim Allison</a> created <a href="https://issues.apache.org/jira/browse/TIKA-2470"><span class='resolved-link'>TIKA-2470</span> - Another Illegal reflective Access -- more cleanup for Java 9</a>
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=githubbot" class="activity-item-user activity-item-author">ASF GitHub Bot</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2400">TIKA-2400 - Standardizing current Object Recognition REST parsers</a>

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140421441

##########
File path:

Read more

Searching for Tika on OPF Labs

Found 9 search result(s) for Tika.

Page: EVAL ARC2WARC-TOMAR with Tika (SCAPE)
EVAL ARC2WARCTOMAR with Tika Evaluator(s) Sven Schlarb <sven.schlarb@onb.ac.at> Evaluation points Assessment of measurable points Metric ...
Mar 10, 2014
Page: SO11 The Tika characterisation Tool (SCAPE)
... versioning, so further work is needed to add versions to other mimetypes. Can Tika be extended to support regexp like Fido? Prototype of Tika (https://github.com/openplanets/tika https://github.com/openplanets/tika) up and running that used regular expressions
Jun 14, 2012
Page: Tika Batch File Identification (SPRUCE)
... investigation and feedback to Apache Tika. \\ \ Some files are only identified as application/octetstream (Tika default).  Needs further investigation and feedback to Tika. \\ \ Some problems with character encoding of metadata returned by Tika causing issues when trying to load JSON output ...
Jun 13, 2012
Labels: spruce, spruce_glasgow, identification, solution
Page: Example - Working with Apache Tika (SCAPE)
... button) # Cloned the fork locally ($ git clone \recursive https://github.com/openplanets/tika) # cd into your local repository ($ cd tika) # Link with the upstream repository ($ git remote add upstream git://github.com/apache ... ...
Jan 24, 2012
Page: Extracting and aggregating metadata with Apache Tika (SPRUCE)
Extracting and aggregating metadata with Apache Tika Extracting and aggregating metadata with Tika At the Glasgow Mashup Peter May created a Python wrapper for Apache Tika. Carl Wilson extended this work, creating a Java utility class that wrapped Tika
Sep 28, 2012
Labels: spruce_london, solution, characterisation
Page: Parsing PST OST file using TIKA (Knowledge Base)
Parsing PST OST file using TIKA Title Parsing PST OST file using TIKA Detailed description The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents ...
Jun 05, 2013
Labels: chapel_hill, solution, appraisal_assessment, characterisation
Page: Large scale document characterization and identification with Tika and DRIOID on SCAPE Azure platform (SCAPE)
... tools, user should SCAPE Azure platform. We measured the speed of the Apache Tika Content Analysis Toolkit and the DROID File Format Identification Tool when they wereLarge scale document characterization and identification with Tika
Jul 15, 2014
Page: SO17 Web Archive Mime-Type detection workflow based on Droid and Apache Tika (SCAPE)
... Involved tools: \\ \\ unARC: \\ A tool (by SB) to unpack ARC files. \\ \\ TIFOWA (using the TIKA API): \\ TIFOWA (by ONB)  is using the TIKA API for extracting meta data from the files contained in an folder structure ...
Mar 01, 2012
Labels: identification, solution
Page: PC.WP1 Tool tracker (SCAPE)
PC.WP1 Tool tracker    \ Tika package by BL    \ UNIX file package by ?    \ FITS package by ?    \ ffprobe package by SB ...
Nov 14, 2012
Labels:
characterisation characterisation Delete
java java Delete
identification identification Delete
extraction extraction Delete
tool tool Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.