Tika

Skip to end of metadata
Go to start of metadata

Summary

Purpose Detects and extracts metadata and text content from documents.
Homepage
http://tika.apache.org/
Source Code Repository
https://github.com/apache/tika
License
Apache License, Version 2.0
Debian Package  

Description

Java based tool for detecting and extracting metadata and text content from documents.

User Experiences

e.g. links to AQuA/SCAPE/Hackathon issues that use the tool

News Feeds

Release Feed

Link to any RSS feed that is updated when new releases occur, if any, e.g:

rss: javax.net.ssl.SSLException: java.lang.RuntimeException: Could not generate DH keypair

Activity Feed

Link to any RSS feed that is updated when issue or code updates occur, if any, e.g:

Activity Streams
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=githubbot" class="activity-item-user activity-item-author">ASF GitHub Bot</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2298">TIKA-2298 - To improve object recognition parser so that it may work without external RESTful service setup</a>

asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-304462311

@thammegowda @chrismattmann awaiting review

Read more
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=githubbot" class="activity-item-user activity-item-author">ASF GitHub Bot</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2262">TIKA-2262 - Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types</a>

chrismattmann commented on issue #180: Fix for TIKA-2262: Supporting Image-to-Text (Image Captioning) in Tika
URL: https://github.com/apache/tika/pull/180#issuecomment-304462151

going to try this out today @ThejanW !

----------------------------------------------------------------

Read more
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=githubbot" class="activity-item-user activity-item-author">ASF GitHub Bot</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2298">TIKA-2298 - To improve object recognition parser so that it may work without external RESTful service setup</a>

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-304462094

frickin' awesome! I'm going to test this

Read more
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=githubbot" class="activity-item-user activity-item-author">ASF GitHub Bot</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2298">TIKA-2298 - To improve object recognition parser so that it may work without external RESTful service setup</a>

chrismattmann closed pull request #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159

----------------------------------------------------------------
This is an

Read more
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=githubbot" class="activity-item-user activity-item-author">ASF GitHub Bot</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2298">TIKA-2298 - To improve object recognition parser so that it may work without external RESTful service setup</a>

chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-304461966

superseded by #182

----------------------------------------------------------------

Read more
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=githubbot" class="activity-item-user activity-item-author">ASF GitHub Bot</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2298">TIKA-2298 - To improve object recognition parser so that it may work without external RESTful service setup</a>

asmehra95 opened a new pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182

<b>Note:</b> This is a modified form of #159 raised earlier by

Read more
<a href="https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sreynolds" class="activity-item-user activity-item-author">Steve Reynolds</a> commented on <a href="https://issues.apache.org/jira/browse/TIKA-2378">TIKA-2378 - Error extracting text from application/x-msaccess mime type</a>

Searching for Tika on OPF Labs

Found 9 search result(s) for Tika.

Page: EVAL ARC2WARC-TOMAR with Tika (SCAPE)
EVAL ARC2WARCTOMAR with Tika Evaluator(s) Sven Schlarb <sven.schlarb@onb.ac.at> Evaluation points Assessment of measurable points Metric ...
Mar 10, 2014
Page: SO11 The Tika characterisation Tool (SCAPE)
... versioning, so further work is needed to add versions to other mimetypes. Can Tika be extended to support regexp like Fido? Prototype of Tika (https://github.com/openplanets/tika https://github.com/openplanets/tika) up and running that used regular expressions
Jun 14, 2012
Page: Tika Batch File Identification (SPRUCE)
... investigation and feedback to Apache Tika. \\ \ Some files are only identified as application/octetstream (Tika default).  Needs further investigation and feedback to Tika. \\ \ Some problems with character encoding of metadata returned by Tika causing issues when trying to load JSON output ...
Jun 13, 2012
Labels: spruce, spruce_glasgow, identification, solution
Page: Example - Working with Apache Tika (SCAPE)
... button) # Cloned the fork locally ($ git clone \recursive https://github.com/openplanets/tika) # cd into your local repository ($ cd tika) # Link with the upstream repository ($ git remote add upstream git://github.com/apache ... ...
Jan 24, 2012
Page: Extracting and aggregating metadata with Apache Tika (SPRUCE)
Extracting and aggregating metadata with Apache Tika Extracting and aggregating metadata with Tika At the Glasgow Mashup Peter May created a Python wrapper for Apache Tika. Carl Wilson extended this work, creating a Java utility class that wrapped Tika
Sep 28, 2012
Labels: spruce_london, solution, characterisation
Page: Parsing PST OST file using TIKA (Knowledge Base)
Parsing PST OST file using TIKA Title Parsing PST OST file using TIKA Detailed description The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents ...
Jun 05, 2013
Labels: chapel_hill, solution, appraisal_assessment, characterisation
Page: Large scale document characterization and identification with Tika and DRIOID on SCAPE Azure platform (SCAPE)
... tools, user should SCAPE Azure platform. We measured the speed of the Apache Tika Content Analysis Toolkit and the DROID File Format Identification Tool when they wereLarge scale document characterization and identification with Tika
Jul 15, 2014
Page: SO17 Web Archive Mime-Type detection workflow based on Droid and Apache Tika (SCAPE)
... Involved tools: \\ \\ unARC: \\ A tool (by SB) to unpack ARC files. \\ \\ TIFOWA (using the TIKA API): \\ TIFOWA (by ONB)  is using the TIKA API for extracting meta data from the files contained in an folder structure ...
Mar 01, 2012
Labels: identification, solution
Page: PC.WP1 Tool tracker (SCAPE)
PC.WP1 Tool tracker    \ Tika package by BL    \ UNIX file package by ?    \ FITS package by ?    \ ffprobe package by SB ...
Nov 14, 2012
Labels:
characterisation characterisation Delete
java java Delete
identification identification Delete
extraction extraction Delete
tool tool Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.