|Purpose||Detects and extracts metadata and text content from documents.|
| Source Code Repository
|| Apache License, Version 2.0
Java based tool for detecting and extracting metadata and text content from documents.
e.g. links to AQuA/SCAPE/Hackathon issues that use the tool
- IS25 Web Content Characterisation
- SO11 The Tika characterisation Tool
- SO17 Web Archive Mime-Type detection workflow based on Droid and Apache Tika
Link to any RSS feed that is updated when new releases occur, if any, e.g:
Link to any RSS feed that is updated when issue or code updates occur, if any, e.g:
Found 9 search result(s) for Tika.
EVAL ARC2WARCTOMAR with Tika Evaluator(s) Sven Schlarb <[email protected]> Evaluation points Assessment of measurable points Metric ...
Mar 10, 2014
... versioning, so further work is needed to add versions to other mimetypes. Can Tika be extended to support regexp like Fido? Prototype of Tika (https://github.com/openplanets/tika https://github.com/openplanets/tika) up and running that used regular expressions
Jun 14, 2012
... investigation and feedback to Apache Tika. \\ \ Some files are only identified as application/octetstream (Tika default). Needs further investigation and feedback to Tika. \\ \ Some problems with character encoding of metadata returned by Tika causing issues when trying to load JSON output ...
Jun 13, 2012
Labels: spruce, spruce_glasgow, identification, solution
... button) # Cloned the fork locally ($ git clone \recursive https://github.com/openplanets/tika) # cd into your local repository ($ cd tika) # Link with the upstream repository ($ git remote add upstream git://github.com/apache ... ...
Jan 24, 2012
Extracting and aggregating metadata with Apache Tika Extracting and aggregating metadata with Tika At the Glasgow Mashup Peter May created a Python wrapper for Apache Tika. Carl Wilson extended this work, creating a Java utility class that wrapped Tika
Sep 28, 2012
Labels: spruce_london, solution, characterisation
Parsing PST OST file using TIKA Title Parsing PST OST file using TIKA Detailed description The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents ...
Jun 05, 2013
Labels: chapel_hill, solution, appraisal_assessment, characterisation
... tools, user should SCAPE Azure platform. We measured the speed of the Apache Tika Content Analysis Toolkit and the DROID File Format Identification Tool when they wereLarge scale document characterization and identification with Tika
Jul 15, 2014
... Involved tools: \\ \\ unARC: \\ A tool (by SB) to unpack ARC files. \\ \\ TIFOWA (using the TIKA API): \\ TIFOWA (by ONB) is using the TIKA API for extracting meta data from the files contained in an folder structure ...
Mar 01, 2012
Labels: identification, solution