Extracting and aggregating metadata with Apache Tika

Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Extracting and aggregating metadata with Tika

Apache Tika was used with a custom wrapper to extract metadata (e.g. author, title, extent, dates and file formats) and content (text) from collection files. A Java script was then used to produce report that summarised the metadata and content from the collection. This information will be used to inform collection management decisions and identify potential preservation issues.

Solution Champion

Thom Carter, Rebecca Webster

Corresponding Issue(s)
Produce a report summarising collection metadata and content
Sorting, appraising and metadata creation for deposited personal collections

Tool/code link
A link to code on Git hub or a corresponding myExperiment if applicable

Tool Registry Link
Add an entry to the OPF Tool Registry, and provide a link to it here.

Evaluation
Any notes or links on how the solution performed.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.