Extracting and aggregating metadata with Apache Tika

Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Extracting and aggregating metadata with Tika

Apache Tika was used with a custom wrapper to extract metadata (e.g. author, title, extent, dates and file formats) and content (text) from collection files. A report was produced that summarised the metadata and content from the collection. This information will be used to inform collection management decisions and identify potential preservation issues.

A detailed description of the Solution. Feel free to include links to further information (eg. OPF blog posts!). Note that a Solution is a specific digital preservation application of a software tool or tools to a particular Issue with a particular Dataset. It might for example be a scripted tool, or a myExperiment workflow

Solution Champion
Thom Carter, Rebecca Webster

Corresponding Issue(s)
Produce a report summarising collection metadata and content
Sorting, appraising and metadata creation for deposited personal collections

Tool/code link
A link to code on Git hub or a corresponding myExperiment if applicable

Tool Registry Link
Add an entry to the OPF Tool Registry, and provide a link to it here.

Any notes or links on how the solution performed.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.