Extracting and aggregating metadata with Apache Tika

Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Extracting and aggregating metadata with Tika

Apache Tika was used with a custom wrapper to extract metadata (e.g. author, title, extent, dates and file formats) and content (text) from collection files. A Java script was then used to produce report that summarised the metadata and content from the collection. This information will be used to inform collection management decisions and identify potential preservation issues.

Solution Champion
Thom Carter, Rebecca Webster

Corresponding Issue(s)
Produce a report summarising collection metadata and content
Sorting, appraising and metadata creation for deposited personal collections

Tool/code link
[Link to Pete's code]

Tool Registry Link
Apache Tika

Evaluation
Any notes or links on how the solution performed.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.