Extracting and aggregating metadata with Apache Tika

Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Extracting and aggregating metadata with Tika

Apache Tika was used with a custom wrapper to extract metadata (e.g. author, title, extent, dates and file formats) and content (text) from files in two large digital archive collections. A script (written in Java) was then used to produce a HTML report summarising the metadata and content across the collection. This information will be used to inform collection management decisions and identify potential preservation issues.

Solution Champion
Thom Carter, Rebecca Webster

Corresponding Issue(s)
Produce a report summarising collection metadata and content
Sorting, appraising and metadata creation for deposited personal collections

Tool/code link
[Link to Pete's code]

Tool Registry Link
Apache Tika

Evaluation

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.