Extracting and aggregating metadata with Tika
Apache Tika was used with a custom wrapper to extract metadata (e.g. author, title, extent, dates and file formats) and content (text) from collection files. A report was produced that summarised the metadata and content from the collection. This information will be used to inform collection management decisions and identify potential preservation issues.
A detailed description of the Solution. Feel free to include links to further information (eg. OPF blog posts!). Note that a Solution is a specific digital preservation application of a software tool or tools to a particular Issue with a particular Dataset. It might for example be a scripted tool, or a myExperiment workflow
A link to code on Git hub or a corresponding myExperiment if applicable
Tool Registry Link
Add an entry to the OPF Tool Registry, and provide a link to it here.
Any notes or links on how the solution performed.