Extracting and aggregating metadata with Tika
Apache Tika was used with a custom wrapper to extract metadata (e.g. author, title, extent, dates and file formats) and content (text) from files in two large digital archive collections. A Java script was then used to produce a report that summarised the metadata and content across the collection. This information will be used to inform collection management decisions and identify potential preservation issues.
[Link to Pete's code]
Any notes or links on how the solution performed.