Page: Analysis of Lucene Index Word Frequency
One line summary Create a word frequency list from a Lucene index and try to ascertain the subject matter of the collection that the index was created against. Detailed description The solution for AQuA:Characterising Externally Generated Content generated a Lucene index of the collection ...
Page: Characterising Externally Generated Content
One line summary Tool to create a manifest of digital content, including format and SHA256 digest, and index content where possible Detailed description Java code, currently runs as a command line application.  Uses Apache Tika to obtain ...
Page: Identifying the content of MS Office documents
One line summary We have OLE2 Office documents, which may contain more documents, and we want to identify which version of Office each was created by. \\ Detailed description The older binary Office document formats (OLE) are effectively ...
Page: Unknown born-digital file history
One line summary History of individual files and how they relate to others within the wider collection          &nbsp ;      &nbsp ...
