Related Labels: sox, word, 3gpp, jpg, jhears, ocr, fu-script, audio, gif, quality_assurance, video, wmv, comparison, obsolescence, macromedia, bmp, flvdump, apache, api

Page: Analysis of Lucene Index Word Frequency
One line summary Create a word frequency list from a Lucene index and try to ascertain the subject matter of the collection that the index was created against. Detailed description The solution for AQuA:Characterising Externally Generated Content generated a Lucene index of the collection ...
Other labels: aqua
Page: Characterising Externally Generated Content
One line summary Tool to create a manifest of digital content, including format and SHA256 digest, and index content where possible Detailed description Java code, currently runs as a command line application.  Uses Apache Tika to obtain ...
Other labels: aqua, characterisation, fixity