Validate and report filetypes per file

Skip to end of metadata
Go to start of metadata
Title Validate and report filetypes per file
Detailed description To report the file discrepency on a per file basis, we:
 - open each file from the archive
 - analyse the filetype using droid

 - compare the filetype with the web archive and report mismatch ( we may need to normalize variations in the same mimetype expressions, i.e. image\jpg=image\jpeg )
 - export report in csv
Investigate if this tool can be offered as module\add on for thejhove2-bfn fork
Solution Champion Lucien van wouw <[email protected]>
Corresponding Issue(s) Identifying web content
Tool/code link heritrix-imp
Tool Registry Link Heritrix and Droid
Evaluation Relativly time consuming as the entire archive needs to be unpacked before being able to identify each file. But the endresults shows clearly the differences in a simple csv between the Archive format description and the external tool ( Droid in this case ). Solution could be seen as a stand alone validation tool; but in regards to the problem description it ought to be seen as a proof of concept. That is: similar functionality ought to be ported into the Hove2-bfn module.
Labels:
solution solution Delete
quality_assurance quality_assurance Delete
identification identification Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.