Title | Validate and report filetypes per file |
Detailed description | To report the file discrepency on a per file basis, we: - open each file from the archive - analyse the filetype using droid - compare the filetype with the web archive and report mismatch ( we may need to normalize variations in the same mimetype expressions, i.e. image\jpg=image\jpeg ) - export report in csv Investigate if this tool can be offered as module\add on for thejhove2-bfn ![]() |
Solution Champion | Lucien van wouw <[email protected]> |
Corresponding Issue(s) | Identifying web content |
Tool/code link | heritrix-imp![]() |
Tool Registry Link![]() |
Heritrix ![]() ![]() |
Evaluation | Relativly time consuming as the entire archive needs to be unpacked before being able to identify each file. But the endresults shows clearly the differences in a simple csv between the Archive format description and the external tool ( Droid in this case ). Solution could be seen as a stand alone validation tool; but in regards to the problem description it ought to be seen as a proof of concept. That is: similar functionality ought to be ported into the Hove2-bfn module. |
Labels: