View Source

| *Title* | Validate and report filetypes per file |
| *Detailed description* | _To report the file discrepency on a per file basis, we:_ \\
_ \- open each file from the archive_ \\
_ \- analyse the filetype using droid_ \\
_ \- compare the filetype with the web archive and report mismatch ( we may need to normalize variations in the same mimetype expressions, i.e. image\jpg=image\jpeg )_ \\
_ \- export report in csv_ \\
_Investigate if this tool can be offered as module\add on for the{_}_[jhove2-bfn|]_ _fork_ \\ |
| *Solution Champion* | Lucien van wouw <[email protected]> \\ |
| *Corresponding Issue(s)* | [REQ:Identifying web content]\\ |
| *Tool/code link* | [heritrix-imp|] |
| *[Tool Registry Link|]* | [Heritrix |]and [Droid|]\\ |
| *Evaluation* | _Relativly time consuming as the entire archive needs to be unpacked before being able to identify each file. But the endresults shows clearly the differences in a simple csv between the Archive format description and the external tool ( Droid in this case ). Solution could be seen as a stand alone validation tool; but in regards to the problem description it ought to be seen as a proof of concept. That is: similar functionality ought to be ported into the Hove2-bfn module._ |