View Source

| *Title* \\ | Checking significant properties of documents have been retained after migration |
| *Detailed description* | At the Archaeology Data Service we have a migration-based preservation strategy so documents are routinely migrated into new formats for either preservation or dissemination. We can batch process these file migrations but what we can not automate at the moment is a check to see that the significant properties of the files have been retained from one version of a file to the next. We often do a random check of a couple of files within the batch but this is not fool proof so a proper check and comparison of a few key quantifiable properties would be really useful. \\
* font - may be most important as different fonts can push the page numbering out
* number of pages
* number of words
* number of characters
* number of images/graphics |
| *Issue champion* | [~jennymitcham] \\ |
| *Other interested parties* \\ | _Any other parties who are also interested in applying Issue Solutions to their Datasets_ |
| *Possible Solution approaches* | _There are some tools from previous events out there that already characterise word documents:_
* _Apache POI tool on AQUA_
* _Office Analyser tool by Andy Jackson_
* _PLANETS tool by Maurice?_Need to see which of these we can use and then create similar ways of characterising PDF, PDF/A and Open Office files (odt/sxw)\\ |
| *Context* | _It is essential that we can_ demonstrate the authenticity of the files that we are preserving. Checking files after a migration should be a part of this. \\ |
| *Lessons Learned* | _Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice_ \\ |
| *Datasets* | [SPR:Archaeology Data Service archive]\\ |
| *Solutions* | _Reference to the appropriate Solution page(s), by hyperlink_ |