Checking that significant properties are preserved after migration

| *Title* \\ | Checking that significant properties are preserved after migration |
| *Detailed description* | _AFter a file migration (from pdf to pdf/a, doc to docx, doc to pdf/a) we should check that the conversion has been successful and that the significant properties of the object are maintained. We do not do this consistently at present. We may check a handful of files after a batch process but that means we are likely to miss the one conversion that has not been successful. Would be great to have a tool that could open the 2 documents (original and migrated files) and compare serveral quantifiable metrics (for example word count, page count, number of images, paragraph count, anything else) and report on those conversions where the numbers don't match up. These may then be assessed by eye individually and re-migrated if necessary._ \\ |
| *Issue champion* | _Jenny Mitcham [email protected]_  (Archaeology Data Service) \\ |
| *Issue champion* | [~jennymitcham]\\ |
| *Other interested parties* \\ | _Any other parties who are also interested in applying Issue Solutions to their Datasets_ |
| *Possible Solution approaches* | * Apache POI  for looking inside MS Word docs - see what metrics can be extracted