View Source

h1. Resources

h2. Software

* [JHOVE |] Bespoke PDF Module used by DP Community.
* [Apache Tika |] Open Source characterisation / content extraction tool.
* [Apache PDF Box |] The Open Source PDF parsing library that powers [Apache Tika |].
* [pdfeh |] PDF Box preflight functionality wrapping.
* [pdf-preflight |] A Ruby pre-flight project on GitHub.

h1. Ideas

Some ideas which kind of tools as an output can be useful to build during the Hackathon.

* Create a scalable test if the PDF file can be opened by the Acrobat reader by using the [PDF Library |]
* Create a scalable comparison workflow by converting both PDF files (original and new representation) to images and compare via e. g. matchbox tool if there are visible difference