View Source

h1. Resources

h2. Software

* [JHOVE |https://github.com/gmcgath/jhove] Bespoke PDF Module used by DP Community.
* [Apache Tika |https://tika.apache.org/] Open Source characterisation / content extraction tool.
* [Apache PDF Box |http://pdfbox.apache.org] The Open Source PDF parsing library that powers [Apache Tika |https://tika.apache.org/].
* [pdfeh |https://github.com/openplanets/pdfeh] PDF Box preflight functionality wrapping.
* [pdf-preflight |https://github.com/yob/pdf-preflight] A Ruby pre-flight project on GitHub.

h1. Ideas

Some ideas which kind of tools as an output can be useful to build during the Hackathon.

* Create a scalable test if the PDF file can be opened by the Acrobat reader by using the [PDF Library |http://www.adobe.com/devnet/pdf/library.html]
* Create a scalable comparison workflow by converting both PDF files (original and new representation) to images and compare via e. g. matchbox tool if there are visible difference