You are viewing an old version of this page. View the current version. Compare with Current | View Page History
- JHOVE Bespoke PDF Module used by DP Community.
- Apache Tika Open Source characterisation / content extraction tool.
- Apache PDF Box The Open Source PDF parsing library that powers Apache Tika .
- pdfeh PDF Box preflight functionality wrapping.
- pdf-preflight A Ruby pre-flight project on GitHub.
Some ideas which kind of tools as an output can be useful to build during the Hackathon.
- Create a scalable test if the PDF file can be opened by the Acrobat reader by using the (but I guess that is not open source) PDF Library
- Create a scalable comparison workflow by converting both PDF files (original and new representation) to images and compare via e. g. matchbox tool if there are visible difference