
Resources
Software
- JHOVE
Bespoke PDF Module used by DP Community.
- Apache Tika
Open Source characterisation / content extraction tool.
- Apache PDF Box
The Open Source PDF parsing library that powers Apache Tika
.
- pdfeh
PDF Box preflight functionality wrapping.
- pdf-preflight
A Ruby pre-flight project on GitHub.
Ideas
Some ideas which kind of tools as an output can be useful to build during the Hackathon.
- Create a scalable test if the PDF file can be opened by the Acrobat reader by using the PDF Library
- Create a scalable comparison workflow by converting both PDF files (original and new representation) to images and compare via e. g. matchbox tool if there are visible difference
Labels:
None