
Description
Some software applications produce PDFs that do not conform to the PDF format specification ( PDF 1.7 /ISO 32000-1 or the earlier pre-ISO specifications).
Risks
- PDF may not render correctly (or even render at all)
- Future migration to alternative format may result in loss of data (or it may fail altogether)
Assessment
Validation is problematic for PDF, mainly because of the complexity of the format and the lack of reliable tools.
- JHOVE includes a PDF module, but it doesn't support PDF 1.7 / ISO 32000 (yet?). In addition its principal author considers JHOVE to be "approaching the end of its life"
.
- The website of the PDF Association lists a number of commercially available tools
that do validation of either PDF (presumably ISO 32000?) and/or PDF/A.
- Presentation
by Duff Johnson on possibilities of an open source PDF validator that may be developed at some point.
Apache Preflight (part of Apache PDFBox) does not validate against the PDF format specification. However, it does include a Processing error category, which is described as "not necessarily a specific PDF/A validation error but a PDF specification requirement that isn't respected". Also, if Preflight raises an exception this may also indicate a malformed file.
Reference file | Description | Error Code(s) | Details |
sample file needed![]() |
Malformed PDF | 8 | Processing error – replace with actual error message |
sample file needed![]() |
Malformed PDF | 8.1 | Mandatory element missing (possibly malformed PDF) |
sample file needed![]() |
Malformed PDF | Exception |