Some software applications produce PDFs that do not conform to the PDF format specification ( PDF 1.7 /ISO 32000-1 or the earlier pre-ISO specifications).
- PDF may not render correctly (or even render at all)
- Future migration to alternative format may result in loss of data (or it may fail altogether)
Validation is problematic for PDF, mainly because of the complexity of the format and the lack of reliable tools.
- JHOVE includes a PDF module, but it doesn't support PDF 1.7 / ISO 32000 (yet?). In addition its principal author considers JHOVE to be "approaching the end of its life".
- The website of the PDF Association lists a number of commercially available tools that do validation of either PDF (presumably ISO 32000?) and/or PDF/A.
Apache Preflight (part of Apache PDFBox) does not validate against the PDF format specification. However, it does include a Processing error category, which is described as "not necessarily a specific PDF/A validation error but a PDF specification requirement that isn't respected". Also, if Preflight raises an exception this may also indicate a malformed file.
|Reference file||Description||Error Code(s)||Details|
|sample file needed||Malformed PDF||8||Processing error – replace with actual error message|
|sample file needed||Malformed PDF||8.1||Mandatory element missing (possibly malformed PDF)|
|sample file needed||Malformed PDF||Exception|
- No authorative or generally accepted tools exist for PDF validation, but using Apache Preflight and checking its output for processing errors will at least detect PDFs that are seriously malformed.
- Use Apache Preflight and check for processing errors.
- In some cases it may be possible to obtain an intact version of malformed files from the original depositor/publisher.