In my sample, I have found 35 error messages concerning invalid or malformed (or both) PDF Files:
- PdfMalformedException: Invalid name tree Offset: 541014
- Invalid destination object
- Expected dictionary for font entry in page resource
- Invalid object number in cross-reference stream
- Annotation object is not a dictionary
- Invalid object number or object stream
- Improperly constructed page tree
- Invalid Page tree node
- Invalid Annotation property
- Invalid Resources Entry in document
- Invalid outline dictionary item
- Compression method is invalid or unknown to JHOVE
- Improperly formed date
- Invalid page dictionary object
- Lexical Error
- No document catalog dictionary
- Malformed dictionary
- Invalid Font entry in Resources
- Invalid character in hex string
- Malformed outline dictionary
- Invalid dictionary data for page
- No PDF Header
- Missing expected element in page number dictionary
- java.lang.ClassCastException: PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary
- java.lang.NullPointerException
- Invalid Names dictionary
- Improperly nested array delimiters
- Invalid cross-reference table
- Invalid ID in trailer
- Invalid object definition
- Unexpected error in findFonts
- Invalid outline dictionary object
- Outline dictionary missing required entry
- Malformed filter
- Improperly constructed page tree
Relevant for PDF/A-compliance are these messages:
- No encryption dictionary
- No Encrypt or Info entries in trailer
- Document catalog dictionary specifies RFC1766 language
- Document catalog dictionary has no AA or OCProperties
- Form fields do not have AA actions
- No Launch, Sound, Movie, ResetForm, ImportData, or JavaScript actions
- Fonts have recognized encoding
- Uncalibrated color spaces have OutputIntent specified
- Page objects do not have Movie, Sound, or FileAttachment
- Non-text annotations have Contents key
- Unfiltered metadata stream
A typical sample of 4145 PDF files from our Open Access Repository consists of 233 invalid PDF Files and outputs the following error messages:
Sample consists of 31 different JHOVE error messages
1: 390 x ErrorMessage: Annotation object is not a dictionary
2: 82 x ErrorMessage: Compression method is invalid or unknown to JHOVE
3: 862 x ErrorMessage: Expected dictionary for font entry in page resource
4: 245 x ErrorMessage: Improperly constructed page tree
5: 42 x ErrorMessage: Improperly formed date
6: 1 x ErrorMessage: Improperly nested array delimiters
7: 131 x ErrorMessage: Invalid Annotation property
8: 5 x ErrorMessage: Invalid Font entry in Resources
9: 2 x ErrorMessage: Invalid Names dictionary
10: 93 x ErrorMessage: Invalid Resources Entry in document
11: 5 x ErrorMessage: Invalid character in hex string
12: 1 x ErrorMessage: Invalid cross-reference table
13: 44 x ErrorMessage: Invalid destination object
14: 4 x ErrorMessage: Invalid dictionary data for page
15: 470 x ErrorMessage: Invalid object number in cross-reference stream
16: 68 x ErrorMessage: Invalid object number or object stream
17: 86 x ErrorMessage: Invalid outline dictionary item
18: 23 x ErrorMessage: Invalid page dictionary object
19: 169 x ErrorMessage: Invalid page tree node
20: 10 x ErrorMessage: Lexical error
21: 2 x ErrorMessage: Malformed dictionary
22: 4 x ErrorMessage: Malformed dictionary: Vector must contain an even number of objects, but has 15
23: 5 x ErrorMessage: Malformed outline dictionary
24: 4 x ErrorMessage: Missing expected element in page number dictionary
25: 5 x ErrorMessage: No PDF header
26: 8 x ErrorMessage: No document catalog dictionary
27: 1055 x ErrorMessage: edu.harvard.hul.ois.jhove.module.pdf.PdfInvalidException: Invalid destination object
28: 2440 x ErrorMessage: edu.harvard.hul.ois.jhove.module.pdf.PdfMalformedException: Invalid name tree
29: 287 x ErrorMessage: edu.harvard.hul.ois.jhove.module.pdf.PdfMalformedException: Invalid object number or object stream
30: 2 x ErrorMessage: java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary
31: 2 x ErrorMessage: java.lang.NullPointerException