Description
PDFs may contain file attachments. There are two ways to include an attachment in a PDF:
- Page-level attachments which use a File Attachment Annotation (section 12.5.6.15 of ISO32000
)
- Document-level attachments which are defined by the EmbeddedFiles entry in the document’s name dictionary (section 7.7.4 of ISO32000
)
Both 1. and 2. are really just references to the actual file attachment data, which are stored as an Embedded File Stream in the document in both cases. However, an Embedded File Stream can also be used to represent multimedia content (see also this blog post on embedded files in PDF), so by itself this cannot be used to identify a file attachment.
Risks
Attachment can have any format, so long-term accessibility may be at risk. Attached malicious software can be a security risk.
Assessment
The following table shows the relevant output of Apache Preflight (part of Apache PDFBox) for PDFs with file attachments. Results obtained with Preflight 2.0.0:
Reference file | Description | Error Code(s) | Details |
fileAttachment.pdf![]() |
Contains document-level file attachment that is defined using EmbeddedFiles entry in the document’s name dictionary | 1.2.9; 1.4.7 | Body Syntax error, EmbeddedFile entry is present in a FileSpecification dictionary; Trailer Syntax error, EmbeddedFile entry is present in the Names dictionary |
fileAttachment_fileAttachmentAnnotation.pdf![]() |
Contains page-level file attachment that is defined using a File Attachment Annotation | 1.2.9; 5.2.1 | Body Syntax error, EmbeddedFile entry is present in a FileSpecification dictionary; Forbidden field in an annotation definition, The subtype isn't authorized : FileAttachment |
PDF___FileAttachment.pdf![]() |
From File Attachment Testing on Adobe Acrobat Engineering website | 1.4.7 | Trailer Syntax error, EmbeddedFile entry is present in the Names dictionary |
non_ACRO___FileAttachment.pdf![]() |
From File Attachment Testing on Adobe Acrobat Engineering website | 1.4.7 | Trailer Syntax error, EmbeddedFile entry is present in the Names dictionary |
non_PDF_ACRO___FileAttachment.pdf![]() |
From File Attachment Testing on Adobe Acrobat Engineering website | 1.4.7 | Trailer Syntax error, EmbeddedFile entry is present in the Names dictionary |
Notes
Error 1.2.9 may also indicate multimedia content
Error code 1.2.9 ('EmbeddedFile entry is present in a FileSpecification dictionary') is also reported for PDFs that contain Multimedia content that is represented as Embedded File Streams (see above).
Page-level and document-level attachments result in different errors
Also note from the above results that a document-level attachment produces error 1.4.7 (Trailer Syntax error, EmbeddedFile entry is present in the Names dictionary), whereas a page-level file attachment will result in error 5.2.1 ('Forbidden field in an annotation definition, The subtype isn't authorized : FileAttachment'). For the second case the error message as a whole should be taken into account, as 5.2.1 is a generic error code that encompasses a number of different annotation types.
Preflight doesn't report Embedded File Stream for Acrobat Engineering PDFs
The table above shows that error 1.2.9 isn't reported for the Acrobat Engineering PDFs, even though a manual check in a hex editor confirms that these files do contain embedded file streams. This is most likely a bug in Preflight (reported here).
Recommendations
Pre-ingest
- Formulate policy on how to deal with file attachments, and the long-term accessibility requirements of attached files.
- Use Apache Preflight to establish if files contain file attachments.
- If attached files are to remain accessible in the long term, a possible option would be to extract attached files before ingest, and ingest the attachments as supplementary file objects to the PDF.
Existing collections
- Use Apache Preflight to detect files with file attachments in collection.
Example files
- http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/
- PDF Cabinet of Horrors on OPF Format Corpus
- http://acroeng.adobe.com/wp/?page_id=276 File Attachment Testing on Adobe Acrobat Engineering website
References
- Van der Knijff, J.M. What do we mean by "embedded" files in PDF?
- explains use of Embedded File Streams for both file attachments and multimedia.