File attachments

compared with
Current by Johan van der Knijff
on Jul 11, 2014 15:10.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (1)

View Page History
The table above shows that error 1.2.9 isn't reported for the Acrobat Engineering PDFs, even though a manual check in a hex editor confirms that these files do contain embedded file streams. This is most likely a bug in Preflight (reported [here|https://issues.apache.org/jira/browse/PDFBOX-1758]).

h2. Recommendations

h3. Pre-ingest

* Formulate policy on how to deal with file attachments, and the long-term accessibility requirements of attached files.
* Use [Apache Preflight|Apache PDFBox] to establish if files contain file attachments.
* If attached files are to remain accessible in the long term, a possible option would be to extract attached files before ingest, and ingest the attachments as supplementary file objects to the PDF.

h3. Existing collections

* Use [Apache Preflight|Apache PDFBox] to detect files with file attachments in collection.

h2. Example files
* [http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/] - PDF Cabinet of Horrors on OPF Format Corpus