View Source

h2. Description
PDF documents may contain JavaScript.

h2. Risks
The presence of JavaScript can be a security issue.

h2. Assessment
The following table shows the relevant output of _Apache Preflight_ (part of [Apache PDFBox]) for PDFs with JavaScript. Results obtained with _Preflight_ 2.0.0:

|*Reference file*|*Description*|*Error Code(s)*|*Details*|
|[javascript.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/javascript.pdf]|Contains embedded Javascript|6.2.5|Action is forbidden, The action JavaScript is forbidden|

h2. Notes

h3. Error code is generic
The [Preflight source code|http://svn.apache.org/repos/asf/pdfbox/trunk/preflight/src/main/java/org/apache/pdfbox/preflight/PreflightConstants.java] reveals that error code 6.2.5 is a generic error code for any action that is forbidden in PDF/A-1. The JavaScript action is just one of them, which means that it is not possible to identify embedded Javascript without taking into account the elaborate error description (i.e. the contents of the _details_ field in the output) as well.

h3. JavaScript not detected in all cases
Additional tests with more complex PDFs show that _Apache Preflight_ (revision 1530740) is not always successful at detecting JavaScript. The following page shows an intercomparison of output from Adobe Acrobat Preflight and Apache Preflight for a selection of PDFs from the [Adobe Acrobat Engineering website|http://acroeng.adobe.com/wp/]:

[Analysis of Acrobat Engineering PDFs with Acrobat Preflight and Apache Preflight]

The following PDFs contain JavaScript (confirmed by both Acrobat Preflight and a manual check with a hex editor), but this is not reported by _Apache Preflight_:

* [http://acroeng.adobe.com/Test_Files/classic_multimedia//Disney-Flash.pdf]
* [http://acroeng.adobe.com/Test_Files/classic_multimedia//Service%20Form_media.pdf]
* [http://acroeng.adobe.com/Test_Files/classic_multimedia//Trophy.pdf]

h2. Recommendations

h3. Pre-ingest

* Formulate policy on how to deal with JavaScript in PDFs.
* Use [Apache Preflight|Apache PDFBox] to establish if files contain JavaScript (but note that Preflight's JavaScript detection is not perfect yet).

h3. Existing collections

* Use [Apache Preflight|Apache PDFBox] to establish if files contain JavaScript (but note that Preflight's JavaScript detection is not perfect yet).

h2. Example files
* [http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/] - PDF Cabinet of Horrors on OPF Format Corpus