Encryption

compared with
Current by Johan van der Knijff
on Jul 11, 2014 14:36.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (10)

View Page History
h2. Assessment

The _Preflight_ component of [Apache PDFBox] (an open source _PDF/A_ validator) is able to detect encryption in a PDF (_any_ PDF, doesn't have to be _PDF/A_!). The following table shows the combinations of error codes and descriptions (_details_ element of _Preflight_'s XML output) for 4 different types of password protection. Results obtained with _Preflight_ 2.0.0:
The following table shows the relevant output of _Apache Preflight_ (part of [Apache PDFBox]) for 4 different types of password protection. Results obtained with _Preflight_ 2.0.0, revision 1530740:

|*Reference file*|*Description*|*Error Code(s)*|*Details*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|Requires password to open the file|1.0|Syntax error|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|Requires password to copy document contents|1.0|Syntax error|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|Requires password for printing|1.0|Syntax error|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|Requires password to enable text access for screen reader devices for the visually impaired|1.0|Syntax error|

Note that these results aren't particularly helpful. However, previous work with an earlier version of *Preflight* (1.8.0) produced notably different results:

|*Reference file*|*Description*|*Error Code(s)*|*Details*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|Requires password to open the file|1.0|Syntax error, Error (CryptographyException) while creating security handler for decryption: Error: The supplied password does not match either the owner or user password in the document|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|Requires password to copy document contents|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|Requires password for printing|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|Requires password to enable text access for screen reader devices for the visually impaired|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|

So the _Preflight_ 2.0.0 results look like a bug, for which the following [bug report|https://issues.apache.org/jira/browse/PDFBOX-1659] was submitted.

h3. Detection of encryption and access permissions using ExifTool
Aside from telling you that a PDF is encrypted, Apache Preflight doesn't provide any details on the specific access rights and restrictions (e.g. Print, Modify, Copy, Extract, etc.). If this information is needed one option is to use [ExifTool]. Based on tests with the test documents above, _Exiftool_'s behavior is as follows:

* ff a document requires a password to open it, _ExifTool_ reports a warning;
* if any functionality is restricted, _ExifTool_'s output contains an _Encryption_ element, as well as a _UserAccess_ element that lists all the permitted functionality.

The following table shows the result for the tests documents (in this case _ExifTool_ was run with the _-X_ switch, producing RDF output):

|*Reference file*|*Location in output*|*Text*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|"/rdf:RDF/rdf:Description/ExifTool:Warning" \\ "/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Document is password protected (use Password option) \\Standard V4.4 (128-bit) \\ Print, Modify, Copy, Annotate, Fill forms, Extract, Print high-res|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess" |Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Extract, Print high-res|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Modify, Copy, Annotate, Fill forms, Extract|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Print high-res|

h2. References Recommendations
[Van der Knijff, J.M., Adobe Portable Document Format - Inventory of long-term preservation risks KB/ National Library of the Netherlands, 2009|http://www.openplanetsfoundation.org/system/files/PDFInventoryPreservationRisks_0_2_0.pdf]

h3. Pre-ingest

* Use [Apache Preflight|Apache PDFBox] to establish if files are encrypted.
* Formulate policy on how to deal with encryption (e.g. reject any PDFs that are encrypted, reject PDFs that require an open password, etc).

h3. Existing collections

* Use [Apache Preflight|Apache PDFBox] to detect encrypted files in collection.
* In some cases it may be possible to obtain unencrypted version from the original depositor/publisher.


h2. Example files
* [http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/] - PDF Cabinet of Horrors on OPF Format Corpus