Encryption

compared with
Current by Johan van der Knijff
on Jul 11, 2014 14:36.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (23)

View Page History
h2. Description
PDF permits the use of encryption as a means of restricting access or (re-)use of content. This may range from documents that can only be opened after providing a password, to disabling specific functionality (e.g. printing, copying content).

h2. Risks
* Content may become inaccessible if passwords are not known (even though "cracking" is often technically possible, institutions may not be legally permitted to do this)
h2. Assessment

The _Preflight_ component of [Apache+PDFBox] (an open source _PDF/A_ validator) is able to detect encyption in a PDF (_any_ PDF, doesn't have to be _PDF/A_!).
The following table shows the relevant output of _Apache Preflight_ (part of [Apache PDFBox]) for 4 different types of password protection. Results obtained with _Preflight_ 2.0.0, revision 1530740:

h3. PDF requires password for opening
|*Reference file*|*Description*|*Error Code(s)*|*Details*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|Requires password to open the file|1.0|Syntax error, Error (CryptographyException) while creating security handler for decryption: Error: The supplied password does not match either the owner or user password in the document|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|Requires password to copy document contents|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|Requires password for printing|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|Requires password to enable text access for screen reader devices for the visually impaired|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|

In _Preflight_ 1.8.0 an open password results in error code 1.0 (syntax error) with the following accompanying description:

bq. Syntax error, Error (CryptographyException) while creating security handler for decryption: Error: The supplied password does not match either the owner or user password in the document
h3. Detection of encryption and access permissions using ExifTool
Aside from telling you that a PDF is encrypted, Apache Preflight doesn't provide any details on the specific access rights and restrictions (e.g. Print, Modify, Copy, Extract, etc.). If this information is needed one option is to use [ExifTool]. Based on tests with the test documents above, _Exiftool_'s behavior is as follows:

In _Preflight_ 2.0.0 only a 'syntax' error message is reported without any indication of anything encryption-related. Reported this as a [bug|https://issues.apache.org/jira/browse/PDFBOX-1659]
* ff a document requires a password to open it, _ExifTool_ reports a warning;
* if any functionality is restricted, _ExifTool_'s output contains an _Encryption_ element, as well as a _UserAccess_ element that lists all the permitted functionality.

|*Error code*|*Affected if expression returns _True_*|
The following table shows the result for the tests documents (in this case _ExifTool_ was run with the _-X_ switch, producing RDF output):

|*Reference file*|*Location in output*|*Text*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|"/rdf:RDF/rdf:Description/ExifTool:Warning" \\ "/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Document is password protected (use Password option) \\Standard V4.4 (128-bit) \\ Print, Modify, Copy, Annotate, Fill forms, Extract, Print high-res|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess" |Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Extract, Print high-res|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Modify, Copy, Annotate, Fill forms, Extract|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Print high-res|

h2. Recommendations

|*Tool*|*Affected if expression returns _True_*|
|[Apache+PDFBox]| {{"/preflight/errors/error/code = '1.0' and /preflight/errors/error/details = 'Syntax error, Error (CryptographyException) while creating security handler for decryption: Error: The supplied password does not match either the owner or user password in the document'"}}|
h3. Pre-ingest

h2. Recommendations
Recommendations on what action(s) to pursue in case file is affected by this problem. Optional.
* Use [Apache Preflight|Apache PDFBox] to establish if files are encrypted.
* Formulate policy on how to deal with encryption (e.g. reject any PDFs that are encrypted, reject PDFs that require an open password, etc).

h2. Example files
Links to example files, preferrably from [OPF Format Corpus|http://www.opf-labs.org/format-corpus/], e.g. like this:
* [http://www.opf-labs.org/format-corpus/jp2k-test/resolution/balloon_aware.jp2] - Sample file Aware 3.19 (Capture Resolution)
h3. Existing collections

h2. References
References to literature, etc.
* Use [Apache Preflight|Apache PDFBox] to detect encrypted files in collection.
* In some cases it may be possible to obtain unencrypted version from the original depositor/publisher.


h2. Example files
* [http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/] - PDF Cabinet of Horrors on OPF Format Corpus