compared with
Current by Johan van der Knijff
on Jul 11, 2014 14:36.

This line was removed.
This word was removed. This word was added.
This line was added.

Changes (11)

View Page History
h2. Description
PDF permits the use of encryption as a means of restricting access or (re-)use of content. This may range from documents that can only be opened after providing a password, to disabling specific functionality (e.g. printing, copying content).

h2. Risks
* Content may become inaccessible if passwords are not known (even though "cracking" is often technically possible, institutions may not be legally permitted to do this)

h2. Assessment
How to assess whether a particular file is affected by this. Includes tool link(s) + concise description of how we can get the required information from this tool. Use XPath expressions if tool produces output in XML format, e.g. see below:

|*Tool*|*Affected if expression returns _True_*|
|[Jpylyzer]| {{"/jpylyzer/isValidJP2 = 'False'"}}|
The following table shows the relevant output of _Apache Preflight_ (part of [Apache PDFBox]) for 4 different types of password protection. Results obtained with _Preflight_ 2.0.0, revision 1530740:

|*Reference file*|*Description*|*Error Code(s)*|*Details*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|Requires password to open the file|1.0|Syntax error, Error (CryptographyException) while creating security handler for decryption: Error: The supplied password does not match either the owner or user password in the document|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|Requires password to copy document contents|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|Requires password for printing|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|Requires password to enable text access for screen reader devices for the visually impaired|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|

h3. Detection of encryption and access permissions using ExifTool
Aside from telling you that a PDF is encrypted, Apache Preflight doesn't provide any details on the specific access rights and restrictions (e.g. Print, Modify, Copy, Extract, etc.). If this information is needed one option is to use [ExifTool]. Based on tests with the test documents above, _Exiftool_'s behavior is as follows:

* ff a document requires a password to open it, _ExifTool_ reports a warning;
* if any functionality is restricted, _ExifTool_'s output contains an _Encryption_ element, as well as a _UserAccess_ element that lists all the permitted functionality.

The following table shows the result for the tests documents (in this case _ExifTool_ was run with the _-X_ switch, producing RDF output):

|*Reference file*|*Location in output*|*Text*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|"/rdf:RDF/rdf:Description/ExifTool:Warning" \\ "/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Document is password protected (use Password option) \\Standard V4.4 (128-bit) \\ Print, Modify, Copy, Annotate, Fill forms, Extract, Print high-res|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess" |Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Extract, Print high-res|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Modify, Copy, Annotate, Fill forms, Extract|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Print high-res|

h2. Recommendations
Recommendations on what action(s) to pursue in case file is affected by this problem. Optional.

h2. Example files
Links to example files, preferrably from [OPF Format Corpus|http://www.opf-labs.org/format-corpus/], e.g. like this:
* [http://www.opf-labs.org/format-corpus/jp2k-test/resolution/balloon_aware.jp2] - Sample file Aware 3.19 (Capture Resolution)
h3. Pre-ingest

h2. References
References to literature, etc.
* Use [Apache Preflight|Apache PDFBox] to establish if files are encrypted.
* Formulate policy on how to deal with encryption (e.g. reject any PDFs that are encrypted, reject PDFs that require an open password, etc).

h3. Existing collections

* Use [Apache Preflight|Apache PDFBox] to detect encrypted files in collection.
* In some cases it may be possible to obtain unencrypted version from the original depositor/publisher.

h2. Example files
* [http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/] - PDF Cabinet of Horrors on OPF Format Corpus