h2. Description
PDF permits the use of encryption as a means of restricting access or (re-)use of content. This may range from documents that can only be opened after providing a password, to disabling specific functionality (e.g. printing, copying content).
h2. Risks
* Content may become inaccessible if passwords are not known (even though "cracking" is often technically possible, institutions may not be legally permitted to do this)
* Printing / copy restrictions may complicated any future preservation actions
h2. Assessment
The following table shows the relevant output of _Apache Preflight_ (part of [Apache PDFBox]) for 4 different types of password protection. Results obtained with _Preflight_ 2.0.0, revision 1530740:
|*Reference file*|*Description*|*Error Code(s)*|*Details*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|Requires password to open the file|1.0|Syntax error, Error (CryptographyException) while creating security handler for decryption: Error: The supplied password does not match either the owner or user password in the document|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|Requires password to copy document contents|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|Requires password for printing|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|Requires password to enable text access for screen reader devices for the visually impaired|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
h3. Detection of encryption and access permissions using ExifTool
Aside from telling you that a PDF is encrypted, Apache Preflight doesn't provide any details on the specific access rights and restrictions (e.g. Print, Modify, Copy, Extract, etc.). If this information is needed one option is to use [ExifTool]. Based on tests with the test documents above, _Exiftool_'s behavior is as follows:
* ff a document requires a password to open it, _ExifTool_ reports a warning;
* if any functionality is restricted, _ExifTool_'s output contains an _Encryption_ element, as well as a _UserAccess_ element that lists all the permitted functionality.
The following table shows the result for the tests documents (in this case _ExifTool_ was run with the _-X_ switch, producing RDF output):
|*Reference file*|*Location in output*|*Text*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|"/rdf:RDF/rdf:Description/ExifTool:Warning" \\ "/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Document is password protected (use Password option) \\Standard V4.4 (128-bit) \\ Print, Modify, Copy, Annotate, Fill forms, Extract, Print high-res|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess" |Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Extract, Print high-res|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Modify, Copy, Annotate, Fill forms, Extract|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Print high-res|
h2. Recommendations
h3. Pre-ingest
* Use [Apache Preflight|Apache PDFBox] to establish if files are encrypted.
* Formulate policy on how to deal with encryption (e.g. reject any PDFs that are encrypted, reject PDFs that require an open password, etc).
h3. Existing collections
* Use [Apache Preflight|Apache PDFBox] to detect encrypted files in collection.
* In some cases it may be possible to obtain unencrypted version from the original depositor/publisher.
h2. Example files
* [http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/] - PDF Cabinet of Horrors on OPF Format Corpus
PDF permits the use of encryption as a means of restricting access or (re-)use of content. This may range from documents that can only be opened after providing a password, to disabling specific functionality (e.g. printing, copying content).
h2. Risks
* Content may become inaccessible if passwords are not known (even though "cracking" is often technically possible, institutions may not be legally permitted to do this)
* Printing / copy restrictions may complicated any future preservation actions
h2. Assessment
The following table shows the relevant output of _Apache Preflight_ (part of [Apache PDFBox]) for 4 different types of password protection. Results obtained with _Preflight_ 2.0.0, revision 1530740:
|*Reference file*|*Description*|*Error Code(s)*|*Details*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|Requires password to open the file|1.0|Syntax error, Error (CryptographyException) while creating security handler for decryption: Error: The supplied password does not match either the owner or user password in the document|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|Requires password to copy document contents|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|Requires password for printing|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|Requires password to enable text access for screen reader devices for the visually impaired|1.4.2|Trailer Syntax error, The trailer dictionary contains Encrypt|
h3. Detection of encryption and access permissions using ExifTool
Aside from telling you that a PDF is encrypted, Apache Preflight doesn't provide any details on the specific access rights and restrictions (e.g. Print, Modify, Copy, Extract, etc.). If this information is needed one option is to use [ExifTool]. Based on tests with the test documents above, _Exiftool_'s behavior is as follows:
* ff a document requires a password to open it, _ExifTool_ reports a warning;
* if any functionality is restricted, _ExifTool_'s output contains an _Encryption_ element, as well as a _UserAccess_ element that lists all the permitted functionality.
The following table shows the result for the tests documents (in this case _ExifTool_ was run with the _-X_ switch, producing RDF output):
|*Reference file*|*Location in output*|*Text*|
|[encryption_openpassword.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_openpassword.pdf]|"/rdf:RDF/rdf:Description/ExifTool:Warning" \\ "/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Document is password protected (use Password option) \\Standard V4.4 (128-bit) \\ Print, Modify, Copy, Annotate, Fill forms, Extract, Print high-res|
|[encryption_nocopy.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_nocopy.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess" |Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Extract, Print high-res|
|[encryption_noprinting.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_noprinting.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Modify, Copy, Annotate, Fill forms, Extract|
|[encryption_notextaccess.pdf|http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/encryption_notextaccess.pdf]|"/rdf:RDF/rdf:Description/PDF:Encryption" \\ "/rdf:RDF/rdf:Description/PDF:UserAccess"|Standard V4.4 (128-bit) \\ Print, Modify, Annotate, Fill forms, Print high-res|
h2. Recommendations
h3. Pre-ingest
* Use [Apache Preflight|Apache PDFBox] to establish if files are encrypted.
* Formulate policy on how to deal with encryption (e.g. reject any PDFs that are encrypted, reject PDFs that require an open password, etc).
h3. Existing collections
* Use [Apache Preflight|Apache PDFBox] to detect encrypted files in collection.
* In some cases it may be possible to obtain unencrypted version from the original depositor/publisher.
h2. Example files
* [http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/] - PDF Cabinet of Horrors on OPF Format Corpus