Encryption

Skip to end of metadata
Go to start of metadata

Description

PDF permits the use of encryption as a means of restricting access or (re-)use of content. This may range from documents that can only be opened after providing a password, to disabling specific functionality (e.g. printing, copying content).

Risks

  • Content may become inaccessible if passwords are not known (even though "cracking" is often technically possible, institutions may not be legally permitted to do this)
  • Printing / copy restrictions may complicated any future preservation actions

Assessment

The following table shows the relevant output of Apache Preflight (part of Apache PDFBox) for 4 different types of password protection. Results obtained with Preflight 2.0.0, revision 1530740:

Reference file Description Error Code(s) Details
encryption_openpassword.pdf Requires password to open the file 1.0 Syntax error, Error (CryptographyException) while creating security handler for decryption: Error: The supplied password does not match either the owner or user password in the document
encryption_nocopy.pdf Requires password to copy document contents 1.4.2 Trailer Syntax error, The trailer dictionary contains Encrypt
encryption_noprinting.pdf Requires password for printing 1.4.2 Trailer Syntax error, The trailer dictionary contains Encrypt
encryption_notextaccess.pdf Requires password to enable text access for screen reader devices for the visually impaired 1.4.2 Trailer Syntax error, The trailer dictionary contains Encrypt

Detection of encryption and access permissions using ExifTool

Aside from telling you that a PDF is encrypted, Apache Preflight doesn't provide any details on the specific access rights and restrictions (e.g. Print, Modify, Copy, Extract, etc.). If this information is needed one option is to use ExifTool. Based on tests with the test documents above, Exiftool's behavior is as follows:

  • ff a document requires a password to open it, ExifTool reports a warning;
  • if any functionality is restricted, ExifTool's output contains an Encryption element, as well as a UserAccess element that lists all the permitted functionality.

The following table shows the result for the tests documents (in this case ExifTool was run with the -X switch, producing RDF output):

Reference file Location in output Text
encryption_openpassword.pdf "/rdf:RDF/rdf:Description/ExifTool:Warning"
"/rdf:RDF/rdf:Description/PDF:Encryption"
"/rdf:RDF/rdf:Description/PDF:UserAccess"
Document is password protected (use Password option)
Standard V4.4 (128-bit)
Print, Modify, Copy, Annotate, Fill forms, Extract, Print high-res
encryption_nocopy.pdf "/rdf:RDF/rdf:Description/PDF:Encryption"
"/rdf:RDF/rdf:Description/PDF:UserAccess"
Standard V4.4 (128-bit)
Print, Modify, Annotate, Fill forms, Extract, Print high-res
encryption_noprinting.pdf "/rdf:RDF/rdf:Description/PDF:Encryption"
"/rdf:RDF/rdf:Description/PDF:UserAccess"
Standard V4.4 (128-bit)
Modify, Copy, Annotate, Fill forms, Extract
encryption_notextaccess.pdf "/rdf:RDF/rdf:Description/PDF:Encryption"
"/rdf:RDF/rdf:Description/PDF:UserAccess"
Standard V4.4 (128-bit)
Print, Modify, Annotate, Fill forms, Print high-res

Recommendations

Pre-ingest

  • Use Apache Preflight to establish if files are encrypted.
  • Formulate policy on how to deal with encryption (e.g. reject any PDFs that are encrypted, reject PDFs that require an open password, etc).

Existing collections

  • Use Apache Preflight to detect encrypted files in collection.
  • In some cases it may be possible to obtain unencrypted version from the original depositor/publisher.

Example files

Labels:
formatissue formatissue Delete
pdf pdf Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.