compared with
Current by William Palmer
on Aug 13, 2014 12:11.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (3)

View Page History



# Using Govdocs1 corpus (231,683 PDFs/ 127.8GB) for initial testing - [http://digitalcorpora.org/corpora/files/|http://digitalcorpora.org/corpora/files/]
# -Seeking access to internal dataset of PDFs (~40k) (not currently tested)-



h2. Workflow

* Manual scan for "/encrypt" keyword in the PDF
* Check PdfReader.isEncrypted() with iText
* NOTE: checks are not currently made against print/copy restrictions etc

Ideally the current checks for validity and DRM will be validated against a set of files with a known ground-truth.