|| Detecting Encryption/DRM in Digital Content
|Detailed description|| Many file formats make provision for the encryption of content, e.g. password protected PDFs. Outside of formats software exists that will encrypt data at a file, directory, and device level, e.g. encrypted hard drives. Encrypted content is not suitable for long term preservation purposes because the content is inaccessible. This issue will not be solved by a single tool as many forms of encryption are format specific, instead a collection of tools will be needed.
Digital Rights Management is closely to encryption and is usually used by content producers trying to ensure that content is only accessible by legitimate (paying) users.
| Scalability Challenge
|| The large variety of encryption techniques employed across many formats makes this a complex issue.
|Issue champion||Maureen Pennock (BL)|
| Other interested parties
||Potentially many, it's a generic and common issue.|
|Possible Solution approaches|| PDF Box for PDF encryption.
Apache POI for Office docs, or talk to MS research.
Java zip library for container formats (e.g. zip).
Calibre to detect DRM in Ebook formats
|Context|| Details of the institutional context to the Issue. (May be expanded at a later date)
|Lessons Learned|| Notes on Lessons Learned from tackling this Issue that might be useful to inform the development of Future Additional Best Practices, Task 8 (SCAPE TU.WP.1 Dissemination and Promotion of Best Practices)
|Datasets||A requirement to build a dataset of sample encrypted content in different formats.|
|Solutions||Reference to the appropriate Solution page(s), by hyperlink|
|Objectives||Which scape objectives does this issues and a future solution relate to? e.g. scaleability, rubustness, reliability, coverage, preciseness, automation|
|Success criteria||Describe the success criteria for solving this issue - what are you able to do? - what does the world look like?|
|Automatic measures|| What automated measures would you like the solution to give to evaluate the solution for this specific issue? which measures are important?
If possible specify very specific measures and your goal - e.g.
* process 50 documents per second
* handle 80Gb files without crashing
* identify 99.5% of the content correctly
|Manual assessment|| Apart from automated measures that you would like to get do you foresee any necessary manual assessment to evaluate the solution of this issue?
If possible specify measures and your goal - e.g.
* Solution installable with basic linux system administration skills
* User interface understandable by non developer curators
|Actual evaluations||links to acutual evaluations of this Issue/Scenario|