
# We need to be able to assess legacy files and deal with them appropriately
# We need to be able to assess files prior to ingest and deal with them appropriately
# We would ideally do 2 & 3 on the basis of some machine readable policy
h2. Experiments
_Create experiments as child pages and they should appear automatically here_
{pageTree:
[email protected]}
Characterisation of ebook formats to identify DRM, etc. as per BL ingest policy (PC)
Data: No. Awaiting test data from publishers. Will not be public.
Workflow: No.
Issues: See data.
Wrap tool for use in Rosetta & execute over some content (OK)
Data: TBD
Workflow: No
Issues: Not yet\!
h2. Developer Notes
TBC, for PDF a possible approach would be to use the Apache Preflight PDF/A validator (part of PDFBox) to identify all potential risks, and then evaluate the output against a set of business rules that correspond to low-level (control) policies. This could be done with Schematron (requires development of XML output handler for Preflight\!), resulting in an approach similar to the JPEG 2000 / jpylyzer work. See also:
[http://www.openplanetsfoundation.org/comment/385#comment-385|http://www.openplanetsfoundation.org/comment/385#comment-385]
For EPUB something similar could be done using the EpubCheck tool.
Also this policy validation is something SCAPE's SCOUT should/could deal within.
h2. Related Documents