|
Key
This line was removed.
This word was removed. This word was added.
This line was added.
|
Comment:
Removed evaluation section. Can add it later if required.
Changes (1)
View Page History

| *Detailed description* | Some forms of content arrive at the preserving institution and will be preserved "as is" regardless of how the files have been constructed (eg. web archived content). Other content can be acquired under a specific agreement with the creator or publisher, and the preserving institution typically expects the content in a particular form. This may go further than describing formats used, and will actually describe specific technical constraints on the construction of the files. For example, the BL's Technical Guidelines for Digitisation state that digitised TIFFs should be TIFF version 6, LZW compressed and each TIFF should contain only one image. These technical constraints are typically described as a "format profile". \\
If content received from the creator or publisher does not conform to the agreed profile, the preserving institution can reject the content and request new/revised/re-scanned content. However, the preserving organisation must have the capability to verify a digital object's compliance with a profile, and if it is not compliant, identify how it fails. It is necessary to perform this check in an automated manner. \\
\\
The SCAPE project proposal calls this "Policy Driven Validation". Policy is most likely not the right word - it would be better to call it something like "profile" \\
\\
Image files may be constucted imperfectly or may damaged during storage or transfer. It would therefore be useful to be able to verify in an automated fashion that the files are complete (i.e have not been arbitrarily truncated) and that the files are valid and/or will render in one or more common viewing applications without error. Examples of truncated JPEG2000s in the JISC1 dataset are typically reported as valid and well formed by JHOVE. \\
\\
Example 1: JISC1 Newspapers \\
Within this dataset there are a number of truncated JPEG2000 images. These should be checked for completeness, validity and renderability (i.e. renders in one or more typical JPEG2000 viewers). \\
\\
Example 2: Brightsolid Newspapers \\
Digitisation of this collection is ongoing. There is a need to check in and QA new JPEG2000 images. This should involve a check that each image conforms to the new BL JPEG2000 profile, as well as checking for completeness, validity and renderability. The BL profile can be found at the end of this page. \\ |
| *Scalability Challenge* \\ | Large scale digitisation projects need to check in content and verify its compliance to a profile quickly and efficiently despite the high volume of data. For example, JPEG2000s digitised for a current BL project will be received at between 0.25 and 0.5TB per day. Checking must be performed at a sufficient rate to prevent a build up of material and allow timely rejection of content that does not match the profile (problem pages can be re-digitised if issues are identified in a timely manner: i.e. within days rather than weeks). |
| *Issue champion* | [Maureen Pennock|https://portal.ait.ac.at/sites/Scape/_layouts/userdisp.aspx?ID=28] (BL) |
| *Other interested parties* \\ | [Sven Schlarb|https://portal.ait.ac.at/sites/Scape/_layouts/userdisp.aspx?ID=32](ONB) \\
Christy Henshaw (Wellcome Library, UK) (external) \\
Ross Spencer (The National Archives, UK) (external) \\
[Bjarne Andersen|https://portal.ait.ac.at/sites/Scape/_layouts/userdisp.aspx?ID=8] (SB) - SB is interested, but we cannot work on this issue until the relevant digitisation project (Newspapers) have begun |
| *Possible Solution approaches* | * Any developments to meet this Issue should consider the following, and ensure appropriate liaison where solutions may exist or may be under development:
** JHOVE/JHOVE2 may provide some of the solution if developed further. Will JHOVE2 developments meet these needs?
** Wellcome Library may develop some solutions in this area
** Ross Spencer has done some development in this area which might work well with Johan's developments (discussions are ongoing)
** Modification of existing rendering tools to do thorough parsing / rendering check
* KEEPS
** Watch may contribute for the solution with the triggers:
*** Monitor characterization tools
*** Monitor changes in policy
* SB
** _1. Develop language (XML ?) to describe institutional collection profiles_
** _2. Write comparator that compares the output of characterisation tools with the profile to judge if files conform not only to the formal file format specification but also to the local institutional requirements_
** _3. This "judgement" to potentially be used i a Taverna workflow to sort large amounts of files in basically 2 piles: those that conform to the profile and those that do not conform._ |
| *Context* | _Details of the institutional context to the Issue. (May be expanded at a later date)_ \\ |
| *Lessons Learned* | _Notes on Lessons Learned from tackling this Issue that might be useful to inform the development of Future Additional Best Practices, Task 8 (SCAPE TU.WP.1 Dissemination and Promotion of Best Practices)_ \\ |
| *Training Needs* | _Is there a need for providing training for the Solution(s) associated with this Issue? Notes added here will provide guidance to the SCAPE TU.WP.3 Sustainability WP._ \\ |
| *Datasets* | [SP:JISC1 19th Century Digitised Newspapers]\\
Brightsolid Newspapers (TBC) \\
Brightsolid Newspapers (TBC) \\
\\ [Scanned [Danish scanned books (TIFF format)] |
| *Solutions* | * [SO15 JP2 validator and properties extractor (jpylyzer)|SO15 JP2 validator and properties extractor] provides a basic check that a JP2 file is complete
* [SP:SO30 Automated assessment of JP2 against a technical profile] uses jpylyzer output and assesses it against schematron-encoded profile |
* [SP:SO30 Automated assessment of JP2 against a technical profile] uses jpylyzer output and assesses it against schematron-encoded profile |


| *Parameter/Field* | *Value* |
| Compression | Lossy (detail TBC) \\ |
| Number of components | 3 |
| Component Transform | Yes (irreversible) |
| Tile size | One tile for entire image |
| Wavelet Filter | 9-7 irreversible |
| Number of levels | Variable; 6 used for test image |
| Number of layers | Multiple |
| Progression order | RPCL |
| Codestream markers | Packet-length markers |
| Precincts | 256x256, 256x256,128x128 |
| Codeblock size | 64x64 |
| Coder Bypass | Yes |
h1.