Title | The name of the solution. This should match the title of this page Identify Files Affected by Truncated/Fuzzy JPEG2000 |
Detailed description | A detailed description of the Solution. Feel free to include links to further information (eg. OPF blog posts!). Note that a Solution is a specific digital preservation application of a software tool or tools. It might for example be a scripted tool, or a myExperiment workflow Initially it was felt that the structure of the J2 files would be deficient in some way. Johan van Der Knijff has developed a python tool to do a simple top level check of J2 structure, which checks for the existence of all top level J2 boxes and if the main end marker is in the right place. j2structCheck across 8 images (4 assumed good, 4 known bad) j2structCheck found 1 image with a problem (a true positive) evidence from j2 filesize (the same 8 images, plus some easy to find known worse) Each j2 for a particular newpaper set should have a comparable filesize (i.e. for two of the sets, approx 20Mb and 17Mb respectively) Files ( -lt ) 1Mb are definitely affected by truncation (and would also be identified by j2structCheck) Files ( -lt ) 1/2 average j2 filesize are likely to exhibit problems evidence from original tif (converted to j2) if the tifs are available, there should be a one-to-one match of tiff to j2 if the count of tiffs ( - ne ) ( not= ) count of j2 (most likely case is j2 cnt ( - lt ) tif cnt) the j2 cnt should be even - except for rare circumstances where both sides have not been digitised j2 cnt is very likely to be a multiple of 4 potential enhancements to j2structCheck use a known bad file (such as the Belfast Newsletter 5Mb j2) and analyse the whole structure rather than the top level see if there are any truncated markers or strange structure things (compared to an assumed good) XnView refused to open any of the j2 files that were in the known bad set, but opened all of the assumed goods - so there has to be a structural difference that differentiates them for this tool. (XnView is built on jj2000, not clear how open source it is (although jad may reveal what is going on too)) |
Solution Champion | Who owns the Solution? Include an email address if possible ![]() |
Corresponding Issue(s) | A bulletted list of links to Issues to which this provides a Solution Truncated JPEG2000 |
Tool/code link | JP2 file structure checker![]() |
Tool Registry Link![]() |
If possible provide a link to information about any third party tools used. Ideally these should point to entries in the OPF Tool Registry Johan van Der Knijff - j2structCheck.py Johan's blog post about the JP2 file structure checker:http://openplanetsfoundation.org/blogs/2011-09-01-simple-jp2-file-structure-checker ![]() Tool registry link: jp2StructCheck |
Evaluation | Any notes or links on how the solution performed |
Labels:
3 Comments
comments.show.hideSep 29, 2011
Johan van der Knijff
Great to see some testing of jp2Structcheck! Two comments:
1. For November I'm planning to have a go at expanding the functionality of the tool to do full JP2 validation. See also the comment I added to my blog post:
http://www.openplanetsfoundation.org/blogs/2011-09-01-simple-jp2-file-structure-checker#comment-200
If someone (probably Paul) could make the test images available to me that would be really helpful. Apparently there's some more complex corruption going on here, and not a plain truncation of the code stream. I'd like to have a more in-depth look at this and see if it's something that can be detected without decoding the whole codestream.
2. I see you're using XnView here to assess the quality of your images. I would strongly advise not to use this software for this particular purpose, as the last time I checked it was using the JasPer library (not JJ2000!) for its JPEG 2000 support, and this library has known performance issues that may result in all sorts of unexpected behaviour for large images. I once did some tests with XnView which confirmed this.
Two (better) alternatives are:
- IrfanView
with the format plugins
installed. The plugins pack includes a JPEG 2000 plugin by Luratech (both free, but not open source)
- The kdu_view application that is part of Kakadu
(also free but not open source)
Sep 30, 2011
Derek Sergeant
Responding to both comments in turn:
1) This would be great, and I am sure that Paul (or Karl) could get you access to a wide range of j2 images from this particular collection. It would certainly be worth looking at the 4 "known bad" files, of which at least one is definitely bad but does contain all the relevant top level boxes and the correct end of stream marker. I am not sure whether the two larger images are actually bad.
2) Although XnView might not be the best choice for assessing the quality of images, it was randomly chosen from a page of tools built using jj2000 (it might not belong in that list, if it does use JasPer), and it did behave differently on "perfect" j2s (from the Belfast Newsletter) than on the truncated j2s (all 4 "known bad") - it threw an error saying that it could not open that type of file (not a j2?). This shows that there is a difference (probably structural) between the two sets of j2s, which were (hopefully) created by the same software workflow from the same tiffs. I did take Irfan view and use this to open all of the j2s from both sets. If I had spent more time on this problem then it would have been helpful to look at what metadata these tools say is associated with the j2s. So just to restate - XnView was going to be used to simply view j2s, but it also turns out to be fussier about which j2s it can render.
3) jp2structCheck is neat, and easy to get hold of and use quickly. (All I did was add python to cygwin, and perhaps add one module so that it was available.) The startup ramp of taking this and using it to read j2s - and perhaps convert to cropped pngs or do analysis would certainly be less than writing an imageJ program.
Dec 05, 2011
Johan van der Knijff
An update on this: at the moment I'm working on a JP2 validator/properties extractor tool that extends the functionality of jp2structcheck to a full validation of the JP2 format, and a more thorough (although still somewhat limited) validation of the codestream. While working on this I found a better way to identify the truncated JP2s. In order to understand how this works, see the figure below for a general overview of the layout of the codestream (figure is based on Figure A.2 in 'Annex A: Codestream Syntax' of the standard:
What we see here is that a codestream starts with a 'start of codestream' marker, followed by a series of main header fields. The remainder is a series of one or more tile parts. Each tile part is made up of a 'start of tile-part' marker (SOT), followed by a series of tile-part header fields. These header fields are terminated by a 'start of data' marker (SOD), which is followed by the compressed bitstream for that tile part. The codestream is terminated by an 'end of codestream' (EOC) marker.
The important thing here is that each tile-part's 'start of tile' marker contains a field called 'Psot' that defines the tile-part's total length (in bytes). This provides a way of checking a tile-part's completeness: starting from each SOT marker, moving 'Psot' positions foward should always get us to the next SOT marker (or, for the last tile-part, to the EOC marker). If this is not the case, it means that something is wrong (e.g. missing data).
I incorporated this check in my validator tool, and I managed to identify the damaged images in a small set of test images that I received from Carl Wilson.
There is one tricky bit here: the standard says that the value of Psot may have a value of 0 (zero) for the last tile-part in the codestream. In that case the tile-part is assumed to contain all data until the EOC marker. The BL's specifications state that no tiles should be used (if I remember well this was to avoid tiling artefacts with lossy compression), which means that each image only contains one single tile-part. Now the images that I've seen do contain Psot values that are actually meaningful, but the standard doesn't give any guarantees on this! It's entirely possible that other encoders will produce a 0 value, and in that case there would be no way to detect this corruption (perhaps unless we would dig even deeper in the codestream).
Based on this the BL might want to reconsider their decision not to use tiles, as the information in the tile-parts can be really helpful for detecting within-codestream corruption.
I will publish my JP2 validator through the OPF blog shortly, probably by mid-December.