View Source

h2. Description
JP2 files may fall victim to bit- or byte-level corruption. Examples are:
# Hard- or software failure during the creation process results in files that are truncated or have missing data
# A JP2 gets damaged because of a faulty file transfer operation
# A JP2 gets damaged because of deterioration of the physical storage medium ('bit rot')

The first 2 examples result in damage at the byte level (i.e. bytes are lost, or extra bytes are inserted). In the case of 'bit rot' the total number of bytes remains the same, but individual bit values are changed (zeros become ones and v.v.).

h2. Risks
Bit and byte level corruption can result in files becoming unreadible. Alternatively, damaged JP2s may display, but information may be missing. It is noteworthy to point out that JPEG 2000 was specifically designed to allow viewers to display (part of) the information of an image without loading the file in its entirety, so depending on the viewer used (and its settings) damaged files may still display.

h2. Assessment

h3. Byte-level corruption
[Jpylyzer] detects most types of byte-level corruption. JP2s that are damaged at the byte level do not pass validation.

|*Tool*|*Affected if expression returns _True_*|
|[Jpylyzer]| {{"/jpylyzer/isValidJP2 = 'False'"}}|

At the codestream level, [Jpylyzer] performs the following checks:

|foundExpectedNumberOfTiles|: Number of encountered tiles is consistent with expected number of tiles|
|foundExpectedNumberOfTileParts|: For all tiles, number of encountered tile parts is consistent with expected number of tile parts|
|foundEOCMarker|: Last 2 bytes in codestream constitute an end of codestream (EOC) marker segment|

In addition, [Jpylyzer] checks the actual length (in bytes) of each tile part against the length information in the tile part header (_Psot_ field in the _Start of tile-part_ marker segment). A limitation of this approach is that the standard permits encoders to _exclude_ this length information in the header of the _last_ tile part in the codestream (i.e. _Psot_ is set to 0, in which case the tile part is assumed to contain all the remaining data in the codestream, up to the end of codestream (EOC) marker). Most encoders appear to include this information anyway, but this leaves the possibility that [Jpylyzer] may not detect missing (or extra) bytes in the last tile part of an image.

h3. Bit-level corruption
Periodic checksum checks will detect bit-level corruption that is due to deterioration of the physical storage medium. [Jpylyzer] is of little value here, as it only picks up bit-level corruption (or 'bit rot') if fields in the JP2 or codestream headers are affected (which only make up a tiny fraction of the file).

h2. Recommendations

h3. Image creation
# Encode using JPEG 2000's error resilience features (start-of-packet markers, end-of-packet markers and segmentation symbols). In case of damaged codestreams this will at least allow decoders to recover from errors and limit their impact on the rendered image. See also [this paper by Volker Heydegger|].
# Do not encode the whole image as one single tile (as, depending on the encoder's defaults, this may limit the possibilities for verifying the completeness of the codestream).
# Use Jpylyzer to establish that files are valid JP2
# Do exact (lossless) or approximate (lossy) pixel-wise comparison between source and destination images
# Compute checksum directly after an image is created

h3. Pre-ingest
# Verify checksum (if available)
# Use Jpylyzer to establish that files are valid JP2

h3. Existing collections
Regular checksum checks + keep multiple copies.

h2. Example files
* [] - last byte missing
* [] - image truncated at byte 5000
* [] - missing data in most of last tile-part

h2. References

[Heydegger, V. Just One Bit in a Million: On the Effects of Data Corruption in Files. Proceedings, 13th European Conference on Research and Advanced Technology for Digital Libraries, p. 315-326. 2009 |]