Effects of bit and byte corruption

Skip to end of metadata
Go to start of metadata

Description

JP2 files may fall victim to bit- or byte-level corruption. Examples are:

  1. Hard- or software failure during the creation process results in files that are truncated or have missing data
  2. A JP2 gets damaged because of a faulty file transfer operation
  3. A JP2 gets damaged because of deterioration of the physical storage medium ('bit rot')

The first 2 examples result in damage at the byte level (i.e. bytes are lost, or extra bytes are inserted). In the case of 'bit rot' the total number of bytes remains the same, but individual bit values are changed (zeros become ones and v.v.).

Risks

Bit and byte level corruption can result in files becoming unreadible. Alternatively, damaged JP2s may display, but information may be missing. It is noteworthy to point out that JPEG 2000 was specifically designed to allow viewers to display (part of) the information of an image without loading the file in its entirety, so depending on the viewer used (and its settings) damaged files may still display.

Assessment

Byte-level corruption

Jpylyzer detects most types of byte-level corruption. JP2s that are damaged at the byte level do not pass validation.

Tool Affected if expression returns True
Jpylyzer "/jpylyzer/isValidJP2 = 'False'"

At the codestream level, Jpylyzer performs the following checks:

foundExpectedNumberOfTiles : Number of encountered tiles is consistent with expected number of tiles
foundExpectedNumberOfTileParts : For all tiles, number of encountered tile parts is consistent with expected number of tile parts
foundEOCMarker : Last 2 bytes in codestream constitute an end of codestream (EOC) marker segment

In addition, Jpylyzer checks the actual length (in bytes) of each tile part against the length information in the tile part header (Psot field in the Start of tile-part marker segment). A limitation of this approach is that the standard permits encoders to exclude this length information in the header of the last tile part in the codestream (i.e. Psot is set to 0, in which case the tile part is assumed to contain all the remaining data in the codestream, up to the end of codestream (EOC) marker). Most encoders appear to include this information anyway, but this leaves the possibility that Jpylyzer may not detect missing (or extra) bytes in the last tile part of an image.

Bit-level corruption

Periodic checksum checks will detect bit-level corruption that is due to deterioration of the physical storage medium. Jpylyzer is of little value here, as it only picks up bit-level corruption (or 'bit rot') if fields in the JP2 or codestream headers are affected (which only make up a tiny fraction of the file).

Recommendations

Image creation

  1. Encode using JPEG 2000's error resilience features (start-of-packet markers, end-of-packet markers and segmentation symbols). In case of damaged codestreams this will at least allow decoders to recover from errors and limit their impact on the rendered image. See also this paper by Volker Heydegger.
  2. Do not encode the whole image as one single tile (as, depending on the encoder's defaults, this may limit the possibilities for verifying the completeness of the codestream).
  3. Use Jpylyzer to establish that files are valid JP2
  4. Do exact (lossless) or approximate (lossy) pixel-wise comparison between source and destination images
  5. Compute checksum directly after an image is created

Pre-ingest

  1. Verify checksum (if available)
  2. Use Jpylyzer to establish that files are valid JP2

Existing collections

Regular checksum checks + keep multiple copies.

Example files

References

Heydegger, V. Just One Bit in a Million: On the Effects of Data Corruption in Files. Proceedings, 13th European Conference on Research and Advanced Technology for Digital Libraries, p. 315-326. 2009

Labels:
formatissue formatissue Delete
jp2 jp2 Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.