Description
JP2 files may fall victim to bit- or byte-level corruption. Examples are:
- Hard- or software failure during the creation process results in files that are truncated or have missing data
- A JP2 gets damaged because of a faulty file transfer operation
- A JP2 gets damaged because of deterioration of the physical storage medium ('bit rot')
The first 2 examples result in damage at the byte level (i.e. bytes are lost, or extra bytes are inserted). In the case of 'bit rot' the total number of bytes remains the same, but individual bit values are changed (zeros become ones and v.v.).
Risks
Bit and byte level corruption can result in files becoming unreadible. Alternatively, damaged JP2s may display, but information may be missing. It is noteworthy to point out that JPEG 2000 was specifically designed to allow viewers to display (part of) the information of an image without loading the file in its entirety, so depending on the viewer used (and its settings) damaged files may still display.
Assessment
Byte-level corruption
Jpylyzer detects most types of byte-level corruption. JP2s that are damaged at the byte level do not pass validation.
Tool | Affected if expression returns True |
Jpylyzer | "/jpylyzer/isValidJP2 = 'False'" |
At the codestream level, Jpylyzer performs the following checks:
foundExpectedNumberOfTiles | : Number of encountered tiles is consistent with expected number of tiles |
foundExpectedNumberOfTileParts | : For all tiles, number of encountered tile parts is consistent with expected number of tile parts |
foundEOCMarker | : Last 2 bytes in codestream constitute an end of codestream (EOC) marker segment |
In addition, Jpylyzer checks the actual length (in bytes) of each tile part against the length information in the tile part header (Psot field in the Start of tile-part marker segment). A limitation of this approach is that the standard permits encoders to exclude this length information in the header of the last tile part in the codestream (i.e. Psot is set to 0, in which case the tile part is assumed to contain all the remaining data in the codestream, up to the end of codestream (EOC) marker). Most encoders appear to include this information anyway, but this leaves the possibility that Jpylyzer may not detect missing (or extra) bytes in the last tile part of an image.
Bit-level corruption
Periodic checksum checks will detect bit-level corruption that is due to deterioration of the physical storage medium. Jpylyzer is of little value here, as it only picks up bit-level corruption (or 'bit rot') if fields in the JP2 or codestream headers are affected (which only make up a tiny fraction of the file).
Recommendations
Image creation
- Encode using JPEG 2000's error resilience features (start-of-packet markers, end-of-packet markers and segmentation symbols). In case of damaged codestreams this will at least allow decoders to recover from errors and limit their impact on the rendered image. See also this paper by Volker Heydegger
.
- Do not encode the whole image as one single tile (as, depending on the encoder's defaults, this may limit the possibilities for verifying the completeness of the codestream).
- Use Jpylyzer to establish that files are valid JP2
- Do exact (lossless) or approximate (lossy) pixel-wise comparison between source and destination images
- Compute checksum directly after an image is created
Pre-ingest
- Verify checksum (if available)
- Use Jpylyzer to establish that files are valid JP2
Existing collections
Regular checksum checks + keep multiple copies.
Example files
- http://www.opf-labs.org/format-corpus/jp2k-test/byteCorruption/balloon_trunc1.jp2
- last byte missing
- http://www.opf-labs.org/format-corpus/jp2k-test/byteCorruption/balloon_trunc2.jp2
- image truncated at byte 5000
- http://www.opf-labs.org/format-corpus/jp2k-test/byteCorruption/balloon_trunc3.jp2
- missing data in most of last tile-part