Valid and well-formed TIFF's with scanline corruption
At NANETH we sometimes encounter TIFF's which render incorrectly or do not render at all, although they are being marked as 'valid' and 'well-formed' when inspected with JHOVE.
Photoshop shows a corrupted 'double image' while IrfanView only shows some random pixels on top of a black image. The ImageMagick viewer is not able to render the image, ImageMagick 'identify' reports that there is not enough data available in scanline 'x'. The SPRUCE "Black and White Pixel Detector", which utilizes the Python Image Library (PIL), also does not report any corruption (ie. black or white pixels) as well.
This is a major issue because validation tools mark the images being 'valid' and 'well-formed'. The solution for this issue would detect this corruption. Ideally the solution is a Python CLI application which can also be used in a automated workflow. Since the images are quite big (around 10 MB), it would need a clever algorithm and use parallel/multi processing.
Maurice de Rooij
Other interested parties
Any other parties who are also interested in applying Issue Solutions to their Datasets.
Possible Solution approaches
This issue potentially impacts all TIFF images in our collection. Checking if a file is valid and well-formed seems not enough to prove that it is not corrupted. Ideally we would need a non-visual renderer in our workflows which covers and respects each aspect of the TIFF format specification and reports back any error.
Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice