Malformed TIFF images solution

Skip to end of metadata
Go to start of metadata
Title Malformed TIFF images solution
Detailed description A Python2 script takes files and directories as command line arguments. It attempts to open images using PIL (Python Imaging Library) and collects statistics on the images which can indicate how much black there is in the images. For an unmodified scan or photo, there shouldn't be any pure black. The output from the script can be opened in a spreadsheet application.
Examining the TIFF files using hexdump, the black areas are caused by black pixels (0s), rather than un-terminated tags.
In the sample, most TIFFs did not have black areas. Quite a few did, and a small number of TIFFs were nothing but 0s.
Solution Champion Swithun Crowe
Corresponding Issue(s)
Tool/code link https://github.com/downloads/openplanets/SPRUCE/tiff_black_pixel_reporter.py
also checked in as SPRUCE/tree/master/black_pixels
Tool Registry Link
Notes The script is run as so:


It will process any images below directory1, the files somefile.tif, another.tiff and any images below directory2. The output is in CSV:


The first line shows an image which has no black pixels, so no percentage of black pixels. The second line shows an image which has a large number of black pixels, covering 95% of the image. The third line shows an image which is full of zeros. It couldn't be opened by PIL as an image, so the width and height couldn't be established.
Evaluation
  • Cause of problem determined, solution is quite straightforward
  • Python script, uses imaging library
  • High numbers of black pixels, strongly indicative of problem files
  • Recurses through files and sub-directories
  • Issues owner: This solves the problem, easy to use solution, investigation of cause of black pixels can now be made, useful for checking outputs of other digitisation projects. Jenny also had a similar problem previously, so is interested in this solution.
Labels:
tiff tiff Delete
black black Delete
pixels pixels Delete
report report Delete
csv csv Delete
spruce spruce Delete
spruce_glasgow spruce_glasgow Delete
solution solution Delete
bit_rot_detection bit_rot_detection Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.