Corrupted JPEG and JPEG2000 files

Detailed description JPEG/JPEG2000 scans are sometimes corrupted. They contain areas which come from other areas. When such an area contains a dark area (edge of scan showing page edges), this is particularly visible. When one area of text is on top of another area of text, it is less visible. The images have also been rotated after the corruption occurred.
Also described here: Shifted Crop Corruption
Issue champion Paul Wheatley
Possible Solution approaches
  • Put the images through a filter to detect edges
  • Write program to find dark areas
Datasets BL 19th Century digitised newspaper collection
Solutions Corrupted JPEG and JPEG2000 files solution
