Corrupted JPEG and JPEG2000 files solution

compared with
Current by Paul Wheatley
on May 08, 2012 12:55.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (9)

View Page History
| *Title* | _Corrupted JPEG and JPEG2000 files solution_ \\ |
| *Detailed description* | {excerpt}{_}I tried some edge detection, but the pages full of text and line drawings had too many edges, so the edges of the corrupted areas were no more visible than before._ \\ _I converted the JPEGs to smaller 1 bit PNGs, so that processing them would be quicker._ \\
_I converted the JPEGs to smaller 1 bit PNGs, so that processing them would be quicker._ \\
\\ {code}
\\
{code}
for j in `ls directory/*.jpg`; do jpegtopnm ${j} | pamscale -xscale 0.3 -yscale 0.3 | pamditherbw -threshold | pnmtopng > small_directory/`basename ${j} .jpg`.png; done
{code}
_I wrote a Python2 script to find areas of black. The program would first look for rows which had a higher than average number of black pixels and were contiguous. Within these rows, it would then look for columns which were largely black and contiguous. It reports files which have such areas and also produces mask image files which show where the black areas were found._ \\ _It is run as so:_ \\
_It is run as so:_ \\
\\ {code}
{code}
./black_area_identifier.py small_newspapers/DUCR-1896-10-31-000[0-9].png > results.txt
{code}
The mask files will be put into small_newspapers, and the names of any images with black areas will be put into results.txt.{excerpt} |
| *Solution Champion* | ___[~swithun]_ \\ |
| *Corresponding Issue(s)* | * _[SPR:Corrupted JPEG and JPEG2000 files]_ |
| *Tool/code link* | _[SPRUCE/tree/master/black_pixels|https://github.com/openplanets/SPRUCE/tree/master/black_pixels]_ \\ |
| *[Tool Registry Link|http://wiki.opf-labs.org/display/TR/Home]* | * ___[Python|http://wiki.opf-labs.org/label/TR/python]_
* [NetPBM|http://netpbm.sourceforge.net/] |