| *Title* | _Corrupted JPEG and JPEG2000 files solution_ \\ |
| *Detailed description* | {excerpt}{_}I tried some edge detection, but the pages full of text and line drawings had too many edges, so the edges of the corrupted areas were no more visible than before._ \\ _I converted the JPEGs to smaller 1 bit PNGs, so that processing them would be quicker._ \\
\\
{code}
for j in `ls directory/*.jpg`; do jpegtopnm ${j} | pamscale -xscale 0.3 -yscale 0.3 | pamditherbw -threshold | pnmtopng > small_directory/`basename ${j} .jpg`.png; done
{code}
_I wrote a Python2 script to find areas of black. The program would first look for rows which had a higher than average number of black pixels and were contiguous. Within these rows, it would then look for columns which were largely black and contiguous. It reports files which have such areas and also produces mask image files which show where the black areas were found._ \\ _It is run as so:_ \\
\\
{code}
./black_area_identifier.py small_newspapers/DUCR-1896-10-31-000[0-9].png > results.txt
{code}
The mask files will be put into small_newspapers, and the names of any images with black areas will be put into results.txt.{excerpt} |
| *Solution Champion* | ___[~swithun]_ \\ |
| *Corresponding Issue(s)* | * _[SPR:Corrupted JPEG and JPEG2000 files]_ |
| *Tool/code link* | _[SPRUCE/tree/master/black_pixels|https://github.com/openplanets/SPRUCE/tree/master/black_pixels]_ \\ |
| *[Tool Registry Link|http://wiki.opf-labs.org/display/TR/Home]* | * ___[Python|http://wiki.opf-labs.org/label/TR/python]_
* [NetPBM|http://netpbm.sourceforge.net/] |
| *Evaluation* | * _Solution champion: I'd hoped to use Fortran77, but the PBM (ascii) library didn't work. Python is reasonably fast at processing the small images, but slow on the original images._
* _Converts to monchrome image (PBM), isolates corruption where there's a large area of black, identifies contiguous rows where there is a high black pixel count, creates bounding box_
* _Weightings for identifying corruption areas can be tweaked _
* _Issue owner: Excellent\! This has been looked at in previous mashups but not solved. Can now take this away and run it over the collection to determine scope of problem, and more thoroughly test accuracy of the solution._ |
| *Detailed description* | {excerpt}{_}I tried some edge detection, but the pages full of text and line drawings had too many edges, so the edges of the corrupted areas were no more visible than before._ \\ _I converted the JPEGs to smaller 1 bit PNGs, so that processing them would be quicker._ \\
\\
{code}
for j in `ls directory/*.jpg`; do jpegtopnm ${j} | pamscale -xscale 0.3 -yscale 0.3 | pamditherbw -threshold | pnmtopng > small_directory/`basename ${j} .jpg`.png; done
{code}
_I wrote a Python2 script to find areas of black. The program would first look for rows which had a higher than average number of black pixels and were contiguous. Within these rows, it would then look for columns which were largely black and contiguous. It reports files which have such areas and also produces mask image files which show where the black areas were found._ \\ _It is run as so:_ \\
\\
{code}
./black_area_identifier.py small_newspapers/DUCR-1896-10-31-000[0-9].png > results.txt
{code}
The mask files will be put into small_newspapers, and the names of any images with black areas will be put into results.txt.{excerpt} |
| *Solution Champion* | ___[~swithun]_ \\ |
| *Corresponding Issue(s)* | * _[SPR:Corrupted JPEG and JPEG2000 files]_ |
| *Tool/code link* | _[SPRUCE/tree/master/black_pixels|https://github.com/openplanets/SPRUCE/tree/master/black_pixels]_ \\ |
| *[Tool Registry Link|http://wiki.opf-labs.org/display/TR/Home]* | * ___[Python|http://wiki.opf-labs.org/label/TR/python]_
* [NetPBM|http://netpbm.sourceforge.net/] |
| *Evaluation* | * _Solution champion: I'd hoped to use Fortran77, but the PBM (ascii) library didn't work. Python is reasonably fast at processing the small images, but slow on the original images._
* _Converts to monchrome image (PBM), isolates corruption where there's a large area of black, identifies contiguous rows where there is a high black pixel count, creates bounding box_
* _Weightings for identifying corruption areas can be tweaked _
* _Issue owner: Excellent\! This has been looked at in previous mashups but not solved. Can now take this away and run it over the collection to determine scope of problem, and more thoroughly test accuracy of the solution._ |