Duplicate images within a collection or job

Skip to end of metadata
Go to start of metadata
One line summary Duplicate images within a collection (set of images) or complex object (scanned book)                                                                                                                        

Detailed description
  • duplicated pages in a scanned book during scanning process (job) due to operator error or machine error resulting in an incomplete final package
  • an additional error would be missing pages, but is out of scope for this solution
  • duplicate image files uploaded by same user or different users resulting in multiple files in digital library and confusing to the user.
Issue champion Jodie
Possible approaches Perceptual Image Difference utility
http://pdiff.sourceforge.net/
Maurice de Rooij has created an example PHP wrapper script using the Windows binary to iterate trough files in a single directory.
More about this wrapper script can be found @ Perceptual Image Diff comparison

pHash - open source perceptual hash library
http://www.phash.org/
NOT TESTED YET

Wavelet Scalar Quantization (WSQ)
http://www.cognaxon.com/index.php
TOO COMPLEX TO CREATE SIMPLE PROTOTYPE

Image To ASCII conversion (concept)
Convert image to 8 bits per pixel which leaves 256 possible values (0-255) per pixel.
If an ASCII character is assigned to each value it can be saved as a string.
Using full text search one could compare this saved string to other saved strings.
This approach would work well if the compared images are exactly the same and would not be usable if there is just ONE different pixel. If one is using a difference treshold this might work.
Context  
AQuA Solutions
Collections  
Labels:
image image Delete
fingerprint fingerprint Delete
prototyped prototyped Delete
issue issue Delete
duplication duplication Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.