One line summary | Same artefact has been scanned two or more times. Duplicate objects must be identified and flagged for action. |
Detailed description | An artefact (e.g. a play bill) is scanned and, at a later date, rescanned. The rescanned version corrects errors made in the original scan, e.g. higher resolution, rotated to correct errors. We wish to identify duplicate scanned objects and flag them for action by a staff member. It is expected that the older version will be classified for storage only. Metadata differs to some extent, e.g. identifier is different, title may differ slightly |
Issue champion | Gareth Knight |
Possible approaches | Metadata extraction and comparison, fuzzy checksum |
Context | |
AQuA Solutions | ssdeep for duplicate image detection Perceptual Image Diff comparison Identifying rotated, duplicate images using pHash |
Collections | East London Theatre Archive |