Image content identification and categorisation

Skip to end of metadata
Go to start of metadata

Content identification and categorisation.

Detailed description
There have been no consistent naming conventions applied to the dataset, the majority of files have no meaningful titles. In some cases names have been given at folder level but not file level. Copyright ownership of images is ascertainable until they are identified. This makes appraisal and re-use of the images very difficult and time consuming. We would like a tool that could help with appraisal by identifying what type of document is in the image – e.g. map/photograph/written document etc. It would need to be able to cope with large sets of data and be relatively simple to operate. 

Issue champion
Cassandra Johnson

Other interested parties
Any other parties who are also interested in applying Issue Solutions to their Datasets.

Possible Solution approaches

  • pattern recognition software
  • FITS for metadata extraction and file format evaluation

Maurice de Rooij: There are several services online that offer an API, webservice or management tool to recognize content of images

DHC has digitised parts of its analogue collections over time in a very inconsistent manner, with various equipment and no standard guidelines on metadata or formats. We want to avoid re-digitising material already digitised but we don't know what we already have. Automation of assessing the collections is preferable to a member of staff going through each of the 20,000 images individually.

Lessons Learned
Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice

Dorset History Centre collection of digitised images

Image content identification and categorisation solution
File Format Identification and Metadata Extraction using FITS

spruce_london_2 spruce_london_2 Delete
issue issue Delete
appraisal_assessment appraisal_assessment Delete
unknown_characteristics unknown_characteristics Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.