Content identification and categorisation.

There have been no consistent naming conventions applied to the dataset, the majority of files have no meaningful titles. In some cases names have been given at folder level but not file level. Copyright ownership of images is ascertainable until they are identified. This makes appraisal and re-use of the images very difficult and time consuming. We would like a tool that could help with appraisal by identifying what type of document is in the image – e.g. map/photograph/written document etc. It would need to be able to cope with large sets of data and be relatively simple to operate. 

Cassandra Johnson

Any other parties who are also interested in applying Issue Solutions to their Datasets.

  • pattern recognition software
  • FITS for metadata extraction and file format evaluation

Maurice de Rooij: There are several services online that offer an API, webservice or management tool to recognize content of images

DHC has digitised parts of its analogue collections over time in a very inconsistent manner, with various equipment and no standard guidelines on metadata or formats. We want to avoid re-digitising material already digitised but we don't know what we already have. Automation of assessing the collections is preferable to a member of staff going through each of the 20,000 images individually.

Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice

