Title |
IS44 Migrated image metadata must map or match to those of the original |
Detailed description | IS2 Do acquired files conform to an agreed technical profile, are they valid and are they complete? deals with issues of format specific significant properties of a migration and if those properties either match or have been translated into the migration format in a way sympathetic to then needs of digital preservation. Image files also contain other embedded metadata - EXIF data, etc. In some cases the original will contain data that does not need to be, or is inappropriate to be translated to the migration (a file format value for example), but often significant properties such as "Creator" or "Creation Date" should be preserved in the migration. There may also be times where original metadata values need to be mapped or translated into the migration (eg. where a capture agency uses a shorthand for the organization name). |
Scalability Challenge |
The quantity of the images and the size of some of the TIFFs. |
Issue champion | Peter Cliff (BL) |
Other interested parties |
Schlarb Sven![]() |
Possible Solution approaches | The solution used is dependent on what we consider to be significant properties of the original and which of those properties need to be successfully captured in the migration. Each of the originals will have embedded metadata. The first task will be to identify what, if any of those fields should be migrated to the new format and then those field values should be extracted from the original and the migration and compared to see if they match. It is possible that the migration process will need to know what embedded metadata to migrate, possibly a two step process - migrate image and then re-insert metadata fields. This suggests that the migration tool should allow for parameters to specify which metadata fields to keep. Each of the originals will have some visual properties that we may want to capture and verify on migration. This could include comparing the two images pixel by pixel, comparing their histograms, or perceptual hashing techniques. Here we may include using OCR to extract text from these images and considering that text extraction to be a significant property. This sounds like a useful approach, but is specific to migration of images of text and as such I think this validation method should be considered secondary to a more general image solution. Finally, while each image in the collection was validated at ingest, it may be worth validating both input and output (migrated) formats meaning we will need some format validation tools. (Luckily some very good ones exist!) |
Context | See LSDRT2 Validating files migrated from TIFF to JPEG2000 |
Lessons Learned | TBD |
Training Needs | Should be added to the Solution |
Datasets | British Library 19th Century newspapers |
Solutions | SO32 - Metadata Extraction SO33 - Metadata Comparison |