Wellcome Library digitisation

Skip to end of metadata
Go to start of metadata
Basic description Collections being digitised now, or in the next 12 months are listed below. Solutions need to be scalable to large-scale digitisation (millions of images):
  • Archives - mixed printed, typescript and handwritten content (ca. 400k images)
  • Books - modern printed books (post 1850), OCR'able (ca.400k images)
  • Manuscripts - handwritten, Western and Arabic collections
Licensing Manuscripts are open access (Creative Commons).The others are not (or do not yet exist).
Institution Wellcome Library
Collection expert Christy Henshaw (Digital Library Programme Manager)
List of issues
  • File conversion:
    • Validating JPEG2000 files on conversion from TIFF, identifying and tracing source of errors
  • Image quality (quantitative):
    • Identifying skipped or duplicated images (i.e. pages) within items
  • OCR of  mixed content
    • Working out what can, and what cannot be OCR'ed in mixed archival content
  • Adminstrative metadata [N.B. Not addressed during AQuA] - file and folder naming according to different rules, including starting out with one set of folder names, and later in the workflow, changing these to new DAM-friendly identifiers:
    • Filenaming is consistent and accurate
    • Folder heirarchies are correct
    • Folder naming is correct
    • Same images in destination folder as started out in capture folder
 Issue
Labels:
aqua aqua Delete
dataset dataset Delete
image image Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.