Basic description |
Collections being digitised now, or in the next 12 months are listed below. Solutions need to be scalable to large-scale digitisation (millions of images):
- Archives - mixed printed, typescript and handwritten content (ca. 400k images)
- Books - modern printed books (post 1850), OCR'able (ca.400k images)
- Manuscripts - handwritten, Western and Arabic collections
|
Licensing |
Manuscripts are open access (Creative Commons).The others are not (or do not yet exist). |
Institution |
Wellcome Library |
Collection expert |
Christy Henshaw (Digital Library Programme Manager) |
List of issues |
- File conversion:
- Validating JPEG2000 files on conversion from TIFF, identifying and tracing source of errors
- Image quality (quantitative):
- Identifying skipped or duplicated images (i.e. pages) within items
- OCR of mixed content
- Working out what can, and what cannot be OCR'ed in mixed archival content
- Adminstrative metadata [N.B. Not addressed during AQuA] - file and folder naming according to different rules, including starting out with one set of folder names, and later in the workflow, changing these to new DAM-friendly identifiers:
- Filenaming is consistent and accurate
- Folder heirarchies are correct
- Folder naming is correct
- Same images in destination folder as started out in capture folder
|
Issue |
|