Brightsolid digitisation of British Library newspapers

Skip to end of metadata
Go to start of metadata
Basic description Project to digitise items from the British Library newspaper archive. Scanning started in October 2010, and scans from paper are currently running at around 5,000 pages per working day. Scanning from microfilm is due to start shortly, with an aim of having over a million images in total scanned by the time the commercial service launches later this year.
Data is processed using CCS DocWorks, using ABBYY FineReader 9 for OCR, and output as METS, ALTO, and JP2 files. There is one METS file per newspaper issue, and an ALTO and JP2 pair per scanned page. Various QC processes and validations are applied to these outputs, before they are transferred to the British Library for archive.
Licensing TBC
Institution Brightsolid / British Library
Collection expert Toby Atkin-Wright
List of issues Use of OCR metadata
Using METS data to inform analysis
Newspaper issue dates
Finding duplicate images
dataset dataset Delete
image image Delete
aqua aqua Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.