Basic description | Project to digitise items from the British Library newspaper archive. Scanning started in October 2010, and scans from paper are currently running at around 5,000 pages per working day. Scanning from microfilm is due to start shortly, with an aim of having over a million images in total scanned by the time the commercial service launches later this year. Data is processed using CCS DocWorks, using ABBYY FineReader 9 for OCR, and output as METS, ALTO, and JP2 files. There is one METS file per newspaper issue, and an ALTO and JP2 pair per scanned page. Various QC processes and validations are applied to these outputs, before they are transferred to the British Library for archive. |
Licensing | TBC |
Institution | Brightsolid / British Library |
Collection expert | ![]() |
List of issues | Use of OCR metadata Using METS data to inform analysis Newspaper issue dates Finding duplicate images |