One line summary | Can we use metadata in METS files to help us target QC analysis of the OCRed text? |
Detailed description | METS files describe structure of documents, listing the pages (with links to their ALTO and image files), and showing what type of data is included in each page. Examples of data types could be headlines, articles, illustrations, family notices, and adverts. Can we use this structure to target our QC analysis of the OCR text? |
Issue champion | Toby Atkin-Wright |
Possible approaches | Perform statistical analysis of the article text in each issue, ignoring other content types. The article text will better match expected English usage than other text on the page. |
Context | |
AQuA Solutions | |
Collections | Brightsolid digitisation of British Library newspapers |