Using METS data to inform analysis

Skip to end of metadata
Go to start of metadata
One line summary Can we use metadata in METS files to help us target QC analysis of the OCRed text?
Detailed description METS files describe structure of documents, listing the pages (with links to their ALTO and image files), and showing what type of data is included in each page. Examples of data types could be headlines, articles, illustrations, family notices, and adverts.
Can we use this structure to target our QC analysis of the OCR text?
Issue champion Toby Atkin-Wright
Possible approaches Perform statistical analysis of the article text in each issue, ignoring other content types. The article text will better match expected English usage than other text on the page.
Context  
AQuA Solutions  
Collections Brightsolid digitisation of British Library newspapers
Labels:
metadata metadata Delete
qa qa Delete
issue issue Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.