Newspaper issue dates - solution

Skip to end of metadata
Go to start of metadata
One line summary For cataloguing purposes, it is of absolute importance that the issue data metadata is accurate. How can we ensure this? And can we predict where issues may be missing?
Detailed description Code was written to extract issue number and publication date information from a folder hierarchy containing METS files.
By analysing the publication pattern across a year using statistical techniques, potentially incorrect issue dates are identified. By looking for gaps in the expected publication date sequence, and comparing these with the issue number sequence, potential missing issues are identified with greater accuracy than was previously possible.
The code was run against multiple years of different newspapers, exhibiting different publication patterns varying from one day a week and two days a week, up to six days a week. The results were compared with the existing Brightsolid issue analysis, and compared favourably, with the additional benefit of identifying missing issues with greater confidence due to the issue number analysis.
The proof-of-concept code has proven its worth, and the concepts will rapidly be introduced into the Brightsolid QC process, resulting in higher quality data being delivered to the British Library.
Solution champion Richard Boulderstone
Git link https://github.com/openplanets/AQuA/tree/master/MetsParser
Evaluation
  • Enables clearer conclusions to be drawn when missing issues detected
  • Brightsolid will incorporate into workflow in the next few weeks
  • Concept of cross checking METS file data to detect errors in transferable, actual code specific to this collection
  • Supports targeting of limited human QA resources
  • Further ways identified to utilise the existing data we have to support QA
  • Noted that none of the information available is completely dependable (particularly when OCR sourced) so cross checking to improve confidence and then targetted human effort is essential
Tool (link)  
Issue Newspaper issue dates
Labels:
solution solution Delete
aqua aqua Delete
quality_assurance quality_assurance Delete
structural_relationships structural_relationships Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.