One line summary | For cataloguing purposes, it is of absolute importance that the issue data metadata is accurate. How can we ensure this? And can we predict where issues may be missing? |
Detailed description | Code was written to extract issue number and publication date information from a folder hierarchy containing METS files. By analysing the publication pattern across a year using statistical techniques, potentially incorrect issue dates are identified. By looking for gaps in the expected publication date sequence, and comparing these with the issue number sequence, potential missing issues are identified with greater accuracy than was previously possible. The code was run against multiple years of different newspapers, exhibiting different publication patterns varying from one day a week and two days a week, up to six days a week. The results were compared with the existing Brightsolid issue analysis, and compared favourably, with the additional benefit of identifying missing issues with greater confidence due to the issue number analysis. The proof-of-concept code has proven its worth, and the concepts will rapidly be introduced into the Brightsolid QC process, resulting in higher quality data being delivered to the British Library. |
Solution champion | Richard Boulderstone |
Git link | https://github.com/openplanets/AQuA/tree/master/MetsParser![]() |
Evaluation |
|
Tool (link) | |
Issue | Newspaper issue dates |
Labels: