View Source

| *One line summary* | For cataloguing purposes, it is of absolute importance that the issue data metadata is accurate. How can we ensure this? And can we predict where issues may be missing? |
| *Detailed description* | Newspapers are structured by title, by year, and by issue in each year. The issues boundaries are determined at scan time, and confirmed in DocWorks, and the issue dates are added by a human parsing of the OCRed issue date (and comparison against the scanned image). How can we ensure that these are accurate? And how can we identify missing issues? |
| *Issue champion* | Toby Atkin-Wright |
| *Possible approaches* | Currently the Brightsolid project reads the issue date from each issue in a year, and predicts what the likely publication pattern was. Using the estimated publication pattern, it highlights issues that don't fit, and that need human QC investigation. The calculation of the publication pattern is naive, and could be significantly improved. \\
The issue order is also compared against the order of the originally scanned TIFF files; if the order does not match, the issues are highlighted for human QC investigation. \\
Once these issue dates have been confirmed or fixed, possible missing issues are identified based on the estimated publication pattern. This could be improved by taking account of the volume and issue numbers that are included in the METS files. \\
The volume and issue numbers could be used to confirm issue order, and to offer possible corrections where issues appear to be out of order. |
| *Context* | [Brightsolid] |
| *AQuA Solutions* | [Newspaper issue dates - solution] |
| *Collections* | [Brightsolid digitisation of British Library newspapers] |