Label: ocr

Content with label ocr in AQuA (See content from all spaces)
Related Labels: mj2, pdf, workflow, jp2k, jpeg2000, qa, jpm, xml, hocr, levenshtein, image, mets, aqua, quality_assurance, alto, solution, tesseract, obsolescence, comparison, more »

Page: BOPCRIS issue - ABBYY "Unknown error"
One line summary ABBYY recognition Server 3 inconsistently gives an "Unknown error" error message when processing collection files.                       &nbsp ...
Other labels: image, qa, issue, obsolescence
Page: Check consistency between metadata and content
One line summary Check that the METS, OCR, JPEG2000 masters and the PDFs are consistent \\ Detailed description As shown in the diagram below, check images and ALTO files information defined in METS against the real files stored in separate Zip files. Also ...
Other labels: mets, metadata, jpeg2000, jp2k, pdf, jp2, jpx, mj2
Page: Compare OCR results of the same source material in different formats (TIFF, JP2)
One line summary The intention of this solution was to compare two OCR results where the images that are OCRed have two different formats, one is the original TIFF file, the other one is a JP2 (JPEG 2000) representation of this TIFF file. The goal was to find ...
Other labels: jp2, jpeg2000, levenshtein, solution, aqua, quality_assurance
Page: Identifying missed or duplicated pages
Note that this is a blank proforma. Please make a copy of it, before filling out the form\! One line summary Identifying missed or duplicated pages in books, archives and manuscripts \\ Detailed description Multipaged items form the vast bulk of digitisation projects. There is always ...
Other labels: qa, image, issue, duplication
Page: Newspaper issue dates
One line summary For cataloguing purposes, it is of absolute importance that the issue data metadata is accurate. How can we ensure this? And can we predict where issues may be missing? Detailed description Newspapers are structured by title, by year, and by issue in each ...
Other labels: metadata, issue, unknown_characteristics
Page: OCR Comparison
One line summary Compare two different OCR results. If the results are not sufficiently close, the source pages may be different indicating possible issues. \\ Detailed description See detailed scenario descriptions below. \\ Solution champion Georg Petz & Sven ...
Other labels: taverna, workflow, comparison, tiff, jpeg2000, tesseract, hocr, quality_assurance
Page: Use of OCR metadata
One line summary How can we use OCR metadata to identify pages for human QC investigation? Detailed description The ABBYY FineReader 9 engine outputs various OCR stats, and these are expressed in the ALTO files. For each page there is a predicted ...
Other labels: qa, issue