* Could apply this approach to evaluate use of lossy compression in preserving master images, and verify that we are not impacting on future OCR quality by throwing away some detail in lossy compression
* Technique could easily be applied to other collections |
| *Detailed Evaluation* \\ | # How well does the solution meet your issue?
### The solution provides a useful generic approach to paired image and/or OCR validation. Further testing is required to examine edge cases with (for example) partially corrupt files.
# What more would you like the solution to do?
# Do you think you can implement the solution in your organisation?
### Good potential to embed the solution, although scalability is probably an issue. As noted above, parallel computing approach may mitigate the long OCR times.
## What further investigation/development/testing would be required before implementation at your organisation?
## Are there any process, workflow or technical obstacles to implementation?
# Summarise the benefits to your organisation that the solution could provide?
### Thorough QA of file migration, particularly where the resulting files are close but not exactly the same as the original would be particularly valuable. WIthout this additional QA, errors can pass unnoticed.
# What potential exists to apply the solution elsewhere? eg. with other collections, in other organisations, or to meet other issues?
### Excellent potential to apply to other cases with minimal work. Verification of level of lossy compression (noted above) would be an interesting application of the workflow.
# What more would you like to understand and/or document about the issue and/or the solution?
### Further testing is probably the highest priority, with perhaps some further work to assess the comparison between the OCR results. |
*Scenario 1: Compare hOCR instances with Tesseract OCR results*