One line summary | Tool will ID files as Bad/Substandard/Good/Unprocessed depending on file type and metadata requirements set by content owner |
Detailed description | Solution for Issue EAP Issue 3: metadata extraction Same tool as for EAP Issue 1: Broken TIFF images Developed using PHP and FITS (and a bit of JQuery/UI for results page). 1. Scan specified directory for files 2. If filetype is of interest (in the demo, TIFFs), process file. 3. If file is good (a valid TIFF), extract technical metadata 4. If technical metadata is of a sufficient standard, file is good! 5. A progress file is written to disk after each file is checked to show how things are progressing. Return list of Bad files, Substandard files, Good files and Unprocessed files. Solution would hopefully be able to run when new media is detected (AutoRun). Files detected Check if valid against type Workflow diagram (see attached) |
Solution champion | John Salter, Matt Ruane |
Git link | |
Evaluation | User friendly interface useful for Content Owners and Project Holders - and can be included in Project Holder workflow, hopefully catching problems early on so that digitisation can be re-done if necessary. Information provided by the tool will also help us to narrow down further QA activity Will need to consult with Digital Preservation Team and IT before implementing. There is possibly further work to be done on the tool - so that it can be used on other file types (audio and video), generating thumbnail views of "Good" files Could possibly tie in with the tool to ID compressed TIFFs and convert them to uncompressed? |
Tool (link) |
Labels: