View Source

| *Title* \\ | IS22 Characterise and Validate very large mpeg-1 and mpeg-2 files |
| *Detailed description* | Collections of very large videofiles (50Gb\+ each) are hard to handle when it comes to characterisation and validation. Known characterisation tools do not nessecarily like very large files. Not all needed formats are well supported (if supported at all) in known tools (JHove, JHove2, FITS, XC*L) \\ |
| *Scalability Challenge* \\ | _Tools need_ to be able to work on very large files (50Gb+) and in a distributed environment to scale (SB holds more than 400Tbytes mpeg-1/2) \\ |
| *[Issue champion|SP:Responsibilities of the roles described on these pages]* | [Gry Elstrøm|https://portal.ait.ac.at/sites/Scape/_layouts/userdisp.aspx?ID=65] (SB) |
| *Other interested parties* \\ | |
| *Possible Solution approaches* | _1. Survey and test existing tools for scalability when it comes to file size_ \\
_2. Survey and test existing tools for support for mpeg-1 and mpeg-2_ \\
_3. Adapt tools to support large files and/or extend format support in tools_ \\ |
| *Context* | \\ |
| *Lessons Learned* | \\ |
| *Training Needs* | \\ |
| *Datasets* | [Danish TV broadcasts, mpeg videos]\\ |
| *Solutions* | Optional solution - [SO25 Rosetta v3.0 Implementation Integrated with DROID 6|http://wiki.opf-labs.org/display/SP/SO25+Rosetta+v3.0+Implementation+Integrated+with+DROID+6]\\
Characterise (not validate) solution - [SO06 FFprobe|http://wiki.opf-labs.org/display/SP/SO06+Use+Ffprobe+to+characterise+Wav] |

h1. Evaluation

| *Objectives* | This is about robustness and scaleability as well as advanced functionality. The corresponding collection is currently over 700Tbytes |
| *Success criteria* | We will have a workflow that gives technical output of all the mpeg-files and validates the files to identify problematic files that current or future tools might have problems with \\ |
| *Automatic measures* | 1. Tool support for very large files (75Gb) \\
2. Process 2Tbytes of sample content in less than 24 hours \\ |
| *Manual assessment* | 1. The workflow gives useful output. Should be understandable by curators \\
2. The majority of the files deemed NOT VALID have a human understandable problem \\ |
| *Actual evaluations* | links to acutual evaluations of this Issue/Scenario |