View Source

| *Title* \\ | IS24 Characterisation of large amounts of wav audio |
| *Detailed description* | SB holds large amounts of WAV audio (200Tb \+) in different resolutions (ranging from 22Khz 16 bit to 96Khz 24 bit). Different resolutions have been choosen over the years for different reasons (equipment, budgets for storage space, quality of original media in digitisation). Before we ingest all these older collections into our new DOMS we need to do simple characterisation on the files to ensure to generate correct technical metadata (in PREMIS format) for the files. We know that cirtain collections that claim to hold only eg. 48Khz 16 bit files have files in other resolutions - most likely as a result of mis-operation of the digitisation equipment. \\ |
| *Scalability Challenge* \\ | _Large amounts of data (200Tbytes \+). For simple characterisation not much CPU is required but a lot of I/O is needed._ \\
Some of the files are rather large (8Gbytes) - could be a problem for some characterisation tools (not problematic for tools that only reads header information and magic bytes) \\ |
| *[Issue champion|SP:Responsibilities of the roles described on these pages]* | [Bjarne Andersen|] (SB) |
| *Other interested parties* \\ | |
| *Possible Solution approaches* | Should be simple. \\
1. Find appropriate characterisation tool that supports wav in a suitable manner \\
    * XC*L framework seems to have good wav support - evaluate and test. \\
2. Ensure this tools runs within the SCAPE platform \\ |
| *Context* | \\ |
| *Lessons Learned* | \\ |
| *Training Needs* | \\ |
| *Datasets* | [WAV with Danish Radio broadcasts, ripped audio CD’s, and SB in-house audio digitization|SP:WAV with Danish Radio broadcasts, ripped audio CD’s, and SB in-house audio digitization]\\ |
| *Solutions* | [SO06 Use Ffprobe to characterise Wav]\\
[SO25 Rosetta v3.0 Implementation Integrated with DROID 6|]\\ |
h1. Evaluation

| *Objectives* | _Which scape objectives does this issues and a future solution relate to? e.g. scaleability, rubustness, reliability, coverage, preciseness, automation_ |
| *Success criteria* | _Describe the success criteria for solving this issue - what are you able to do? - what does the world look like?_ |
| *Automatic measures* | _What automated measures would you like the solution to give to evaluate the solution for this specific issue? which measures are important?_ \\
_If possible specify very specific measures and your goal - e.g._ \\
_ \* process 50 documents per second_ \\
_ \* handle 80Gb files without crashing_ \\
_ \* identify 99.5% of the content correctly_ \\ |
| *Manual assessment* | _Apart from automated measures that you would like to get do you foresee any necessary manual assessment to evaluate the solution of this issue?_ \\
_If possible specify measures and your goal - e.g._ \\
_ \* Solution installable with basic linux system administration skills_ \\
_ \* User interface understandable by non developer curators_ \\ |