View Source

| *Title* \\ | IS20 Detect audio files with very bad sound quality |
| *Detailed description* | _In a collection of mp3 files (20 Tbytes - 360.000 files) we have discovered files with very bad sound quality. Before ingesting everything into our DOMS we would like to be able to discover the bad files and potentially get those re-digitized from the original analogue media._ |
| *Scalability Challenge* \\ | Large amounts of data (20 Tbytes - 360.000 files). We foresee that the process needed to solve this will require both I/O intensive work as well as heavy use of CPU for analysing the actual content. \\ |
| *[Issue champion|SP:Responsibilities of the roles described on these pages]* | [Bjarne Andersen|https://portal.ait.ac.at/sites/Scape/_layouts/userdisp.aspx?ID=8] (SB) |
| *Other interested parties* \\ | |
| *Possible Solution approaches* | * SB: Analyze WAV-form to detect high frequency nois. Test our commercial Dobbin Audio Analyser to see if this system detects problems with the files
* |
| *Context* | \\ |
| *Lessons Learned* | |
| *Training Needs* | |
| *Datasets* | [mp3 (128kbit) with Danish Radio broadcasts|Danish Radio broadcasts, mp3]\\ |
| *Solutions* | |

h1. Evaluation

| *Objectives* | This issue is primarily about \\
* scaleability because we need to process 150.000 files
* reliability and preciseness since we need to detect files with very bad sound - files that we potentially will remove from the repository |
| *Success criteria* | When this issue is solved we will be able to scan through all files in our repository and automatically detect files with bad quality. \\ |
| *Automatic measures* | Speed is not really important for this issue since this is a one time only job. Can't take years off cause. \\
1. Process 20 files per hour per node |
| *Manual assessment* | 1. Non of the detected files are false positives (don't have bad sound quality) \\ |
| *Actual evaluations* | links to acutual evaluations of this Issue/Scenario |