Skip to end of metadata
Go to start of metadata
IS20 Detect audio files with very bad sound quality
Detailed description In a collection of mp3 files (20 Tbytes - 360.000 files) we have discovered files with very bad sound quality. Before ingesting everything into our DOMS we would like to be able to discover the bad files and potentially get those re-digitized from the original analogue media.
Scalability Challenge
Large amounts of data (20 Tbytes - 360.000 files). We foresee that the process needed to solve this will require both I/O intensive work as well as heavy use of CPU for analysing the actual content.
Issue champion Bjarne Andersen (SB)
Other interested parties
Possible Solution approaches
  • SB: Analyze WAV-form to detect high frequency nois. Test our commercial Dobbin Audio Analyser to see if this system detects problems with the files
Lessons Learned  
Training Needs  
Datasets mp3 (128kbit) with Danish Radio broadcasts


Objectives This issue is primarily about
  • scaleability because we need to process 150.000 files
  • reliability and preciseness since we need to detect files with very bad sound - files that we potentially will remove from the repository
Success criteria When this issue is solved we will be able to scan through all files in our repository and automatically detect files with bad quality.
Automatic measures Speed is not really important for this issue since this is a one time only job. Can't take years off cause.
1. Process 20 files per hour per node
Manual assessment 1. Non of the detected files are false positives (don't have bad sound quality)
Actual evaluations links to acutual evaluations of this Issue/Scenario
characterisation characterisation Delete
lsdr lsdr Delete
qa qa Delete
issue issue Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.