Skip to end of metadata
Go to start of metadata


Unable to render {include} Couldn't find a page to include called: mp3 (128kbit) with Danish Radio broadcasts


IS20 Detect audio files with very bad sound quality
Detailed description In a collection of mp3 files (20 Tbytes - 360.000 files) we have discovered files with very bad sound quality. Before ingesting everything into our DOMS we would like to be able to discover the bad files and potentially get those re-digitized from the original analogue media.
Scalability Challenge
Large amounts of data (20 Tbytes - 360.000 files). We foresee that the process needed to solve this will require both I/O intensive work as well as heavy use of CPU for analysing the actual content.
Issue champion Bjarne Andersen (SB)
Other interested parties
Possible Solution approaches
  • SB: Analyze WAV-form to detect high frequency nois. Test our commercial Dobbin Audio Analyser to see if this system detects problems with the files
Lessons Learned  
Training Needs  
Datasets mp3 (128kbit) with Danish Radio broadcasts


Objectives This issue is primarily about
  • scaleability because we need to process 150.000 files
  • reliability and preciseness since we need to detect files with very bad sound - files that we potentially will remove from the repository
Success criteria When this issue is solved we will be able to scan through all files in our repository and automatically detect files with bad quality.
Automatic measures Speed is not really important for this issue since this is a one time only job. Can't take years off cause.
1. Process 20 files per hour per node
Manual assessment 1. Non of the detected files are false positives (don't have bad sound quality)
Actual evaluations links to acutual evaluations of this Issue/Scenario


lsdr lsdr Delete
scenario scenario Delete
lsdrscenario lsdrscenario Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Oct 11, 2012

    This scenario needs some work.

    Its not clear if the collections is 360.000 files or 150.000. Conflicting numbers.

    The length of the audio files should also be described.

    In some places it says that scalability is an issue to be solved, but a few lines after it says "Speed is not really important for this issue since this is a one time only job. Can't take years off cause.".

  2. Nov 02, 2012

    I could not find a link the the product - Dobbin Audio Analyser. I couldn't even find the company's website.

    Can someone provide a link?