Skip to end of metadata
Go to start of metadata


Unable to render {include} Couldn't find a page to include called: WAV with Danish Radio broadcasts, ripped audio CD’s, and SB in-house audio digitization


IS24 Characterisation of large amounts of wav audio
Detailed description SB holds large amounts of WAV audio (200Tb +) in different resolutions (ranging from 22Khz 16 bit to 96Khz 24 bit). Different resolutions have been choosen over the years for different reasons (equipment, budgets for storage space, quality of original media in digitisation). Before we ingest all these older collections into our new DOMS we need to do simple characterisation on the files to ensure to generate correct technical metadata (in PREMIS format) for the files. We know that cirtain collections that claim to hold only eg. 48Khz 16 bit files have files in other resolutions - most likely as a result of mis-operation of the digitisation equipment.
Scalability Challenge
Large amounts of data (200Tbytes +). For simple characterisation not much CPU is required but a lot of I/O is needed.
Some of the files are rather large (8Gbytes) - could be a problem for some characterisation tools (not problematic for tools that only reads header information and magic bytes)
Issue champion Gry Elstrøm (SB)
Other interested parties
Possible Solution approaches Should be simple.
1. Find appropriate characterisation tool that supports wav in a suitable manner
    * XC*L framework seems to have good wav support - evaluate and test.
2. Ensure this tools runs within the SCAPE platform
Lessons Learned
Training Needs
Datasets WAV with Danish Radio broadcasts, ripped audio CD’s, and SB in-house audio digitization
Solutions SO06 Use Ffprobe to characterise audio+video

SO06 Use Ffprobe to characterise audio+videoSO25 Rosetta v3.0 Implementation Integrated with DROID 6, JHOVE1, NLNZ tool and more...


Objectives This is about scaleability and functionality
Success criteria We will have a workflow that can process WAV (and BWF) files - also larger files up to 10Gb
Automatic measures 1. Support for both WAV and BWF
2. Support for larger files - up to 10Gb
3. Process 2Tbytes of sample content in less than 24 hours
4. 100% of the files are identified correctly
5. 100% of the files gets useful and correct characterisation output
Manual assessment 1. Sample checking of the generated characterisation output
Actual evaluations links to acutual evaluations of this Issue/Scenario


Title SO06 Use Ffprobe to characterise audio/video
Detailed description A detailed description of the Solution. Feel free to include links to further information (eg. OPF blog posts!). Note that a Solution is a specific digital preservation application of a software tool or tools. It might for example be a scripted tool, or a myExperiment workflow.
Solution Champion
Asger Askov Blekinge
Corresponding Issue(s)
IS24 Characterisation of large amounts of wav audio
IS22 Characterise and Validate very large mpeg-1 and mpeg-2 files
myExperiment Link
A link to a corresponding workflow on myExperiment
Tool Registry Link
Any notes or links on how the solution performed. This will be developed and formalised by the Testbed SP.
lsdr lsdr Delete
scenario scenario Delete
lsdrscenario lsdrscenario Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Jan 09, 2012

    This Scenario will be implemented and tested by Exlibris Rosetta.

    1. Jan 12, 2012

      Hi Nir

      Could you maybe tell us a bit more? Which tools do you plan to use? Will you write a solution page? Which work package are you working in? SB is also planning an in-house ffprobe characterisation of these collections. I however do not expect the SCAPE platform to be up and running, so the characterisation will be on the existing SB platform. I hope we can use this as a base example in the LSDR testbed work package, and then we can hopefully improve on it.


      1. Jan 13, 2012

        Hi Bolette,

        We are working on LSDR TB WP. Yes - we will write a solution page which will describe the solution by using FFPROBE integrated with Rosetta. We have experience of such integrations with other tools such as MXF Extractor or media info. As part of SCAPE we will install Rosetta to be used exclusively by the members and you (or anyone else from the SB) can use it for testing etc.

        We can elaborate more on the Braga meetings.


  2. Oct 11, 2012

    This scenario is still missing a solution page.

    what do the following points mean?

    - 100% of the files are identified correctly 
    - 100% of the files gets useful and correct characterisation output

    Do they mean that you have a 0 tolerance for CC errors?