compared with
Current by Niels Bjarke Reimer
on Jul 16, 2013 07:18.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (3)

View Page History
SB currently owns a small collection of real audio filer files (digitised cd’s). They are part of the Danish publications that SB preserves. The rest of the Danish cd collection is in WAV. This format has been chosen as the preservation format as this is a raw format, which needs less interpretation or fewer layers of interpretation to be understood by humans and it is also a robust format. \\
\\
The Danish Radio Broadcast mp3 files from the [mp3 (128kbit) with Danish Radio broadcasts|SP:mp3 (128kbit) with Danish Radio broadcasts, mp3] dataset are also to be migrated to WAV according to policy. The actual migration will be done using FFmpeg which is one of the SCAPE Action Services recommended tools. The QA will be split into a number of steps. The first step is validation that the migrated file is a correct file in the wanted format. This is done using JHOVE2 to analyse and provide a JHOVE2 property xml file, and next using a Beanshell in Taverna to check that the jhove2 feature “isValid” is true. The second step compares the header information properties of the original and the migrated files to see if they are ‘close enough’. This is done using FFprobe to extract header information and Taverna Beanshells to compare the extracted properties. Another step could be to extract more properties by ‘playing’ the two files. \\
\\
The third step uses an analysis tool comparing the sound waves. To do this we have to ‘play’ or interpret the mp3 files. Just as a human needs to ‘play’ or interpret the files to hear the sound. A human cannot look at fileA and tell if it is correct or corrupted. We choose a player P and define ‘fileA played on player P’ to be correct. A small randomly chosen subset of files will be played on player P and checked by human ears to be correct making this definition probable. The player used in this workflow is MPG321. Note that MPG321 is an independent implementation of an mp3-decoder -- thus independent from FFmpeg, which is used to actually migrate the file. The result of playing fileA on player P (when noone is listening) is a WAV file. The migrated file is already a WAV file, and we can compare the two files using the analysis tool xcorrSound/migrationQA, see [SP:SO2 xcorrSound QA audio comparison tool].. \\
| *Evaluation* \\ | This Migration and QA solution has been developed as a Taverna workflow using web services. This puts focus on availability rather than scalability. The sparse tests run i February 2012 have also been run through Taverna. TODO IN DIRE NEED OF AN UPDATE\!\!\! \\
\\
We tested the _Mp3 to Wav Migrate Validate Compare Workflow_ on file P1_1000_1200_890106_001.mp3 from the [mp3 (128kbit) with Danish Radio broadcasts|SP:mp3 (128kbit) with Danish Radio broadcasts, mp3] testbed dataset. The file is 112 Mbyte and the duration is 2 hours, \\
2 minutes and 5.23 seconds. The workflow was run from Taverna on a local work station using the web services deployed on the SB iapetus test machine. The total time for the workflow is 2.3 minutes, and the most expensive nested workflow is the \\
JHove2Validate workflow with 1.3 minutes, closely followed by the FFmpegMigrate workflow with 59.2 s. The FFprobeExtractCompare workflow with 12.9 seconds seems to be the cheaper QA in this set-up. The result of the workflow is a migrated \\
\\
We note that we are working on 2-hour sound files. The average file size of the original mp3 files is only 118Mb, but the migrated wav files are approximately 1.4Gb. This means we can probably not hope to improve much on the performance of the \\
actual FFmpeg migration of the individual files. The [mp3 (128kbit) with Danish Radio broadcasts|SP:mp3 (128kbit) with Danish Radio broadcasts, mp3] collection is 20 TB and around 150000 files. This means that running the basic workflow migrations sequentially on the test machine would take around 300 \\
days. We can however hope to improve by using the Scape execution platform instead of doing the migrations sequentially. \\
\\