Title |
IS45 Audio and Video Recordings have unreliable broadcast time information |
Detailed description | The Danish State and University Library (SB) holds large collections of Radio and TV Broadcasts. The duration of the WAV (22.05khz, 16 bit) Danish radio broadcast files in the testbed is approximately 20 minutes to 10.5 hours. This means some recordings cover a number of shows. The mpeg-2 video with Danish TV broadcasts in the testbed dataset are approximately 20 minutes to 17 hours, containing a number of shows. The mpeg-1 video with Danish TV broadcasts in the testbed dataset are approximately 10 minutes to 16 hours, again containing a number of shows. The metadata of the files are Radio or TV Channel ID, start time and end time (part of file names). The SB also has the program listings in a different collection. The recording start and end times are however usually 'a few minutes early' (just before the top of the hour) and 'a few minutes late', e.g. from 2 minutes to 9 am till 3 minutes past 10 am. Also the programs do not always start precisely at the announced time! It would be nice to link the program listings to exact timestamps in the audio and video files, as this would make it possible to cut out single programs automatically when requested. (Note the mpeg-2 transport stream with Danish TV broadcasts are one hour recordings. These also contain metadata on the shows being sent.) |
Scalability Challenge |
The combined size of the collections in question is 630 TB. |
Issue champion | Bolette Jurik![]() |
Other interested parties |
|
Possible Solution approaches | Most of the programs start with a jingle or some sort of recognizable intro. If we can search for this in the audio and video files, we would be able to find the exact start times of different shows. |
Context | Details of the institutional context to the Issue. (May be expanded at a later date) |
Lessons Learned | Notes on Lessons Learned from tackling this Issue that might be useful to inform the development of Future Additional Best Practices, Task 8 (SCAPE TU.WP.1 Dissemination and Promotion of Best Practices) |
Training Needs | Is there a need for providing training for the Solution(s) associated with this Issue? Notes added here will provide guidance to the SCAPE TU.WP.3 Sustainability WP. |
Datasets | |
Solutions | SO36 Perform scalable search for small sound chunks in large audio archive SO2 xcorrSound QA audio comparison tool |
Evaluation
Objectives | Which scape objectives does this issues and a future solution relate to? e.g. scaleability, rubustness, reliability, coverage, preciseness, automation |
Success criteria | Describe the success criteria for solving this issue - what are you able to do? - what does the world look like? |
Automatic measures | What automated measures would you like the solution to give to evaluate the solution for this specific issue? which measures are important? If possible specify very specific measures and your goal - e.g. * process 50 documents per second * handle 80Gb files without crashing * identify 99.5% of the content correctly |
Manual assessment | Apart from automated measures that you would like to get do you foresee any necessary manual assessment to evaluate the solution of this issue? If possible specify measures and your goal - e.g. * Solution installable with basic linux system administration skills * User interface understandable by non developer curators |
Actual evaluations | links to acutual evaluations of this Issue/Scenario |
Labels: