Title |
IS3 Large media files are difficult to characterise without mass processing + We cannot identify preservation risks in uncharacterised files |
Description | At SB, data from broadcasters contain huge media files like MPEG2 transport streams (MPEG2-TS), for example. There is an end user agreement that only allows streaming this data, but not distribution of copies of the archived content. SB captures broadcast television as complex MPEG2-TS. The video content is accompanied by metadata, typically used to support the production of TV guides. SB preserves the MPEG2-TS as the preservation masters. Chunks of this data that relate to specific programmes are extracted, migrated and served to users as streaming Flash video. The master MPEG2-TS files are so large that characterisation is a significant challenge. The difficulty lies in pulling out metadata for these huge media files in a large scale. Deep characterisation, in this context, means that for container formats the contained streams (typically mpeg-2 or mpeg-4 (h.264) video and AAC audio are also identified and characterised. It is difficult to apply typical validation tools to such large files. A detailed characterisation of the MPEG2-TS is needed in order to identify technical dependencies for extracting from or rendering the embedded content in the MPEG2-TS. This would enable preservation risks related to current access services to be monitored and action taken as necessary to ensure continued access and preservation. |
Scalability Challenge |
Extremely large files. Checksumming the collection currently takes around 3 months on existing hardware. |
Issue champion | Blekinge, Asger Askov![]() |
Other interested parties |
|
Possible approaches |
|
Context | |
Lessons Learned | Notes on Lessons Learned from tackling this Issue that might be useful to inform the development of Future Additional Best Practices, Task 8 (SCAPE TU.WP.1 Dissemination and Promotion of Best Practices) |
Training Needs | Is there a need for providing training for the Solution(s) associated with this Issue? Notes added here will provide guidance to the SCAPE TU.WP.3 Sustainability WP. |
Datasets | mpeg-2 transport stream with Danish TV broadcasts |
Solutions |
Evaluation
Objectives | scaleability, coverage, preciseness, automation |
Success criteria | Being able to extract all the provided metadata
|
Automatic measures | Being able to process streams faster than their defined bitrate (ie. not lose the race to time) |
Manual assessment | Which if the above mentioned metadata sources we can extract |
Actual evaluations | links to acutual evaluations of this Issue/Scenario |