||IS3 Large media files are difficult to characterise without mass processing + We cannot identify preservation risks in uncharacterised files|
|Description|| At SB, data from broadcasters contain huge media files like MPEG2 transport streams (MPEG2-TS), for example. There is an end user agreement that only allows streaming this data, but not distribution of copies of the archived content. SB captures broadcast television as complex MPEG2-TS. The video content is accompanied by metadata, typically used to support the production of TV guides. SB preserves the MPEG2-TS as the preservation masters. Chunks of this data that relate to specific programmes are extracted, migrated and served to users as streaming Flash video. The master MPEG2-TS files are so large that characterisation is a significant challenge.
The difficulty lies in pulling out metadata for these huge media files in a large scale. Deep characterisation, in this context, means that for container formats the contained streams (typically mpeg-2 or mpeg-4 (h.264) video and AAC audio are also identified and characterised.
It is difficult to apply typical validation tools to such large files. A detailed characterisation of the MPEG2-TS is needed in order to identify technical dependencies for extracting from or rendering the embedded content in the MPEG2-TS. This would enable preservation risks related to current access services to be monitored and action taken as necessary to ensure continued access and preservation.
| Scalability Challenge
||Extremely large files. Checksumming the collection currently takes around 3 months on existing hardware.|
|Issue champion||Blekinge, Asger Askov (SB)|
| Other interested parties
|Lessons Learned|| Notes on Lessons Learned from tackling this Issue that might be useful to inform the development of Future Additional Best Practices, Task 8 (SCAPE TU.WP.1 Dissemination and Promotion of Best Practices)
|Training Needs|| Is there a need for providing training for the Solution(s) associated with this Issue? Notes added here will provide guidance to the SCAPE TU.WP.3 Sustainability WP.
|Datasets|| mpeg-2 transport stream with Danish TV broadcasts
|Objectives||scaleability, coverage, preciseness, automation|
|Success criteria|| Being able to extract all the provided metadata
|Automatic measures||Being able to process streams faster than their defined bitrate (ie. not lose the race to time)|
|Manual assessment|| Which if the above mentioned metadata sources we can extract
|Actual evaluations||links to acutual evaluations of this Issue/Scenario|