Skip to end of metadata
Go to start of metadata
This scenario focuses on the identification of preservation risks in order to ensure render/extraction services can continue to be supported.


Unable to render {include} Couldn't find a page to include called: mpeg-2 transport stream with Danish TV broadcasts


IS3 Large media files are difficult to characterise without mass processing + We cannot identify preservation risks in uncharacterised files
Description At SB, data from broadcasters contain huge media files like MPEG2 transport streams (MPEG2-TS), for example. There is an end user agreement that only allows streaming this data, but not distribution of copies of the archived content. SB captures broadcast television as complex MPEG2-TS. The video content is accompanied by metadata, typically used to support the production of TV guides. SB preserves the MPEG2-TS as the preservation masters. Chunks of this data that relate to specific programmes are extracted, migrated and served to users as streaming Flash video. The master MPEG2-TS files are so large that characterisation is a significant challenge.

The difficulty lies in pulling out metadata for these huge media files in a large scale. Deep characterisation, in this context, means that for container formats the contained streams (typically mpeg-2 or mpeg-4 (h.264) video and AAC audio are also identified and characterised.

It is difficult to apply typical validation tools to such large files. A detailed characterisation of the MPEG2-TS is needed in order to identify technical dependencies for extracting from or rendering the embedded content in the MPEG2-TS. This would enable preservation risks related to current access services to be monitored and action taken as necessary to ensure continued access and preservation.

Scalability Challenge
Extremely large files. Checksumming the collection currently takes around 3 months on existing hardware.
Issue champion Blekinge, Asger Askov (SB)
Other interested parties

Possible approaches
  • ALL
    • A deep characterisation service is required for MPEG2-TS. Analysis of the characterisation results would facilitate risk identification.
  • EXL
    • We're not sure how this scenario fits in. The TB work package is meant to design test cases to test the work that the platform team (WP PT) does. The platform team should find an appropriate test scenario to test whatever MD extractor is developed or found which can work on large files.
    • Checksum can be done on chunks of a predefined size, and therefore can be map/reduced. The final checksum can be checksum on a partial checksums. Need to specify the checksum algorithm to be used. The problem is not clear stated. Does JHOVE crash? What is maximum file size JHOVE can handle?
    • Watch may contribute to the solutions with the triggers:
      • Monitor characterization tools
      • Monitor new versions of new rendering software
      • Monitor rendering software features or supported characteristics
Lessons Learned Notes on Lessons Learned from tackling this Issue that might be useful to inform the development of Future Additional Best Practices, Task 8 (SCAPE TU.WP.1 Dissemination and Promotion of Best Practices)
Training Needs Is there a need for providing training for the Solution(s) associated with this Issue? Notes added here will provide guidance to the SCAPE TU.WP.3 Sustainability WP.
Datasets mpeg-2 transport stream with Danish TV broadcasts


Objectives scaleability, coverage, preciseness, automation
Success criteria Being able to extract all the provided metadata 
  • The technical metadata, which is used by the player machines to decode the stream
  • The program metadata that is used to display program and channel information
  • The subtitles, which to some extent is a full text dump of the program content. 
  • TextTV information
    With this metadata extracted, search-machine integration should be very powerful. With the ability to extract this metadata, the transport stream could be used as a selfdescribing object.
Automatic measures Being able to process streams faster than their defined bitrate (ie. not lose the race to time)
Manual assessment Which if the above mentioned metadata sources we can extract
Actual evaluations links to acutual evaluations of this Issue/Scenario


The solution to this scenario is to solve the following scenarios

LSDRT7 Characterise very large video files


LSDRT9 Characterisation of large amounts of wav audio


LSDRT16 Evaluate preservation risks from FFProbe and Manzanita Crosscheck characterisation information

lsdr lsdr Delete
scenario scenario Delete
lsdrscenario lsdrscenario Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Oct 23, 2012

    This scenario is broken. No issues exist.

    1. Oct 24, 2012

      That was because IS3 had a name change. I have now updated the issue page include.