Skip to end of metadata
Go to start of metadata

Status

Active

This story is associated with a success story: http://wiki.opf-labs.org/display/SP/QA+and+Characterisation+of+Web+Content

Contact

Per Møldrup-Dalum, SB, pmd@statsbiblioteket.dk

William Palmer, BL (william (.) palmer (@)) bl (.) uk)

User Story

As a Web Archive I need a Digital Preservation System that can process both ARC and WARC files and identify file formats/characterize of items contained so that I can assess preservation risks and plan which tools will be required for access to those formats.

User Requirements/Components

  1. A tool that can efficiently work through the content of an ARC file and identify the type of files found.
    1. Must provide a report in a usable format - where this is a large dataset, this could be a database of some sort
    2. Ideally the tool will also perform file format identification on files within container formats - media streams, zips and other compressed files, etc.
  2. Look up of an appropriate access tool/software would be a bonus!

Experiments

Create experiments as child pages and they should appear automatically here

  • Characterisation of Web Archive Content Using a 'Stack' of Tools * (SS)
    Data: ONB Web Archive
    Workflow: No.
    Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?
  • Characterisation of Web Archive Content Using a 'Stack' of Tools * (PC)
    Data: BL Web Archive
    Workflow: No.
    Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?
  • Characterisation of Web Archive Content Using a 'Stack' of Tools on Rosetta * (OK)
    Data: ONB Web Archive (Representative set 1.3TB)/SB Web Archive
    Workflow: Rosetta stack
    Issues: Getting the data to Rosetta or Rosetta to the data.
  • Characterisation of Web Archive Content using SB Tool * (PMD)
    Data: SB Web Archive
    Workflow: No.
    Issues:
  • Characterise 2012 Web Archive Data * (NBR)
    Data: SB Web Archive
    Workflow: Yes.
    Issues: Looking at presenting the data.

Developer Notes

Space for discussion, suggested solutions, links to other scenarios, etc.

Related Documents

Scenarios, case studies, etc. that provide background to this story.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.