View Source

h2. Status
{tip:title=Active}
This story is associated with a success story: [http://wiki.opf-labs.org/display/SP/QA+and+Characterisation+of+Web+Content]

h2. Contact

Per Møldrup-Dalum, SB, [email protected]

William Palmer, BL (william (.) palmer (@)) bl (.) uk)

h2. User Story

As a Web Archive I need a Digital Preservation System that can process both ARC and WARC files and identify file formats/characterize of items contained so that I can assess preservation risks and plan which tools will be required for access to those formats.

h2. User Requirements/Components

# A tool that can efficiently work through the content of an ARC file and identify the type of files found.
## Must provide a report in a usable format - where this is a large dataset, this could be a database of some sort
## Ideally the tool will also perform file format identification on files within container formats - media streams, zips and other compressed files, etc.
# Look up of an appropriate access tool/software would be a bonus\!

h2. Experiments

_Create experiments as child pages and they should appear automatically here_
{pageTree:[email protected]}

* Characterisation of Web Archive Content Using a 'Stack' of Tools * (SS)
Data: ONB Web Archive
Workflow: No.
Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?

* Characterisation of Web Archive Content Using a 'Stack' of Tools * (PC)
Data: BL Web Archive
Workflow: No.
Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?

* Characterisation of Web Archive Content Using a 'Stack' of Tools on Rosetta * (OK)
Data: ONB Web Archive (Representative set 1.3TB)/SB Web Archive
Workflow: Rosetta stack
Issues: Getting the data to Rosetta or Rosetta to the data.

* Characterisation of Web Archive Content using SB Tool * (PMD)
Data: SB Web Archive
Workflow: No.
Issues:

* Characterise 2012 Web Archive Data * (NBR)
Data: SB Web Archive
Workflow: Yes.
Issues: Looking at presenting the data.

h2. Developer Notes

_Space for discussion, suggested solutions, links to other scenarios, etc._

h2. Related Documents

_Scenarios, case studies, etc. that provide background to this story._