h2. Status
{tip:title=Active}
This story is associated with a success story: [http://wiki.opf-labs.org/display/SP/QA+and+Characterisation+of+Web+Content]
h2. Contact
Per Møldrup-Dalum, SB, [email protected]
William Palmer, BL (william (.) palmer (@)) bl (.) uk)
h2. User Story
As a Web Archive I need a Digital Preservation System that can process both ARC and WARC files and identify file formats/characterize of items contained so that I can assess preservation risks and plan which tools will be required for access to those formats.
h2. User Requirements/Components
# A tool that can efficiently work through the content of an ARC file and identify the type of files found.
## Must provide a report in a usable format - where this is a large dataset, this could be a database of some sort
## Ideally the tool will also perform file format identification on files within container formats - media streams, zips and other compressed files, etc.
# Look up of an appropriate access tool/software would be a bonus\!
h2. Experiments
_Create experiments as child pages and they should appear automatically here_
{pageTree:[email protected]}
* Characterisation of Web Archive Content Using a 'Stack' of Tools * (SS)
Data: ONB Web Archive
Workflow: No.
Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?
* Characterisation of Web Archive Content Using a 'Stack' of Tools * (PC)
Data: BL Web Archive
Workflow: No.
Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?
* Characterisation of Web Archive Content Using a 'Stack' of Tools on Rosetta * (OK)
Data: ONB Web Archive (Representative set 1.3TB)/SB Web Archive
Workflow: Rosetta stack
Issues: Getting the data to Rosetta or Rosetta to the data.
* Characterisation of Web Archive Content using SB Tool * (PMD)
Data: SB Web Archive
Workflow: No.
Issues:
* Characterise 2012 Web Archive Data * (NBR)
Data: SB Web Archive
Workflow: Yes.
Issues: Looking at presenting the data.
h2. Developer Notes
_Space for discussion, suggested solutions, links to other scenarios, etc._
h2. Related Documents
_Scenarios, case studies, etc. that provide background to this story._
{tip:title=Active}
This story is associated with a success story: [http://wiki.opf-labs.org/display/SP/QA+and+Characterisation+of+Web+Content]
h2. Contact
Per Møldrup-Dalum, SB, [email protected]
William Palmer, BL (william (.) palmer (@)) bl (.) uk)
h2. User Story
As a Web Archive I need a Digital Preservation System that can process both ARC and WARC files and identify file formats/characterize of items contained so that I can assess preservation risks and plan which tools will be required for access to those formats.
h2. User Requirements/Components
# A tool that can efficiently work through the content of an ARC file and identify the type of files found.
## Must provide a report in a usable format - where this is a large dataset, this could be a database of some sort
## Ideally the tool will also perform file format identification on files within container formats - media streams, zips and other compressed files, etc.
# Look up of an appropriate access tool/software would be a bonus\!
h2. Experiments
_Create experiments as child pages and they should appear automatically here_
{pageTree:[email protected]}
* Characterisation of Web Archive Content Using a 'Stack' of Tools * (SS)
Data: ONB Web Archive
Workflow: No.
Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?
* Characterisation of Web Archive Content Using a 'Stack' of Tools * (PC)
Data: BL Web Archive
Workflow: No.
Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?
* Characterisation of Web Archive Content Using a 'Stack' of Tools on Rosetta * (OK)
Data: ONB Web Archive (Representative set 1.3TB)/SB Web Archive
Workflow: Rosetta stack
Issues: Getting the data to Rosetta or Rosetta to the data.
* Characterisation of Web Archive Content using SB Tool * (PMD)
Data: SB Web Archive
Workflow: No.
Issues:
* Characterise 2012 Web Archive Data * (NBR)
Data: SB Web Archive
Workflow: Yes.
Issues: Looking at presenting the data.
h2. Developer Notes
_Space for discussion, suggested solutions, links to other scenarios, etc._
h2. Related Documents
_Scenarios, case studies, etc. that provide background to this story._