Status
![]() | Active |
This story is associated with a success story: http://wiki.opf-labs.org/display/SP/QA+and+Characterisation+of+Web+Content
Contact
Per Møldrup-Dalum, SB, [email protected]
William Palmer, BL (william (.) palmer (@)) bl (.) uk)
User Story
As a Web Archive I need a Digital Preservation System that can process both ARC and WARC files and identify file formats/characterize of items contained so that I can assess preservation risks and plan which tools will be required for access to those formats.
User Requirements/Components
- A tool that can efficiently work through the content of an ARC file and identify the type of files found.
- Must provide a report in a usable format - where this is a large dataset, this could be a database of some sort
- Ideally the tool will also perform file format identification on files within container formats - media streams, zips and other compressed files, etc.
- Look up of an appropriate access tool/software would be a bonus!
Experiments
Create experiments as child pages and they should appear automatically here
- Characterisation of Web Archive Content Using a 'Stack' of Tools * (SS)
Data: ONB Web Archive
Workflow: No.
Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?
- Characterisation of Web Archive Content Using a 'Stack' of Tools * (PC)
Data: BL Web Archive
Workflow: No.
Issues: Ability to do risk analysis - do we have obsolete files? Rendering tools? Cost of rendering now?
- Characterisation of Web Archive Content Using a 'Stack' of Tools on Rosetta * (OK)
Data: ONB Web Archive (Representative set 1.3TB)/SB Web Archive
Workflow: Rosetta stack
Issues: Getting the data to Rosetta or Rosetta to the data.
- Characterisation of Web Archive Content using SB Tool * (PMD)
Data: SB Web Archive
Workflow: No.
Issues:
- Characterise 2012 Web Archive Data * (NBR)
Data: SB Web Archive
Workflow: Yes.
Issues: Looking at presenting the data.
Developer Notes
Space for discussion, suggested solutions, links to other scenarios, etc.
Related Documents
Scenarios, case studies, etc. that provide background to this story.