Investigator(s)
Per Møldrup-Dalum, SB, [email protected]
Dataset
http://wiki.opf-labs.org/display/SP/SB+Web+Archive+Data
Platform
http://wiki.opf-labs.org/display/SP/Platform+SB+1
may have been renamed to http://wiki.opf-labs.org/display/SP/SB+Test+Platform
Workflow
Since November 2011 we have been running FITS (link to component @ myexperiment) on a selection of our web content spread over the years from 2005 up till 2011.
The data is stored in ARC files on a SAN. These ARC files are fetched from this SAN, unpacked and the FITS are run on each ARC record.
Running FITS on a ARC record produces an XML file. These XML files from a single ARC are packed into TGZ files and made available to the Planning and Watch subproject.
Requirements and Policies
ThroughputGbytesPerHour >= 60
OrganisationalFit = ?
FITS is already in use within this institution, so file format ID using FITS would be useful. However, other tools may be used.