Version 9 by Peter Cliff
on Jul 10, 2013 14:21.

compared with
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (3)

View Page History
Workflow: Yes.
Issues: Currently snapshots compare using checksums and some issues there, e.g. animated GIF.
HDFS file access using Wayback Machine - Wayback on each node and using HDFS-held content. Currently this isn't working, but it should\!

* Large Scale ARC to WARC Migration using Pagealyzer * (LM)
Data: SB Web Archive
Workflow: Yes.
Issues: Intention is to redo the ONB experiment on SB content

A QA of the migration could use comparing snapshots of each of the sites, it could also take the approach of comparing all the files in each. There may be other aspects of ARCs and WARCs (header information, logs, etc.) that will need checking too. For example, has the log file format changed between the two? Is the WARC structurally sound?, etc.

Using JWAT

h2. Related Documents