Skip to end of metadata
Go to start of metadata

Investigator(s)

William Palmer, British Library

Dataset

BL Web Archive SCAPE Testbed Dataset

Platform

BL Hadoop Platform

Workflow

The workflow has been implemented using a native Java/Hadoop application called Nanite, which was originally developed within SCAPE and has since seen further development.  Nanite uses Tika & Droid and operates directly on the content of arc/warc files using a RecordReader.

Nanite code is here: https://github.com/openplanets/nanite

The arc/warc files are held in HDFS

Requirements and Policies

ReliableAndStableAssessment = Is the code reliable and robust and does it handle errors sensibly with good reporting?
NumberOfFailedFiles = 0

Evaluations

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.