View Source

h2. Outline

The tables below outline all the experiments that are mentioned in the SCAPE project, some experiments are active others are paused/aborted - grouping is by testbeds at first level, and then by user stories at second level. The idea is to use the tables during execution and evaluation of experiments and fill in details when possible, both to track progress and share information with other evaluators.


h4. Large Scale Digital Repositories Testbed

|| || || || Hadoop/ \\
Other || || || Experiment || || || || ||
|| Title || Organisation || Contact || Setup || Documentation || Benchmarking || Implementation || Execution || Evaluation || Top-10 goals || Comments ||
|| US: [Characterisation of Large Audio and Video Files] || || || || || || || || || || {warning:title=Stopped} ||
| [Characterisation and validation of audio and video files during ingest] | SB | Bolette Jurik | | | | | | | | {info:title=Completed} |
| [Execute ffprobe across content] | SB | Rune Ferneke-Nielsen | | | | | | | | {warning:title=Stopped} |
| | | | | | | | | | | |
|| US: [Large Scale Audio Migration] || || || || || || || || || || ||
| [SB Experiment SO4 Audio mp3 to wav Migration and QA Workflow] | SB | Bolette Jurik | N/A | [SB Test Platform] | N/A | | | | | {info:title=Completed} |
| [SB Experiment Audio mp3 to wav Migration and QA on Hadoop Cluster] | SB | Bolette Jurik | Jan2014 | [SP:SB Hadoop Platform] | [SP:Benchmarking Hadoop installations] | | | | | {tip:title=Active} |
| | | | | | | | | | | |
|| US: [Large Scale Image Migration] || || || || || || || || || || ||
| [LSDRT2 EX1 BL Newspapers on the BL Platform] | BL | William Palmer | | [SP:BL Hadoop Platform] | [EVAL-BL-LSDRT-TIFFJP2-01|SP:EVAL-BL-LSDRT-TIFFJP2-01] \\ | | | | | {tip:title=Active} |
| [LSDRT2 EX2 BL Newspapers on hosted Rosetta] | EXL | Opher Kutner | | | | | | | | {tip:title=Active} |
| [KB Metamorfoze Image Migration & QA] | KB | Clemens Neudecker | | [KB Hadoop Platform] | | | | | | {tip:title=Active} |
| | | | | | | | | | | |
|| US: [Large Scale Ingest] || || || || || || || || || || ||
| [Ingest of digitized book METSs into Fedora 4] | FIZ | Matthias Hahn | | | | | | | | {tip:title=Active} |
| [Ingest of digitized book METSs into Rosetta] | EXL | Opher Kutner | | | | | | | | {tip:title=Active} |
| | | | | | | | | | | |
|| US: [Policy-Driven Identification of Preservation Risks in Electronic Document Formats|Policy-Driven Identification of Preservation Risks in Electronic Document Formats] || || || || || || || || || || ||
| [Validate PDF&EPUBs and check for DRM] | BL | William Palmer | | [SP:BL Hadoop Platform] | [EVAL-BL-LSDRT-PDFDRM-01|SP:EVAL-BL-LSDRT-PDFDRM-01] | | | | | {tip:title=Active} |
| [*Inactive* Validate PDF against against institutional policy] | KB | Johan van der Knijff | | | | | | | | {tip:title=Active} |
| [Characterisation of ebook formats to identify DRM, etc. as per BL ingest policy] | BL | Peter Cliff | | | | | | | | {warning:title=Stopped} |
| [Wrap tool for use in Rosetta & execute over some content] | EXL | Opher Kutner | | | | | | | | {tip:title=Active} |
| | | | | | | | | | | |
|| US: [Quality Assurance of Digitized Books] || || || || || || || || || || \\ ||
| [Executing Matchbox over large scale collection of digitized books to find duplicates in one book] | ONB | Sven Schlarb | | | | | | | | {warning:title=Stopped} |
| [Executing Matchbox to find duplicates in different representations of the same book] | ONB | Sven Schlarb | | | | | | | | {warning:title=Stopped} |
| | | | | | | | | | | |
|| US: [Repository Profiling] || || || || || || || || || || {warning:title=Stopped} ||
| | | | | | | | | | | |
|| US: [Validation of Archival Content Against an Institutional Policy] || || || || || || || || || || ||
| [Validate JPEG2000 Newspapers Using Jpylyzer] | SB | Rune Ferneke-Nielsen | Jan2014 | [SP:SB Hadoop Platform] | [SP:Benchmarking Hadoop installations] | Feb 27, 2014 [src|https://github.com/statsbiblioteket/scape-jp2-qa/tree/b7a3b76db57276abb6b4be78821a645ae8fad7d5] | Feb 24, 2014 [log|http://fue.onb.ac.at/scape-tb-evaluation/sb/ValidationOfArchivalContentAgainstAnInstitutionalPolicy/ValidateJPEG2000NewspapersUsingJpylyzer/out1-20000.log] | Apr 3, 2014 [page|http://wiki.opf-labs.org/display/SP/Evaluation+1+-+JPEG2000+validation] | | {tip:title=Active} |


h4. Research Datasets Testbed

|| || || || Hadoop/ \\
Other || || || Experiment || || || || ||
|| Title || Organisation || Contact || Setup || Documentation || Benchmarking || Implementation || Execution || Evaluation || Top-10 goals || Comments ||
|| US: [Migration from local format to domain standard format] || || || || || || || || || || ||
| [Migrate raw to NeXuS format] | STFC | Alastair Duncan | | | | | | | | {tip:title=Active} |
| | | | | | | | | | | |
|| US: [Normalise Disparate Tabular Data Sources] || || || || || || || || || || {warning:title=Stopped} ||
| [RDST10-EX1 - UK Electoral Register|RDST10-EX1 - UK Electoral Register] | BL | William Palmer \\ | | | | | | | | {warning:title=Stopped} |
| | | | | | | | | | | |
|| US: [Preserving the context and links to research data or preserving research objects] || || || || || || || || || || ||
| [Experiment 1 - FUSEKI - 4Store Comparison|Experiment 1 - FUSEKI - 4Store Comparison] | STFC | Antony Wilson | | | | | | | | {tip:title=Active} |
| | | | | | | | | | | |
|| US: [Research Object Linkage Monitoring Over Time] || || || || || || || || || || ||
| [Connecting research objects to SCOUT] | STFC | Antony Wilson | | | | | | | | {tip:title=TBA} |
| | | | | | | | | | | |
|| US: [Persistent Data Citation of Dynamically Created Subsets] || || || || || || || || || || ||
| | | | | | | | | | | {tip:title=TBA} |
|| US: [Identification, validation and checksumming of a complex corpus] || || || || || || || || || || ||
| [GeoLint Experiment] | BL | William Palmer | | | | | | | | |
| | | | | | | | | | | |

h4. Web Content Testbed

|| || || || Hadoop/ \\
Other || || || Experiment || || || || ||
|| Title || Organisation || Contact || Setup || Documentation || Benchmarking || Implementation || Execution || Evaluation || Top-10 goals || Comments ||
|| US: [ARC to WARC Migration] || || || || || || || || || || ||
| [Large Scale ARC to WARC Migration using JWAT with QA using PhantomJS snapshots] | ONB | Sven Schlarb | | | | | | | | {warning:title=Stopped} |
| [Large Scale ARC to WARC Migration using Pagealyzer] | IM | ? | | | | | | | | |
| [Large Scale ARC to WARC Migration using JWAT with QA using PhantomJS snapshots] | SB | Per Møldrup-Dalum | Jan2014 | [SP:SB Hadoop Platform] | [SP:Benchmarking Hadoop installations] | | | | | {warning:title=Stopped} |
| [Large Scale ARC to WARC Migration using JWAT with QA using PhantomJS snapshots|../../../../../../../../../../pages/createpage.action?spaceKey=SP&title=Large+Scale+ARC+to+WARC+Migration+using+JWAT+with+QA+using+PhantomJS+snapshots&linkCreation=true&fromPageId=36012387] | SP:ARC2WARC Experiment at KB\] | KB | Clemens Neudecker | | [SP:KB Hadoop Platform] | | | | | | {tip:title=Active} |
| | | | | | | | | | | |
|| US: [Comparison of Web Snapshots] || || || || || || || || || || ||
| [WCT2-EX1 Comparing newly archived Web sites against a verified copy|EVAL 3 - Web Pages comparison - Pagelyzer] | IM | Radu Pop | central SCAPE platform instance at IMF | | | [src|https://github.com/crawler-IM/browser-shots-tool] | | | | {warning:title=Stopped} |
| [EVAL 1 - Web Pages comparison - Pagelyzer|EVAL 1 - Web Pages comparison - Pagelyzer] | IM | Stanislav Barton \\ | central SCAPE platform instance at IMF \\ | | | [src|https://github.com/sbarton/browser-shot-tool-mapred]\\ | | | | {tip:title=Active} |
|| US: [File Format Identification and Characterisation of Web Archives] || || || || || || || || || || ||
| [Web Archive FITS Characterisation using ToMaR at ONB] | ONB | Sven Schlarb | | | | | | | | {tip:title=Active} |
| [WCT EX2 File ID at SB] | SB | Per Møldrup-Dalum | | | | | | | | {warning:title=Stopped} |
| [WCT EX3 File ID at BL] | BL | William Palmer | | [SP:BL Hadoop Platform] | [EVAL-BL-WCT-01|SP:EVAL-BL-WCT-01] | | | | | {tip:title=Active} |
| [Characterisation of Web Archive Content Using a 'Stack' of Tools] | ONB | Sven Schlarb | | | | | | | | {warning:title=Stopped} |
| [Characterisation of Web Archive Content Using a 'Stack' of Tools on Rosetta] | EXL \\ | Opher Kutner | | | | | | | | {warning:title=Stopped} |
| [Characterisation of Web Archive Content using SB Tool] | SB | Per Møldrup-Dalum | Jan2014 | [SP:SB Hadoop Platform] | [SP:Benchmarking Hadoop installations] | | | | | {tip:title=Active} |
| [Characterise 2012 Web Archive Data] | SB | Per Møldrup-Dalum | N/A | N/A | N/A | | | | | {info:title=Completed} |


h4. Data Center Testbed

|| || || || Hadoop/ \\
Other || || || Experiment || || || || ||
|| Title || Organisation || Contact || Setup || Documentation || Benchmarking || Implementation || Execution || Evaluation || Top-10 goals || Comments ||
|| US: [Large-scale video processing and interlinking|Large-scale video processing and interlinking] || || || || || || || || || || ||
| [Scene reconstruction] | BUT | Pavel Smrz | | | | | | | | {tip:title=Active} |
| [Video annotation and geo localization] | BUT | Ondrej Klima | | | | | | | | {tip:title=Active} |
| | | | | | | | | | | |
|| US: [Large scale access at hospital] || || || || || || || || || || ||
| | | | | | | | | | | {tip:title=TBA} |
|| US: [Large scale access for educational purposes] || || || || || || || || || || ||
| | | | | | | | | | | {tip:title=TBA} |
|| US: [Large scale analysis] || || || || || || || || || || ||
| | | | | | | | | | | {tip:title=TBA} |
|| US: [Large scale ingest of medical data] || || || || || || || || || || ||
| | | | | | | | | | | {tip:title=TBA} |