Skip to end of metadata
Go to start of metadata

Outline

The tables below outline all the experiments that are mentioned in the SCAPE project, some experiments are active others are paused/aborted - grouping is by testbeds at first level, and then by user stories at second level. The idea is to use the tables during execution and evaluation of experiments and fill in details when possible, both to track progress and share information with other evaluators.

Large Scale Digital Repositories Testbed

      Hadoop/
Other
    Experiment        
Title Organisation Contact Setup Documentation Benchmarking Implementation Execution Evaluation Top-10 goals Comments
US: Characterisation of Large Audio and Video Files                  
Stopped
Characterisation and validation of audio and video files during ingest SB Bolette Jurik              
Completed
[Execute ffprobe across content] SB Rune Ferneke-Nielsen              
Stopped
                     
US: Large Scale Audio Migration                    
SB Experiment SO4 Audio mp3 to wav Migration and QA Workflow SB Bolette Jurik N/A SB Test Platform N/A        
Completed
SB Experiment Audio mp3 to wav Migration and QA on Hadoop Cluster SB Bolette Jurik Jan2014 SB Hadoop Platform Benchmarking Hadoop installations        
Active
                     
US: Large Scale Image Migration                    
LSDRT2 EX1 BL Newspapers on the BL Platform BL William Palmer   BL Hadoop Platform EVAL-BL-LSDRT-TIFFJP2-01
       
Active
LSDRT2 EX2 BL Newspapers on hosted Rosetta EXL Opher Kutner              
Active
KB Metamorfoze Image Migration & QA KB Clemens Neudecker   KB Hadoop Platform          
Active
                     
US: Large Scale Ingest                    
Ingest of digitized book METSs into Fedora 4 FIZ Matthias Hahn              
Active
[Ingest of digitized book METSs into Rosetta] EXL Opher Kutner              
Active
                     
US: Policy-Driven Identification of Preservation Risks in Electronic Document Formats                    
Validate PDF&EPUBs and check for DRM BL William Palmer   BL Hadoop Platform EVAL-BL-LSDRT-PDFDRM-01        
Active
*Inactive* Validate PDF against against institutional policy KB Johan van der Knijff              
Active
[Characterisation of ebook formats to identify DRM, etc. as per BL ingest policy] BL Peter Cliff              
Stopped
[Wrap tool for use in Rosetta & execute over some content] EXL Opher Kutner              
Active
                     
US: Quality Assurance of Digitized Books                  
[Executing Matchbox over large scale collection of digitized books to find duplicates in one book] ONB Sven Schlarb              
Stopped
[Executing Matchbox to find duplicates in different representations of the same book] ONB Sven Schlarb              
Stopped
                     
US: Repository Profiling                  
Stopped
                     
US: Validation of Archival Content Against an Institutional Policy                    
Validate JPEG2000 Newspapers Using Jpylyzer SB Rune Ferneke-Nielsen Jan2014 SB Hadoop Platform Benchmarking Hadoop installations Feb 27, 2014 src Feb 24, 2014 log Apr 3, 2014 page  
Active

Research Datasets Testbed

      Hadoop/
Other
    Experiment        
Title Organisation Contact Setup Documentation Benchmarking Implementation Execution Evaluation Top-10 goals Comments
US: Migration from local format to domain standard format                    
[Migrate raw to NeXuS format] STFC Alastair Duncan              
Active
                     
US: Normalise Disparate Tabular Data Sources                  
Stopped
RDST10-EX1 - UK Electoral Register BL William Palmer
             
Stopped
                     
US: Preserving the context and links to research data or preserving research objects                    
Experiment 1 - FUSEKI - 4Store Comparison STFC Antony Wilson              
Active
                     
US: Research Object Linkage Monitoring Over Time                    
[Connecting research objects to SCOUT] STFC Antony Wilson              
TBA
                     
US: Persistent Data Citation of Dynamically Created Subsets                    
                   
TBA
US: Identification, validation and checksumming of a complex corpus                    
GeoLint Experiment BL William Palmer                
                     

Web Content Testbed

      Hadoop/
Other
    Experiment        
Title Organisation Contact Setup Documentation Benchmarking Implementation Execution Evaluation Top-10 goals Comments
US: ARC to WARC Migration                    
[Large Scale ARC to WARC Migration using JWAT with QA using PhantomJS snapshots] ONB Sven Schlarb              
Stopped
[Large Scale ARC to WARC Migration using Pagealyzer] IM ?                
[Large Scale ARC to WARC Migration using JWAT with QA using PhantomJS snapshots] SB Per Møldrup-Dalum Jan2014 SB Hadoop Platform Benchmarking Hadoop installations        
Stopped
[Large Scale ARC to WARC Migration using JWAT with QA using PhantomJS snapshots|../../../../../../../../../../pages/createpage.action?spaceKey=SP&title=Large+Scale+ARC+to+WARC+Migration+using+JWAT+with+QA+using+PhantomJS+snapshots&linkCreation=true&fromPageId=36012387] SP:ARC2WARC Experiment at KB] KB Clemens Neudecker   KB Hadoop Platform          
Active
                     
US: Comparison of Web Snapshots                    
[WCT2-EX1 Comparing newly archived Web sites against a verified copy] IM Radu Pop central SCAPE platform instance at IMF     src      
Stopped
[EVAL 1 - Web Pages comparison - Pagelyzer] IM Stanislav Barton
central SCAPE platform instance at IMF
    src
     
Active
US: File Format Identification and Characterisation of Web Archives                    
Web Archive FITS Characterisation using ToMaR at ONB ONB Sven Schlarb              
Active
WCT EX2 File ID at SB SB Per Møldrup-Dalum              
Stopped
WCT EX3 File ID at BL BL William Palmer   BL Hadoop Platform EVAL-BL-WCT-01        
Active
[Characterisation of Web Archive Content Using a 'Stack' of Tools] ONB Sven Schlarb              
Stopped
[Characterisation of Web Archive Content Using a 'Stack' of Tools on Rosetta] EXL
Opher Kutner              
Stopped
[Characterisation of Web Archive Content using SB Tool] SB Per Møldrup-Dalum Jan2014 SB Hadoop Platform Benchmarking Hadoop installations        
Active
[Characterise 2012 Web Archive Data] SB Per Møldrup-Dalum N/A N/A N/A        
Completed

Data Center Testbed

      Hadoop/
Other
    Experiment        
Title Organisation Contact Setup Documentation Benchmarking Implementation Execution Evaluation Top-10 goals Comments
US: Large-scale video processing and interlinking                    
Scene reconstruction BUT Pavel Smrz              
Active
Video annotation and geo localization BUT Ondrej Klima              
Active
                     
US: Large scale access at hospital                    
                   
TBA
US: Large scale access for educational purposes                    
                   
TBA
US: Large scale analysis                    
                   
TBA
US: Large scale ingest of medical data                    
                   
TBA
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.