![]() | These Scenarios describe specific preservation issues that will drive the development and evaluation of a number of key outputs from the SCAPE Project![]() |
![]() |
|
The pages below form a network of Datasets, preservation Issues with those Datasets, and Solutions to those Issues. Triples composed of a Dataset, Issue and a Solution are called Scenarios. These Scenarios provide a convenient view onto a set of SCAPE activities that will be spread across a number of workpackages and sub projects (each Scenario page merely includes by reference the relevant details from the Dataset, Issue and Solution wiki pages). This structure was first developed as part of the AQuA Project hackathons, which captured preservation issues and solutions on similar wiki pages here (AQuA) and here (in a subsequent DPC/OPF hackathon). These results may be of interest, and indeed some have already been incorporated or linked to from these SCAPE pages.
The aims of the SCAPE Scenarios are to:
- Provide a record of SCAPE Project developments and identify what will be evaluated in SCAPE Testbeds
- Ensure effective liaison and collaboration between the various WPs and SPs within the project, by illustrating how each piece of work fits together
- Issue Owners can see what solutions are being developed for them and ensure developers understand their requirements
- Solution Owners can see what the challenge is they are being asked to solve (with associated requirements) and know who they can contact for more information
- Both Issue and Solution Owners can agree on how Solutions will be evaluated in the Testbeds
- Support WP and SP monitoring and management, by describing key project results in one place
- Support the gathering of feedback from outside of the project, publicise project results and engage with interested users/practitioners
![]() | To add a new Dataset, Issue, Solution, or Scenario check out the following instructions: How to add a new Dataset, Issue, Solution or Scenario. Ensure you are aware of the: Responsibilities of the roles described on these pages. |
WARNING: There is a Confluence Wiki bug that sometimes causes problems with bulleted lists inside tables. If you're having trouble, don't use a list inside a table!
Datasets:
These are the Datasets that relate to specific preservation Issues which in turn have Solutions developed for them. Click here to create a new Dataset.
- State and University Library Denmark - Danish National Heritage Video, Audio, and Image Collections
- British Library - Books & Newspapers Collections
- National Library of the Netherlands - Image Repository Content
- Austrian National Library - Web Archive
- British Library - International Dunhuang Project Manuscripts
- State and University Library Denmark - Web Archive Data
- Govdocs1 Open Corpus
- STFC Scientific Datasets
- British Library - Research Datasets
- KB Open Access Journals PDFs
- Camera raw file images
- Internet Memory Web Archive
Scenarios:
Scenarios:
- are based on typical preservation processes that address challenges (or 'issues') associated with particular collections/datasets;
- must be practical and reflect real use cases for participating institutions;
- should enable subsequent demonstration (in WP19) of key issues addressed by the project, namely efficient processing of very large scale collections of heterogeneous and complex digital content
Scenarios should have only one issue associated with them. Multiple datasets may be associated with that issue.
Click here to create a new Scenario.
Issues:
These are the preservation or other business driven Issues that are found in particular Datasets and have Solutions developed to solve them.
Issues should describe particular preservation problems. They must be well defined so that solutions and evaluation points can be proposed and tested. |
Click here to add a new Issue.
Don't forget to:
- add a Testbed Label (lsdr, webarchive, researchdatasets),
- add a relevant Functional Label (eg. characterisation, migration)
- remove the "untagged" Label.
- IS1 Digitised TIFFs do not meet storage and access requirements
- IS2 Do acquired files conform to an agreed technical profile, are they valid and are they complete?
- IS3 Large media files are difficult to characterise without mass processing + We cannot identify preservation risks in uncharacterised files
- IS5 Digital objects archive contains unidentified content
- IS6 Determine render-ability of displayable web objects
- IS7 Incompleteness and and inconsistency of web archive data
- IS8 Diversity of office document formats in digital objects archive
- IS9 Archive system migration preserving and enriching AIPs
- IS10 Potential bit rot in image files that were stored on CD
- IS11 PDF files may face preservation risks
- IS12 ARC to WARC migration
- IS13 wmv to Video Format-X Migration Results in Out-of-sync Sound and Video
- IS14 Diverse preservation risks in large archives with millions of objects
- IS15 Long-term access and decoding of JP2 images
- IS16 Normalisation of JPEG 2000 images
- IS17 Characterisation of text-based formats
- IS18 Verify bitstream integrity
- IS19 Migrate whole archive to new archiving system
- IS20 Detect audio files with very bad sound quality
- IS21 Migration of mp3 to wav
- IS22 Characterise and Validate very large mpeg-1 and mpeg-2 files
- IS24 Characterisation of large amounts of wav audio
- IS25 Web Content Characterisation
- IS26 Dealing with difficult identification cases
- IS27 Quality assurance in redownload workflows of digitised books
- IS28 Structural and visual comparisons for web page archiving
- IS29 Characterisation and validation of very large data files
- IS30 Fixity capturing and checking of very large data files
- IS31 Semantic checking of very large data files
- IS32 Basic Migration of RAW to NeXus data
- IS33 Enhanced migration of RAW to NeXus data
- IS34 ISIS instrument website no longer applicable or available
- IS35 Mantid website or software no longer applicable or available
- IS36 Examine the long term value of the preserved datasets
- IS37 Preserving the verifiability and provenance of processed datasets
- IS38 (W)ARC to HBASE migration
- IS39 Format obsolescence detection
- IS40 Complexity of camera raw files
- IS41 Analyse huge text files containing information about a web archive
- IS42 Detecting Encryption and DRM in Digital Content
- IS43 Determining general 'document' properties
- IS44 Migrated image metadata must map or match to those of the original
- IS45 Audio and Video Recordings have unreliable broadcast time information
- IS46 Book page image duplicate detection within one book
- IS47 Identify Preservation Risks from audio+video characterisation information
- IS48 Validate archival files against an institutional content policy regarding formats
- IS49 Large scale ingest of a large book collection
Solutions:
These are Solutions that address Issues that relate to particular Datasets. Click here to add a new Solution.
- SO1 Simple JP2 file structure checker
- SO2 xcorrSound QA audio comparison tool
- SO3 Comparing identification tools
- SO4 Audio mp3 to wav Migration and QA Workflow
- SO5 Video Migration and QA
- SO06 Use Ffprobe to characterise audio+video
- SO07 Develop Warc Unpacker
- SO8 QA for TIFF to JP2K conversion (image comparison tool based on histograms and profiles)
- SO9 Matchbox - Image comparison tool based on bag-of-(visual-)words matching
- SO10 QA for TIFF to correspondent JP2K comparison (image comparison tool based on SIFT-matching)
- SO11 The Tika characterisation Tool
- SO12 Tool testing framework
- SO14 Fuse mounting (w)arc files
- SO15 JP2 validator and properties extractor
- SO16 QA for estimation of affine transformation (image comparison tool based on SSIM algorithm)
- SO17 Web Archive Mime-Type detection workflow based on Droid and Apache Tika
- SO18 Comparing two web page versions for web archiving
- SO19 Recognize inaccurate graphical image files based on a pattern-set
- SO20 Extending JHOVE to characterise NeXus data format
- SO21 Extending the NeXus validation toolkit to cope with very large data files
- SO22 Developing a Raw-to-NeXus migration tool
- SO23 Pushing additional metadata into NeXus metadata fields
- SO24 Use Preservation Network Model to record "deep" dependencies and to allow tracking over time
- SO25 Rosetta v3.0 Implementation Integrated with DROID 6, JHOVE1, NLNZ tool and more...
- SO26 Automated RAW to DNG migration+QA
- SO27 Analyse huge text files containing information about a web archive using Hadoop
- SO28 A heuristic measure for detecting undesired influence of lossy JP2 compression on OCR in the absence of ground truth
- SO29 Extending JHOVE to characterise very large NeXus data file
- SO30 Automated assessment of JP2 against a technical profile
- SO31 Preservation Grade TIFF to JPEG2000 Migration
- SO32 Image Metadata Extractor
- SO33 Image Metadata Compare
- SO34 Use Manzanita Crosscheck to validate mpeg transport streams
- SO35 Use schematron as the content profile language to validate files by evaluating their characterisation information
- SO36 Perform scalable search for small sound chunks in large audio archive
- SO37 Connector API Technical Compability Kit
Wiki details
Recently Updated
-
Published Preservation Policies
updated by Barbara SiermanMar 27, 2018
-
The Hello World Debian Package
commented by AnonymousDec 11, 2016
-
SO19 Recognize inaccurate graphical image files based on a pattern-set
commented by AnonymousSep 06, 2016
-
IS1 Digitised TIFFs do not meet storage and access requirements
commented by AnonymousJun 17, 2016
-
SO19 Recognize inaccurate graphical image files based on a pattern-set
commented by AnonymousApr 18, 2016
-
9.2 Risk management
updated by Jette JungeMar 29, 2016
-
5.5 Structural metadata
updated by Jette JungeMar 29, 2016
-
5.2 Original metadata
updated by Jette JungeMar 29, 2016
-
5.1 Management of metadata
updated by Jette JungeMar 29, 2016
-
raw2nexus small dataset evaluation
commented by AnonymousFeb 22, 2016
-
SB Experiment Audio mp3 to wav Migration and QA on Hadoop Cluster
commented by AnonymousFeb 15, 2016
-
raw2nexus Experiment at STFC
commented by AnonymousFeb 13, 2016
-
raw2nexus Experiment at STFC
commented by AnonymousJan 15, 2016
-
SB Experiment Audio mp3 to wav Migration and QA on Hadoop Cluster
commented by AnonymousJan 15, 2016
-
raw2nexus Experiment at STFC
commented by AnonymousJan 15, 2016
- More