Skip to end of metadata
Go to start of metadata
Title
Archive system migration preserving/enriching AIPs
Detailed description Problem: a typical characteristic of digital archives that aim for “long-term preservation” is that the life cycle of the technical infrastructure on which they are based is much shorter than the period for which their contained materials should be preserved. This means that migrations from one archival system to another are inevitable. In the simplest case this could be nothing more than a migration of AIPs from one storage medium to another. However, in most cases this will also involve the migration of metadata, and the contents of each AIP from the source system may need to be taken apart and re-assembled on the destination system. This will result in changes to the AIP’s internal structure that must be accounted for in the migrated (structural) metadata. Finally, such migrations may also involve one or more metadata enrichment steps (for example, because the availability of new or improved characterisation tools makes it possible to automatically extract technical and preservation metadata that couldn’t be established within the old system).
Scalability Challenge

Issue champion To be defined
Other interested parties
 
Possible Solution approaches
  • ALL
    • At the most basic level we would like to ensure that the system migration does not result in the loss or alteration of any archived objects. In the case of a pure medium migration this could be realised very easily using checksums. More sophisticated mechanisms are needed for migrations where, as an example, AIPs that are held together in a physical container (e.g. a TAR file) on the source system need to be taken apart and subsequently re-assembled on the destination system. In that case we will need to check the integrity of each single file within the AIP, before and after the migration.
    • Possible issues: Due to the wide variety of legacy, publicly available and custom-built archiving systems that are used by different repositories, and the resulting variety of data models and structures, it may be difficult to establish use cases that are sufficiently generic to be of interest to more than one SCAPE partner. The best approach may be to start out with a limited number of relatively simple, generic and universally applicable use cases, such as: Migrate one object from one medium to another and verify the integrity of migrated object. Migrate set of files from one container file to another and verify the integrity of all constituting components. We could then establish a checkpoint where, based on the outcome of the work on the above simple use cases, we decide whether continuing the work on this scenario is worth any further effort or not. A more thorough understanding of the problem space (including key aspects for validation) would in itself be a useful output here.
  • EXL
    • This scenario sounds like a requirement for AIP migration. There has been some recent work in this area. See this paper.
  • KEEPS
    • Watch can contribute to the solution with the triggers:
      • Monitor new repository systems or new versions of existing ones
      • Monitor repository systems features and tools
      • Monitor repository systems popularity and support
      • Monitor operative systems
      • Monitor policies (policies may require functionality that is not supported in current repository system)
Context Ideally this should be a representative cross-section of AIPs in a repository. However, the solutions that are needed for this scenario will most likely be highly dependent on the data (and metadata) models used by the source and destination systems, as well as on the specific hard- and software infrastructures.
At the time of writing, KB is exploring making a dataset of AIPs available.
Lessons Learned Notes on Lessons Learned from tackling this Issue that might be useful to inform the development of Future Additional Best Practices, Task 8 (SCAPE TU.WP.1 Dissemination and Promotion of Best Practices)
Training Needs Is there a need for providing training for the Solution(s) associated with this Issue? Notes added here will provide guidance to the SCAPE TU.WP.3 Sustainability WP.
Datasets
Solutions  

Evaluation

Objectives Which scape objectives does this issues and a future solution relate to? e.g. scaleability, rubustness, reliability, coverage, preciseness, automation
Success criteria Describe the success criteria for solving this issue - what are you able to do? - what does the world look like?
Automatic measures What automated measures would you like the solution to give to evaluate the solution for this specific issue? which measures are important?
If possible specify very specific measures and your goal - e.g.
 * process 50 documents per second
 * handle 80Gb files without crashing
 * identify 99.5% of the content correctly
Manual assessment Apart from automated measures that you would like to get do you foresee any necessary manual assessment to evaluate the solution of this issue?
If possible specify measures and your goal - e.g.
 * Solution installable with basic linux system administration skills
 * User interface understandable by non developer curators
Actual evaluations links to acutual evaluations of this Issue/Scenario
Labels:
qa qa Delete
lsdr lsdr Delete
issue issue Delete
planning planning Delete
watch watch Delete
system_obsolescence system_obsolescence Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.