Title |
IS19 Migrate whole archive to new archiving system |
Detailed description | Problem: a typical characteristic of digital archives that aim for “long-term preservation” is that the life cycle of the technical infrastructure on which they are based is much shorter than the period for which their contained materials should be preserved. This means that migrations from one archival system to another are inevitable. In the simplest case this could be nothing more than a migration of AIPs from one storage medium to another. However, in most cases this will also involve the migration of metadata, and the contents of each AIP from the source system may need to be taken apart and re-assembled on the destination system. This will result in changes to the AIP’s internal structure that must be accounted for in the migrated (structural) metadata. Finally, such migrations may also involve one or more metadata enrichment steps (for example, because the availability of new or improved characterisation tools makes it possible to automatically extract technical and preservation metadata that couldn’t be established within the old system). |
Scalability Challenge |
Volume of data (whole archival system). |
Issue champion | Johan van der Knijff![]() |
Other interested parties |
|
Possible Solution approaches | At the most basic level we would like to ensure that the system migration does not result in the loss or alteration of any archived objects. In the case of a pure medium migration this could be realised very easily using checksums. More sophisticated mechanisms are needed for migrations where, as an example, AIPs that are held together in a physical container (e.g. a TAR file) on the source system need to be taken apart and subsequently re-assembled on the destination system. In that case we will need to check the integrity of each single file within the AIP, before and after the migration. Possible issues: Due to the wide variety of legacy, publicly available and custom-built archiving systems that are used by different repositories, and the resulting variety of data models and structures, it may be difficult to establish use cases that are sufficiently generic to be of interest to more than one SCAPE partner. The best approach may be to start out with a limited number of relatively simple, generic and universally applicable use cases, such as: Migrate one object from one medium to another and verify the integrity of migrated object Migrate set of files from one container file to another and verify the integrity of all constituting components We could then establish a checkpoint where, based on the outcome of the work on the above simple use cases, we decide whether continuing the work on this scenario is worth any further effort or not.A more thorough understanding of the problem space (including key aspects for validation) would in itself be a useful output here. |
Context | Details of the institutional context to the Issue. (May be expanded at a later date) |
Lessons Learned | Notes on Lessons Learned from tackling this Issue that might be useful to inform the development of Future Additional Best Practices, Task 8 (SCAPE TU.WP.1 Dissemination and Promotion of Best Practices) |
Training Needs | Is there a need for providing training for the Solution(s) associated with this Issue? Notes added here will provide guidance to the SCAPE TU.WP.3 Sustainability WP. |
Datasets | To be confirmed |
Solutions | SO18 Comparing two web page versions for web archiving |
Evaluation
Objectives | Which scape objectives does this issues and a future solution relate to? e.g. scaleability, rubustness, reliability, coverage, preciseness, automation |
Success criteria | Describe the success criteria for solving this issue - what are you able to do? - what does the world look like? |
Automatic measures | What automated measures would you like the solution to give to evaluate the solution for this specific issue? which measures are important? If possible specify very specific measures and your goal - e.g. * process 50 documents per second * handle 80Gb files without crashing * identify 99.5% of the content correctly |
Manual assessment | Apart from automated measures that you would like to get do you foresee any necessary manual assessment to evaluate the solution of this issue? If possible specify measures and your goal - e.g. * Solution installable with basic linux system administration skills * User interface understandable by non developer curators |
Actual evaluations | links to acutual evaluations of this Issue/Scenario |