Monday 16 September - Tuesday 17 September
See the session plan example and agenda structure for guidance.
Learning Outcomes (by the end of the session the attendees will be able to):
- Understand scalable platforms and evaluate the situations in which such environments are required.
- Apply knowledge of existing tools to solve migration and quality control problems.
- Combine and modify tool chains in order to create automated workflows for migration and quality control.
- Implement best practice for discovering and sharing workflows for use and re-use.
- Make use of a scalable environment and apply a number of workflows to automatically perform migration and quality assurance checks on a large number of objects.
- Identify a number of potential problems when working in a scalable environment and propose solutions.
- Understand the potential to use scalable platforms in digital preservation and synthesise new opportunities within your own environments.
16 September
Session One
Learning outcomes:
1. Understand scalable platforms and evaluate the situations in which such environments are required.
2. Apply knowledge of existing tools to solve migration and quality control problems.
Time | Outline Plan/Teacher Activity | Activities & Resources |
---|---|---|
09.30 - 11.00 |
Introduction This session introduces the fundamentals of preservation at scale and outlines the importance of scalability. Why scalable platforms? What scalable platforms? What are the key considerations? In this session, delegates will also get a chance to experiment with a pre-built environment, as used by the British Library to migrate TIFF images to JPEG2000. So don't forget your own images! Over the course of the following sessions, delegates will get hands on experience of building workflows for scalable environments and defining the files that control it all. |
Make available a multi-node demonstrator that attendees can load a set of TIFF images onto in a certain format, e.g. images.zip on a memory key. The cluster detects this zip, takes all the images out of it and does the following:
|
11:15 - 12:30 |
Migration and Quality Assurance Starting with the tools, we look at migration and quality control tools for images and look at how these are invoked on a single machine instance. |
Introduce imageMagik and jpylyzer and show how these are run on a single TIFF to JP2000 conversion. |
Session Two
Learning outcomes:
3. Combine and modify tool chains in order to create automated workflows for migration and quality control.
Time | Outline Plan/Teacher Activity | Activities & Resources |
---|---|---|
13:30 - 15:00 | Workflows With the tools explored, we introduce workflows and look at how these can be used to invoke multiple operations to both migrate content and run quality control checks on the results. Again this exercise will be done in your own local instance and not built to scale. |
Migrate an image using imageMagick then use jpylyzer to check for valid JP2000 image. Extension: Also integrate exiftool to check metadata. |
Session Three
Learning outcomes:
4. Implement best practice for discovering and sharing workflows for use and re-use.
Time | Outline Plan/Teacher Activity | Activities & Resources |
---|---|---|
15:15 - 16:30 | How to share your workflow Having built a workflow we look at how to share and discover other workflows. |
Register on myExperiment, describe and upload workflows. Get delegates to upload their own workflow. |
17 September
Session Four
Learning outcomes:
5. Set-up a local test instance of a scalable environment and apply a number of workflows to automatically perform migration and quality assurance checks on a large number of objects.
6. Identify a number of potential problems in scalable environments and propose solutions.
Time | Outline Plan/Teacher Activity | Activities & Resources |
---|---|---|
09.30 - 12.30 | Building Scalable Environements This session introduces the Hadoop platform introduces its application for executing preservation workflows in a distributed environment. More than just "getting the job done" we look at the tools for monitoring and controlling complex operations at scale and look at how these can be used to identify potential problems. |
By the end of this session each attendee should be set up with a working (pseudo-distributed) Hadoop test installation on his/her laptop running the same scripts as the demonstration cluster. They should also be able to analyse and read the various log files in order to identify potential problems (e.g. tool versions) |
Session Five
Learning outcomes:
5. Set-up a local test instance of a scalable environment and apply a number of workflows to automatically perform migration and quality assurance checks on a large number of objects.
7. Understand the potential to use scalable platforms in digital preservation and synthesise new opportunities within your own environments.
Time | Outline Plan/Teacher Activity | Activities & Resources |
---|---|---|
13:30 - 14:30 | Invited talk: Introduction to the SCAPE repository reference implementation This talk will introduce the SCAPE repository reference implementation as a guide to get you started with using eSciDoc on top of Hadoop and HDFS. It will discuss the opportunities and potential for the future for scalablity with respect to digital object management systems. |
|
14:45 - 16:00 | Integrating Taverna and Hadoop This final session recaps the work that has been done to this point and allows attendees to fully integrate a number of workflows (both of their own making as well as existing ones) into their own scalable preservation platform. |
|
16:00 - 17:00 | Panel and wrap up | Discussion |