Skip to end of metadata
Go to start of metadata

Monday 16 September - Tuesday 17 September 

See the session plan example and agenda structure for guidance.

Learning Outcomes (by the end of the session the attendees will be able to):

  1.  Understand scalable platforms and evaluate the situations in which such environments are required. 
  2.  Apply knowledge of existing tools to solve migration and quality control problems. 
  3.  Combine and modify tool chains in order to create automated workflows for migration and quality control. 
  4.  Implement best practice for discovering and sharing workflows for use and re-use. 
  5.  Make use of a scalable environment and apply a number of workflows to automatically perform migration and quality assurance checks on a large number of objects.
  6.  Identify a number of potential problems when working in a scalable environment and propose solutions.
  7.  Understand the potential to use scalable platforms in digital preservation and synthesise new opportunities within your own environments. 

16 September

Session One

Learning outcomes:

1. Understand scalable platforms and evaluate the situations in which such environments are required. 

2. Apply knowledge of existing tools to solve migration and quality control problems. 

Time Outline Plan/Teacher Activity Activities & Resources
09.30 - 11.00





   


Introduction
This session introduces the fundamentals of preservation at scale
and outlines the importance of scalability.

Why scalable platforms?
What scalable platforms?
What are the key considerations?

In this session, delegates will also get a chance to experiment with 
a pre-built environment, as used by the British Library to migrate 
TIFF images to JPEG2000. So don't forget your own images! 

Over the course of the following sessions, delegates will get hands 
on experience of building workflows for scalable environments and defining 
the files that control it all.
Make available a multi-node demonstrator that attendees can load a set of
TIFF images onto in a certain format, e.g. images.zip on a memory key.
The cluster detects this zip, takes all the images out of it and does the following:
  1. Run Identification to ensure they are TIFFs and compatible with the workflow
  2. Divide them into groups and distribute them to be migrated into JP2000's
  3. Perform verification using jpylyzer on the JP2000s
  4. Return the JP2000s to images_migrated.zip on the memory key.
  5. Also placed on the memory key should be the log files of the process.
    All of these operations should be visible either graphically or in a series of terminals.
    This will allow delegates to feel very connected to the process.
11:15 - 12:30
Migration and Quality Assurance
Starting with the tools, we look at migration and quality control tools
for images and look at how these are invoked on a single machine
instance.

Introduce imageMagik and jpylyzer and show how these are run on a single TIFF to
JP2000 conversion.

Session Two

Learning outcomes:

3. Combine and modify tool chains in order to create automated workflows for migration and quality control. 

Time Outline Plan/Teacher Activity Activities & Resources
13:30 - 15:00 Workflows
With the tools explored, we introduce workflows and look at how      
these can be used to invoke multiple operations to both migrate 
content and run quality control checks on the results.

Again this exercise will be done in your own local instance and
not built to scale.

Migrate an image using imageMagick then use jpylyzer to check for valid JP2000 image.
Extension: Also integrate exiftool to check metadata.

Session Three

Learning outcomes:

4. Implement best practice for discovering and sharing workflows for use and re-use. 

Time Outline Plan/Teacher Activity Activities & Resources
15:15 - 16:30 How to share your workflow
Having built a workflow we look at how to share and discover 
other workflows.


Register on myExperiment, describe and upload workflows.
Get delegates to upload their own workflow.

17 September

Session Four

Learning outcomes:

5. Set-up a local test instance of a scalable environment and apply a number of workflows to automatically perform migration and quality assurance checks on a large number of objects.

6. Identify a number of potential problems in scalable environments and propose solutions.

Time Outline Plan/Teacher Activity Activities & Resources
09.30 - 12.30 Building Scalable Environements
This session introduces the Hadoop platform introduces its
application for executing preservation workflows in a distributed
environment.
More than just "getting the job done" we look at the tools for 
monitoring and controlling complex operations at scale and 
look at how these can be used to identify potential problems.

By the end of this session each attendee should be set up with a working (pseudo-distributed)
Hadoop test installation on his/her laptop running the same scripts as the demonstration cluster.
They should also be able to analyse and read the various log files in order to identify potential problems (e.g. tool versions)

Session Five

Learning outcomes:

5. Set-up a local test instance of a scalable environment and apply a number of workflows to automatically perform migration and quality assurance checks on a large number of objects.

7. Understand the potential to use scalable platforms in digital preservation and synthesise new opportunities within your own environments. 

Time Outline Plan/Teacher Activity Activities & Resources
13:30 - 14:30 Invited talk: Introduction to the SCAPE repository reference implementation
This talk will introduce the SCAPE repository reference
 implementation as a guide to get you started with using
eSciDoc on top of Hadoop and HDFS. It will discuss the
 opportunities and potential for the future for scalablity with
respect to digital object management systems. 
 
14:45 - 16:00 Integrating Taverna and Hadoop
This final session recaps the work that has been done to this point and allows attendees
to fully integrate a number of workflows (both of their own making as well as existing ones)
into their own scalable preservation platform. 


 
16:00 - 17:00 Panel and wrap up Discussion
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.