View Source

h3. Event Evaluation Survey

Please complete the event evaluation survey at: http://www.surveymonkey.com/s/TCZHBTD. 

h4. Learning outcomes (by the end of the training event attendees will be able to):

{panel}
# Distinguish between different file types and identify the requirements for characterising each of them.
# Carry out a number of identification, characterisation, and duplication detection experiments on example files.
# Critically evaluate characterisation and identification tools and assess their advantages and disadvantages when used in different scenarios.
# Compare and contrast the differences in running characterisation and identification tools both stand-alone and within workflows.
# Envisage a system that combines workflows with identification, characterisation and validation tools to suit a variety of scenarios.
# Conduct an in-depth analysis of large volumes of identification and characterisation data and find representative sample records suitable for preservation planning experiments.{panel}


h4. Thursday 6 December (the agenda is subject to change)



|| Time || Session || Facilitator || Learning outcomes ||
| 09.30 - 10.00 | Registration | | |
| 10.00 - 10.15 | Welcome and housekeeping | Miguel Ferreira, KEEPS | |
| 10.15 - 11.15 | *Introduction to file formats * \\
Understanding the different requirements for identification and characterisation experiments \\
\\
*File format identification and characterisation tools: file, fido, tika, exiftool* \\
What can they do? \\
File Format Identification, File Format Characterisation, \\
File Format Validation, File Format Signature Files \\ | \\
Carl Wilson, OPF \\
Dave Tarrant, OPF | 1 |
| 11.15 - 11.30 | Coffee | | |
| 11.30 - 12.45 | *Applying file format tools to different scenarios* (demonstrations) \\
How do they compare? | Carl Wilson, OPF  \\
Dave Tarrant, OPF | 1 |
| 12.45 - 13.45 | Lunch | | |
| 13.45 - 15.15 | *Break out groups: practical exercises* \\
Creating file format profiles with an example dataset \\
Command line processing \\
\\
Evaluation of the results \\ | Carl Wilson, OPF \\
Dave Tarrant, OPF \\
\\ | 2 |
| 15:15 - 15.30 | Coffee | \\ | |
| 15.30 - 16:30 | *Wrapping tools for identification and characterisation* \\
FITS (File Information Tool Set)  \\
\\
*Panel session: advantages and disadvantages of wrapping tools* \\
Q&A | Petar Petrov, TUWIEN \\
\\
\\
All | 3 \\ |
| 16.30 - 17.00 | Wrap up | Dave Tarrant, OPF | |
| 17.00 | Close | | |
| 20.00 | Event dinner | | |

h4. Open Feedback - Day 1

* Need to provide cheet sheets for each tool so people can choose their options and experiments they wish to run
* The datasets need to be, unzipped in the virtual machines ready for instant use.
* Need to focus a little closer on what data is output by the different tools for different formats.
* Need to get to the quickscripts that use some of the tools in more complex ways to produce summaries in excel.
* Need to put fits on the machines and allow people to look at a fits profile
* The discussion at the end of the day was effective, a good result. 


h4. Friday  7 December

|| Time || Session || Facilitator || Learning outcomes ||
| 09.15 - 09.30 | Welcome back, overview of agenda for the day | Dave Tarrant, OPF | |
| 09.30 - 10.15 | *Content profiling and planning*   \\
Introduction and motivation of large-scale content profiling for preservation analysis | Petar Petrov, TUWIEN \\ | 5 |
| 10.15 - 10.45 | *Practical exercise:* analysing an example scenario file set without a content profiler   \\
Discussion of results | Petar Petrov, TUWIEN \\ | 6 |
| 10.45 - 11.00 | Coffee | | |
| 11.00 - 11.30 | *c3po* (A content profiling prototype) demonstration of the tool and its capabilities \\ | Petar Petrov, TUWIEN | 6 |
| 11.30 - 12.00 | *Practical exercise:* analysing the scenario file set using c3po   \\
Comparing the results and lessons learned | Petar Petrov, TUWIEN | 6 |
| 12.00 - 12.30 | *Quality control for digital collections: the matchbox tool*   \\
Identifying duplicate images in digital collections | Roman Graf, AIT | 4 |
| 12.30 - 13.30 | Lunch and presentation of certificates | | |
| 13.45 - 15.15 | *Using file format identification tools as part of a workflow* \\
Introduction to Taverna workflows \\
Demonstration: Web archive content identification over ARC files \\
using tika in a Taverna workflow \\ | Sven Schlarb, ONB | 4 |
| 15.15 - 15.30 | Coffee | | |
| 15.30 - 16.30 | *Comparing the Taverna workflow with a DROID version of the workflow* \\
Introduction to file format identification using a Hadoop cluster (demonstration) \\
Understanding the implementation differences \\ | Sven Schlarb, ONB \\ | 4 \\ |
| 16.00 - 16.30 | Comparison of results | Sven Schlarb, ONB \\ | |
| 16.30 - 17.00 | Wrap up discussion and event evaluation | Dave Tarrant, OPF \\ | |
| 17.00 | Close | | |

h3. Open Feedback - Day 2

* Need to make sure data is pre-loaded into VMs. Also the contents of data.zip didn't seem to hold any significance, e.g. documents, images etc
* A recap was good, but needs to be more in line with day 1.
* c3po needs a bit of work to ensure it is running and it is clear how to find a collection. 
* The graphs and discussion on why the system is modular worked well. 
* Perhaps matchbox does have a practical exercise, maybe something to add to the machine images for next time. 

h4.



h3.