
Event Evaluation Survey
Please complete the event evaluation survey at: http://www.surveymonkey.com/s/TCZHBTD.
Learning outcomes (by the end of the training event attendees will be able to):
- Distinguish between different file types and identify the requirements for characterising each of them.
- Carry out a number of identification, characterisation, and duplication detection experiments on example files.
- Critically evaluate characterisation and identification tools and assess their advantages and disadvantages when used in different scenarios.
- Compare and contrast the differences in running characterisation and identification tools both stand-alone and within workflows.
- Envisage a system that combines workflows with identification, characterisation and validation tools to suit a variety of scenarios.
- Conduct an in-depth analysis of large volumes of identification and characterisation data and find representative sample records suitable for preservation planning experiments.
Thursday 6 December (the agenda is subject to change)
Time | Session | Facilitator | Learning outcomes |
---|---|---|---|
09.30 - 10.00 | Registration | ||
10.00 - 10.15 | Welcome and housekeeping | Miguel Ferreira, KEEPS | |
10.15 - 11.15 | Introduction to file formats Understanding the different requirements for identification and characterisation experiments File format identification and characterisation tools: file, fido, tika, exiftool What can they do? File Format Identification, File Format Characterisation, File Format Validation, File Format Signature Files |
Carl Wilson, OPF Dave Tarrant, OPF |
1 |
11.15 - 11.30 | Coffee | ||
11.30 - 12.45 | Applying file format tools to different scenarios (demonstrations) How do they compare? |
Carl Wilson, OPF Dave Tarrant, OPF |
1 |
12.45 - 13.45 | Lunch | ||
13.45 - 15.15 | Break out groups: practical exercises Creating file format profiles with an example dataset Command line processing Evaluation of the results |
Carl Wilson, OPF Dave Tarrant, OPF |
2 |
15:15 - 15.30 | Coffee | |
|
15.30 - 16:30 | Wrapping tools for identification and characterisation FITS (File Information Tool Set) Panel session: advantages and disadvantages of wrapping tools Q&A |
Petar Petrov, TUWIEN All |
3 |
16.30 - 17.00 | Wrap up | Dave Tarrant, OPF | |
17.00 | Close | ||
20.00 | Event dinner |
Open Feedback - Day 1
- Need to provide cheet sheets for each tool so people can choose their options and experiments they wish to run
- The datasets need to be, unzipped in the virtual machines ready for instant use.
- Need to focus a little closer on what data is output by the different tools for different formats.
- Need to get to the quickscripts that use some of the tools in more complex ways to produce summaries in excel.
- Need to put fits on the machines and allow people to look at a fits profile
- The discussion at the end of the day was effective, a good result.
Friday 7 December
Time | Session | Facilitator | Learning outcomes |
---|---|---|---|
09.15 - 09.30 | Welcome back, overview of agenda for the day | Dave Tarrant, OPF | |
09.30 - 10.15 | Content profiling and planning Introduction and motivation of large-scale content profiling for preservation analysis |
Petar Petrov, TUWIEN |
5 |
10.15 - 10.45 | Practical exercise: analysing an example scenario file set without a content profiler Discussion of results |
Petar Petrov, TUWIEN |
6 |
10.45 - 11.00 | Coffee | ||
11.00 - 11.30 | c3po (A content profiling prototype) demonstration of the tool and its capabilities |
Petar Petrov, TUWIEN | 6 |
11.30 - 12.00 | Practical exercise: analysing the scenario file set using c3po Comparing the results and lessons learned |
Petar Petrov, TUWIEN | 6 |
12.00 - 12.30 | Quality control for digital collections: the matchbox tool Identifying duplicate images in digital collections |
Roman Graf, AIT | 4 |
12.30 - 13.30 | Lunch and presentation of certificates | ||
13.45 - 15.15 | Using file format identification tools as part of a workflow Introduction to Taverna workflows Demonstration: Web archive content identification over ARC files using tika in a Taverna workflow |
Sven Schlarb, ONB | 4 |
15.15 - 15.30 | Coffee | ||
15.30 - 16.30 | Comparing the Taverna workflow with a DROID version of the workflow Introduction to file format identification using a Hadoop cluster (demonstration) Understanding the implementation differences |
Sven Schlarb, ONB |
4 |
16.00 - 16.30 | Comparison of results | Sven Schlarb, ONB |
|
16.30 - 17.00 | Wrap up discussion and event evaluation | Dave Tarrant, OPF |
|
17.00 | Close |
Open Feedback - Day 2
- Need to make sure data is pre-loaded into VMs. Also the contents of data.zip didn't seem to hold any significance, e.g. documents, images etc
- A recap was good, but needs to be more in line with day 1.
- c3po needs a bit of work to ensure it is running and it is clear how to find a collection.
- The graphs and discussion on why the system is modular worked well.
- Perhaps matchbox does have a practical exercise, maybe something to add to the machine images for next time.
Labels:
None