To sign up for a wiki account visit: http://jira.opf-labs.org/secure/Signup!default.jspa|http://jira.opf-labs.org/secure/Signup!default.jspa
Below is our preliminary agenda for the OPF Cologne Hackathon. In order to meet our members' needs, we invite you to comment and provide feedback on the topics that you would like to be addressed at this event.
For information about getting there, hotels and the event dinner visit: http://www.openplanetsfoundation.org/events/2011-08-31-opf-hackathon
Register at: http://opf-hackathon-cologne.eventbrite.com/
AGENDA
31 August: Coffee and registration from 9.30am. Sessions 10am - 4pm (Event dinner 7.30pm)
Morning
Welcome from Manfred Thaller
Introduction from Bram van der Werf
Demo of remote emulation and automation - Dirk von Suchodoletz and Isgandar Valizada, University of Freiburg)
11.15 - 11.30 Coffee break
Emulation continued. Discussion and outlining use cases
13.00 - 14.00 Lunch
Afternoon
File identification tools - Maurice de Rooij
The session this afternoon will look at many of the available file identification tools, their features and usage. The main differences between each of the tools will be outlined, including performance and scalability aspects. The main outcome of this session will be to provide a greater knowledge on the different tools and find out why each one is different even though the core functionality remains the same.
The second session here will begin with a recap of the tools current support level and emphasise the importance of this in a changing environment of file types. This session will be more open to people investigating the tools themselves and then discussing the relative merits of each. Finally a discussion will be held around the requirements and shortcomings of each tool which should lead to short term actions for the OPF community.
15.15 - 15.30 Coffee break
The OPF-REF
This session will build on the mornings file identification sessions and introduce the OPF-Results Evaluation Framework. The OPF-REF has been designed to objectively compare preservation tools against a large corpus of test data. All results will also be published in raw form according to Linked-Data guidelines thus allowing easy re-processing and further experiments to be carried out.
Late Afternoon Session
This is an open session focussing on the subjects of the first day. There will be opportunities here for ad-hoc focus groups to form and chat/hack on the REF and file evaluation tools.
16.00 Close
19.30 Event dinner
1 September: 9am - 5pm
09.00 - 09.15 Coffee
09.15 Morning session
Testbed Emulation demo - Isgandar Valizada, University of Freiburg
KEEP presentation - Antonio Ciuffreda, University of Portsmouth
11.15 - 11.30 Coffee break
Developers - Hacking (OPF REF) facilitated by David Tarrant, or Emulation facilitated by Dirk/Isgandar
Practitioners - Use cases and requirements session facilitated by Bram van der Werf
13.00 - 14.00 Lunch
14.00 Afternoon session
Database Archiving
Demo of the Danish State Archive of a SIARD implementation
(15.00 - 15.15 Coffee break)
15:15-17:00 Break-out sessions hacking (REF & Emulation Service), requirement discussions
17.00 Close
2 September: 9am - 4pm
09.00 - 09.15 Coffee
09.15 Morning session
Plato Demonstation - Carl Wilson
This mornings sessions will focus on preservation planning, starting with a demonstration of the plato tool. Over the years the Plato tool has been developed substantially as part of many research projects and, as such, has become the most advanced and perhaps, most complex, tool available for preservation planning. The idea of this first session is to introduce the key stages of the Plato tool to give people a grasp on the important aspects of preservation planning.
11.00 - 11.15 Coffee break
Discussion and hacking on XML schema's
12.30 - 13.30 Lunch
Afternoon
Session 2 will introduce the complexity of the plato tool and address some of the maintainability issues based upon experience within the OPF project. This will lead onto a discussion focussing on how plato does not have to be used for every stage of the planning process. The XML preservation plans mean that plato can inter-operate with other tools which can be developed on a smaller scale in order to allow easier maintenance.
14.45 - 15.00 Coffee break
Late afternoon
Closing thoughts and Panel
Subject to time, this last session will allow more open discussion/hacking followed by a short closing session.
16.00 Close
Requirements & Hack Session: Results Evaluation Framework.
- DROID
- FIDO
- File Utility
- Automated execution of format identification tools against corpora
- Storage of results in a consistent and suitable form
- A way of comparing results, this can just be comparing a tool against itself to start with (i.e. DROID 6 with DROID 4). .
- A corpus of old results, as much as possible, i.e. during development we might start to gather results for old versions of the software, and supporting sig / magic files.
- Results not just identifiable by tool and version, but also by supporting data, i.e. NOT DROID v4, but DROID v4 SigFile v24.
The hacking challenge is to first put a framework in place that can achieve the above.
Further requirements from the techies hacking, and the content owners.
Requirements & Roadmap for Characterisation Tools
Broadly, where should we put the available effort when it comes to characterising content?
- Should we extend XCL, or focus on JHOVE2?
- Which platform do we have the most expertise with (i.e. is the Qt/C++ that XCL depends on a problem?)
- Can we merge the two efforts?
- Where does FITS fit in? Should we even be writing out own tools when we have alternative that others use, e.g. Apache Tika.
Tools to be aware of:
- Developed by the preservation community:
- DROID
- Fido
- FITS
- JHOVE
- JHOVE2
- XCL Tools (from Planets)
- NZ Metadata Extractor
- Developed by others
- file
- JMimeMagic (a Java implementation covering a good chunk of file's functionality)
- Apache Tika
- And many more, for specific media types, e.g. Apache PDFBox.
Some proposed requirements
- Identify single bitstreams
- Identify container bitstreams and go inside.
- Identify aggregate objects (n bitstreams in an arrangement)
- Identify non-bitstream objects (folders, URIs, NULL?)
- Identify XML, it's root schema, namespace, encoding, and encodings on the inside.
- Combine identification tools and resolve the differences (FITS?)
- Extract summary properties that describe the object encoding (FITS?)
- Perform deep analysis of significant characteristics (JHOVE2)
Database Archiving Requirements
- Allow some hacking here as well, just to get an idea of feasibility of approaches etc.
- We do need some clear requirements.
- See Database Archiving & Migration Tools
PLATO demo
A demonstration of the plato tool. Over the years the Plato tool has been developed substantially as part of many research projects and, as such, has become the most advanced and perhaps, most complex, tool available for preservation planning. The idea of this first session is to introduce the key stages of the Plato tool to give people a grasp on the important aspects of preservation planning.
Format identification tools (DROID and file) training
Does this match with the expected audience?
- History of the tools and where to obtain them
- Installation
- How to run the tools
- Command line tools training, piping of input, redirection of error and output, using batch files
- What to do with the results
- Where the REF fits in
Emulation Demo
Demo of remote emulation and automation (University of Freiburg)
The first 3 demos are the vncrecord, vncreplay and qemu web-services deployed within planets. They implement the CreateView interface. In each case they will be consumed through the JSP-web pages, which we will access through the web-browser.
- By using vncrecord we will demonstrate the ability to record user input-events performed on an emulated os image. As for example, the recording of user session, which accesses the injected obsolete DO using corresponding application pre-installed in the chosen OS image.
- At the second step the vncreplay service will reproduce these input events (involving the same conditions, but different DOs) in an unattended way as if they were performed by the user.
- The qemu service is only for demonstrating the possibility of remote emulation (similar to our earlier GRATE, but now planets compatible).
The last demo is the Migrate implementing service also deployed within planets: GRATE-R. This service accepts a DO and uses a prerecorded user sessions to convert the DO to a format of interest. Currently we support SAM >PDF +TXT format migration and WPD > RTF migration. The testing will be done through the Testbed by assigning the corresponding experiment and connecting through RFB protocol in order to see the conversion in real-time.