Skip to end of metadata
Go to start of metadata

Helpful pages

The SPRUCE Mashup Manifesto

Digital Preservation and Data Curation Requirements and Solutions

Developing with the OPF

Monday 2 December

Time Activity Facilitator
09:00 - 09:20 Coffee and registration  
09:20 - 09:30 Welcome and housekeeping ONB
09:30 - 10:00 Introduction
Overview of the event
How the hackathon will work

Getting to know each other
Who are you? What do you do?
What are you interested in working on at the event?

Becky McGuinness, OPF



Carl Wilson, OPF
10:00 - 10:20 What is Hadoop? Clemens Neudecker, National Library of the Netherlands
10:20 - 11:00 Scenario I: 
Web-Archiving: File Format Identification/Characterisation
Sven Schlarb, Austrian National Library
11:00 - 11:15 Coffee break  
11:15 - 11:45 Scenario I continued...
  • DROID format identification
  • Demo: Tika format identification
  • Demo: Tika characterisation
  • Practical exercises
Sven Schlarb, Austrian National Library
Carl Wilson, OPF
11:45 - 12:30 Scenario II:
Digital Books: Quality Assurance, text mining (OCR Quality)
Stefan Majewski, Austrian National Library
12:30 - 13:30 Lunch  
13:30 - 14:15 Scenario II continued...
  • Demo: Book level processing (METS)
  • Demo: Page level processing (JPEG2000, OCR)
  • Practical exercises
Sven Schlarb, Austrian National Library
14:15 - 15:15 Big data and Twitter
https://github.com/lintool/warcbase/tree/pig
Pig scripts
Jimmy Lin, University of Maryland
15:15 - 15:30 Coffee break  
15:30 - 16:15 Group work 
Sven Schlarb, Austrian National Library
Carl Wilson, OPF
16:15 - 16:30 Wrap up  
16:30 Close  
19:00 Event dinner at Fromme Helene (http://www.frommehelene.at/)
(Please indicate if you would like to attend when registering)
 

Tuesday 3 December

Time Activity Facilitator
09:00 - 09:15 Coffee  
09:15 - 09:30 Welcome back and overview of the day  
09.30 - 10.00 Hacking
(Wiki write up
Document brainstorming session ideas and
develop plan for the remaining days)
Carl Wilson, OPF
10:00 - 11:00 Advanced Hadoop and MapReduce
Jimmy Lin, University of Maryland
11:00 - 11:15 Coffee break  
11:15 - 11.35 SCAPE Tool-to-MapReduce Wrapper
Hacking
Matthias Rella, Austrian Institute of Technology
12:15 - 13:30 Lunch
(Tour of the State Hall 13:00 - 13:30)
 
13:30 - 14:00 Hacking
Sven Schlarb, Austrian National Library 
Carl Wilson, OPF
14:00 - 14:30 Update wiki plans and develop requirements Clemens Neudecker, National Library of the Netherlands
14:30 - 15:30 HBase introduction and Warcbase project presentation Jimmy Lin, University of Maryland
15:30 - 15:45 Coffee break  
15:45 - 16:15 Hacking Sven Schlarb, Austrian National Library 
Carl Wilson, OPF
16:15 - 17:00 Group presentations Sven Schlarb, Austrian National Library 
Carl Wilson, OPF
17:00 Close  

Wednesday 4 December

Time Practitioner Activity Facilitator
09:00 - 09:15 Coffee  
09:15 - 09:30 Welcome back and overview for the day ONB / OPF
09:30 - 10:30 Update requirements / wiki write ups Clemens Neudecker, National Library of the Netherlands
10:30 - 10:45 Coffee break  
10:45 - 12:15 Adventures in implementing Hadoop  
Sharing experiences of implementing Hadoop in a real productive 
environment
Sven Schlarb, Austrian National Library 
Carl Wilson, OPF
12:15 - 13:15 Lunch  
13:15 - 13:45 Hacking Sven Schlarb, Austrian National Library 
Carl Wilson, OPF
13:45 - 14:15 Final hacking session 
Write ups & prepare presentations
Check in code
Sven Schlarb, Austrian National Library 
Carl Wilson, OPF
14:15 - 15:30 Final report back - presentations to the group Sven Schlarb, Austrian National Library 
Carl Wilson, OPF
15:30 - 15:45 Coffee break and voting  
15:45 - 16:30 *Competition winners announced*
Event evaluation and wrap up
Sven Schlarb, Austrian National Library 
Carl Wilson, OPF
16:30 Close  

Potential other topics:

  • Costs of setting up a Hadoop cluster
  • Image processing
  • OCR at scale
  • Operating Hadoop (lessons learned)
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.