Helpful pages
Digital Preservation and Data Curation Requirements and Solutions
Monday 2 December
Time | Activity | Facilitator |
---|---|---|
09:00 - 09:20 | Coffee and registration | |
09:20 - 09:30 | Welcome and housekeeping | ONB |
09:30 - 10:00 | Introduction Overview of the event How the hackathon will work Getting to know each other Who are you? What do you do? What are you interested in working on at the event? |
Becky McGuinness, OPF Carl Wilson, OPF |
10:00 - 10:20 | What is Hadoop? ![]() |
Clemens Neudecker, National Library of the Netherlands |
10:20 - 11:00 | Scenario I: Web-Archiving: File Format Identification/Characterisation ![]() |
Sven Schlarb, Austrian National Library |
11:00 - 11:15 | Coffee break | |
11:15 - 11:45 | Scenario I continued...
|
Sven Schlarb, Austrian National Library Carl Wilson, OPF |
11:45 - 12:30 | Scenario II: Digital Books: Quality Assurance, text mining (OCR Quality) ![]() |
Stefan Majewski, Austrian National Library |
12:30 - 13:30 | Lunch | |
13:30 - 14:15 | Scenario II continued...
|
Sven Schlarb, Austrian National Library |
14:15 - 15:15 | Big data and Twitter https://github.com/lintool/warcbase/tree/pig ![]() Pig scripts ![]() |
Jimmy Lin, University of Maryland |
15:15 - 15:30 | Coffee break | |
15:30 - 16:15 | Group work
|
Sven Schlarb, Austrian National Library Carl Wilson, OPF |
16:15 - 16:30 | Wrap up | |
16:30 | Close | |
19:00 | Event dinner at Fromme Helene (http://www.frommehelene.at/![]() (Please indicate if you would like to attend when registering) |
Tuesday 3 December
Time | Activity | Facilitator |
---|---|---|
09:00 - 09:15 | Coffee | |
09:15 - 09:30 | Welcome back and overview of the day | |
09.30 - 10.00 | Hacking (Wiki write up Document brainstorming session ideas and develop plan for the remaining days) |
Carl Wilson, OPF |
10:00 - 11:00 | Advanced Hadoop and MapReduce |
Jimmy Lin, University of Maryland |
11:00 - 11:15 | Coffee break | |
11:15 - 11.35 | SCAPE Tool-to-MapReduce Wrapper Hacking |
Matthias Rella, Austrian Institute of Technology |
12:15 - 13:30 | Lunch (Tour of the State Hall 13:00 - 13:30) |
|
13:30 - 14:00 | Hacking |
Sven Schlarb, Austrian National Library Carl Wilson, OPF |
14:00 - 14:30 | Update wiki plans and develop requirements | Clemens Neudecker, National Library of the Netherlands |
14:30 - 15:30 | HBase introduction and Warcbase project presentation | Jimmy Lin, University of Maryland |
15:30 - 15:45 | Coffee break | |
15:45 - 16:15 | Hacking | Sven Schlarb, Austrian National Library Carl Wilson, OPF |
16:15 - 17:00 | Group presentations | Sven Schlarb, Austrian National Library Carl Wilson, OPF |
17:00 | Close |
Wednesday 4 December
Time | Practitioner Activity | Facilitator |
---|---|---|
09:00 - 09:15 | Coffee | |
09:15 - 09:30 | Welcome back and overview for the day | ONB / OPF |
09:30 - 10:30 | Update requirements / wiki write ups | Clemens Neudecker, National Library of the Netherlands |
10:30 - 10:45 | Coffee break | |
10:45 - 12:15 | Adventures in implementing Hadoop Sharing experiences of implementing Hadoop in a real productive environment |
Sven Schlarb, Austrian National Library Carl Wilson, OPF |
12:15 - 13:15 | Lunch | |
13:15 - 13:45 | Hacking | Sven Schlarb, Austrian National Library Carl Wilson, OPF |
13:45 - 14:15 | Final hacking session Write ups & prepare presentations Check in code |
Sven Schlarb, Austrian National Library Carl Wilson, OPF |
14:15 - 15:30 | Final report back - presentations to the group | Sven Schlarb, Austrian National Library Carl Wilson, OPF |
15:30 - 15:45 | Coffee break and voting | |
15:45 - 16:30 | *Competition winners announced* Event evaluation and wrap up |
Sven Schlarb, Austrian National Library Carl Wilson, OPF |
16:30 | Close |
Potential other topics:
- Costs of setting up a Hadoop cluster
- Image processing
- OCR at scale
- Operating Hadoop (lessons learned)
Labels:
None