View Source

h3. Helpful pages

[SPR:The SPRUCE Mashup Manifesto]






[REQ:Digital Preservation and Data Curation Requirements and Solutions]







[PT:Developing with the OPF]







h3. Monday 2 December

|| Time || Activity || Facilitator ||
| 09:00 - 09:20 | Coffee and registration | |
| 09:20 - 09:30 | Welcome and housekeeping | ONB |
| 09:30 - 10:00 | *Introduction* \\
Overview of the event \\
How the hackathon will work \\
\\
*Getting to know each other* \\
Who are you? What do you do? \\
What are you interested in working on at the event? | \\
Becky McGuinness, OPF \\
\\
\\
\\
Carl Wilson, OPF |
| 10:00 - 10:20 | [What is Hadoop? |^What is Hadoop.ppt] | Clemens Neudecker, National Library of the Netherlands \\ |
| 10:20 - 11:00 | *Scenario I: * \\
[Web-Archiving: File Format Identification/Characterisation |^01-scenario1-webarchiving.ppt] | Sven Schlarb, Austrian National Library |
| 11:00 - 11:15 | Coffee break | |
| 11:15 - 11:45 | *Scenario I continued...* \\
* DROID format identification
* Demo: Tika format identification
* Demo: Tika characterisation
* Practical exercises | Sven Schlarb, Austrian National Library \\
Carl Wilson, OPF \\ |
| 11:45 - 12:30 | *Scenario II:* \\
*[Digital Books: Quality Assurance, text mining (OCR Quality)|^ABO_intro_usecase.pdf]* | Stefan Majewski, Austrian National Library \\ |
| 12:30 - 13:30 | Lunch | |
| 13:30 - 14:15 | *Scenario II continued...* \\
* Demo: Book level processing (METS)
* Demo: Page level processing (JPEG2000, OCR)
* Practical exercises | Sven Schlarb, Austrian National Library |
| 14:15 - 15:15 | *Big data and Twitter* \\
[https://github.com/lintool/warcbase/tree/pig|https://github.com/lintool/warcbase/tree/pig]\\
[Pig scripts|^hackathon-demo.txt]\\ | Jimmy Lin, University of Maryland \\ |
| 15:15 - 15:30 | Coffee break | |
| 15:30 - 16:15 | *Group work*  \\
* Divide into groups by scenario/topic
* Brainstorm ideas for development work
* Assign topic owners
* Agree on who will [document the progress on the wiki|REQ:Digital Preservation and Data Curation Requirements and Solutions] | Sven Schlarb, Austrian National Library \\
Carl Wilson, OPF |
| 16:15 - 16:30 | Wrap up | |
| 16:30 | *Close* | |
| 19:00 | Event dinner at Fromme Helene ([http://www.frommehelene.at/|http://www.frommehelene.at/]) \\
(Please indicate if you would like to attend when registering) | |

h3. Tuesday 3 December

|| Time || Activity || Facilitator ||
| 09:00 - 09:15 | Coffee | |
| 09:15 - 09:30 | Welcome back and overview of the day | |
| 09.30 - 10.00 | *Hacking* \\
*(Wiki write up* \\
Document brainstorming session ideas and \\
develop plan for the remaining days) | Carl Wilson, OPF |
| 10:00 - 11:00 | *Advanced Hadoop and MapReduce* \\ | Jimmy Lin, University of Maryland |
| 11:00 - 11:15 | Coffee break | |
| 11:15 - 11.35 | {color:#222222}{*}SCAPE Tool-to-MapReduce Wrapper{*}{color}\\
{color:#222222}{*}Hacking{*}{color} | Matthias Rella, Austrian Institute of Technology |
| 12:15 - 13:30 | Lunch \\
(Tour of the State Hall 13:00 - 13:30) | |
| 13:30 - 14:00 | *Hacking* \\ | Sven Schlarb, Austrian National Library  \\
Carl Wilson, OPF |
| 14:00 - 14:30 | *Update wiki plans* *and develop requirements* | Clemens Neudecker, National Library of the Netherlands |
| 14:30 - 15:30 | *HBase introduction and Warcbase project presentation* | Jimmy Lin, University of Maryland |
| 15:30 - 15:45 | Coffee break | |
| 15:45 - 16:15 | *Hacking* | Sven Schlarb, Austrian National Library  \\
Carl Wilson, OPF \\ |
| 16:15 - 17:00 | *Group presentations* | Sven Schlarb, Austrian National Library  \\
Carl Wilson, OPF \\ |
| 17:00 | Close | |

h3. Wednesday 4 December

|| Time || Practitioner Activity || Facilitator ||
| 09:00 - 09:15 | Coffee | |
| 09:15 - 09:30 | Welcome back and overview for the day | ONB / OPF |
| 09:30 - 10:30 | *Update requirements / wiki write ups* | Clemens Neudecker, National Library of the Netherlands |
| 10:30 - 10:45 | Coffee break | |
| 10:45 - 12:15 | *Adventures in implementing Hadoop *  \\
Sharing experiences of implementing Hadoop in a real productive  \\
environment | Sven Schlarb, Austrian National Library  \\
Carl Wilson, OPF |
| 12:15 - 13:15 | Lunch | |
| 13:15 - 13:45 | *Hacking* | Sven Schlarb, Austrian National Library  \\
Carl Wilson, OPF |
| 13:45 - 14:15 | Final hacking session  \\
*Write ups & prepare presentations* \\
Check in code \\ | Sven Schlarb, Austrian National Library  \\
Carl Wilson, OPF |
| 14:15 - 15:30 | *Final report back - presentations to the group* | Sven Schlarb, Austrian National Library  \\
Carl Wilson, OPF \\ |
| 15:30 - 15:45 | Coffee break and voting | |
| 15:45 - 16:30 | {color:#339966}*\*Competition winners announced\**{color}\\
{color:#000000}{*}Event evaluation and wrap up{*}{color}\\ | Sven Schlarb, Austrian National Library  \\
Carl Wilson, OPF |
| 16:30 | Close | |

*Potential other topics*:


* Costs of setting up a Hadoop cluster
* Image processing
* OCR at scale
* Operating Hadoop (lessons learned)