Skip to end of metadata
Go to start of metadata


Andy, Ivan, Peter
Ivan working with Jenny to see if the MSR tool fits her needs.
Peter looking to QA work.


Andy, Carl, Ivan, Markus, Peter, Sven

Andy: Tidying up DROID and Nanite code... (inc. Hadoop task).
Carl: Travel. Jenkins.
Ivan: Preparing scenarios and issue ideas.
Markus: Working towards understanding Hadoop. Travel.
Peter: Travel prep.
Sven: Discussion scenarios, and definitions. Are they really strict triples? Sven prefers it if so. Also, make it clear that a solution uses other solutions. Also testing Asger's workflow.

No more virtual stand-up meetings this week as we're most/all in the SCAPE workshop.


Andy, Carl, Markus, Peter, Sven

Carl: Travel.
Markus: Packaging files into sequence files for HADOOP processing.
Peter: Finishing RegEx patch for Tika.
Sven: How to use seqence files with the mapper.


Andy, Markus, Peter

Markus: Looking at porting Taverna workflows to Hadoop.
Peter: Mostly travel, and Tika RegEx patch.


Andy, Carl, Markus, Peter, Sven

Andy: Look at making a Tika sig. for the new ibooks format. Tracing web pages also tested.
Carl: Testing lots of JP2s, thinking of testing PDFs (~1 million).
Markus: Karmasphere for Hadoop, any good?
Peter: Debugged RegEx patch for Tika, traced broken unit tests.
Sven: Sketching REF usage for results.


Andy, Carl, Ivan, Markus, Peter, Rui, Shaul & Nir
Akubra API integrated into Rosetta, following Fedora - i.e. experimental HDFS backend.
Q: Is Akubra stable?
Q: Can Akubra support spooling to file?
TODO: Ask Frank about these things.


Andy, Carl, Helder, Ivan, Markus, Peter, Sven.


Andy, Carl, Ivan, Markus, Peter, Sven
OPTIONAL: Performance measurement
Using Taverna wall-clock time, ONB report very variable results, e.g. factors of 2/3 (local and ssh execution).


Andy, Carl, Markus, Peter, Sven


Andy, Carl, Davetaz, Markus, Peter, Sven, Ivan

NOTE: New REST service code in GitHub.
IMPACT->SCAPE server, plus HADOOP debs pseudo-cluster.

TODO: Call for more content? Content creation event?


Andy, Carl, Peter, Asger, Ivan, Markus.

TODO Asger ID python code in GitHub?
TODO Share ID code between Markus and Asger at some point.


Asger: Test framework for char tools, some minor python issues. GovDocs groundtruth and appended correct MIME Types. Using Fido.
Carl: As Andy.
Ivan: Will respond to scenario workshop. GovDocs has a lot of office files, so may be a useful test data set. Note that @davetaz has arranged this corpus into zips by file extension:
Marcus: Caught 'Too many open files'. ANJ hit the same issue with the DROID codebase aggressively opening file handles without really needing to.


Asger: GovDocs-based test harness. Using Groundtruth file parsing is difficult.
Carl: Back today, getting back to.
Dave: People passing data to him. Data sharing API, and software pages.
Ivan: UI/UX work for the migration, and page-wise side-by-side QA. Scaling-up to large numbers of docs. Looking for ways to compare structures of documents.
Markus: Web content testbed: Experimenting with workflow for mining web archives, Tika. Later, to compare with Fido and Droid.
Peter: Tika restful web service, on JAX-RS. Started C5 on the CC deliverable.
Sven: Also working on characterising WCT and JP2000 comparison with OCR.

TODO: Add to somewhere near
NOTE: Proposal
NOTE: GovDocs is being hostd by OPF:

GoToMeeting - Windows, Mac, iPhone, Android: See

IDEA: Append org name to user name e.g. (BL), (SB).
IDEA: Work out why Android didn't connect.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.