Skip to end of metadata
Go to start of metadata

2012-02-08
----------------

Andy, Ivan, Peter

http://www.openplanetsfoundation.org/blogs/2012-01-26-can-we-talk-about-fmt42-fmt43-and-fmt44 http://droid7.wikispaces.com/
Ivan working with Jenny to see if the MSR tool fits her needs.
Peter looking to QA work.

2012-01-30
----------------

Andy, Carl, Ivan, Markus, Peter, Sven

Andy: Tidying up DROID and Nanite code... (inc. Hadoop task).https://github.com/openplanets/nanite https://github.com/openplanets/nanite/blob/master/nanite-hadoop/src/main/java/uk/bl/wap/hadoop/profiler/ARCFormatProfiler.java https://github.com/openplanets/droid/tree/scape-dist
Carl: Travel. Jenkins.
Ivan: Preparing scenarios and issue ideas.
Markus: Working towards understanding Hadoop. Travel.
Peter: Travel prep.
Sven: Discussion scenarios, and definitions. Are they really strict triples? Sven prefers it if so. Also, make it clear that a solution uses other solutions. Also testing Asger's workflow.

No more virtual stand-up meetings this week as we're most/all in the SCAPE workshop.

2012-01-26
----------------

Andy, Carl, Markus, Peter, Sven

Andy:http://wiki.opf-labs.org/display/SP/Moving+code+in+the+shared+SCAPE+repository+into+a+dedicated+one http://wiki.opf-labs.org/display/SP/Example+-+Working+with+Apache+Tika
Carl: Travel.
Markus: Packaging files into sequence files for HADOOP processing.http://www.exmachinatech.net/01/forqlift/
Peter: Finishing RegEx patch for Tika.
Sven: How to use seqence files with the mapper.

2012-01-24
----------------

Andy, Markus, Peter

Andy: https://issues.apache.org/jira/browse/TIKA-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel#issue-tabs
Markus: Looking at porting Taverna workflows to Hadoop.
Peter: Mostly travel, and Tika RegEx patch.

2012-01-23
----------------

Andy, Carl, Markus, Peter, Sven

Andy: Look at making a Tika sig. for the new ibooks format. Tracing web pages also tested.
Carl: Testing lots of JP2s, thinking of testing PDFs (~1 million).
Markus: Karmasphere for Hadoop, any good?
Peter: Debugged RegEx patch for Tika, traced broken unit tests.
Sven: Sketching REF usage for results.

2012-01-18
----------------

Andy, Carl, Ivan, Markus, Peter, Rui, Shaul & Nir

https://wiki.duraspace.org/display/AKUBRA/Akubra+Project
Akubra API integrated into Rosetta, following Fedora - i.e. experimental HDFS backend.
Q: Is Akubra stable?
Q: Can Akubra support spooling to file?
TODO: Ask Frank about these things.

2012-01-18
----------------

Andy, Carl, Helder, Ivan, Markus, Peter, Sven.

http://wiki.opf-labs.org/display/SP/Technical+Coordinator
http://scapescenarioworkshop.eventbrite.co.uk/

2012-01-17
----------------

Andy, Carl, Ivan, Markus, Peter, Sven

https://issues.apache.org/jira/browse/TIKA-86
http://wiki.opf-labs.org/display/SP/List+of+source+code+repos+related+to+SCAPE
OPTIONAL: Performance measurement
http://wiki.opf-labs.org/display/SP/Process+performance+metrics
Using Taverna wall-clock time, ONB report very variable results, e.g. factors of 2/3 (local and ssh execution).

2012-01-16
----------------

Andy, Carl, Markus, Peter, Sven

2012-01-12
----------------

Andy, Carl, Davetaz, Markus, Peter, Sven, Ivan

NOTE: New REST service code in GitHub.
IMPACT->SCAPE server, plus HADOOP debs pseudo-cluster.

TODO: Call for more content? Content creation event?
http://www.forensicinnovations.com/fitools.html
http://delicious.com/beardedstoat/corpora
http://www.commoncrawl.org/data/accessing-the-data/

2012-01-11
----------------

Andy, Carl, Peter, Asger, Ivan, Markus.

TODO Asger ID python code in GitHub?
TODO Share ID code between Markus and Asger at some point.

2012-01-10
----------------

Andy: http://wiki.opf-labs.org/display/SP/SCAPE+Scenario+Workshop%2C+1-3+February+2012%2C+Portugal
Asger: Test framework for char tools, some minor python issues. GovDocs groundtruth and appended correct MIME Types. Using Fido.
Carl: As Andy.
Ivan: Will respond to scenario workshop. GovDocs has a lot of office files, so may be a useful test data set. https://domex.nps.edu/corp/files/govdocs1/ Note that @davetaz has arranged this corpus into zips by file extension: http://soton.corpora.openplanetsfoundation.org/?dir=zips_by_extension/
Marcus: Caught 'Too many open files'. ANJ hit the same issue with the DROID codebase aggressively opening file handles without really needing to.

2012-01-09
----------

Asger: GovDocs-based test harness. Using Groundtruth file parsing is difficult.
Carl: Back today, getting back to.
Dave: People passing data to him. Data sharing API, and software pages.
Ivan: UI/UX work for the migration, and page-wise side-by-side QA. Scaling-up to large numbers of docs. Looking for ways to compare structures of documents.
Markus: Web content testbed: Experimenting with workflow for mining web archives, Tika. Later, to compare with Fido and Droid.
Peter: Tika restful web service, on JAX-RS. Started C5 on the CC deliverable.
Sven: Also working on characterising WCT and JP2000 comparison with OCR.

TODO: Add http://blogs.msdn.com/b/mariok/archive/2011/05/11/hadoop-in-azure.aspx http://blogs.technet.com/b/microsoft_blog/archive/2011/10/12/microsoft-expands-data-platform-to-help-customers-manage-the-new-currency-of-the-cloud.aspx http://blogs.technet.com/b/port25/archive/2011/10/11/microsoft-hadoop-and-big-data.aspx to somewhere near http://wiki.opf-labs.org/display/SP/Installing+Local+Platform+Instance
NOTE: Proposal http://wiki.opf-labs.org/display/SP/Proposal+-+Extended+MIME+Type+Identifiers
NOTE: GovDocs is being hostd by OPF: http://soton.corpora.openplanetsfoundation.org/

GoToMeeting - Windows, Mac, iPhone, Android: See http://support.citrixonline.com/GoToMeeting/all_files/GTM010003

IDEA: Append org name to user name e.g. (BL), (SB).
IDEA: Work out why Android didn't connect.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.