View Source

2012-02-08
----------------

Andy, Ivan, Peter

http://www.openplanetsfoundation.org/blogs/2012-01-26-can-we-talk-about-fmt42-fmt43-and-fmt44
http://droid7.wikispaces.com/
Ivan working with Jenny to see if the MSR tool fits her needs.
Peter looking to QA work.


2012-01-30
\---------------\-

Andy, Carl, Ivan, Markus, Peter, Sven

Andy: Tidying up DROID and Nanite code... (inc. Hadoop task).[https://github.com/openplanets/nanite] [https://github.com/openplanets/nanite/blob/master/nanite-hadoop/src/main/java/uk/bl/wap/hadoop/profiler/ARCFormatProfiler.java] [https://github.com/openplanets/droid/tree/scape-dist]
Carl: Travel. Jenkins.
Ivan: Preparing scenarios and issue ideas.
Markus: Working towards understanding Hadoop. Travel.
Peter: Travel prep.
Sven: Discussion scenarios, and definitions. Are they really strict triples? Sven prefers it if so. Also, make it clear that a solution uses other solutions. Also testing Asger's workflow.


No more virtual stand-up meetings this week as we're most/all in the SCAPE workshop.



2012-01-26
\---------------\-

Andy, Carl, Markus, Peter, Sven

Andy:[http://wiki.opf-labs.org/display/SP/Moving+code+in+the+shared+SCAPE+repository+into+a+dedicated+one] [http://wiki.opf-labs.org/display/SP/Example+-+Working+with+Apache+Tika]
Carl: Travel.
Markus: Packaging files into sequence files for HADOOP processing.[http://www.exmachinatech.net/01/forqlift/]
Peter: Finishing RegEx patch for Tika.
Sven: How to use seqence files with the mapper.

2012-01-24
\---------------\-

Andy, Markus, Peter

Andy: [https://issues.apache.org/jira/browse/TIKA-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel#issue-tabs]
Markus: Looking at porting Taverna workflows to Hadoop.
Peter: Mostly travel, and Tika RegEx patch.

2012-01-23
\---------------\-

Andy, Carl, Markus, Peter, Sven

Andy: Look at making a Tika sig. for the new ibooks format. Tracing web pages also tested.
Carl: Testing lots of JP2s, thinking of testing PDFs (~1 million).
Markus: Karmasphere for Hadoop, any good?
Peter: Debugged RegEx patch for Tika, traced broken unit tests.
Sven: Sketching REF usage for results.

2012-01-18
\---------------\-

Andy, Carl, Ivan, Markus, Peter, Rui, Shaul & Nir

[https://wiki.duraspace.org/display/AKUBRA/Akubra+Project]
Akubra API integrated into Rosetta, following Fedora - i.e. experimental HDFS backend.
Q: Is Akubra stable?
Q: Can Akubra support spooling to file?
TODO: Ask Frank about these things.


2012-01-18
\---------------\-

Andy, Carl, Helder, Ivan, Markus, Peter, Sven.

[http://wiki.opf-labs.org/display/SP/Technical+Coordinator]
[http://scapescenarioworkshop.eventbrite.co.uk/]


2012-01-17
\---------------\-

Andy, Carl, Ivan, Markus, Peter, Sven
\\

[https://issues.apache.org/jira/browse/TIKA-86]
[http://wiki.opf-labs.org/display/SP/List+of+source+code+repos+related+to+SCAPE]
OPTIONAL: Performance measurement
[http://wiki.opf-labs.org/display/SP/Process+performance+metrics]
Using Taverna wall-clock time, ONB report very variable results, e.g. factors of 2/3 (local and ssh execution).

2012-01-16
\---------------\-

Andy, Carl, Markus, Peter, Sven

2012-01-12
\---------------\-

Andy, Carl, Davetaz, Markus, Peter, Sven, Ivan

NOTE: New REST service code in GitHub.
IMPACT->SCAPE server, plus HADOOP debs pseudo-cluster.

TODO: Call for more content? Content creation event?
[http://www.forensicinnovations.com/fitools.html]
[http://delicious.com/beardedstoat/corpora]
[http://www.commoncrawl.org/data/accessing-the-data/]

2012-01-11
\---------------\-

Andy, Carl, Peter, Asger, Ivan, Markus.

TODO Asger ID python code in GitHub?
TODO Share ID code between Markus and Asger at some point.


2012-01-10
\---------------\-

Andy: [http://wiki.opf-labs.org/display/SP/SCAPE+Scenario+Workshop%2C+1-3+February+2012%2C+Portugal]
Asger: Test framework for char tools, some minor python issues. GovDocs groundtruth and appended correct MIME Types. Using Fido.
Carl: As Andy.
Ivan: Will respond to scenario workshop. GovDocs has a lot of office files, so may be a useful test data set. [https://domex.nps.edu/corp/files/govdocs1/] Note that @davetaz has arranged this corpus into zips by file extension: [http://soton.corpora.openplanetsfoundation.org/?dir=zips_by_extension/]
Marcus: Caught 'Too many open files'. ANJ hit the same issue with the DROID codebase aggressively opening file handles without really needing to.



2012-01-09
\---------\-

Asger: GovDocs-based test harness. Using Groundtruth file parsing is difficult.
Carl: Back today, getting back to.
Dave: People passing data to him. Data sharing API, and software pages.
Ivan: UI/UX work for the migration, and page-wise side-by-side QA. Scaling-up to large numbers of docs. Looking for ways to compare structures of documents.
Markus: Web content testbed: Experimenting with workflow for mining web archives, Tika. Later, to compare with Fido and Droid.
Peter: Tika restful web service, on JAX-RS. Started C5 on the CC deliverable.
Sven: Also working on characterising WCT and JP2000 comparison with OCR.


TODO: Add [http://blogs.msdn.com/b/mariok/archive/2011/05/11/hadoop-in-azure.aspx] [http://blogs.technet.com/b/microsoft_blog/archive/2011/10/12/microsoft-expands-data-platform-to-help-customers-manage-the-new-currency-of-the-cloud.aspx] [http://blogs.technet.com/b/port25/archive/2011/10/11/microsoft-hadoop-and-big-data.aspx] to somewhere near [http://wiki.opf-labs.org/display/SP/Installing+Local+Platform+Instance]
NOTE: Proposal [http://wiki.opf-labs.org/display/SP/Proposal+-+Extended+MIME+Type+Identifiers]
NOTE: GovDocs is being hostd by OPF: [http://soton.corpora.openplanetsfoundation.org/]

GoToMeeting - Windows, Mac, iPhone, Android: See [http://support.citrixonline.com/GoToMeeting/all_files/GTM010003]

IDEA: Append org name to user name e.g. (BL), (SB).
IDEA: Work out why Android didn't connect.