compared with
Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (2)

View Page History


For now clone [https://github.com/cneud/warcbase&nbsp] and checkout the pig-integration branch. Running the unit tests will run the above Pig Latin script on the provided test gzip'ed ARC file. The language distribution reported by Tika is:

{code}
org.warcbase.pig.ArcLoader() as (url: chararray, date:chararray, mime:chararray, content:chararray);

-- Detect the mime type of the content using tika
a = foreach raw generate url,mime,content, org.warcbase.pig.piggybank.DetectMimeType(content) as tikaMime;