Parsing PST OST file using TIKA

Skip to end of metadata
Go to start of metadata

Parsing PST OST file using TIKA

Detailed description

The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries. This solution uses Tika toolkit.

Fields used from the email messages in building the parser.  The solution used the MBOX parser and modified it for the PST files.

  • Sender
  • Recipient
  • Has Attachment [true/false]
  • Number of attachments
  • Subject
  • Date Received
  • Body

Create SAX events to run against parsed data.

Solution Champion

"donald mennerich, "Gregory N. Jansen,

"Carl Wilson

Corresponding Issue(s)
"Parsing PST and OST email files for textual mining and searching

"Apprasing OST file for restricted data

Tool/code link
A link to code on Git hub or a corresponding myExperiment if applicable

Tool Registry Link

Any notes or links on how the solution performed.

chapel_hill chapel_hill Delete
solution solution Delete
appraisal_assessment appraisal_assessment Delete
characterisation characterisation Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.