Parsing PST OST file using TIKA
The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries. This solution uses Tika toolkit. http://tika.apache.org/
Fields used from the email messages in building the parser. The solution used the MBOX parser and modified it for the PST files.
- Has Attachment [true/false]
- Number of attachments
- Date Received
Create SAX events to run against parsed data.
"donald mennerich, "Gregory N. Jansen,
"Parsing PST and OST email files for textual mining and searching
"Apprasing OST file for restricted data
A link to code on Git hub or a corresponding myExperiment if applicable
Any notes or links on how the solution performed.