Title
Parsing PST OST file using TIKA
Detailed description
The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries. This solution uses Tika toolkit. http://tika.apache.org/
Fields used from the email messages in building the parser. The solution used the MBOX parser and modified it for the PST files.
- Sender
- Recipient
- Has Attachment [true/false]
- Number of attachments
- Subject
- Date Received
- Body
Create SAX events to run against parsed data.
Solution Champion
"donald mennerich, "
Gregory N. Jansen,
Corresponding Issue(s)
"Parsing PST and OST email files for textual mining and searching
"Apprasing OST file for restricted data
Tool/code link
A link to code on Git hub or a corresponding myExperiment if applicable
https://github.com/dmmd/OPF_Hack
Evaluation
Any notes or links on how the solution performed.