Title
Apprasing OST file for restricted data (as different from PPI data).
Detailed description
This includes initial migration to open, non-proprietary mbox format. The first step is to obtain a PST file from an OST file. Migration to non-proprietary format - a single-platform solution is fine, since this is an occasional use case. Capture the metadata about the conversion process that could be later incorporated into PREMIS event records. Output log file details the tool specifics. Enable parsing of the messages within an OST file so that the data can be searched for restricted data (using case-specific key words). This issue includes several steps and the solutions listed below address beginning tasks in the workflow that would achieve the total solution. The Solution approaches listed below are other approaches that might also assist with parts of the solution set for this issue.
Issue champion
Kari Smith,
Bill LeFurgy
Other interested parties
Any other parties who are also interested in applying Issue Solutions to their Datasets.
Possible Solution approaches
Brief brainstorm of possible approaches to solving the Issue. Each approach should be described in a single sentence as part of a bulleted list. Further detail can go in a dedicated Solution page.
http://epadd.stanford.edu/muse/archives/ - summary metadata module for MUSE, word clouds, graphs, etc..
https://github.com/rjohnsondev/java-libpst - A library to read PST files with java, without need for external libraries.
https://sites.google.com/a/brown.edu/google-migration-project-site/home/migration/gammo - google apps email migration tool
http://sourceforge.net/projects/pedalsemailextr/ - Email message to XML file extractor for digital preservation created by the Persistent Digital Archives and Library System (PeDALS) research project.
http://siarchives.si.edu/cerp/parserdownload.htm - squeak/smalltalk PST parser.
http://www.records.ncdcr.gov/emailpreservation/technical_resources.htm - EMCAP NC State Archives email converter
http://www.five-ten-sg.com/libpst/ - C library and linux utilities for migrating PST/OST
https://code.google.com/p/libpff/ - Library and tools to access the Personal Folder File (PFF) and the Offline Folder File (OFF) format.
Analysis of Lucene Index Word Frequency - Lucene/Solr is a good base for creating search/browse and other viz features
http://mobisocial.stanford.edu/muse/ -
http://tika.apache.org/1.3/parser.html
Context
Details of the institutional context to the Issue.
Lessons Learned
Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice
Datasets
Reference to the appropriate Dataset page, by hyperlink. Note that all Issues MUST be linked to at least one Dataset!
OST archive with attachments - MIT IASC
Email archive in OST format (LeFurgy)
Solutions
"Parsing PST OST file using TIKA