|Title|| Preserving MS Outlook (.msg) E-mails with Attachments
|Detailed description|| When preserving complex electronic objects such as emails with attachments (MS Word, Excel etc) it is necessary to:
1. Identify the constituent parts of the email (the record)
2. Extract the attachments for preservation purposes
3. Harvest any required metadata from the attachments
4. Link the attachments back to the parent object (within the Digital Repository)
5. Ensure that the de-constructed parts retain the original data objects "recordness"
|Issue champion|| Larry Murray
| Other interested parties
||Any other parties who are also interested in applying Issue Solutions to their Datasets|
|Possible Solution approaches|| An initial suggestion is to use a commercially available product (e.g. File Investigator from Forensic Innovations) to examine the content of the .msg file.
The FIDO Project may also be of benefit - further investigation required.
|Context|| Details of the institutional context to the Issue. (May be expanded at a later date)
|Lessons Learned|| Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice
|Datasets|| PRONI Digital Preservation Project
|Solutions|| Preserving MS Outlook (.msg) E-mails with Attachments - Solution
Apr 17, 2012
Johan van der Knijff
Not sure if this is relevant at all here, but if you're dealing with whole PST files (rather than individual .msg files) you might find the libPST library (and its command line tools) useful. See also my earlier comment here: Identifying the content of Email Mailboxes. Also depends on whether you want to archive the emails in their native (Outlook) format, or migrate them to something more palatable (e.g. mbox). Just for info: I once used libPST's 'readPST' utility to convert a number of Outlook PSTs to mbox format, and I was quite pleased with the results (esp. handling of formatting and attachments).
Apr 18, 2012
Thanks for the suggestion Johan, but its all .msg rather than PST. Useful for future reference though!
Apr 19, 2012
Johan - Thank you very much for your input which is greatly appreciated.
My immediate problem is that the MSG files are saved directly into the the Northern Ireland Civil Service (NICS) EDRM system - TRIM.
We have a 3 month deletion rule on mailboxes and this means that users Exchange mailboxes are continually being emptied (unless off on long term sick leave) and the use of PST files is strictly limited.
There are approximately 1,000,000 MSG files in TRIM (growing daily!) of which approximately 20% will be deemed worthy of permanent preservation under current Retention & Disposal rules.
I can easily identify those which contain attachments through a TRIM metadata element "HasAttachments" which leads to my problem of how to preserve these emails and their associated attachments.
Maurice de Rooij championed this problem at the SPRUCE event and has put together a solution Preserving MS Outlook (.msg) E-mails with Attachments - Solution which solves a lot of the problems of separating out the attachments while maintaining some form of link to the original email MSG file. He is continuing to develop the solution to ignore those emails with no attachments and also to process recursively those emails which have nested email attachments - which have attachments etc. etc.
I imagine that the information you provided will be of future use when we are able to consider ingest of records from non-structured systems but for the moment we are focussing on records from the TRIM system.
Again, many thanks to you and especially Maurice for his excellent work.