Title | Harvest a web mail account and generate a ARC file out of it |
Detailed description | In order to harvest a web mail account, we use the JavaMail API (http://java.sun.com/developer/onlineTraining/JavaMail/contents.html#JavaMailIntro![]() As soon as we can connect to the pop3 store, we iterate over the messages and serialize their content to an arc file using the Heritrix utility (http://crawler.archive.org/ ![]() We choose to use the mailto schema to identify each record (message or part of a message) in the arc file. So for example, a message with 2 alternatives (one text, the other html) will appear as 3 records :
|
Solution Champion | Thomas Ledoux |
Corresponding Issue(s) | Web based email "harvesting" |
Tool/code link | The code can be found in the attached war file : PopmailArchive.war![]() |
Tool Registry Link![]() |
|
Evaluation | A needed extension is to be able to really go by HTTP(s) exchanges to solve not pop3 or imap providers or firewalls restrictions. Some programs such as MrPostman (http://sourceforge.net/projects/mrpostman/![]() CO: ARC was not original requirement, but this is ok. CO: This is the prototype that was required, and CO can now try this out with some testers. |
Labels: