Extracting embedded objects from docx files
Extracting embedded objects from docx files

We preserve MS Word documents as docx files. We are reasonably confident that the XML structure preserves the report text and structure well. We are not so confident about
Web based email "harvesting"
Web based email "harvesting"

The setting is collecting private archives, more specific web based emails. It should be possible to automatically harvest emails from web based email accounts. The system should scale as the number
