Extracting embedded objects from docx files

Extracting embedded objects from docx files
Detailed description We preserve MS Word documents as docx files. We are reasonably confident that the XML structure preserves the report text and structure well. We are not so confident about other objects within the files and wish to preserve them separately, or at least assess whether we should be preserving them separately. Most report files have images within them (preserved as emf within docx file) and some have embedded spreadsheets (as xlsx) and potentialy other embedded objects. We need a tool that looks into the zip folder structure of the docx (specifically in the /media/ folder) and extracts the content to a separate place where it can be dealt with accordingly.
Issue champion Jenny Mitcham
Possible Solution approaches Python script is being written to solve this problem
Lessons Learned Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice
Datasets ADS Grey Literature Library
Solutions Extracting embedded objects from Office OpenXML documents
