Label: embedded_objects+solution

Content with label embedded_objects+solution in Practical Preservation Issues (See content from all spaces)
Related Labels: validation, opf_montpellier, convert, characterisation, email, identification, vector, opf, york_hackathon, issue, data_capture, harvesting, untagged, fonts, bit_rot_detection, preservingpdf, postscript, migration, quality_assurance, more » ( - embedded_objects, - solution )

Page: Extracting embedded objects from Office OpenXML documents
Title Extracting embedded objects from Office OpenXML documents Detailed description Overview: docXtractor is a python script using zipfile and lxml hooks to extract media from OOXML files (specifically docx in the current \\ alpha implementation). docXtractor parses ...