Content with label embedded_objects+solution in Practical Preservation Issues (See content from all spaces)
Related Labels:
validation, opf_montpellier, convert, characterisation, email, identification, vector, opf, york_hackathon, issue, data_capture, harvesting, untagged, fonts, bit_rot_detection, preservingpdf, postscript, migration, quality_assurance,
rights
more »
( - embedded_objects, - solution )
Page:
Extracting embedded objects from Office OpenXML documents
Title Extracting embedded objects from Office OpenXML documents Detailed description Overview: docXtractor is a python script using zipfile and lxml hooks to extract media from OOXML files (specifically docx in the current \\ alpha implementation). docXtractor parses ...
|
|
|