Identifying the content of MS Office documents

Skip to end of metadata
Go to start of metadata
One line summary We have OLE2 Office documents, which may contain more documents, and we want to identify which version of Office each was created by.
Detailed description The older binary Office document formats (OLE) are effectively file systems, and the format information only really gives very superficial information about the object. We can tell that it is an OLE 2.0 Compound Document, but need to know which kind and what the creating application was. OLE can also contain sub-objects, so we want to know about that too.

Issue champion Mette van Essen
Possible approaches Use Apache POI ( to deconstruct the object.
Use doc2x etc. ( to transform the older format documents to the new OOXML formats and examine those.
Use the commercial library to analyse the object.
AQuA Solutions Apache POI Office Document Analyser
Collections MS Word 97-2003 Documents (NANETH)
prototyped prototyped Delete
issue issue Delete
characterisation characterisation Delete
embedded_objects embedded_objects Delete
obsolescence obsolescence Delete
appraisal_assessment appraisal_assessment Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.