Label: embedded_objects

All content with label embedded_objects.
Related Labels: word, spruce_glasgow, pdf, characterisation, jpg, ole, jpxfilter, data_capture, appraisal_assessment, png, prototyped, msg, java, aqua, gif, windows, office, solution, poi, more »

Page: Apache POI Office Document Analyser (AQuA)
One line summary A utility based on Apache POI that is able to analyse MS Office documents. Detailed description Uses POI to walk through the OLE file structures and look for embedded objects and their properties. \\ \\ \\ \\ \\ Solution champion anjackson Git link ...
Other labels: apache, poi, ms-office, office, ms, ole, word, excel
Page: Detect, extract and analyse embedded objects in PDFs (AQuA)
One line summary Detect and identify embedded objects in PDFs, then where appropriate extract and analyse analyse further \\ Detailed description The PDF specification is complex, and PDF files can contain other other objects, embedded at the file or page level ...
Other labels: pdf, objects, bmp, jpg, png, gif, tiff, pdfbox
Page: Embedded objects in PDFs (AQuA)
One line summary Need to detect embedded objects within PDFs                                          &nbsp ...
Other labels: pdf, issue
Page: Extracting embedded objects from docx files (Practical Preservation Issues)
Title \\ Extracting embedded objects from docx files Detailed description We preserve MS Word documents as docx files. We are reasonably confident that the XML structure preserves the report text and structure well. We are not so confident about ...
Other labels: york_hackathon, issue
Page: Extracting embedded objects from Office OpenXML documents (Practical Preservation Issues)
Title Extracting embedded objects from Office OpenXML documents Detailed description Overview: docXtractor is a python script using zipfile and lxml hooks to extract media from OOXML files (specifically docx in the current \\ alpha implementation). docXtractor parses ...
Other labels: solution
Page: Identifying the content of MS Office documents (AQuA)
One line summary We have OLE2 Office documents, which may contain more documents, and we want to identify which version of Office each was created by. \\ Detailed description The older binary Office document formats (OLE) are effectively ...
Other labels: prototyped, issue, characterisation, obsolescence, appraisal_assessment
Page: Preserving MS Outlook (.msg) E-mails with Attachments (SPRUCE)
Title Preserving MS Outlook (.msg) Emails with Attachments \\ Detailed description When preserving complex electronic objects such as emails with attachments (MS Word, Excel etc) it is necessary to:  \\ 1. Identify the constituent parts of the email (the record)  \\ 2 ...
Other labels: issue, spruce, spruce_glasgow
Page: Preserving MS Outlook (.msg) E-mails with Attachments - Solution (SPRUCE)
Title Preserving MS Outlook (.msg) Emails with Attachments \\ Detailed description The solution is a JAR executable which makes use of the msgparser Java library http://auxilii.com/msgparser/ to extract binary attachments from Microsoft Outlook MSG files. Extracted ...
Other labels: msg, attachment, extractor, java, batch, windows, microsoft, spruce
Page: Web based email "harvesting" (Practical Preservation Issues)
Title \\ Web based email "harvesting" Detailed description The setting is collecting private archives, more specific web based emails. It should be possible to automatically harvest emails from web based email accounts. The system should scale as the number ...
Other labels: york_hackathon, email, issue, harvesting, data_capture