Label: pdf

Related Labels: mj2, validation, characterisation, embedded_objects, jpg, jpxfilter, jp2k, png, document, jpeg2000, qa, jpm, xml, extraction, ocr, dependency, mets, java, acroform, more »

Page: Born-digital - migration success
One line summary Checking whether an automated normalisation produces a surrogate of sufficient quality ... Detailed description "sufficient" obviously needs to be defined in terms of significant properties relevant to the context but are there some checks which can be run to determine whether ...
Other labels: qa, comparison, characterise, office, issue
Page: Check consistency between metadata and content
One line summary Check that the METS, OCR, JPEG2000 masters and the PDFs are consistent \\ Detailed description As shown in the diagram below, check images and ALTO files information defined in METS against the real files stored in separate Zip files. Also ...
Other labels: mets, ocr, metadata, jpeg2000, jp2k, jp2, jpx, mj2
Page: Detect, extract and analyse embedded objects in PDFs
One line summary Detect and identify embedded objects in PDFs, then where appropriate extract and analyse analyse further \\ Detailed description The PDF specification is complex, and PDF files can contain other other objects, embedded at the file or page level ...
Other labels: objects, bmp, jpg, png, gif, tiff, pdfbox, jpxfilter
Page: Embedded links within the PDF
One line summary Need to identify links embedded within PDFs and check whether they are still live                                  &nbsp ...
Other labels: issue, obsolescence, dependency
Page: Embedded objects in PDFs
One line summary Need to detect embedded objects within PDFs                                          &nbsp ...
Other labels: issue, embedded_objects
Page: Open Access PDFs
Basic description Open Access research outputs and etheses (sample being used from White Rose Research Online and White Rose eTheses Online)                     &nbsp ...
Other labels: aqua, dataset, document
Page: PDF Characterisation Tool
One line summary Java program to characterise PDF files, looking for preservation concerns.                                     &nbsp ...
Other labels: characterise, pdfbox, api, fonts, issue, acroform, embedded, jpeg2000