PDF to PDF-A conversion

Skip to end of metadata
Go to start of metadata
Title
PDF to PDF-A conversion
Detailed description The process of converting pdf files to pdf/a is one of our most time-consuming tasks. It is frustrating and not always successful. Also, the process can not be run as a batch process. The conversion often fails (most common reason for failure is missing fonts). Some pdf files can not be converted at all. Is there a way of making this task easier? One way that might help is being able to report on a batch of pdfs and highlight those that are doomed to failure before we waste any time on them.
Issue champion Jenny Mitcham
Other interested parties
Leeds / White Rose would be interested in solution for our repositories (would use it on etheses first then the research repository)
Possible Solution approaches
  • PDFbox
  • JHOVE
  • Tool to highlight all fonts needed for conversion - then we can see if we have all the necessary fonts before conversion
Context We receive many files now (particularly grey literature library files) as pdf. These come in a variety of different versions. Some are secured, some are password protected, some have errors, some have missing fonts, some seem fine but still won't convert to pdf/a. We have about 200 of these to convert to pdf/a every month and it is one of our least favourite tasks!
Lessons Learned Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice
Datasets ADS Grey Literature Library
eTheses
Solutions PDF to PDF-A Conversion Pre-Processor
Labels:
issue issue Delete
york_hackathon york_hackathon Delete
qa qa Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.