Fonts missing, damaged or incomplete

Skip to end of metadata
Go to start of metadata

Description

PDFs may use fonts that are either not embedded in the file, damaged or incomplete.

Risks

If fonts are not embedded, or if embedded fonts are damaged or otherwise incomplete, PDFs may be rendered incorrectly.

Assessment

The following table shows the relevant output of Apache Preflight (part of Apache PDFBox) for PDFs with non-embedded fonts. Results obtained with Preflight 2.0.0:

Reference file Description Error Code(s) Details
text_only_fontsNotEmbedded.pdf Used fonts are not embedded 3.1.3 Invalid Font definition, FontFile entry is missing from FontDescriptor for TimesNewRomanPSMT
test_fontArialNotEmbedded.pdf Some fonts not embedded (other fonts are) 3.1.3 Invalid Font definition, FontFile entry is missing from FontDescriptor for Arial,BoldItalic / Arial /TimesNewRoman / ...(multiple messages)

However, Preflight is able to report many more font issues, which are broadly subdivide into the following categories:

  1. Invalid or incomplete font dictionary errors. This includes a wide range of problems, including fonts that are not embedded.
  2. Damaged embedded font errors.
  3. Glyph errors.

The table below shows all possible errors; the descriptions are taken from the comments in Preflight's source code.

Error code Description
3 Main error code for font problems
  Invalid or incomplete font data errors
3.1 Main error code for invalid data in font
3.1.1 Some mandatory fields are missing from the FONT Dictionary
3.1.2 Some mandatory fields are missing from the FONT Descriptor Dictionary
3.1.3 Error on the "Font File x" in the Font Descriptor (ex : FontFile and FontFile2 are present in the same dictionary)
3.1.4 Charset declaration is missing in a Type 1 Subset
3.1.5 Encoding is inconsistent with the Font (ex : Symbolic TrueType mustn't declare encoding)
3.1.6 Width array and Font program Width are inconsistent
3.1.7 Required entry in a Composite Font dictionary is missing
3.1.8 The CIDSystemInfo dictionary is invalid
3.1.9 The CIDToGID is invalid
3.1.10 The CMap of the Composite Font is missing or invalid
3.1.11 The CIDSet entry i mandatory from a subset of composite font
3.1.12 The CMap of the Composite Font is missing or invalid
3.1.13 Encoding entry can't be read due to IOException
3.1.14 The font type is unknown
  Damaged embedded font errors
3.2 The embedded font is damaged
3.2.1 The embedded Type1 font is damaged
3.2.2 The embedded TrueType font is damaged
3.2.3 The embedded composite font is damaged
3.2.4 The embedded type 3 font is damaged
3.2.5 The embedded CID Map is damaged
  Glyph errors
3.3 Common error for a Glyph problem
3.3.1 a glyph is missing
3.3.2 a glyph is missing

Not all of these errors are equally "serious" (e.g. errors 3.1.4, 3.1.5 and 3.1.6 appear to be relatively harmless). It may be advisable to consider the presence of any of the above errors (maybe except 3.1.4, 3.15 and 3.1.6) to be indicative of a font-related issue, although this may be overly restrictive in some cases (this section needs more work / examples/ evidence).

Note on non-embedded fonts

Based on a number of tests, non-embedded fonts usually appear to return error code 3.1.3, although the description of that error indicates that it may including other font issues as well. Also, the results of this Analysis of Acrobat Engineering PDFs with Acrobat Preflight and Apache Preflight indicate that in some cases non-embedded fonts may produce other error codes. This is all a bit unclear and may need further investigation.

Recommendations

Pre-ingest

  • Formulate policy on how to deal with non-embedded, damaged or incomplete fronts.
  • Use Apache Preflight to check for font errors. Depending on the provenance of the PDFs this may result in many font errors being reported. As the meaning of Preflight's font error codes is not 100% clear, this may not be a viable solution (yet) in operational workflows.

Existing collections

  • Use Apache Preflight to check for errors. However, this may not be a practical solution yet for the reason listed above.

Example files

Labels:
formatissue formatissue Delete
pdf pdf Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.