0

This question continues one begun in commments to https://tex.stackexchange.com/a/551291/13492.

The issue:

To determine all those glyphs in pdf output, as from the MWE below, that are preventing pdf validation (under standard PDF a 2-u). Namely, because the file fails the rule "The Font dictionary of all fonts shall define the map of all used character codes to Unicode values, either via a ToUnicode entry, or other mechanisms as defined in ISO 19005-2, 6.2.11.7.2."

The two commands\pdfcompresslevel=0 and \pdfobjcompresslevel=0 cause the output from pdflatex to be a readable, pure-ASCII file.

The question:

What do I look for in the ascii pdf so as to subsequently include a suitable \pdfglyphtounicode command?

If I process the file with pdflatex first, as it is shown, and then again but with the two lines

    \pdfglyphtounicode{summationdisplay.1}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{summationdisplay}{0060 0060 0060 0060 0060 0060 0060 0060}%  

commented out, and if I then examine the resulting ascii pdfs, the only substantial difference I detect is:

  • with those two lines included in the source, the pdf includes once the line

    dup 213 /summationdisplay.1 put

and then the line:

/CharSet (/radicalBigg/radicalbig/radicalbigg/summationdisplay.1/uni222B.dsp)
  • with those two lines commented out, the pdf includes dup 213 /summationdisplay.1 put twice but does not include /CharSet (/radicalBigg/radicalbig/radicalbigg/summationdisplay.1/uni222B.dsp).

Erroneous inference:

From the above, I'm tempted to infer that I will need to include \pdfglyphtounicode commands in the source for glyph names occurring in pdf file lines of the GREP form

dup [0-9]+ /\S+ put

But that surely is not correct! After all, the pdf file includes many such lines, for example:

dup 149 /period put
dup 48 /u1D44E put
dup 49 /u1D44F put

dup 150 /comma put dup 56 /u1D456 put dup 58 /u1D458 put

dup 115 /radicalBigg put dup 112 /radicalbig put dup 114 /radicalbigg put dup 213 /summationdisplay.1 put dup 185 /uni222B.dsp put

dup 61 /equal put dup 8 /uni03A6 put

dup 33 /arrowright put dup 49 /infinity put dup 0 /minus put dup 184 /plus put dup 6 /plusminus put dup 112 /radical put

dup 33 /A put dup 34 /B put dup 40 /H put dup 41 /I put dup 42 /J put dup 43 /K put dup 50 /R put dup 52 /T put dup 65 /a put dup 66 /b put dup 67 /c put dup 12 /comma put dup 68 /d put dup 69 /e put dup 1 /exclam put dup 70 /f put dup 20 /four put dup 71 /g put dup 72 /h put

The MWE:

\documentclass{article}

% To examine pdf as pure ASCII: \pdfcompresslevel=0 \pdfobjcompresslevel=0

\usepackage{hyperxmp} \RequirePackage[type={CC},modifier={by-nc-nd},version={4.0},lang={english}]{doclicense} \usepackage[pdfa]{hyperref} \hypersetup{ pdfapart=2, pdfaconformance=u, bookmarksnumbered, pdftitle={A Book}, pdfauthor={Anonymous}, pdfcreator={somebody}, pdfsubject={A general introducton to things}, pdfkeywords={things, stuff}, pdflicenseurl={http://creativecommons.org/licenses/by-nc-nd/4.0/} }% \input{glyphtounicode} \pdfgentounicode=1 \pdfglyphtounicode{EM}{0058 0058 0058 0058 0058 0058 0058 0058}% \pdfglyphtounicode{NUL}{0060 0060 0060 0060 0060 0060 0060 0060}% \pdfglyphtounicode{uni222B.dsp}{222B}% \pdfglyphtounicode{summationdisplay.1}{0060 0060 0060 0060 0060 0060 0060 0060}% \pdfglyphtounicode{summationdisplay}{0060 0060 0060 0060 0060 0060 0060 0060}%
\pdfglyphtounicode{radicalBigg}{0060 0060 0060 0060 0060 0060 0060 0060}% \pdfglyphtounicode{radicalbig}{0060 0060 0060 0060 0060 0060 0060 0060}% \pdfglyphtounicode{radicalbigg}{0060 0060 0060 0060 0060 0060 0060 0060}% \immediate\pdfobj stream attr{/N 3} file{sRGB.icc} \pdfcatalog{% /OutputIntents [ << /Type /OutputIntent /S /GTS_PDFA1 /DestOutputProfile \the\pdflastobj\space 0 R /OutputConditionIdentifier (sRGB) /Info (sRGB) >> ] }

\newcommand\mytitle{A Book} \newcommand\myauthor{Anonymous} \newcommand\myabstract{An introduction to things in general.} \newcommand\mydate{\today} \title{\mytitle} \author{\myauthor} \date{\mydate}

\usepackage{newtxtext,newtxmath} \usepackage[french,ngerman,russian,main=english]{babel}

\usepackage{blindtext}

\begin{document} abc abc \maketitle \blindmathpaper \end{document}

Commnent:

The source here is just for experimentation. My real, book-length, document has a pdf ascii output file over a quarter-million lines long! Which is why I need help in what to look for (and what to do about it).

Related:

https://tex.stackexchange.com/a/551291/13492

How to find the proper glyph name required by \pdfglyphtounicode

How are the glyph (character) names in PDF-files determined?

How to fix missing or incorrect mappings from glyphtounicode.tex

murray
  • 7,944
  • well basically you have to find the names that aren't yet in glyphtounicode.txt or detected automatically). Glyph names like /A or /equal are standard. /uXXXX and /uniXXXX are detected automatically. – Ulrike Fischer Jun 29 '20 at 20:43
  • @UlrikeFischer: OK, that's a big help! (I dare to ask whether trying to make a math-ridden document pdf-compliant is a fool's errand!) – murray Jun 29 '20 at 20:46
  • well it would be easier if you would use unicode math fonts. The math times fonts are old and so it is not really surprising that they don't use unicode glyph names everywhere. – Ulrike Fischer Jun 29 '20 at 21:00
  • Does newtxmath qualify as "unicode math fonts"? – murray Jun 29 '20 at 22:12
  • no, you would use an unicode font with lualatex or xelatex and unicode-math. – Ulrike Fischer Jun 29 '20 at 22:19
  • By "standard", do you mean in glpyhtounicode.txt? For example, are any of /infinity, /minus, /plus, /plusinus, /radical "standard"? – murray Jul 01 '20 at 12:55
  • yes everything in glyphtounicode.tex is okay already. – Ulrike Fischer Jul 01 '20 at 12:58
  • @Ulrike Fischer: Perhaps make your comments an answer. – murray Jul 01 '20 at 13:12
  • Starting with package version 1.60, the docs for newtx say "newtx is now able to ouput a PDF/A-1b complianat pdf uding pdflatex." And it says to do this by including \pdfcompresslevel=0 \pdfgentounicode=1 \input glyphtounicode.tex \usepackage{pdfx} \InputIfFileExists{glyphtounicode-cmr.tex}{}{} \InputIfFileExists{glyphtounicode-ntx.tex}{}{} in the preamble, before loading newtx (or, equivalently, newtxtext and newtxmath). – murray Jan 28 '22 at 22:25
  • with a current latex you don't need to load glyphtounicode and activate \pdfgentounicode. LaTeX does it by default. And I have no idea why newtx advise to uncompress the pdf, that is useful for debugging but not required by any standard. – Ulrike Fischer Jan 28 '22 at 22:34
  • @UlrikeFischer: Just to be sure -- your comments apply when using pdflatex (not xelatex or lualatex) with newtx? – murray Jan 31 '22 at 16:47

0 Answers0