0

The following document, which uses hyperref, illustrates two ways of embedding metadata into the output from pdflatex:

  1. Using the pdfx package with option a-2u — this happens when the flag validate is set to true, as it is below.
  2. Using instead the hyperxmp package with \hyperrefset including options pdfapart=2, pdfaconformance=u — this happens if the flag validate is set to 'false`.

Both methods seem to embed essentially the same metadata about title, author, etc., into the pdf.

The pdf output file passes PDF/A-2U validation (with the veraPDF app, e.g.) with method 1. However, it fails validation with method 2. (See an extract from the veraPDF report, at the end.)

Question: What can be done to allow validation to succeed with method 2?

Source:

\RequirePackage{filecontents}

\begin{filecontents}{\jobname metadata.xmp} \hyxmp@at@end{% % Create XMP code and write it to macro \hyxmp@xml % (cf. hyperxmp.sty, \hyxmp@construct@packet (ll. 847-868)) \gdef\hyxmp@xml{}% \hyxmp@add@to@xml{% <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-702">^^J% ___<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns\hyxmp@hash">^^J% }% \hyxmp@pdf@schema \hyxmp@xmpRights@schema \hyxmp@dc@schema \hyxmp@photoshop@schema \hyxmp@photometa@schema \hyxmp@xmp@basic@schema \hyxmp@mm@schema \hyxmp@add@to@xml{% ___</rdf:RDF>^^J% </x:xmpmeta>^^J% }% } \end{filecontents}

\begin{filecontents}{\jobname.xmpdata} \Title{A Book} \Author{Anonymous} \Language{en-US} \Keywords{things\sep stuff} \Subject{matters} \Publisher{Anonymous} \Copyright{\copyright 2020 The Company CC-BY-NC-ND} \CopyrightURL{https://creativecommons.org/licenses/by-nc-nd/4.0/} \PublicationType{book} \Lastpage{3} \Date{2020-06-26} \CoverDisplayDate{June\ 26,\ 2020} \CoverDate{2020-06-26} \end{filecontents}

\documentclass{book}

\usepackage{ifthen} \newboolean{validate} \setboolean{validate}{true}

\ifthenelse{\boolean{validate}}% {\RequirePackage[a-2u]{pdfx}% \pdfglyphtounicode{EM}{0058 0058 0058 0058 0058 0058 0058 0058}% \pdfglyphtounicode{NUL}{0060 0060 0060 0060 0060 0060 0060 0060}% \RequirePackage[type={CC},modifier={by-nc-nd},version={4.0},lang={english}]{doclicense} \usepackage{hyperref} \hypersetup{ pdfa, bookmarksnumbered, pdftitle={A Book}, pdfauthor={Anonymous}, pdfcreator={somebody}, pdfsubject={A general introducton to things}, pdfkeywords={things, stuff}, } %
}% {\RequirePackage{hyperxmp} % to add CC info into pdf \RequirePackage[type={CC},modifier={by-nc-nd},version={4.0},lang={english}]{doclicense} \usepackage[pdfa]{hyperref} \hypersetup{ pdfapart=2, pdfaconformance=u, bookmarksnumbered, pdftitle={A Book}, pdfauthor={Anonymous}, pdfcreator={somebody}, pdfsubject={A general introducton to things}, pdfkeywords={things, stuff}, pdflicenseurl={http://creativecommons.org/licenses/by-nc-nd/4.0/} }% \immediate\pdfobj stream attr{/N 3} file{sRGB.icc} \pdfcatalog{% /OutputIntents [ << /Type /OutputIntent /S /GTS_PDFA2 /DestOutputProfile \the\pdflastobj\space 0 R /OutputConditionIdentifier (sRGB) /Info (sRGB) >> ] } }

\newcommand\mytitle{A Book} \newcommand\myauthor{Anonymous} \newcommand\myabstract{An introduction to things in general.} \newcommand\mydate{\today} \title{\mytitle} \author{\myauthor} \date{\mydate}

\usepackage{newtxtext,newtxmath} \usepackage[french,ngerman,russian,main=english]{babel}

\usepackage{blindtext}

\begin{document} \maketitle \blindmathpaper \end{document}

Validation failure report for method 2:

<rule specification="ISO 19005-2:2011" clause="6.2.4.3" testNumber="4" status="failed" passedChecks="0" failedChecks="230">
   <description>DeviceGray shall only be used if a device independent DefaultGray colour space has been set when the DeviceGray colour space is used,
    or if a PDF/A OutputIntent is present.</description>
   <object>PDDeviceGray</object>
   <test>gOutputCS != null</test>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[170]/fillCS[0]</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[168]/fillCS[0]</context>
   </check>
   .... [many more similar failures] 
</rule>
<rule specification="ISO 19005-2:2011" clause="6.2.11.7" testNumber="1" status="failed" passedChecks="0" failedChecks="27">
   <description>The Font dictionary of all fonts shall define the map of all used character codes to Unicode values, either via a ToUnicode entry,
    or other mechanisms as defined in ISO 19005-2, 6.2.11.7.2.</description>
   <object>Glyph</object>
   <test>toUnicode != null</test>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[150]/usedGlyphs[1](YKMMCJ+NewTXMI 67 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[150]/usedGlyphs[0](YKMMCJ+NewTXMI 109 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[135]/usedGlyphs[0](PGXZDZ+txmiaX 8 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[123]/usedGlyphs[0](YKMMCJ+NewTXMI 50 0  0)</context>
   </check>
   .... [more like the above] 
   <check status="failed">
     <context>root/document[0]/pages[1](12 0 obj PDPage)/contentStream[0](13 0 obj PDContentStream)/operators[601]/usedGlyphs[0](GEXQPI+txsys 0 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[1](12 0 obj PDPage)/contentStream[0](13 0 obj PDContentStream)/operators[595]/usedGlyphs[0](YKMMCJ+NewTXMI 63 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[1](12 0 obj PDPage)/contentStream[0](13 0 obj PDContentStream)/operators[580]/usedGlyphs[0](UOCZFW+txexs 112 0  0)</context>
   </check>
   .... [more like the above]
</rule>

Related:

pdfx + hyperref prevents setting PDF metadata [incompatible packages]

Is it possible to use both hyperxmp and xmpincl in LaTeX?

Added 2020-06-29:

I've continued in How to find glyphs in pdf requiring \pdfglyphtounicode to allow validation? the discussion begun in the comments to the answer https://tex.stackexchange.com/a/551291/13492.

murray
  • 7,944

1 Answers1

3

The following validates. The main error was that the subtype of the outputintent should be /GTS_PDFA1, /GTS_PDFA2 doesn't exist according the pdf reference.

Beside this some glyph of your fonts hadn't and unicode representation, I added dummy meanings. I didn't try to include the additional xmp data.

\documentclass{article}

\usepackage{hyperxmp} \RequirePackage[type={CC},modifier={by-nc-nd},version={4.0},lang={english}]{doclicense} \usepackage[pdfa]{hyperref} \hypersetup{ pdfapart=2, pdfaconformance=u, bookmarksnumbered, pdftitle={A Book}, pdfauthor={Anonymous}, pdfcreator={somebody}, pdfsubject={A general introducton to things}, pdfkeywords={things, stuff}, pdflicenseurl={http://creativecommons.org/licenses/by-nc-nd/4.0/} }% \input{glyphtounicode} \pdfgentounicode=1 \pdfglyphtounicode{EM}{0058 0058 0058 0058 0058 0058 0058 0058}% \pdfglyphtounicode{NUL}{0060 0060 0060 0060 0060 0060 0060 0060}% \pdfglyphtounicode{uni222B.dsp}{222B}% \pdfglyphtounicode{summationdisplay.1}{0060 0060 0060 0060 0060 0060 0060 0060}% \pdfglyphtounicode{summationdisplay}{0060 0060 0060 0060 0060 0060 0060 0060}%
\pdfglyphtounicode{radicalBigg}{0060 0060 0060 0060 0060 0060 0060 0060}% \pdfglyphtounicode{radicalbig}{0060 0060 0060 0060 0060 0060 0060 0060}% \pdfglyphtounicode{radicalbigg}{0060 0060 0060 0060 0060 0060 0060 0060}% \immediate\pdfobj stream attr{/N 3} file{sRGB.icc} \pdfcatalog{% /OutputIntents [ << /Type /OutputIntent /S /GTS_PDFA1 /DestOutputProfile \the\pdflastobj\space 0 R /OutputConditionIdentifier (sRGB) /Info (sRGB) >> ] }

\newcommand\mytitle{A Book} \newcommand\myauthor{Anonymous} \newcommand\myabstract{An introduction to things in general.} \newcommand\mydate{\today} \title{\mytitle} \author{\myauthor} \date{\mydate}

\usepackage{newtxtext,newtxmath} \usepackage[french,ngerman,russian,main=english]{babel}

\usepackage{blindtext}

\begin{document} abc abc \maketitle \blindmathpaper \end{document}

Ulrike Fischer
  • 327,261
  • What is the "pdf reference" to which you refer? – murray Jun 27 '20 at 20:59
  • https://www.iso.org/standard/63534.html – Ulrike Fischer Jun 27 '20 at 21:00
  • ouch: the iso standards doc for PDF costs more than USD 200! How else might I determine which additional \pdfglyphtounicode{...}{...} commands I need to satisfy PDF validation. (In my real, book-length, document, I'm still getting many "The Font dictionary of all fonts shall define the map of all used character codes to Unicode values, either via a ToUnicode entry: rule failures referrring to glyphs in the MathTime Pro 2 fonts, e.g., in MT2MIT, MT2SYT, MT2SYS, MT2EXA, MTBMIT, MT2EXA, MT2SYF, MT2BYST fonts. – murray Jun 27 '20 at 21:50
  • The glyphs are not in the reference. Their names I found by looking into the uncompressed pdf. – Ulrike Fischer Jun 27 '20 at 23:08
  • What is meant by "the uncompressed pdf"? Is that the pdf that results from running pdflatex, or is it something else? – murray Jun 28 '20 at 14:29
  • you can set \pdfcompresslevel=0 and \pdfobjcompresslevel=0. Then the pdf is ascii and you can open it in some editor and look at the content. (I normally use the expl3 variant \RequirePackage{l3pdf}\ExplSyntaxOn\pdf_uncompress:\ExplSyntaxOff as it works with other engines too). – Ulrike Fischer Jun 28 '20 at 14:54
  • there is also an older version of the pdf reference around (for pdf until 1.7) which is free. Google for it (but be warned it has also around 1000 pages). – Ulrike Fischer Jun 28 '20 at 14:55
  • OK, now I can examine the uncompressed, ascii, pdf output. But what do I need to look for in order to determine what additional \pdfglyphtounicode commands I need for the file to pass pdf validation? (I need some hint in order to get started!) – murray Jun 29 '20 at 14:37
  • well look at the example above and search e.g. for summationdisplay.1. Then you can try to guess how entries for other fonts look like. – Ulrike Fischer Jun 29 '20 at 14:39
  • I've continued this discussion at: https://tex.stackexchange.com/questions/551625/how-to-find-glyphs-in-pdf-requiring-pdfglyphtounicode-to-allow-validation – murray Jun 29 '20 at 20:37