5

I have a preprocessor that generates and compiles some input into LaTeX and then into a PDF.

I'd like to have some regression tests so that I can tell if I've accidentally broken something in the code.

What's the best way to, from the command line, check if two latex-generated files are the same? (I could check all the sources, but there are a few hundred of them so I'd like a faster solution.

I've been trying to use pythons filecmp.cmp method - but that always finds a freshly generated file nonidentical to one generated seconds before - I'm assuming that's because there is a timestamp encoded somewhere in the pdf...

AboAmmar
  • 46,352
  • 4
  • 58
  • 127
Joe
  • 479
  • 1
    The thing is they are not identical for precisely the reason you state. Given that fact, you will have 100% non-matches from any tool which tests whether 2 of them are identical. – cfr Aug 24 '15 at 22:24
  • 1
    http://superuser.com/questions/46123/how-to-compare-the-differences-between-two-pdf-files-on-windows – SLx64 Aug 24 '15 at 22:52
  • http://www.ubuntugeek.com/diffpdf-compare-two-pdf-files-textually-or-visually.html – alfC Aug 25 '15 at 00:14
  • Not sure if it's relevant to your case, but the LaTeX team do regression testing based on the TeX output (rather than the PDF itself): http://latex-project.org/papers/tb111mitt-l3build.pdf – Joseph Wright Aug 26 '15 at 09:45
  • Hi @JosephWright - it turns out that that is pretty much what I've ended up doing - it's nice to know that's what the main team do though! – Joe Aug 26 '15 at 09:47
  • 1
    https://tex.stackexchange.com/q/229605/107497 shows how to get a reproducible build. But I'm not sure if it will still give the same pdf if you change how the input data or how macros are defined. – Teepeemm Jan 29 '22 at 21:46

3 Answers3

7

The LaTeX team have a regression suite for LaTeX itself which is also available to others from CTAN as l3build. This is based not on doing binary comparison of PDF files but rather on comparing results at the 'TeX end' using the .log file. This can be done by deliberately logging programming results ('Does this macro produce the correct tokens?') or by boxing up typeset material and examining the full content of the box. The latter method can be used with full pages (by setting \showoutput) and so will pick up any change in anything sent by TeX to the PDF. Importantly, this does not include 'variable' data such as the time that the run was made or the engine version. The l3build script is also designed to normalise out a variety of 'acceptable' variations between systems/engines so that tests can focus on the important changes.

Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036
1

Have you tried DiffPDF (http://www.qtrac.eu/diffpdf.html) ? Works on Windows and macOSX machines. It is far from perfect, but there are cases where it is really helful.

Pascal

0

It turns out that what I wanted to do was use latexpand to put all the sources into one file and then compare the two tex files. Exactly what I needed :)

Joe
  • 479