8

I'm striking with an interesting problem. I created a pdf file long time ago. It can be downloaded here. Now just to recall how it was generated I decided to repeat the process and compare pdf files. I'm getting close, but I can't make the old and new pdf files look the same. I remember that I did everything in a usual way, I just changed the margins.

Why the pdf file can't be reproduced?

This is the process to generate new pdf file:

1) Get the latest cweb sources from ftp://ftp.cs.stanford.edu/pub/cweb/cweb-3.64ah.tgz

2) In cwebmac.tex change {NOS} fith to {NOS} fitb manually or with this command

perl -i -pe 's/{NOS} fith/{NOS} fitb/' cwebmac.tex

3) Add the following to the end of cwebmac.tex

\let\Blue=\Black
\hoffset=1.52400970458984374999999999999cm
\pageshift=2in
\advance\pageshift by-\hoffset
\advance\hoffset by-1in
\advance\pageshift by-1in

4) Build cweave

touch *.c
make

5) Run cweave on cweave.w

./cweave cweave.w

6) Generate the pdf file:

SOURCE_DATE_EPOCH=1460880679 pdftex cweave.tex

7) Now we compare old pdf with new pdf. For this we must uncompress objects in pdf files.

qpdf --qdf --object-streams=disable cweave.pdf cweave-long.pdf
qpdf --qdf --object-streams=disable cweave-old.pdf cweave-old-long.pdf
diff -u cweave-old-long.pdf cweave-long.pdf

We see in the diff that in new pdf a lot of values are less by exactly 0.001 than in old pdf. But I can't make this 0.001 disappear. If I set \hoffset to 1.52400970458984375, the values in new pdf is by 0.001 greater than in old pdf. And if I set \hoffset to 52400970458984374999999999999, the values are by 0.001 less in new pdf than in old pdf. I'm completely puzzled by this. Also, I remember to have set \hoffset to something simple, like 1.5cm, not this value which I constructed empirically by repeatedly comparing the diff.

Also, some hyphenation is changed. For example, the following is different in old and new pdf files:

-/F13 9.9626 Tf 125.8 495.045 Td [(i)]TJ/F3 7.9701 Tf 13.837 0 Td [(Used)-354(in)-354(secti)-1(o)1(n)]TJ
+/F13 9.9626 Tf 125.799 495.045 Td [(i)]TJ/F3 7.9701 Tf 13.837 0 Td [(Used)-354(in)-354(se)-1(ction)]TJ

i.e., how is

[(Used)-354(in)-354(secti)-1(o)1(n)]TJ

different from

[(Used)-354(in)-354(se)-1(ction)]TJ

? But, more importantly, why it is different? What this pdf code means, anyway?

Why pdftex can't reproduce the pdf file?

This is the link to download the diff file: https://www.dropbox.com/s/lvbijcn2689cuye/cweave.diff?dl=1

Igor Liferenko
  • 7,063
  • 2
  • 14
  • 47
  • 2
    What is the purpose of step 2? How does it affect the comparison if you leave it out? – Henri Menke Nov 09 '16 at 10:15
  • @HenriMenke this change brings cwebmac.tex to the state in which it was when old pdf file was created, in order to remove irrelevant information from the diff of old and new pdf files – Igor Liferenko Nov 10 '16 at 02:50
  • dropbox requires to register for download, 2) does the old pdf info dict have information on how it was produced (pdftex or ps2pdf or...), which version of pdftex ? cm-super ?
  • –  Nov 10 '16 at 07:38
  • @jfbu No need to register - just click "No thanks, continue to view" at the bottom of registration form. BTW, which site do you use to store files for download? – Igor Liferenko Nov 10 '16 at 07:42
  • I can view it but I click download it does ask again to register 2) my internet provider allows me to create a home page and if I need to share a file I upload it there and give people the link. I don't use and don't like social networks or cloud business.
  • –  Nov 10 '16 at 07:43
  • @jfbu the cweb-3.64ah.tgz link does not work for some reason. Use git clone https://github.com/ascherer/cweb – Igor Liferenko Nov 10 '16 at 07:44
  • correction: you are right that there is at bottom a "no thanks, I wish to go ahead with download" (translated from French). Sorry for noise. Sorry also because I am off to work now, hence that was double noise. –  Nov 10 '16 at 07:45
  • @jfbu the file was created with default pdfTeX version 1.40.16 and CM-fonts – Igor Liferenko Nov 10 '16 at 09:20
  • The thing you added to the end of cwebmac.tex (the \advance etc.) seems quite arbitrary; is it something "standard" or something you came up with? What is its purpose? Also, what is the reason to set \hoffset and \pageshift by giving them initial values and modifying them immediately, rather than directly giving them the desired values? – ShreevatsaR Nov 10 '16 at 16:40
  • In other words, the effect of that code is (logically) to set \hoffset to 1.52400970458984374999999999999cm - 1in and \pageshift to 2in - 1.52400970458984374999999999999cm - 1in. This will correspond to some number of "points" or rather "scaled points", in TeX. (Note that there may be conversion issues, e.g. 2.54cm and 1in may not be exactly equal.) You may have better luck just setting those two values on two different lines (possibly directly as an integer number of "sp" units), rather than trying to affect them indirectly via multiple commands. (But my earlier questions remain.) – ShreevatsaR Nov 10 '16 at 17:15
  • @ShreevatsaR TeX considers dimension unit as scale factors but indeed 2.54cm and 1in do not give the exact same number of sp units. But if you multiply by 100, i.e. 254cm vs 100in you have exact identity. –  Nov 10 '16 at 17:24
  • @jfbu Yes that's what I was getting at. The effect of the OP's commands is to set \hoffset=-1894511sp \pageshift=1894512sp in one case, and \hoffset=-1894482sp \pageshift=1894483sp in the other. Somewhere in these ranges there may be pairs of values which will produce the results the OP wants, and if we know what those (ranges of) acceptable values are, we may have better luck reverse-engineering what are the likely "natural" units which may have led to those values. – ShreevatsaR Nov 10 '16 at 17:54
  • @ShreevatsaR agreed. See my answer (0.6in) –  Nov 10 '16 at 17:57