28

Consider this sample code:

\documentclass{article}
\usepackage{lipsum}

\begin{document}
\lipsum[1-4]
\end{document}

When I compile this with latex and then use dvipdfm, the output file is 7893 bytes. When I use pdflatex, the output PDF is a whopping 20696 bytes. Naturally, the outputs are visually indistinguishable one from another.

Why does this happen? What does pdflatex put in there that takes so much space?

For reference, I have used the latest MikTeX 2.9 on Windows 7, and ran the commands without any extra switches.

Martin Scharrer
  • 262,582
  • 4
    pdflatex generates "verbose" PDFs; the results are probably similar if you (lossless) compress the PDF afterwards as described in http://tex.stackexchange.com/q/18987/3751. – Daniel Dec 13 '11 at 15:28
  • @Daniel: Yeah, when I ran ghostscript on the PDF from pdflatex, I got pretty much the same size as the other one. Still, that doesn't answer the why part of the question. – Martin Tapankov Dec 13 '11 at 15:35
  • 2
    Could it be because dvipdfm processes an already processed version of a file (.dvi), while pdflatex processes a file from scratch (.tex). As such, there could be additional (unused) information contained within when processing it with the latter, while the former could selectively include only "the necessary components." – Werner Dec 13 '11 at 16:52
  • 4
    pdflatex embeds the font in type 1 format and dvipdfmx embeds the font in type 1c format. I think the latter compresses better, but I cannot find a reference to back the claim up. Use pdffonts <pdf-file> to see the difference in embedded fonttypes. – Martin Heller Dec 14 '11 at 21:42

1 Answers1

25

Martin Heller has stated the correct answer: dvipdfm uses a different font format than pdftex. You can look into the PDF file by loading it into a text editor. Sometimes (well, often), the objects are compressed and you only see some data. So you either need a decompression algorithm built into your head, or use a tool like qpdf to uncompress the objects (that is what I do):

qpdf --qdf --object-streams=disable test-pdflatex.pdf test-pdflatex-long.pdf

Now the output file is much more readable and you can now compare the output of dvipdfm and pdftex. I don't know if this applies to all cases, but in this example you can take a look at the font object:

% dvipdfm:
9 0 obj
<<
  /FontFile3 11 0 R
  /Ascent 694
  /CapHeight 683
  /Descent -194
  /Flags 6
  /FontBBox [-40 -250 1009 750 ]
  /FontName /DJLCQW+CMR10
  /ItalicAngle 0
  /StemV 69
  /Type /FontDescriptor
>>
endobj

and

% pfdtex
9 0 obj
<<
  /FontFile 11 0 R
  /Ascent 694
  /CapHeight 683
  /CharSet (/A/C/D/E/I/L/M/N/P/Q/S/U/V/a/b/c/comma/d/e/f/g/h/hyphen/i/j/l/m/n/o/one/p/period/q/r/s/t/u/v/w/y)
  /Descent -194
  /Flags 4
  /FontBBox [ -40 -250 1009 750 ]
  /FontName /QJZLYL+CMR10
  /ItalicAngle 0
  /StemV 69
  /Type /FontDescriptor
  /XHeight 431
>>
endobj

Both have different entries referring to the font file (/FontFile3 and /FontFile). According to the table 126 "Embedded font organization for various font types" in the PDF specification, the entry /FontFile refers to a Type1 font program and /FontFile3 to whatever the subtype in the referred stream is. So we need to take a look at object #11 in the dvipdfm file:

11 0 obj
<<
  /Subtype /Type1C
  /Length 12 0 R
>>
stream
....
endstream
endobj

So it is Type1C, which is according to the same table in the PDF spec: "Type 1–equivalent font program represented in the Compact Font Format (CFF), as described in Adobe Technical Note #5176, The Compact Font Format Specification."

To find out what the secret of CFF is, a look at the introduction of "The Compact Font Format Specification" suffices:

Principal space savings are a result of using a compact binary representation for most of the information, sharing of common data between fonts, and defaulting frequently occurring data.

topskip
  • 37,020