6

One of my biggest problems with LaTeX is the speed that it takes to process large documents. (I typeset books with LaTeX.) I have approaches, such as breaking the book into chapters and running them independently. But the results are not satisfactory.

For example, the book I'm currently working on reports this after I run LaTeX:

Latexmk: All targets () are up-to-date
no errors
make  74.57s user 1.92s system 99% cpu 1:17.05 total

My computer has six cores! LaTeX uses one.

EDIT

Details of my system:

  • MacMini (2018) 3 Ghz 6-Core Intel Core i5 Processor, 32 GB 2667 MHz DDR4 RAM
  • 41 .tex input files, total of 11,000 lines of LaTeX source, over 100,000 words of text.
  • Moved to xelatex due to Unicode issues, but pdflatex took roughly the same period of time.
  • 69 included packages
  • Book typesets to 326 pages currently, will typeset to 500 at completion of project.
  • 72 images, most in the neighborhood of 20-50k.
  • Output logfile is 5611 lines long (!)
  • Still using BiBTeX because biber breaks, and I can't debug it. But using biblatex, at least.
  • Compiling with latexmk
  • Multiple targets in Makefile, including targets to just typeset each chapter. Typesetting a single chapter can be done in 15.97 seconds

LaTeX experts will say that every page depends on everything that comes before it because LaTeX is Turing Complete. And, of course, every page also depends on every page that comes after it, because of tempfiles. Frankly, it's amazing that LaTeX converges when typesetting!

However, there are well-known tricks that could be used to solve this problem. For example, pages could checkpoint relevant state, and then a new compile could use a multi-threaded implementation and run each page using the previous run's checkpoint, and re-run the pages if the checkpoint changes. Something similar could be used to speed even single-threaded runs: if we're compiling page 265, and the state at the beginning of page 265 is the same this run as the previous run, and no text in the document has changed between the start of page 265 and the start of page 266, then nothing on page 266 probably changed.

It seems that optimizations like these could make LaTeX dramatically faster. So why don't we see them in LaTeX2e, and will they be in LaTeX3?

vy32
  • 4,780
  • LaTeX3 runs on top of LaTeX2 which runs on top of TeX. Anyone can write a new TeX engine, but you have to pass a test to call it TeX. – John Kormylo Oct 10 '20 at 01:30
  • So we really need a new TeX engine? – vy32 Oct 10 '20 at 01:42
  • To do what you want, yes. – John Kormylo Oct 10 '20 at 01:43
  • 4
    It is likely for LaTeX to benefit from concurrent implementations. But for an open-source project like this, which has accumulated decades of legacy code, the motivation to realize such an implementation is low, and the difficulty is high. Splitting up the document can greatly reduce compilation time. If done correctly, it should not change the layout of the document by much. Another factor I find out is that LaTeX distributions on Windows typically run much slower than their Linux counterparts. If you are using Windows, maybe you should try pdflatex on Linux instead. – Alan Xiang Oct 10 '20 at 03:07
  • Latex is written in tex, so asking us how to make tex go faster is the wrong question. – David Carlisle Oct 10 '20 at 08:12
  • 9
    If you are writing a book do you really need to compile the whole thing each time? Just work on the chapter you are working on: that typically speeds things up a lot and compared to the time spent writing , a few seconds to remake a chapter is rarely an issue. Speed is an issue in other workflows eg if you are using tex to typeset 100s of reports or invoices from some database, saving a few milliseconds each run adds up. But in human oriented document authoring less so. – David Carlisle Oct 10 '20 at 08:19
  • Another way to speed up compilation is to specify the document class option draft. – Mico Oct 10 '20 at 09:10
  • @AlanXiang I provided more details of my system. – vy32 Oct 10 '20 at 17:53
  • 2
    Back in 1991 when I had my first PC, I was running a 32-bit version of LaTeX on a machine with 1Meg of RAM. With all the memory churn, it took about an hour to compile a single chapter of the class notes for the LaTeX class I was teaching. There was no multitasking of course, so I got a lot of recreational reading done although I had to keep an eye on the computer lest there be an error. – Don Hosek Oct 10 '20 at 19:29
  • And there are potential non-converging scenarios where a changing value from a pageref could potentially create a cycle where when the value is N the reference moves to page N±1 and then when the value comes out as N±1 the reference moves back to page N. – Don Hosek Oct 10 '20 at 19:31
  • See https://tex.stackexchange.com/a/8792/5763 – Martin Schröder Oct 11 '20 at 11:22
  • 1
    Trim down the number of included packages - 69 sounds excessive. – Martin Schröder Oct 11 '20 at 11:23
  • In order to get the old pdflatex faster have a look at https://tex.stackexchange.com/questions/79493/ultrafast-pdflatex-with-precompiling – Jonas Stein Oct 12 '20 at 17:36
  • Do you have an example document (maybe with text changed)? I think 75 seconds for a book is ridiculous. The slowness is not from TeX itself: on CTAN (and distributed with TeX Live) you'll find a book gentle.tex. It's a single plain-TeX source file of 5397 lines, and typesets to 97 pages. On my laptop running pdftex gentle.tex or luatex gentle.tex takes 0.5 seconds. Now, your file has less than 4 times the number of lines or pages, so imagine a compilation time of about 2 seconds, not 75. Of course, gentle.tex doesn't use LaTeX or packages or BibTeX or nontrivial figures… – ShreevatsaR Oct 16 '20 at 19:32
  • I have the book itself! The 69 seconds includes multiple runs to handle the index and glossary and the bibliography as dictated by latexmk. If you wish, I can post the log file, and I'm happy to even give you access to the GitHub repo where I have the book if you want. I'd love to have another set of eyes on it. – vy32 Oct 17 '20 at 11:04
  • 1
    @vy32 Sure, my github username is "shreevatsa" and I'd like to take a look. But probably one of the recommendations would be to not have multiple runs each time, and just accept out-of-date references (or an incomplete index) in draft mode... – ShreevatsaR Oct 18 '20 at 07:55
  • 2
    BTW you can run latexmk with the -time option (and optionally -silent) to see output at the end about what took how long. In this case, it seems that the bulk of the 70-odd seconds is taken by four calls to xelatex each taking about 17 seconds. (Interspersed with calls to makeindex and bibtex.) So if you don't need the index and all references each time, you can just run xelatex and save about three-fourths of the time. (The “full” version can still be run in the background while you're reading the document I guess.) – ShreevatsaR Oct 19 '20 at 04:48
  • Thank you so much, @ShreevatsaR. That's great to know. I'll add additional targets to my Makefile. – vy32 Oct 19 '20 at 11:52
  • 1
    I still hope to look into this example more deeply, but also, as you probably know (going by your \qonly that you asked another question about) you can use the \includeonly mechanism to compile just one chapter. In this case (and using just xelatex directly for that one chapter), compiling just one chapter would finish in about 3 to 7 seconds. Note that to make this work, you should not include your glossary.tex with \include{glossary} but with \input{glossary}; see here for example. – ShreevatsaR Oct 21 '20 at 09:57
  • Thank you so much for the comments. I discovered the problem with glossary but hadn't figured out the fix. Thanks! – vy32 Oct 22 '20 at 11:28
  • Sorry I got distracted by other things and probably can't return to this, but to recap what I had learned so far (the first two and last one mentioned above already): (1) Using latexmk -time shows where time is going, (2) Can start viewing the PDF (with stale references) after running just the first xelatex (or lualatex), (3) Removing \usepackage[english]{babel} (if it's not needed) makes it 15% faster, (4) Most of the time is spent expanding/defining macros, (5) Giving all the reference commands dummy definitions makes it another 30% faster, (6) Compiling just one chapter can be fast. – ShreevatsaR Oct 26 '20 at 10:26
  • Thanks so much for the comments! I do appreciate them. With your help, I was able to get the compile-single-chapter to work. I am also reworking the included images so that very-low-resolution PDFs are included by default. (since it appears that PDFs are faster to include than JPGs). I wasn't aware of latexmk -time but will use it. I'm not sure what you mean by 'Giving all the reference commands dummy definitions' Can you elaborate on that? – vy32 Oct 26 '20 at 13:30

2 Answers2

20

The LaTeX project does spend a lot of time making sure that latex goes as fast as possible but none of the things you suggest are relevant to latex code; you are suggesting changes to the tex language in which latex is written.

As you can see if you look through github issues, a lot of thought goes into optimising the core expl3 programming constructs, whether it's quicker to have multiple \expandafter or \fi-delimited argument or to use \expanded or whatever in each case.

Also LaTeX releases this year have preloaded two largish packages into the format, expl3 (in February) and xparse (in October) this can make a fairly noticeable improvement in startup time as locating package files and reading the data off the filesystem can take significantly longer than processing the tex code within the file.

Note you can build a custom format pre-loading the packages you use which can also speed up start up time a lot.

The kind of check pointing that you mention is asking about the underlying tex system so not addressable within LaTeX. It is the same as other programming languages. A Web page author can avoid inefficient JavaScript in their page to make the page load faster, they can not re-write the JavaScript engine in all possible browsers in which that code may run, which is the equivalent of what you are asking here.

The actual checkpointing is hard as page breaking is asynchronous, it is feasible at forced page breaks from \clearpage which is exactly what the LaTeX \include system does, it saves the state of all LaTeX counters at that point so if on the next run you skip chapters 1-3, the page numbering is preserved and the draft document starts with chapter 4. But to do that automatically and to save more state, such as the definitions of all macros, not just the values of all counters, would require changes to the tex system not to latex.

You mention that later changes can affect earlier ones due to auxiliary files, but that is actually the easier case, just consider a long paragraph that spans over two or more pages. Adding a comma in the last line can change the line breaking of the entire paragraph so changing earlier pages without any auxiliary files being involved.

Many tex systems these days are fast enough that latex is set up to run continuously in the background as the file is edited and update the display whenever the pdf is successfully remade, if your build is slow you should look to your build system, are you including high resolution images or re-setting complicated tikz on every run? If you arrange to save these things to more occasional "full" builds you can usually get things to run at a reasonable speed.

David Carlisle
  • 757,742
  • Thank you so much for the detailed comment. The difference between what's in TeX and what's in LaTeX is always a bit confusing to me. And thank you for reminding me about the difference of \input vs. \include. Alas, I am using \include and getting a separate .aux file for each chapter in my book, but all of the .aux files seem to be re-generated with every pass. – vy32 Oct 10 '20 at 17:45
  • I should have clarified that because of labels and what not--including captions---it is quite easy for a change in a later chapter to be reflected earlier. If I have a list of tables in the beginning and add 10 tables in chapter 10, then my list of tables may suddenly take an extra page, forcing every page in the book to change. – vy32 Oct 10 '20 at 17:46
  • Also, it's true that many tex systems are fast, but even Overleaf is taking about 2 minutes to typesert the book. It's not a lot of high resolution images or running tikz. (I actually have a mode to incorporate low-resolution versions by default.) It's just a big book. – vy32 Oct 10 '20 at 17:47
  • 5
    @vy32 well I remember when 15 minutes per page was thought fast, so 2 minutes per book doesn't seem that slow but overleaf isn't necessarily the fastest compared to a local install on a local disk and as I say how often do you need to typeset the whole book? if it takes 2minutes to do a full build once or week or so how much of an issue is that? – David Carlisle Oct 10 '20 at 18:01
  • 5
    @vy32 if you are using \include then add \includeonly{chapter4} or whatever the filename is called, and it will just process the one chapter while keeping all cross references to skipped chapters. that is the whole point for \include and why latex has that command not \input. It is there specifically to address your issue. – David Carlisle Oct 10 '20 at 18:03
  • Thanks @David! I have re-acquainted myself with \includeonly{}. It works well and cut compile time from 74 seconds to 26 seconds. Still not where I want to be, but much better. I wish I could make the package loading faster. – vy32 Oct 12 '20 at 04:40
  • @vy32 as I say in this answer you can precompile your preamble (in most cases) into a custom format, see mylatex or mylatexformat both in all the standard distributions. – David Carlisle Oct 12 '20 at 06:41
  • 15 minutes per page was never fast. My undergraduate thesis, run in 1987, was 84 pages and it ran on LaTeX in under 5 minutes on a VAX 750. I do remember that it took so longer that I didn't bother to fix the one-word orphan on the last page. But it was late at night. – vy32 Oct 12 '20 at 20:32
  • I will explore mylatexformat (the author of mylatex claims that it is "truly ancient" and recommends the later) and report back. – vy32 Oct 12 '20 at 20:34
  • @vy32 at the time would also have been around 87 I had a 4Mb sun3 and it was a lot faster than this but I had friends in math using 640K msdos boxes and sbtex and it was around that order (for the first page at least reading the preamble took forever:-) That's why I wrote mylatex at the time. – David Carlisle Oct 12 '20 at 20:35
  • David — I feel so bad for your friends! Presumably the thing that was causing the problems for them was the FAT32 file system. – vy32 Oct 12 '20 at 20:37
  • @v32 but my thesis (2 years earlier) was done on a typewriter so as I say several minutes per page can seem fast, depending what you compare it with, – David Carlisle Oct 12 '20 at 20:38
  • Ah. I had a Xitan machine in 1978 running a text editor and text formatter in memory, then moved to WordStar on CPM when I got my 8" floppy disk drive in 1980. The printer was a Diablo 630 Daisy Wheel printer; I had to write the driver myself. – vy32 Oct 12 '20 at 20:43
  • OMG, David Carlisle is the person who wrote mylatex. Now I feel lame. – vy32 Oct 12 '20 at 23:39
  • out of pity for colleagues with a dos box when I had a sun3. – David Carlisle Oct 12 '20 at 23:42
4

Putting this here as a bit of a provocative manifesto:

Popping back into LaTeXworld lately I've contemplating finally finishing the LaTeX book that I'd started in the late 80s/early 90s when I taught the TUG LaTeX classes.

LaTeX 2e was first released in 1994 as a transitional step to the eventual release of LaTeX 3. 26 years later, there still isn't a 1.0 release of LaTeX 3. In the interim, we've seen the rise of HTML and the web, the dominance of PDF as a format for representation of printed material (and now there is a plan to have PDF extended with "liquid mode" that allows reflowing of PDF text for smaller screens).

In the meantime, the TeX engine has been extended multiple times, the little-used TeX-XeT, some early efforts to support large Asian character sets, and we have in widish use pdfTeX, XeTeX, LuaTeX along with an assortment of abandoned engines. Worst of all, it seems that none of pdfTeX, XeTeX or LuaTeX can serve as the one TeX to rule them all, each with some limitations that can require users to switch engines depending on their needs.

As I've thought about it, the problem at its root is TeX itself. It's what would be referred to in contemporary software engineering parlance, as a tightly-coupled monolith. Worse still, it's a tightly-coupled monolith with numerous compromises baked in because of the limitations of 1970s computing hardware. It seems that the vast majority of what work has been done with LaTeX 3 has been geared towards dealing with the limitations of TeX as a programming language.

On top of that, there's been an explosion of questionable, if not outright harmful practices from the greater LaTeX community. Ideally, a document should be translated from one document class to another structurally similar class (naming-wise, the choice of "class" to name document classes is unfortunate, but understandable) should not require changing anything after the preamble, better still, nothing but the \documentclass command itself. All the appearance should be handled through the document class and packages should be employed to provide document structure enhancements or new capabilities). There are numerous violations of this. The memoir class is a mess, claiming to be a replacement for article, report and book (this reminds me of the mess that's PHP where the same data structure acts as an array and an associative array and as a consequence manages to merge the worst aspects of both in one inefficient construct) and at the same time, providing a number of bits of functionality that belong in packages rather than the document class. On the flipside, packages like geometry and fancyhdr fall into a category that LaTeX2e doesn't really define, bits of common code that would be helpful to document class writers but shouldn't really be exposed to document authors.

Given the ultimate failure of NTS and ExTeX, I'm not hopeful for a resolution to any of these issues.

Edit 20 October 2020 I've done something stupid. I'm starting a new project. It will probably never amount to anything.

Don Hosek
  • 14,078
  • Thank you for this comment, and I'm in complete agreement with you. It is surprising that the underlying TeX engine has so ossified, but on the other hand, that is what Knuth wanted for stability. I recently met a (surprisingly not-famous) old timer who has exteded TeX to allow editing on the displayed document and then the changes automatically get back-propigated to the underlying TeX document. He thought that the TeX community would not be receptive to the mods because it wouldn't automatically work with all of the myriad of LaTeX packages. – vy32 Oct 12 '20 at 04:45
  • 2
    LuaTeX (or at least LuaHBTeX) does combine more-or-less all of the improvements that pdfTeX and XeTeX offer, so in many ways it's the future engine for everyone. However, it's deliberately not 100% output compatible with pdfTeX (and thus TeX90), which is an issue for some people. And if we are talking performance, Unicode is always slower than 8-bit. – Joseph Wright Oct 12 '20 at 08:54
  • 3
    There are of course alternatives to the TeX approach that still use the ideas from TeX: most obvious would be Speedata (uses LuaTeX but from Lua other than a couple of bootstrap lines). The issue as always is not having clever ideas but getting to a critical mass. The LaTeX team have looked hard at a new format, and we've concluded that it's never going to get that critical mass, hence focussing on improving LaTeX2e. – Joseph Wright Oct 12 '20 at 08:56
  • Engine work is a whole different thing to macro programming, but you might look at HINT and JSBox as possible 'future directions' (both yet to be really tested in the wild) – Joseph Wright Oct 12 '20 at 08:57
  • " there still isn't a 1.0 release of LaTeX 3" yes and no, look at the latex startup banner LaTeX2e <2020-10-01> patch level 1 L3 programming layer <2020-10-05> xparse <2020-03-03> The LaTeX3 (expl3) code is available from 2e, a change of policy.... see Frank's talk at the virtual tug2020, there are more things planned for LaTeX3 than the lowest level expl3 code but it can be built incrementaly (as you can have a format big enough to hold both, which you couldn't back then) – David Carlisle Oct 12 '20 at 23:47
  • 3
    Well, good luck with your new project (and I mean it sincerely); I hope you can take some lessons from the menagerie of failed (La)TeX replacements. You may also find it useful to talk to people who have tried writing alternative typesetting engines (SILE, Patoline, etc.) or look at people who have worked closely with the TeX engine code itself (developers of XeTeX, LuaTeX, jsBox, HINT… and also BaKoMa TeX, Texpad, SwiftLaTeX) . Anecdote from Doug McKenna of jsBox: he told Knuth he was writing a new TeX engine and Knuth warned him it would take 5 years of full-time work, and was exactly right. – ShreevatsaR Oct 21 '20 at 06:50
  • If one is less ambitious, there's another line of useful work to be done: actually I was going to post an answer to this question, making some points like: (1) one can't optimize without understanding what's going on, (2) performance will only degrade if not measured, (3) LaTeX is a great idea (semantic document structure); its implementation (as TeX macros) not so much. So I think the best way out would be to improve user understanding: hack TeX and LaTeX to provide better debuggability, profiling etc, so that users can see "everything", and are nudged towards simplicity and fewer macros. – ShreevatsaR Oct 21 '20 at 06:52
  • I read about a few would-be replacements. For, say, LaTeX and most common package assessments, does the underlying engine really need to be feature complete? Would 80% of the engine cover 99% of the existing packages? TeX also seems paranoid about changing the output --- but for moving on (rather than replacement) this may not be first priority. Speed, with reflowability (given document and package compatibility) would make a nice complement. Maybe not substituting, but complementing would be key to adoption. – ivo Welch Mar 18 '23 at 17:30