Components of (La)TeX's memory usage

Question

After reading Increase LaTeX capacity and not being able to summon Gandalf¹, I am curious to know what contributes to the components of memory usage during a (La)TeX compilation. Take the above post's output as an example:

l.3593 ...temp.png}

If you really absolutely need more capacity,
you can ask a wizard to enlarge me.


Here is how much of TeX's memory you used:
 31937 strings out of 94500
 1176767 string characters out of 1176767
 272586 words of memory out of 1000000
 24170 multiletter control sequences out of 10000+50000
 11185 words of font info for 39 fonts, out of 500000 for 2000
 580 hyphenation exceptions out of 1000
 28i,7n,36p,345b,3810s stack positions out of 1500i,500n,5000p,200000b,5000s
PDF statistics:
 33619 PDF objects out of 300000
 7356 named destinations out of 131072
 48094 words of extra memory for PDF output out of 65536
!  ==> Fatal error occurred, the output PDF file is not finished!

What, in a "regular document" that includes packages, contains user-defined macros and environments, constitutes a string, a string character, a word, a multiletter control sequence, words of font, a hyphenation exception, stack positions and (when running pdftex) a PDF object, a named destination and word of extra memory?

The intent of this question is more to understand where one might run into problems if you are (say) compiling a 4000 page document - yikes! Perhaps, a more realistic scenario might be that you are typesetting a very large document and you include a large number of packages even though you only use a select few macros/environments from each package. (La)TeX still loads the entire package into memory, leaving you with less to work with.

Reading pdf objext/destinations/memory limits only suggests where one might have problems and perhaps how to boost (La)TeX's available capacity. However, it doesn't state which parts of the document contribute to which memory component. Similar to other posts I've found.

Some of the memory output may be self-explanatory, but not all. For example, multiletter control sequences probably refer to definitions like \newcommand{\mycom}{...} and \def\stuff{...} each of which I assume puts a +1 to the tally. However, this seems to exclude \def\a{...} since it is a single letter control sequence? Also, does \def\stuff{...} add +1 to the string and +5 to the string characters?

Understand this will probably not sway me from any of my current usages, since they have never given me memory problems of such a nature without the problem being on my end rather on the compiler's. However, it may improve future (La)TeX programming of macros/environments.

¹ By the way, TeX's output of If you really absolutely need more capacity, you can ask a wizard to enlarge me. is just epic.

And a follow-up question: how good do the limits correspond to today's typical personal computers, which can perform billions of operations per second and have gibibytes of operating memory? — Andrey Vihrov, Aug 21 '11 at 06:36
In my experience most memory problems of Tex is not really memory problems, but some bugs in user's macros. I once compiled a font sample book with lualatex and used up to 36 GB memory. And Tex gave no warning, no error. Of course that is a 64bit version. — Yan Zhou, Aug 21 '11 at 07:37
As long as you have paragraphs and reasonable floats, i.e, not a very long piece of text, TeX will have no problem with 4000 pages and it will not run out of memory as it only keeps in memory one page at a time, remember it uses a page model (it does not keep the whole book) in memory. I tried during testing over 4000 pages and had no adverse effects with automated text. If you have too many references, figures and the like, it will get painfully slow while writing all those logs and auxiliaries in files. Most problems are unclosed brackets - forcing TeX to keep in memory long pieces of text. — yannisl, Aug 23 '11 at 12:09
there is a summary list of the memory components in the texbook (p.300) that has short but clear descriptions of each. more detailed information can be found in victor eijkhout's tex by topic. of course, these are biased toward "plain" tex, but since latex simply makes use of these tools, the necessary information should be there. i don't know for sure whether any additional components were added with e-tex, but the list in the log shown in the quextion doesn't report anything that's not in the texbook list. — barbara beeton, Sep 02 '11 at 21:09

score 26 · Answer 1 · edited Apr 13 '17 at 12:34

Here is a short program that can print 15283¹ pages with no indication of memory problems.

\documentclass{article}
\usepackage{lipsum}
\begin{document}
\newcount\n
\n=0
\def\message{I can count to }
   \loop
   \ifnum\n<37000
   \advance\n by1
   \message\number\n : \lipsum[1-2]    

   \repeat
\end{document}

You can increase the \lipsum[1-2] and test both your patience as well as TeX's limits! As long as you allow TeX to break a page easily you are unlikely to hit any memory problems.

Knuth created a very efficient memory management system. The details can be found at the TeX source. A somewhat limited explanation of how memory and strings are handled can be found in my answer for Delimiting a macro argument with the macro itself. Check also for TEXPOOL on your distribution.

Not only Knuth but Lamport and all the LaTeX contributors took extreme care to preserve memory as well as to provide proper garbage collection. IMHO studying the TeX source should be required reading at all Computer Science classes. In the link above check the Dynamic Memory allocation section and note Knuth's comments (clause 119), which is the most common problem encountered with TeX new users:

If memory is exhausted, it might mean that the user has forgotten a right brace...

On a last note TeX does not keep your "book" in memory. It always works on a page at a time ... well almost. If you do not close with a right brace it will continue scanning, potentially until the end of the source and so it cannot de-allocate memory. Knuth introduced a lot of checks and special commands to avoid these issues (long, outer etc...).

¹ Exercise for the reader, change the page height to half and see if you can double the pages.

If we change \lipsum[1-2] into \lipsum*[1] (so no \par is issued) and avoid the blank line before \repeat, the 3000000 memory size is exhausted somewhere between 900 and 1000 (instead of 37000). Indeed, TeX loads entirely in memory a paragraph, in order to break it; thus also gigantic paragraphs may cause it to run out of main memory. — egreg, Sep 04 '11 at 20:21
@egreg Sure, that is why also I had a line in the code. From memory Knuth mentioned this in the TeXBook (I forget the exact place) about a devious student that wrote everything into one paragraph and offered a solution. — yannisl, Sep 04 '11 at 20:24
It's on page 100: the computer's memory capacity might be exceeded if you are typesetting the works of some philosophers or modernistic novelist who writes 200-line paragraphs. That's where James Joyce is indexed. :-) — egreg, Sep 04 '11 at 20:37

score 26 · Accepted Answer · edited Apr 13 '17 at 12:35

My attempt at understanding the wizardry may be somewhat brute force-ish, but it does touch on some of the questions contained in memory usage output. It does not describe how to modify (or increase these quantities).

Much of the discussion below revolves around a modification to the following, basic, "Hello world" minimal example. Let's call it Hello world¹. The superscript denotes an initial MWE, with subsequent modifications denoted by an increased superscript:

\documentclass{article}% Hello world 1
\begin{document}
Hello world.
\end{document}

TeX statistics

( ) strings out of ( )

As-is, Hello world¹ uses 203 strings out of 493633. Modifying this to Hello world²:

\documentclass{article}% Hello world 2
\begin{document}
Hello world. Hello world.
\end{document}

leaves things unchanged across all statistics. However, defining Hello world as part of a macro (via \def or \newcommand) to Hello world³

\documentclass{article}% Hello world 3
\def\helloworld{Hello world.}% \helloworld -> Hello world.
\begin{document}
\helloworld
\end{document}

causes a unit increase in this part of TeX's memory usage, bumping it up to 204. Macro definition increases string, but usage does not. That is, have (say) 50 \helloworld commands in a document would still only use 204 strings. Strings are also used when reading from a file using \input or \include (Hello world⁴):

\documentclass{article}% Hello world 4
\begin{document}
\input{helloworld.txt}% Hello world.
\end{document}

\include uses fewer strings, but necessarily requires the file to have a .tex file extension.

( ) string characters out of ( )

Hello world¹ has 2308 string characters, and the same goes for Hello world². However, Hello world³ uses 2318 string characters, stemming from the 10-letter control sequence \helloworld. The that extent, brevity (with clarity) reigns supreme when it comes to defining macros. That is, it's best to avoid Hello world⁵:

\documentclass{article}% Hello world 5
\def\helloworldmacrothatwillwritehelloworldastheoutput{Hello world.}%
\begin{document}
\helloworldmacrothatwillwritehelloworldastheoutput
\end{document}

( ) words of memory out of ( )

With only Hello world¹, already 49245 words of memory were used. This remains unchanged using Hello world^2,3. However, if \helloworld is defined to contain a fairly large paragraph (like in the case of the lipsum package), the words are bumped up (by 1000 to 50245 in this case) Hello world⁶:

\documentclass{article}% Hello world 6
\newcommand\helloworld{%
  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris 
  turpis purus, posuere et suscipit eu, dictum ac lacus. In velit 
  orci, pulvinar nec tristique quis, congue id diam. Sed sit amet 
  augue tellus, sit amet placerat sapien. Vivamus scelerisque placerat 
  libero id auctor. Donec laoreet auctor velit, eu cursus nulla 
  bibendum a. Mauris id tellus vitae felis sodales sagittis. Curabitur 
  mollis vehicula sagittis. In pharetra elit vel dui mattis dapibus. 
  Pellentesque commodo magna eu sem faucibus fermentum. Nunc in turpis 
  arcu, nec venenatis enim. Quisque sed velit id velit ullamcorper 
  suscipit mattis a ipsum. Etiam viverra eleifend tellus, eget sodales 
  enim varius at. Proin quis nibh mi. Nullam sed nisl id mauris 
  aliquam feugiat.}
\begin{document}
\helloworld
\end{document}

In fact, using the code provided by @Yiannis to produce 15283 pages of wholesome goodness uses a only(!) 145453 word of memory - a 189% increase from Hello world⁶. On my system I was even able to up the 37000 limit to 100000, producing a ~61.5MB, 41305 page document in just over 3 minutes, without any noticeable increase in memory usage. This is undoubtedly due to the line- and paragraph breaking that flush some memory; following @egreg's comment about \par removal, only 900 iterations yields 2746549 words of memory (>90% usage). It is also evident from the compiling output that TeX "takes a deep breath" as it digests the intake, before spewing out the 222 pages of breath-taking beauty. Knuth also mentions this on p 300 in the TeX Book:

If you have specified a gigantic paragraph or a gigantic alignment that spans more than one page, you should change your approach, because TeX has to read all the way to the end before it can complete the line-breaking or the alignment calculations; this consumes huge amounts of memory space.

( ) multiletter control sequences out of ( )

(La)TeX defines a number of single-letter control sequences by default: \\, \c (for cedilla), \b (for bar), etc. And, since there is a limit to the number of single-letter control sequences that are available, it is more constructive to count multiletter control sequences. This is an obvious count to establish. Hello world¹ produces 3587 such control sequences, giving justice to all the behind the scenes prep work (La)TeX does in order get you started. In fact, the 8000 odd lines of latex.ltx is peppered with macro definitions (and \lets). Hello world³, as expected, has 3588 multiletter control sequences.

Further to the above excerpt from the TeX Book, Knuth writes (p 300):

If you have built up an enormous macro library, you should remember that TeX has to remember all of the replacement texts that you define; therefore if memory space is in short supply, you should load only the macros that you need.

This is especially true for rich document classes like beamer and (say) the hyperref package. In fact, this Hello world⁷ in beamer:

\documentclass{beamer}% Hello world 7
\begin{document}
Hello world.
\end{document}

uses

15105 strings out of 493633;
280253 string characters out of 3146724;
353989 words of memory out of 3000000; and
18064 multiletter control sequences out of 15000+200000.

( ) words of font info for 39 fonts, out of ( )

( ) hyphenation exceptions out of ( )

Hello world¹ covers 831 hyphenation exceptions, while Hello world⁸:

\documentclass{article}% Hello world 8
\begin{document}
\hyphenation{He-ll-o} \hyphenation{w-o-r-ld}
Hello world.
\end{document}

has 833. Understandably, this is language dependent. Also, only global hyphenation/discretionary settings affect hyphenation exceptions, while local/temporary settings (like He\-ll\-o) do not affect it.

( ) stack positions out of ( )

These refer to (from the TeX Book, p 300):

i: input stack size (simultaneous input sources);
n: semantic nest size (unfinished lists being constructed);
p: parameter stack size (macro parameters);
b: buffer size (characters in lines being read from files); and
s: save size (values to restore at group ends)

I did not venture into seeing how these are modified. However, as suggested in the TeX Book (p 300), using the \tracingrestores=1 in your document would produce a tracking report (in your .log file) of how TeX removes saved items from the stack at the end of every group.

PDF statistics

( ) PDF objects out of ( )

A very straight-forward discussion on PDF objects mentions that there are essentially three types:

Annotations
Text, and
Images

It is therefore no wonder that Hello world⁹

\documentclass{article}% Hello world 9
\usepackage{graphicx}% http://ctan.org/pkg/graphicx
\begin{document}
\noindent%
\includegraphics[width=\linewidth]{tiger}% http://mirrors.ctan.org/info/examples/lgc2/pstricks/tiger.eps
\end{document}

increases the starting PDF objects tally by about a handful (to 17). Other objects include document pages.

( ) named destinations out of ( )

Destination in a PDF refer to marked locations. These are typically associated with sectioning macros like \section{...}, \phantomsection, \chapter{...} and \caption{...}. Hello world¹⁰

\documentclass{article}% Hello world 10
\usepackage{hyperref}% http://ctan.org/pkg/hyperref
\begin{document}
\section{Hello world.}
\end{document}

produces 3 named destinations (a unit increment from starting with a clean slate Hello world¹ and only including the hyperref package).

( ) words of extra memory for PDF output out of ( )

All the above statistics stems from running TeX Live 2011, with pdfTeX, Version 3.1415926-2.3-1.40.12 and will, almost certainly, differ from other installations (however small or large).