Compiling a 870 mb latex file

Question

I have a tikz file that is 870 mb and I want to compile it using some latex. How can I achieve this? I have tried increasing the amount of main memory to latex to the maximum and it still doesn't render. Currently I get the following output while compiling. I computer generated the file using this tool https://gephi.org using the tikz output export.

http://pastebin.com/4bD9r1Ld

I performed the fix in the answer below and now I get this error

http://pastebin.com/KMZDW8qZ

I'm intrigued by what you can possibly have in a TikZ file of that size. — Joseph Wright, Aug 16 '12 at 20:06
That seems rather ... large. I would guess that there is something peculiar about this document making it so big. Knowing that might help find an answer to your question. — Andrew Stacey, Aug 16 '12 at 20:06
@Joshua You can't tell us you wrote 870 mb of code. So this is computer generated code. If you'd like advise how to minimise this weight, you'll have to deliver some details. Otherwise you better get a supercomputer access or wait, let's say, 5 years. — Keks Dose, Aug 16 '12 at 20:58
I can see that you use plothandler and topaths library so it's most probably a marker issue. You can switch to pgfplots for complicated diagrams and data reading without the marker rendering etc. Can you decrease the sample size to more manageable numbers? — percusse, Aug 16 '12 at 21:40
@percusse The point is to render the whole graph and I don't know what you mean by sample size. — Joshua Herman, Aug 16 '12 at 21:45
I understand but there is not much you can render with a typical pdf page resolution. So most of your data fidelity will be lost in aliasing anyway. — percusse, Aug 16 '12 at 22:02
@JoshuaHerman can you generate a small file? (i.e., a network with a few nodes). So it would be possible to examine the latex file generated by the tool. — Guido, Aug 16 '12 at 23:27

score 13 · Accepted Answer · answered Aug 16 '12 at 21:37

13

The error message says:

! TeX capacity exceeded, sorry [pool size=3141349].

The pool size can be raised, for testing it can be done on the command line using an environment variable:

$ pool_size=10000000 latex openordconceptnet4.tex

This can also be configured in texmf.cnf, see entry for pool_size.

However, it cannot be raised to infinity and there are lots of other memory limits. Some memories can be increase the same way as pool size. For other values the format must be regenerated to have an effect.

The real problems start when the memory limits cannot be increased further. Then it can be tried to divide the large file in smaller files and concat the output files later. Or the TeX code must be tried to make more efficient with the memory limits.

answered Aug 16 '12 at 21:37

Heiko Oberdiek

271,626

How can I divide the large file into smaller files? – Joshua Herman Aug 16 '12 at 21:38
It depends on your file. LaTeX provides the \include feature that allows putting chapters in separate files. And via \includeonly they are compiled one after another. – Heiko Oberdiek Aug 16 '12 at 21:59
From the posted log it seems that no page is produced. Thus, it cannot be split into chapters using \include – Guido Aug 16 '12 at 23:04
@Guido The input TeX file is split into pieces. Then compiling the pieces might successful produce output files with pages. – Heiko Oberdiek Aug 16 '12 at 23:20

Guido · Answer 2 · 2012-08-24T20:18:00.867

Based on the code you posted for a smaller example, the tool you are using generate redundant LaTeX code

\usepackage{tikz, tkz-graph}
\usepackage[active,tightpage]{preview}
\PreviewEnvironment{tikzpicture}
\setlength\PreviewBorder{5pt}

It is not clear to me what is the use of the last 3 lines. preview is useful to generate stand alone output (for example, to be used in the AucTeX preview mode of emacs).

The main code seems to be divided in two parts: one part where \node are defined, and the second part where \Edges are defined. Several optimizations are possible.

\begin{tikzpicture}
\node at (-2.6682776,7.9326984) [circle, line width=1, fill=COLOR0,  inner sep=0pt, minimum size = 14.3428574pt, label={[label distance=0] 315:Myriel}] (1) {};
\node at (-4.1808344,9.4046475) [circle, line width=1, fill=COLOR0,  inner sep=0pt, minimum size = 2pt, label={[label distance=0] 315:Napoleon}] (2) {};
...
\node at (2.3879364,1.7951599) [circle, line width=1, fill=COLOR3,  inner sep=0pt, minimum size = 10.2285728pt, label={[label distance=0] 315:Brujon}] (76) {};
\node at (7.1218353,4.9839259) [circle, line width=1, fill=COLOR8,  inner sep=0pt, minimum size = 10.2285728pt, label={[label distance=0] 315:MmeHucheloup}] (77) {};

In the example it seems that a lot of parameters of the nodes are exactly the same circle, line width=1, inner sep=0pt. Every there is a duplicate some words of memory are consumed without any reason. A more efficient is to collect the same values and declare them only one time. This can be done using \tikzset{<parameters>}

\begin{tikzpicture}
\tikzset{circle, line width=1, inner sep-0pt, label distance=0pt}
\node at (-2.6682776,7.9326984) [fill=COLOR0, minimum size = 14.3428574pt, label={315:Myriel}] (1) {};
\node at (-4.1808344,9.4046475) [fill=COLOR0, minimum size = 2pt, label={315:Napoleon}] (2) {};
...
\node at (2.3879364,1.7951599) [fill=COLOR3, minimum size = 10.2285728pt, label={315:Brujon}] (76) {};
\node at (7.1218353,4.9839259) [fill=COLOR8, minimum size = 10.2285728pt, label={315:MmeHucheloup}] (77) {};

For the \Edges parameters are unnecessarily swapped.

\tikzset{EdgeStyle/.style = {-, shorten >=1pt, >=stealth, bend right=10, line width=0.5, color=COLOR0}}
\Edge (2)(1)
\Edge (5)(1)
\Edge (6)(1)
\Edge (7)(1)
\Edge (8)(1)
\tikzset{EdgeStyle/.style = {-, shorten >=1pt, >=stealth, bend right=10, line width=1, color=COLOR0}}
\Edge (9)(1)
\tikzset{EdgeStyle/.style = {-, shorten >=1pt, >=stealth, bend right=10, line width=0.5, color=COLOR0}}
\Edge (10)(1)

First of all, there are many unnecessary duplication. Some parameters can get a value one once instead of repeating the same values over and over and wasting TeX memory words. Thus the cade can use \tikzset{EdgeStyle/.style = {<parameters>}} for the fixed parameters, and \tikzset{EdgeStyle/.append style} to modify the parameters that need to be modified for the following edges

\tikzset{EdgeStyle/.style = {-, shorten >=1pt, >=stealth, bend right=10, line width=0.5, color=COLOR0}}
\Edge (2)(1)
\Edge (5)(1)
\Edge (6)(1)
\Edge (7)(1)
\Edge (8)(1)
\tikzset{EdgeStyle/.append style = {line width=1, color=COLOR0}}
\Edge (9)(1)
\tikzset{EdgeStyle/.append style = {line width=0.5, color=COLOR0}}
\Edge (10)(1)

From the sample it seems that the only parameters that change are line widht and color. In the example there are countless cases where both parameters are used while only one of them needs to be changed, thus a further optimisation of the code above is

\Edge (8)(1)
\tikzset{EdgeStyle/.append style = {line width=1, color=COLOR0}}
\Edge (9)(1)
\tikzset{EdgeStyle/.append style = {line width=0.5}}
\Edge (10)(1)

Some of the above optimisations can be done by a simple search and replace mechanism (in the sample I have been able to save about 10% of memory words with them). The last optimisation can be done by some clever scripts. Also another optimisation would be to group together as many changes as possible (e.g., all \Edges with the same line width or color)

Hope it helps

your approach would certainly be a good approach towards simplified maintenance of that huge tex file. However, I do not see how it should improve the memory usage... the memory usage comes from TikZ's internal output buffer (it has to collect all internal low level path instructions before it can finally write them to the pdf). Thus, your approach would reduce mem usage if you manage to use "simpler path constructs" in the sense of "fewer low level instructions". — Christian Feuersänger, Aug 25 '12 at 09:22

Compiling a 870 mb latex file

2 Answers2

Linked