Coding non-trivial algorithms within TeX

Question

This is kind of a hard question to formulate correctly, so forgive me if I have to clarify this question later.

I want to draw a lot of somewhat complex TikZ diagrams, which to be done effectively needs a certain amount of "intelligent behaviour" on the part of the macros that will drawing them. To achieve this I think I have several options, none of which is ideal:

code the whole thing in TikZ and/or TeX/LaTeX macros. This is a huge challenge because while I'm fairly adept at using LaTeX and TikZ for typesetting, I have no idea how you'd go about implementing a non-trivial algorithm and it feels like attaining the needed expertise would require a major investment of time.
use LuaLaTeX. This is the obvious Right Solution, but unfortunately it would mean I couldn't post anything written using the system to arXiv, and hence it has to be dismissed.
write the layout code in Julia or Python etc. and spit out TikZ code that I paste into my document. This is by far the easiest solution but it means I have to manage all the diagrams separately from the main text, which I'm hoping to avoid.
write code in my language of choice that acts as a preprocessor for .tex files. This avoids the issues with the above option but means I can't use Overleaf for collaborations.
[added in edit] use PythonTeX - this is even slicker than writing my own preprocessor and probably by far the best option, but it does have the same issue that I have to teach my colleagues to use git because it doesn't work with Overleaf.

Because none of these are ideal, I was wondering if an additional option might exist: TeX is a Turing-complete macro processing language, so in principle it should be possible to implement a more modern-feeling basic scripting language within TeX.

I realise it's a long shot, but my question is just whether this has been done: is there a package that lets you code in TeX in a way that "feels" like writing code in a modern scripting language, in such a way that I could auto-generate TikZ diagrams without needing to know all the deep intricacies of how TeX macros work?

I'm open to other suggestions also and am happy for this to become a general question about how to get things done when you want to automate things but aren't a TeX wizard.

I have no idea how to correctly tag this question - suggestions would be appreciated. — N. Virgo, Aug 06 '22 at 10:38
you could look at pythontex or for a pure tex code look at expl3 (which is pre-loaded into latex) which gives loops, conditionals, iterating over lists, .... What programming do you need? — David Carlisle, Aug 06 '22 at 10:44
@DavidCarlisle maybe expl3 is what I need, then. (I'd heard the term but hadn't really understood what it was.) Basically what I want to achieve is that the user specifies the layout of the diagram "logically" and then the system does some computation to figure out where to put things. It's a bit like a graph layout engine (what graphviz does), but specialised to a particular type of graph, and hence hopefully much simpler. Loops, conditionals and lists is probably enough, though other data structures would probably help a lot. — N. Virgo, Aug 06 '22 at 10:50
texdoc expl3 for user guide or texdoc interface3 for reference or [tag:expl3] tag here for many examples — David Carlisle, Aug 06 '22 at 10:55
Have anyone mentioned functional package? Although you still need to know some details of how TeX macros work to use this correctly. // Side note, you need to know a lot about how TeX works to use expl3. — user202729, Aug 06 '22 at 11:12
In looking at that expl3 documentation I'm getting a strong "welcome, person who has already achieved a godlike level of wisdom at LaTeX2e" sort of a vibe. It doesn't look like it's currently at the stage where it can be picked up for use as a general scripting language for mere mortals like me, unfortunately. — N. Virgo, Aug 06 '22 at 11:13
Well, use PythonTeX then. It has some depythonize script that automatically convert TeX-with-PythonTeX-package code (with limitations) to pure TeX code, although I haven't used that one myself. — user202729, Aug 06 '22 at 11:14
@user202729 (my previous comment was about raw expl3 rather than your suggestion) functional looks like the kind of thing I was thinking of and I will look into it. PythonTeX also sounds very useful. — N. Virgo, Aug 06 '22 at 11:20
Side note 2, if you're not careful your algorithm in TeX will be O(n) times slower than that in other languages. E.g. extracting kth character from a string in TeX is O(n) instead of O(1). — user202729, Aug 06 '22 at 11:23
I prefer the preprocessing option. For example, you can create diagrams with PlantUML code enclosed in @startuml and @enduml and have that replaced by an image before the TeX processor is invoked. — Heiko Theißen, Aug 06 '22 at 12:00
It sounds like the sagetex package is your answer. You get Python and a CAS, Sage which "builds on top of many existing open-source packages: NumPy, SciPy, matplotlib, Sympy, Maxima, GAP, FLINT, R and many more.". See here, here, here for some examples. — DJP, Aug 06 '22 at 13:05
I laughed out loud when reading your "welcome, person who has already achieved a godlike level of wisdom at LaTeX2e" remark :D it's true though. What might help is to ask a question here with an as complete as possible MWE for a specific graph, with your envisioned syntax for the input and an image of the desired output. If you're lucky somebody will provide an algorithm in expl3 or latex2e that is useful for you to learn by example. — Marijn, Aug 06 '22 at 13:38
Unrelated: lualatex can output latex/tikz code. What type of visual output do you need? Tree, syntactic, flowchart, resource, paper-folding, ...? That last is not as ludicrous as it might sound: any algorithm A, as an iterative sequence of steps, should map to any other iterative sequence of steps, A1, of arbitrary physical implementation (including bamboo and liana vines) - the sequence is constant. The complexity of the input needed will be defined by/at the user-end, so independent. — Cicada, Aug 06 '22 at 16:13
In the past I sometimes put PostScript-code for drawings into filecontents-environments, and had loaded the package epstopdf for on-the-fly-conversion to pdf and included things using \includegraphics. I also did some images that needed to be inserted repeatedly via Metafont and treated them like a font. Drawbacks: You need to know PostScript/Metafont. Having some writing in the image in the same font as is in use throughout the document is not easy. Importing data and creating image on the fly depending on that data requires re-running Metafont. — Ulrich Diez, Aug 06 '22 at 19:44
Probably knitr is of interest to you? With knitr you have files which are a mixture of TeX-code and code in the language R. knitr evaluates the R-code and creates a .tex-file where R-code is replaced by what shall be processed/typeset by TeX instead. Hereby you can use R for producing more TeX-code, e.g., forming tokens of a tikZ-picture. But for doing that you need to be at an advanced level both with R and with knitr. The time needed for achieving that level could be invested in getting to grips with TeX's expansion and TeX's other "digestive processes". ;-) — Ulrich Diez, Aug 06 '22 at 21:34
Well, full list of packages should be somewhere in Can I use an easy programming language with XeTeX? - TeX - LaTeX Stack Exchange — user202729, Aug 07 '22 at 01:24
Probably you can suggest a short algorithm well-known to you (so that you don't need to focus on the gist of the algorithm but can focus on its implementation in TeX) and I can try to describe how I would go at implementing it in TeX. — Ulrich Diez, Aug 07 '22 at 01:30
I guess the people offering help for the OP to learn TeX is well-intentioned, but my opinion is that unless you have no other choice or learn TeX for learning TeX's sake (well, once you know TeX ("Counterexamples in Computer Science") it's pretty interesting. But if you just want to get the job done, stick with something else) — user202729, Aug 07 '22 at 03:09
The documentation of that mentions depythontex script, might work in your case, not sure, although it's a bit cumbersome indeed. (I see you already read that PythonTeX is no longer supported in overleafv2 https://tex.stackexchange.com/a/534059/250119) — user202729, Aug 07 '22 at 16:07

Ulrich Diez · Answer 1 · 2022-08-07T17:21:02.223

What workflows to suggest depends on many circumstances.

In case there is the requirement of recipients of your .tex-input-files being able to obtain the .pdf-file simply by running LaTeX on the main .tex-file on whatsoever (probably more recent) TeX-platform, then you probably cannot get around implementing things in TeX because only the availability of the TeX-platform itself can be taken for granted.

If you can make some assumptions regarding the computer-platform, the TeX-platform and the possibility of running/executing external programs from within TeX, then things might look different:

In LaTeX you can, e.g., use the filecontents*-environment or \immediate, \openout, \write and \closeout for creating an external text file which contains the source code for a program, which in turn is the implementation of the algorithm needed by you in your preferred programming language. The output of the program can be an external text-file which contains .tex-code representing the result of carrying out the algorithm. (\immediate\write, however, might require a bit of knowledge of how and when TeX writes tokens to file and of expansion that hereby might take place.)

Using \write18 (traditional TeX and XeTeX) or os.execute (LuaTeX) —the package shellesc provides a unified interface—you can have executed commands on your computer for running the compiler of your choice on that source code and afterwards running the resulting executable and this way obtaining the text-file which contains the .tex-code representing the result of carrying out the algorithm.

This text-file in turn can be processed by TeX via \input.

However, such approaches are highly platform-dependent and you might need to do a great deal of (shellscript-)programming for catching up the situation of something going wrong while having TeX run the external programs.

Configuring things (rights to run external programs, enabling \write18/os.execute) might also turn out cumbersome on some of the nowadays' computer platforms.

If you use overleaf or the like online-platform, workflows of this kind are not really an option to you.

I the following I take the title of this question to be a question about general strategies/procedures when implementing algorithms in TeX.

When implementing things in TeX, you might want to think of things "materially":

TeX reads its input and thinks of the input as a set of instructions to put pretty glittery things on an assembly line one after another. These pretty glittery things are called "tokens".

(Excursus: These tokens are either control sequence tokens or character tokens. Control sequence tokens are those things where you usually write a backslash first in the TeX source code. So for example \LaTeX, \section, ... Some control sequence tokens, e.g. macros, e.g. some primitives, like \romannumeral or \string or \csname..\endcsname are expandable. That is, they and their arguments are replaced by other tokens at an early stage of processing.
Some control sequence tokens are not expandable. For example \relax or \hbox{...}. They are not replaced in said early stage but remain and serve in later stages e.g. as basic directives for typesetting the text.
Character tokens have a character code which in the TeX-internal character encoding scheme (either ASCII or Unicode) corresponds to the number of the code point of the respective character, and belong to a category on which depends what TeX does when the respective character token is processed. Character tokens of category 13 (active) can be used like control sequence tokens. Together with the control sequence tokens they form the control sequences. Excursus end.)

TeX processes these pretty glittery things. The first station the tokens go through on the assembly line is the "expansion department". Here expandable tokens and their arguments are taken off the assembly line and constellations of tokens are put on the assembly line to replace those tokens. How the constellations of replacement tokens are composed depends on the definition or meaning of the replaced expandable tokens.

When the replacing is over in the expansion department, there are no more expandable tokens on the assembly line. The (non-expandable) tokens resulting from going through the expansion department on the assembly line are transported to departments where they are considered as directives for typesetting text or for defining macros or for assigning values to registers/parameters or for shipping typeset pages into the .pdf or .dvi output file or the like.

When I implement algorithms in TeX and want to use only macros/expandable tokens, so that the algorithm is fully processed in the expansion department alone, and only tokens representing the result should reach subsequent departments, I think of it this way:

A macro can have, depending on the definition, up to nine arguments, which must be placed on the assembly line right behind the respective macro token, each argument enclosed in curly braces. The curly braces are also tokens, and the arguments themselves are also constellations of tokens. (If an argument is to consist of only one token, the curly braces around it can be omitted).

A macro denotes a step of the algorithm.
The arguments of a macro, numbered from #1 to (at most) #9, can be thought of as variables used in that step of the algorithm. The tokens that make up the arguments in an expansion step can be thought of as the values of the variables in that step.

An expansion step, i.e., a replacement step, represents a step of the algorithm. In such a step, the macro token denoting the step of the algorithm and the tokens that are its arguments and thus denote the values of the variables are taken off the assembly line and are replaced on the assembly line by the macro token that denotes the next step, followed by arguments required in the next step, consisting of curly brace tokens and sequences of tokens nested within them that represent the values of the variables in the next step.

You can often do this recursively. That means, after a few intermediate replacement-steps, which are only there to rearrange tokens for arguments, which means to change values of variables of the algorithm, and to put the arguments in the right order on the assembly line, the token is put in front of everything, which denotes that step of the algorithm, from which the expansion-cascade/replacement-cascade was started. In this way, one can have steps of an algorithm repeated recursively until the arguments/values of variables meet certain conditions.

In principle it is a matter of replacing tokens by other tokens expansion step by expansion step until those tokens are there which are considered as the result of the algorithm. If you use subroutines, you may need to know after how many expansion steps their result is there.

It is very helpful to know how \expandafter works, how \romannumeral and in some contexts \the and with recent TeX-engines \expanded can be (mis)used to trigger further expansion steps (, e.g. all expansion steps of an expansion based "sub-routine"), how \csname. .\endcsname , \number and the \if-\else-\fibranches work respectively which tokens behind tokens \if/\else/\fi are removed from or left on the token assembly line under which conditions in which expansion-steps.

The figurative idea that tokens are items that lie one behind the other on an assembly line, combined with the question how an expansion step that gets applied to the token currently processed in the expansion department changes what tokens lie on the assembly line, and how an expansion step itself triggers further expansion steps (\romannumeral, \number, \csname, \if, \ifcat) or triggers an expansion-step to the next but one token (\expandafter) and so influences the time order in which tokens on the assembly line are expanded is helpful when it comes to macro programming in TeX.

Coding non-trivial algorithms within TeX

1 Answers1