Issues with TeX sub-formula formatting

Question

As generally acknowledged TeX offers superb formatting capabilities for math in an automated fashion. Nevertheless there are some areas with clear deficiencies that one either has to accept or manually improve on a case by case basis.

One of the biggies in that respect is TeX's handling of sub-formulas (i.e., material inside a brace group, e.g., and index, but also any group on top-level). TeX sets such sub-formulas always in natural width even if the whole formula is subject to severe stretching or shrinking otherwise. As an example, consider the following (not very sensible) example:

\documentclass{article}

\newlength\x

\begin{document}

\newcommand\formula{$ a+\mathbf{b+c}+d+e+f = \sum_{i=1}^{n-x-y-z} x_{i+j}$}

\settowidth\x{\formula}

\hbox to \x{\formula}    % natural width

\addtolength\x{-20pt}

\hbox to \x{\formula}   % now shrink it a lot

\addtolength\x{100pt}

\hbox to \x{\formula}   % now stretch a lot in the opposite direction

\end{document}

The formula here has a number of obvious sub-formulas in the subscripts and superscript, but for illustration I also added a sub-formula via \mathbf. Of course the usage of \mathbf in this way is wrong!!! (it should not be applied to several symbols), but I'm sure you would find this in documents. In any case just {b+c} would have had the same effect here. Now what do you think happens if we run this?

This:

enter image description here

As one can see the naturally "boxed" sub-formulas are very wrong the moment the rest of the formula is subject to stretching or shrinking.

Until recently no TeX engine successor addressed this issue. With LuaTeX opening up a lot of the internals of TeX I had some hope that this would be different. However, upon studying the manual my conclusion is that this area (sor far it least) has not been addressed (or considered).

As far as I can see the only way something could be done in LuaTeX about this issue would be to use the mlist_to_hlist callback. However, this would really mean replacing the full math typesetting algorithm, which of course could be a way to solve the problem but ... but what is needed is not that (as 99% of this algorithm is next to perfect) but to add support to not simply box sub-formulas at their natural width.

So long text ... here is the short question:

Is this analysis correct, or did I overlook something?
And in case anybody knows: are there plans to look into the sub-formula issue and provide support for it eventually? (it is not listed on the "to-dos" for math)

In case somebody wonders that I talk about "this sub-formula issue" as if it is something like a known thing ... it is, it was already raised way back in E-TeX: guidelines to future TeX extensions which at the time a some thorough (if not say heated) discussions.

This is the same behavior with \newcommand\formula{A \hbox{b c} d}... — Paul Gaborit, Jun 26 '12 at 22:50
@PolGab sure. and \hbox is what a sub-formula produces when TeX converts an mlist to an hlist. so from a conceptual perspective this is a natural thing to do as long as you think that sequential processing is the right kind of model (which is what I'm challenging (for 2 decades here)) — Frank Mittelbach, Jun 26 '12 at 23:19
I think it's not a natural thing. Just a (bad? but pragmatic) choice of conception for building math formulae... — Paul Gaborit, Jun 26 '12 at 23:32
@PolGab: It's the same behavior but not the same situation. The analogous situation would be a \textbf{b c} d, which does not make an hbox just to change the font. It seems to me that the issue is that TeX does not have a way of doing groups in math mode that aren't boxes. — Ryan Reich, Jun 27 '12 at 06:17
@Ryan technically it is the same situation because TeX internally converts the mlist recursively to an hbox. And the fact that it happens recursively is the main culprit for anything being boxed with natural width, except for the top-level material. — Frank Mittelbach, Jun 27 '12 at 07:04
what if you wanted to embolden a "word" in math, like \mathbf{Span}(x + y) ? (yes, i know it would be better to use \DeclareMathOperator, but we see this all the time.) you wouldn't want those letters to "fly apart". to me, this is the same as @RyanReich's \textbf{b c}. i'm afraid i think that the only "good" approach is for authors to learn to "do the right thing" (a goal of which i despair). — barbara beeton, Jun 27 '12 at 12:40
@barbarabeeton adjacent letters have no stretchable glue between them so there isn't a problem there and so you might hope that it worked just like a \begingroup \fam4 b+c\endgroup group which doesn't cause an internal box to be created so allows the operator spacing to stretch or shrink (but doesn't affect inter-letter spaces). — David Carlisle, Jun 27 '12 at 12:50
@barbarabeeton \mathbf{Span} as this is just a sequence of alphabet letters. The issue is with \mathbf{a+b} because this is a mix of a symbol which is not an alphabet char and letters and the + symbol is a binary and so has variable spacing to its left and right ... and those get frozen. — Frank Mittelbach, Jun 27 '12 at 20:04
@FrankMittelbach I think http://tex.stackexchange.com/questions/9683/breaking-equations-with-breqn/47361#47361 illustrates the issue? — yannisl, Jun 28 '12 at 06:25
@FrankMittelbach looking at ltfssdcl.dtx it seems we just use { rather than \begingroup in \mathbf so you can go 2^\mathbf{3} perhaps that's a high price to pay if that's all it is for? (Not that math font groups are your main issue). — David Carlisle, Jun 28 '12 at 12:53
@David I really hate having used that example :-) because it draw away attention from the real issue that I was trying to highlight. As to your observation: yes perhaps a high price, but back then a concious decision and not changable for compatibility reasons even if we wanted to. — Frank Mittelbach, Jun 28 '12 at 13:15
Oh sure we cant change anything, but a package that did change that and used breqn to avoid most (or at least some) of the bad effects of mathinner might mean that at least the issue is reduced. Back to the main point though, if using the luatex callbacks it wasn't clear to me from the manual if the lua code had access to the original function. Is it possible to say if this do something else do what you would have done. Or do you really have to write the whole math layout if you define that callback? — David Carlisle, Jun 28 '12 at 13:36

score 3 · Answer 1 · answered May 03 '22 at 12:11

Old question, but I stumbled upon it, and I think there might now be a way to get this to work, using the luametatex engine*. This is done by unpacking boxes. (This is not applied for square roots, fractions, or in sub/superscripts.) We first take look at the picture.

Here,

is just set with \hbox{word word $a + {\bf b + c} + \left(d + e\right) + f + \sqrt{g + h} + \frac{a + b}{c + d} = \sum_{i=1}^{n-x-y-z} x_{i+j}$ word word}.
is set with \hbox spread 4cm {word word $a + {\bf b + c} + \left(d + e\right) + f + \sqrt{g + h} + \frac{a + b}{c + d} = \sum_{i=1}^{n-x-y-z} x_{i+j}$ word word}. Note that the spreading is not working in the bold part. But it is working inside the fenced part. As mentioned above, it is not done for the square root, the fraction or the index.
is set with \hbox spread 4cm {word word $a + \mathatom unpack \mathordinarycode {\bf b + c} + \left(d + e\right) + f + \sqrt{g + h} + \frac{a + b}{c + d} = \sum_{i=1}^{n-x-y-z} x_{i+j}$ word word}. So, the bold part is also spread. It is unpacked, and then packed as an ord.
is set with \hbox spread 4cm{word word $a + \mathbf{b + c} + \left(d + e\right) + f + \sqrt{g + h} + \frac{a + b}{c + d} = \sum_{i=1}^{n-x-y-z} x_{i+j}$ word word}. This shows that the unpacking and repacking is done in \mathbf.
is set with \hbox spread -2cm{word word $a + \mathbf{b + c} + \left(d + e\right) + f + \sqrt{g + h} + \frac{a + b}{c + d} = \sum_{i=1}^{n-x-y-z} x_{i+j}$ word word}. We just see that stuff can also be packed together.
is set with \hbox spread 4cm{word word $a + \mathatom unpack leftclass \mathordinarycode rightclass \mathbinarycode {\bf c + d} + \left(d + e\right) + f + \sqrt{g + h} + \frac{a + b}{c + d} = \sum_{i=1}^{n-x-y-z} x_{i+j}$ word word}. This is not what we want here, but it shows that we can, when we re-pack, say that the box should behave differently to the left and to the right. Might be useful in other cases. Here, behaving like a binary to the right, makes the plus sign into an ord, and the spacing there gets wrong.

One nice feature of the unpacking and repacking, is that one can have multi line formulas, which breaks inside fences. Example (adapted from this old question):

This formula is set with

\defineformula[XYZ][
  split=text,
  textalign=slanted,
  spaceinbetween=.5\lineheight,
  strut=yes,
  distance=1em,
]
\startXYZformula[margin=3em]
s_{i}^{G}
=
\min 
\left{
    \sqrt{
        \min\left[(x_{i}^{G} - x)^2, (w_{T} - x_{i}^{G} - x)^2\right] + y
        }
    ,\breakhere
    \sqrt{
        \min\left[(x_{i}^{G} - x)^2, (w_{T} - x_{i}^{G} - x)^2\right] 
        + 
        \min\left[(y_{i}^{G} - y)^2, (h_{T} - y_{i}^{G} - y)^2\right]
        }
\right}
\stopXYZformula

*In fact, the \mathbf{b + c} also works in context with luatex. But the unpacking mechanism is new in luametatex. I do not have mkii installed, so I don't know if it also works in pdftex with context.

Could you tell me how the unpacking of \left \right constructions is done at LuaMetaTeX primitive level? :) — Weißer Kater, Feb 13 '23 at 00:36
In short (addition): Advanced math class system with lots of options (see for example math-ini.mkxl). But I have no good understanding on the primitive level, really. The source is available, if you want to dig into it. — mickep, Feb 13 '23 at 12:18

score 3 · Accepted Answer · answered Aug 06 '12 at 20:00

Looks like the answer is as follows:

This issue (which is present already in the original program of TeX) is not being solved or addressed by any TeX successor including LuaTeX.

It is true that LuaTeX offers to replace all of the math processing by proviate code but this is more along the lines "demolish the house and build a new one" and not really warranted. After all TeX's algorithm are really great in most respects. A pity, but then perhaps understandable as it would require to take the processing logic of the math formatting appart and reorganize it to improve only a small fraction of it.

score 1 · Answer 3 · answered Jun 27 '12 at 18:03

1

There is a way to do this with unicode-math since bold letters in Unicode math are different letters of the same font rather than letters of a different font:

\documentclass{article}

\usepackage{unicode-math}

\setmathfont{XITS Math}

\newlength\x

\begin{document}

\newcommand\formula{$ a+++d+e+f = \sum_{i=1}^{n-x-y-z} x_{i+j}$}

\settowidth\x{\formula}

\hbox to \x{\formula}    % natural width

\addtolength\x{-20pt}

\hbox to \x{\formula}   % now shrink it a lot

\addtolength\x{100pt}

\hbox to \x{\formula}   % now stretch a lot in the opposite direction

\end{document}

I couldn't find a way to make this work with \mathbf, but it should be doable: Essentially \mathbf could be made into something like \begingroup \Umathcode ```a=``` … \endgroup.

answered Jun 27 '12 at 18:03

Philipp

17,641

5

You are mistaken in what I'm trying to high-light: any of the current TeX-based engines treat sub-formulas (i.e., a straight brace group) as something that is typeset first (at natural width and inserted into the formula internally as an hbox) and such a sub-formula is then not taking part in any shrink or stretch. The \mathbf was just an example (of deliberate incorrect input) to show how additionally surprise sub-formulas could appear. No unicode-math is making the ^{n-x-y-z} shrink when the rest of the formula is squeezed. – Frank Mittelbach Jun 27 '12 at 20:00
1

@Frank: +1 for your comment, although I should point out that there shouldn't be any shrinking in ^{n-x-y-z} anyway since in \scriptstyle there's no space around binary operations. Or am I mistaken? I think a better example would be a \left...\right construct. – Hendrik Vogt Jun 27 '12 at 21:25
@HendrikVogt right you are, I didn't choose my example well. A toplevel Inner atom would have been better as you say. Inside subscripts the default spacing rules would only add stretchable stuff for Op atoms like \sum or \prod or something.. But anyway I hope the general issue is clear now. – Frank Mittelbach Jun 27 '12 at 21:37
@Frank: I just realized that (of course) fractions are another example of this problem, and that one's even more difficult to handle. Should only the numerator be squeezed if it's wider than the denominator? I'm not sure. But I just encountered an example where the non-squeezing of the numerators is somewhat annoying. – Hendrik Vogt Dec 19 '12 at 10:47

Issues with TeX sub-formula formatting

3 Answers3

Linked

Related