As shown above, I saw many labXXX in the source code of tex and pdftex. I'm really confused about them and there is no comments at all. Is there any doc that demonstrate the meaning of every labXXX? Thanks!
2 Answers
The file you are looking at is not really "source code" it is tex0.c which is C derived (by web2c) from the tex.web source code of TeX, which is written in web (documented pascal).
Almost all comments are dropped in this conversion.
So what you see are various jumps to a procedure arbitrarily labelled as lab22
If you look in tex.web you will see the procedure
@p procedure clear_for_error_prompt;
begin while (state<>token_list)and terminal_input and@|
(input_ptr>0)and(loc>limit) do end_file_reading;
print_ln; clear_terminal;
end;
and various places this is referenced eg
@ @<Get user's advice...@>=
loop@+begin continue: clear_for_error_prompt; prompt_input("? ");
In the generated C this sends up as
void
clearforerrorprompt ( void )
{
clearforerrorprompt_regmem
while ( ( curinput .statefield != 0 ) && ( curinput .namefield == 0 ) && (
inputptr > 0 ) && ( curinput .locfield > curinput .limitfield ) )
endfilereading () ;
println () ;
}
and a labeled reference to it
lab22: clearforerrorprompt () ;
and then some jumps to that point from other parts of the loop:
goto lab22 ;
To understand the source it is best to read the tex typeset document that can be generated from tex.web not the C that is extracted from it.
texdoc tex
in texlive will bring up a hyperlinked typeset version of this source. This section appears as:
- 757,742
(Just to add a bit more detail to the other answer…)
Short version of answer: For the labels specifically shown in the screenshot in the question, you can read the label lab22 as continue, the label to which the code will goto to resume a loop, and read label lab20 as restart, the label to which the program will goto to start a procedure again. (See §15 of the TeX program, or below.)
Long version:
Why does the TeX program have so many gotos?
The goto debate
Once upon a time, computers were programmed in machine language or (a slight improvement) in assembly language, where control flow is expressed using branches / jumps. Donald Knuth (who would go on to write TeX) did a lot of such programming (examples: 1957–1960 (video, texts), 1960). As higher-level programming languages evolved, they used a keyword like goto for such arbitrary jumps, and also introduced control-flow structures for common kinds of jumps like conditionals (if) and loops (while). Then in 1968, a major controversy was ignited when an article by Edsger Dijkstra (submitted as "A Case Against the Goto Statement") was published by the CACM editor Niklaus Wirth as a letter under the title “Go To Statement Considered Harmful”. Roughly the two camps of the debate were:
The
gotostatement is not necessary and is better avoided; the control-flow structures provided by the high-level languages are enough.The existing control-flow structures are too limited; sometimes
gotois indeed necessary / good.
The former position was represented by “structured programming” (Dijkstra, Wirth, Hoare, etc). Knuth, though he wrote an encyclopedic article in 1974 summarizing both (and more) sides (PDF, HTML) always had his sympathies towards the latter.
For what it's worth, the current state of the debate (see summary on Wikipedia) seems to be that while the title of the letter seems to have become dogma and almost every programmer avoids or is afraid of using the word “goto”, in practice the control-flow structures available at the time (of ALGOL 60, ALGOL W, ALGOL 68 etc) were indeed deemed insufficient, and languages have indeed gained more control-flow structures that Dijkstra wouldn't have liked and which cover the examples raised in (say) Knuth's paper: early return from functions, and in loops continue and break (even labeled ones, in languages like Java and Rust).
goto in Pascal
Knuth wrote the first version of TeX (meant for use only at Stanford) in SAIL, but when there was enough interest in the program elsewhere and the danger of incompatible implementations, he set out to rewrite the program in a (then) widely available language in a maximally portable way, and the natural choice was Pascal. Now Pascal (invented by Wirth mentioned above) tries to strongly encourage structured programming:
Functions don't have
returnstatements; instead you have to assign to a pseudo-variable with the same name as the function and control flow has to exit by reaching the bottom of the function.There is no
breakorcontinuein loops; you can use booleans (or usegoto).gotois still available but discouraged, e.g. labels have to be declared beforehand at the top of the function (or program), and these labels have to be numbers: no symbolic names are allowed!
So for example, where in modern languages you can write a function like this (sum of all the odd numbers less than n):
def sum_odd(n):
if n < 1: return 0
sum = 0
for i in range(n):
if i % 2 != 1: continue
sum += i
return sum
(just a made-up example with return and continue; of course this function can be written without them) in Pascal if you wanted an exact translation, you would have to use gotos and pick some ad-hoc numeric labels:
function sumodd(n: integer)
label
42, 100;
var
i, sum: integer;
begin
if n < 1 then
begin
sumodd := 0;
goto 100
end
sum := 0;
for i := 1 to n - 1 do
begin
if i mod 2 <> 1 then goto 42;
sum := sum + i;
42:
end
sumodd := sum;
100:
end
which I guess is sufficient inducement to avoid gotos and rewrite using booleans etc (straightforward in this case, but not always).
goto in TeX and WEB
These features of Pascal were fine for a teaching language, but Knuth (like others) must have found it annoying for writing large software programs with real-world constraints of performance etc., so he created a system called WEB that works around a lot of these limitations. The same program above in WEB could be written as:
in some earlier section,
definesymbolic names and macros for the whole program:define exit = 10 define continue = 22 define return == goto exitthen when writing your function, you can use the above names (e.g. remember to put a label called
exitbefore the end of the function), for a slightly better experience.
This is the system followed by TeX, which uses goto but usually adhering to certain conventions, described in Section 15 of the program:
If you were using the Pascal implementation of TeX directly you would see numeric labels in the source code generated by tangle (part of WEB), but in practice most TeX users use a distribution (like TeX Live) that is based on converting this WEB/Pascal to C (using a system like web2c), in which the numeric labels are once again translated to start with lab.
Common labels
lab20 = restart
This is used for example here (§380, I've reformatted the indentation):
procedure get_x_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|, and expands macros}
label
restart, done;
begin
restart:
get_next;
if cur_cmd <= max_command then
goto done;
if cur_cmd >= call then
if cur_cmd < end_template then
macro_call
else
begin
cur_cs := frozen_endv;
cur_cmd:=endv;
goto done; {|cur_chr=null_list|}
end
else expand;
goto restart;
done:
if cur_cs = 0 then
cur_tok := (cur_cmd * 256) + cur_chr
else
cur_tok := cs_token_flag + cur_cs;
end;
so this structure of the function body being restart: ... if (...) goto done; ... goto restart; done: ... is basically an infinite loop, exited by goto done when some condition is hit.
lab22 = continue
This is used to re-do loops, for example things like
while true do
begin
continue:
...
if ... then goto continue;
...
end
or variants thereof (putting continue at the end of the loop body, etc).
The specific example in the question
An added source of confusion is that what we're running (if using a modern distribution like TeX Live) is not TeX-as-written-by-Knuth directly, but one to which several changes/patches have been applied — and these changes may not always follow the same conventions for the labels, and sometimes aren't even valid Pascal/WEB (e.g. not declaring labels), as they've only been tested via the web2c pipeline (translated to C and run through a C compiler), not via a Pascal compiler. The one in the question shows things that come from EncTeX (see CTAN, Overleaf). In the source code, what looks like this:
...
if (i = start) and (not mubyte_start) then
begin
mubyte_keep := 0;
if (end_line_char >= 0) and (end_line_char < 256) then
if mubyte_read [end_line_char] <> null then
begin
mubyte_start := true; mubyte_skip := -1;
p := mubyte_read [end_line_char];
goto continue;
end;
end;
restart:
mubyte_start := false;
if (mubyte_read [buffer[i]] = null) or (mubyte_keep > 0) then
begin
if mubyte_keep > 0 then decr (mubyte_keep);
return ;
end;
p := mubyte_read [buffer[i]];
continue:
if type (p) >= 64 then
begin
last_type := type (p) - 64;
p := link (p);
mubyte_token := info (p); last_found := mubyte_skip;
end
...
gets turned into the (almost unreadable) C code you showed in the question:
...
if ( ( *i == curinput .startfield ) && ( ! mubytestart ) )
{
mubytekeep = 0 ;
if ( ( eqtb [27215 ].cint >= 0 ) && ( eqtb [27215 ].cint < 256 ) ) {
if ( mubyteread [eqtb [27215 ].cint ]!= -268435455L )
{
mubytestart = true ;
mubyteskip = -1 ;
p = mubyteread [eqtb [27215 ].cint ];
goto lab22 ;
}
}
}
lab20: mubytestart = false ;
if ( ( mubyteread [buffer [*i ]]== -268435455L ) || ( mubytekeep > 0 ) )
{
if ( mubytekeep > 0 )
decr ( mubytekeep ) ;
return Result ;
}
p = mubyteread [buffer [*i ]];
lab22: if ( mem [p ].hh.b0 >= 64 )
{
lasttype = mem [p ].hh.b0 - 64 ;
p = mem [p ].hh .v.RH ;
mubytetoken = mem [p ].hh .v.LH ;
lastfound = mubyteskip ;
}
...
My suggestion, if you're looking at the source code for either understanding or debugging (or rather, finding bugs), would be to start with either LuaTeX (written in C, though manually translated from WEB first: e.g. get_x_token is here) or one of the other non-WEB reimplementations — they may not have all the extra features from TeX Live, but they should be easier to work with.
- 45,428
- 10
- 117
- 149



goto). Usinggotoin the program flow is a programming style that has been discouraged long ago, but is nevertheless the actual model of the underlying machine (the processor). For this reason, you often find it in autogenerated code (from a transpiler), which I assume to be the case here. AFAIK,texwas developed in a Pascal dialect called WEB, which is nowadays transformed to C before getting compiled to machine code. The code you have been looking at may be the intermediate of this process. – Daniel Jul 04 '20 at 05:33tangleorweb2c) and would not be on-topic on a generic C question site like Stack Overflow. – ShreevatsaR Jul 15 '20 at 16:58gotomean?" Would that also be on topic here? – David Z Jul 15 '20 at 20:21gotothen - what if someone had asked "what doesifmean?" When I voted to close, I viewed this question as kind of on the same level as that. But it seems that there is something TeX-specific about that code, so in light of the comments I withdraw my claim that the question should be closed. – David Z Jul 15 '20 at 20:45lab22— is the meaning of each label documented anywhere”, so it's specific to the TeX source code. Anyway, doesn't matter now, glad it's reopened now. :-) – ShreevatsaR Jul 16 '20 at 01:13