As of TeX version 3.141592653 and Metafont version 2.71828182, the bug has been fixed. The rest of this answer applies only to previous versions.
Note: It just occurred to me that this bug is present in Metafont as well. See the end of this answer.
For the sake of documentation, here’s exactly what makes the bug happen, since
it took me a little while to figure out.
The key players are the two integer variables selector and interaction.
Let’s focus on interaction first, since it’s the simpler of the two. It is
supposed to control whether TeX stops to interact with the user, and it has
four possible values:
When interaction = error_stop_mode = 3, TeX stops if an error occurs
(§82, §530), or if \pausing is set to a positive value (§363), or if \read
is used to get input from the terminal (§484), or if interrupt is nonzero at
certain points [after a token list has been scanned (§324), after a line of
input has been read (§343), and during the processing of ligatures (§753, §911)].
When interaction = scroll_mode = 2, TeX does not stop when a nonfatal
error occurs, unless the problem is that a file can’t be found, in which case
TeX will still prompt the user for a new file name (§530).
When interaction = nonstop_mode = 1, TeX does not stop unless a fatal
error occurs, or it gets into a situation that requires input from the
user—namely, if an \end command is not present in a file (§360), if a \read
command requests input from the terminal (§484), or if a file cannot be found
(§530). (These situations are treated as fatal, although they would not be if
interaction is scroll_mode or error_stop_mode.)
When interaction = batch_mode = 0, TeX’s behavior is as when
interaction is nonstop_mode, except that output to the terminal is omitted
(§75, §90, §92, §1328). This is important.
Note that the level of user interaction increases as the value of interaction
increases. Initially, interaction is set to error_stop_mode (§74).
The selector variable controls where TeX’s various text printing routines
send their output. In TeX82, it has twenty-two possible values, ranging from 0
to 21. When 0 ≤ selector ≤ 15, it represents one of the files opened with
\openout. The values of selector above 15 have the following meanings:
When selector = no_print = 16, printing goes nowhere.
When selector = term_only = 17, printing goes only to the terminal.
When selector = log_only = 18, printing goes only to the transcript file.
When selector = term_and_log = 19, printing goes to the terminal and to
the transcript file.
When selector = pseudo = 20, characters are “printed” to a buffer for use
by the show_context procedure, in a process called “pseudoprinting” (§315). This
setting is not relevant to us.
When selector = new_string = 21, characters get appended to the string
memory (if there’s any space left). This setting is not relevant to us either.
Initially, selector is term_only (§55, §1332), since no transcript file has
been opened.
The values of selector and interaction are mostly independent. However, as
you may expect, when interaction is batch_mode, selector should not be
term_only or term_and_log. In one case, it is set to term_only
unconditionally (though temporarily, since the previous value is saved in §534)
in §535, regardless of interaction. But the general idea is that
selector will be term_only or term_and_log if and only if interaction >
batch_mode; in particular, selector will be term_only or term_and_log
when interaction = error_stop_mode.
When TeX wishes to read a line from the terminal, it calls term_input
(usually via prompt_input; see §71). This routine makes clever use of the
numeric relation between selector’s possible values, in order to echo the
input line if appropriate. The program assumes that selector must be either
term_only or term_and_log upon entry to term_input (no other values would
make sense). Hence term_input can decrement selector and unconditionally
print the line the user input; if selector was term_only, it becomes
no_print, which is correct because the line will have been echoed already
(owing to the nature of terminals), and if selector was term_and_log, it
becomes log_only, which is correct because the line must be written to the
transcript file.
Let’s look at the error routine now. Its top level looks like this (§82):
procedure error;
label continue, exit;
var …;
begin
…
if interaction = error_stop_mode then
⟨Get the user’s advice and return⟩;
…
exit:
end;
And the outline of ⟨Get the user’s advice…⟩ looks like this:
loop
begin continue:
clear_for_error_prompt;
prompt_input("? ");
if last = first then
return;
c ← buffer[first]
if c ≥ "a" then
c ← c + "A" − "a"; {convert to uppercase}
⟨Interpret code c and return if done⟩;
⟨Print the menu of available options⟩;
end
The section ⟨Print the menu…⟩ is what it sounds like, except that the option
to type E to edit the input file is not listed if no input file is open, and
the option to type a number to delete tokens is not listed if
deletions_allowed is false (in order to thwart more than two levels of
recursion in error).
The interesting part of ⟨Interpret code c…⟩ is a big case statement,
switching on the value of c. (The uninteresting part is actually ⟨Print
the menu…⟩. I moved it to make the overall flow of the loop clearer.) In the
following descriptions, transfers of control are in bold.
If c is a decimal digit, and if it’s OK to delete tokens, then the number
of tokens specified by the user are deleted and control goes to continue.
If c is "E", then (in TeX82) the user is told what line of what file to
edit and TeX terminates.
If c is "H", then the help information is printed and control goes to
continue.
If c is "I", then a line of input is read from the terminal as the next
thing for TeX to process, and control goes to exit via the return macro.
If c is "Q", then interaction becomes batch_mode, selector gets
decremented (to suppress terminal output), and control goes to exit.
If c is "R", then interaction becomes nonstop_mode and control goes
to exit.
If c is "S", then interaction becomes scroll_mode and control goes
to exit.
If c is "X", then interaction becomes scroll_mode and TeX
terminates.
Otherwise, nothing happens; control falls through to ⟨Print the menu…⟩
and we go back to the top of the loop.
There is also a case for c = "D", if code for debugging isn’t commented out.
Control goes to continue afterwards.
[Something interesting to note about cases 5, 6, 7: Each change of
interaction is accompanied by a message saying OK, entering , followed by
the new mode; e.g., when you type S, TeX says
OK, entering scrollmode.
Then the program does print("..."), so that the message ends up being
OK, entering scrollmode.... In case 5, however, selector is decremented before
the ellipses, so it ends up going either to the transcript file if selector
was term_and_log or to nowhere if selector was term_only; the ... will not
appear on the terminal. Knuth acknowledges this in the answer to his sixth
exercise for TeX: The Program he published in TUGboat (exercises
here, answers
here).]
The process of deletion is pretty simple. First, the values of certain
global variables (cur_tok, cur_cmd, cur_chr,
and align_state) are saved. Then OK_to_interrupt is set to false—this
is another measure to stop unwanted recursion, since error might be called if
an interrupt occurs and OK_to_interrupt is true. Next, c is set to the
number typed in by the user. The following loop is executed:
while c > 0 do
begin
get_token; {one-level recursive call of error is possible}
decr(c);
end
Hence tokens are deleted by simply reading and ignoring tokens. The get_token
procedure can be regarded, for our purposes, as identical to get_next. The
recursion can happen because get_next might cause error to be called. Most
of the erroneous situations that can arise in get_next eventually terminate
the program; they are fatal errors. But there’s one direct call to error,
which happens when an invalid character is read (§346). The deletions_allowed
variable is set to false before the call, and to true afterwards.
So what’s the problem? Let’s consider what happens when you start up plain TeX
and enter the troublesome input. (I'm using plain TeX because ^^? is already made illegal.) First \s^^?E is typed in response to the **
prompt. Because the first character of input is \ (= escape), TeX
treats it as regular code (i.e., it doesn’t assume you wanted to \input a file
named \s^^?E; see §1337). The \s is read and TeX tries to expand a
control sequence named s. Since \s has no definition, the expand routine
calls error (§370).
At this point, interaction is error_stop_mode and selector is term_only.
(This is why the error has to happen on the first line of input; otherwise the
transcript file is opened and selector changes.) The loop in §83 begins. Then
you type 1 (this is case 1 listed above) and §88 starts to be executed, and
get_next is called by get_token. The invalid character ^^? (ASCII code 127 = '177 = "7F; see Appendix C of The TeXbook) is read and control
moves to §346. The error routine is called again.
The values of interaction and selector have not changed, so the error dialog
is entered as before. Now you type Q. The code in §86 runs; interaction
becomes batch_mode, and selector gets decremented to no_print. Control
returns from error back to get_next, which skips over the invalid character
and reads the E left in the input. Then we get back to error; remember that
we’re in case 1, so control goes up to continue and the dialog loop begins
again.
At this point, interaction is batch_mode and selector is no_print = 16,
and we are at the top of the loop in §83, which should be executed only if
interaction = error_stop_mode. All the pieces of the puzzle are now in
place. The prompt_input macro first attempts to print ? ; nothing is
displayed, because of the value of selector. Then prompt_input calls
term_input, which does input_ln(term_in, true); this is why TeX waits for
input, even though it’s supposed to be in batch mode. The reason there must be
text following the invalid character is that otherwise TeX will encounter the
end of input (in get_next, §360) and report a fatal error [*** (job aborted, no legal \end found)]. The fatal_error procedure (§93) calls
normalize_selector (§92), which is intended to avoid situations just like what
I'm describing!
Next term_input decrements selector; its value becomes 15. If you typed
anything in response to the invisible ? , then term_input will attempt to
print it, by calling print on each character in buffer, which will end up
calling print_char. (Simple exercise: Why can’t term_input call print_char
directly?) The value of selector isn’t one of the six important ones
enumerated above, so print_char tries to print to write_file[selector]. The
elements of write_file are of type alpha_file, and none of them are actual
open streams, so what happens now is system-dependent. In Web2C, the result is
that putc will be called with a null pointer as its second argument (see
fixwrites.c),
which causes a segmentation fault. ∎
Now that we know what goes wrong, how might it be fixed? In TeX Live and in the recent tune-up, §83 was
changed to have a test at the start of the loop, so that it now looks like
loop
begin continue:
if interaction ≠ error_stop_mode then
return;
clear_for_error_prompt;
prompt_input("? ");
if last = first then
return;
c ← buffer[first]
if c ≥ "a" then
c ← c + "A" − "a" {convert to uppercase}
⟨Interpret code c and return if done⟩;
end
(See this July 6
commit.
Here I have not lifted ⟨Print the menu…⟩ out of ⟨Interpret code c…⟩ as I
did before.)
After studying the original code, I’ve come up with the following alternative
solution. First, we change error’s top level (§82) so that
if interaction = error_stop_mode then
⟨Get the user's advice and return⟩;
is
while interaction = error_stop_mode do
⟨Get the user's advice and return⟩;
Then we change §83 to read
begin
clear_for_error_prompt;
prompt_input("? ");
if last = first then
return;
c ← buffer[first];
if c ≥ "a" then
c ← c + "A" − "a"; {convert to uppercase}
⟨Interpret code c and return if done⟩;
continue:
end
There are also other, more drastic options. We could make the same changes, but
remove the continue label from §83 and change ⟨Interpret code c…⟩ into
something like
if (c ≥ "0") ∧ (c ≤ "9") ∧ deletions_allowed then
⟨Delete c − "0" tokens⟩
else
if (c = "E") ∧ (base_ptr > 0) then
…
else
case c of
debug "D"
begin
debug_help;
end;
gubed
"H":
⟨Print the help information⟩;
"I":
⟨Introduce new material from the terminal and return⟩;
"Q", "R", "S":
⟨Change the interaction level and return⟩;
"X":
begin
interaction ← scroll_mode;
jump_out;
end;
othercases
⟨Print the menu of available options⟩
endcases
where goto continue has been removed from the deletion code, from the
debugging code, and from the help-displaying code. In my opinion, this is worse,
because it’s not as obvious that the menu might be printed even if c is "E"
or a digit.
Other places besides error could be changed as well. We could make
term_input or prompt_input explicitly validate the assumption that
selector ∈ {term_only, term_and_log}. For example, prompt_input(#) might
be made to expand into
begin
if (selector ≠ term_only) ∧ (selector ≠ term_and_log) then
confusion("selector");
wake_up_terminal();
print(#);
term_input;
end
Of course, that would be helpful only if any more bugs of this sort exist in the
program.
Addendum: Metafont and TeX share a lot of programming, and in fact their versions of the error routine are nearly identical. So it isn’t too surprising that this bug can happen in both programs. It’s not as bad in Metafont, though; no segmentation fault can occur. The problematic first line this time is \1:=^Ax, where ^A is control+a. (Any invalid character will do, but it must be typed directly, since Metafont doesn't have an equivalent to TeX's ^^ syntax.) You’ll get the error
Improper `:=' will be changed to `='.
The rest of the interaction proceeds as before. You type 1, Metafont decries the invalid character, then you type q, and Metafont waits for input after supposedly entering batch mode.
There are of course other ways to cause the bug. You could say \1;^Ax, but you’d have to delete two tokens instead of just one.
Most of the exposition above about TeX applies to Metafont, although many of the section numbers are different. The selector shenanigans don’t happen, since Metafont expects it to be between 0 and 5 and does nothing if it isn’t.
^^Eis no longer invalid when LaTeX starts up. – egreg Jun 27 '20 at 08:32