3

In §31 of tex.web it is said:

Trailing blanks are removed from the line; thus, either |last==first| (in which case the line was entirely blank) or |buffer[last-1]!=' '|.

Input should be the same when trailing blanks are present as when they are not, due to the fact that blanks are invisible at the end of the line. From this I can deduce that if trailing blanks were not removed, TeX would behave differently.

Why is it necessary to remove trailing blanks? Does anybody know which section in tex.web would work differently if trailing blank would be present in input line? Or some example input?

Igor Liferenko
  • 7,063
  • 2
  • 14
  • 47
  • Please see the accepted answer to this question: https://tex.stackexchange.com/questions/7453/what-is-the-use-of-percent-signs-at-the-end-of-lines I think it is related to that about which you inquire. – Steven B. Segletes Apr 14 '20 at 10:17
  • Those examples use %, which can be used to leave space in end of line, but it removes \endlinechar. Which undesired effects would arise if space in end of line was kept and \endlinechar was preserved? – Igor Liferenko Apr 14 '20 at 11:00
  • I think, in such a case, you would end up with two space tokens, which is not the desired default. – Steven B. Segletes Apr 14 '20 at 11:09
  • @StevenB.Segletes Well, actually what troubles me is that in §36 nothing is said about removing trailing blanks. So, they are safe in the end of command line? – Igor Liferenko Apr 14 '20 at 11:36
  • Consider also the case when the catcode of space is not the usual 10 (e.g. when it is an active character). (For possibly relevant context, also the comment by DRF quoted here, about "IBM's OS360 and VM/CMS".) – ShreevatsaR Apr 14 '20 at 11:36
  • Related to the previous comment: using verbatim*, I believe it is not possible (at least not in a straightforward way) to have lines ending with visible spaces (those printed by \verbvisiblespace). – frougon Apr 14 '20 at 11:57
  • So, considering the end of the question: example input that would presumably work differently if trailing blanks weren't removed: \begin{verbatim*} abc def \end{verbatim*} (there is a newline after \begin{verbatim*}, then abc def followed by one or more spaces, a newline and finally \end{verbatim*}). – frougon Apr 14 '20 at 12:05
  • @frougon a plain-tex example would be preferable - so that I could check it by changing code which handles command line arguments (I can recompile only Knuth's TeX, no eTeX, which is needed for latex). (I'm interested about command line handling specifically, because in §36 of tex.web nothing is said about removing trailing space) – Igor Liferenko Apr 14 '20 at 12:21
  • I think it's just to have a normalization across operating systems: some used NUL to fill fixed length records, others used spaces. This is confirmed by the words of David Fuchs in the answer quoted by @ShreevatsaR – egreg Apr 14 '20 at 13:07
  • @IgorLiferenko I've added a plain TeX example below. – frougon Apr 14 '20 at 13:18
  • If I do tex '\relax abc ' from the command line, and add def\bye at the prompt, the output will have “abc def” with a space. The space is not stripped in that case. – egreg Apr 14 '20 at 15:19
  • @egreg The following command produces the same output as in your example: echo -e '\\relax abc \ndef\\bye' | tex. Space is stripped in file mode. So it follows that in your example space is also stripped. – Igor Liferenko Apr 17 '20 at 09:31

3 Answers3

7

I believe you gave yourself a good reason why it is not completely unreasonable to ignore trailing blanks: since most people can't see them, having different behaviors depending on their presence could be very confusing (note that I do see trailing blanks, because I have (setq-default show-trailing-whitespace t) in my Emacs configuration). There may be other reasons that I don't know—I only wrote this answer in reply to your comment here.

So, regarding your request for sample input that would behave differently if trailing blanks weren't ignored, I propose the following (which belongs to the category envisaged by ShreevatsaR: catcode different from 10 for the ASCII space character):

\def\visiblespace{{\tt\char32 }}
\obeyspaces\let =\visiblespace
abc def  ghi   
\par
\bye

where I've left three spaces after ghi (which are unfortunately invisible here). The output with my unmodified TeX engine is:

enter image description here

I would expect three “visible spaces” after ghi with your modified engine that doesn't ignore trailing blanks.

Addendum

Here are two other examples, this time with the standard category code for the ASCII space (10):

  1.  {\endlinechar=`X
      abc} 
     d\par
     \bye
    

    There is one trailing space after abc}. Subtlety: the \endlinechar terminator character is appended before tokenization starts for a given line. Thus, each line is terminated according to the \endlinechar value that was current at the end of the previous line. Here, after the closing brace and invisible trailing space, an X character has already been appended as line terminator before TeX starts to tokenize abc.

  2.  {\let\par=X\obeylines%
      abc 
     }d\par
     \bye
    

    There is one trailing space after abc.

In both cases, an unmodified TeX engine outputs:

enter image description here

I expect that your modified engine prints abc Xd in both cases.

frougon
  • 24,283
  • 1
  • 32
  • 55
7

in TeXLive 2018 the interpretation of "stripping blanks" was corrected to just strip spaces and not tabs, so you can see the effect by comparing texlive 2017 with any later release (texlive 2020 here)

consider the plain TeX


\catcode9\active\def	{X}

one two three

one two three

\bye

This has two tab characters (U+0009) this site will strip them so I will show them as T here:


\catcode9\active\defT{X}

one two three

one twoT three

\bye

in texlive2017 the tabs are stripped and you get

enter image description here

in TeXlive2020 you get

enter image description here

user202729
  • 7,143
David Carlisle
  • 757,742
  • I'm not sure how this answers the question. – egreg Apr 14 '20 at 14:20
  • @egreg ??? it is an example input that produces different output if trailing blanks are stripped. That is exactly the question is it not? – David Carlisle Apr 14 '20 at 14:21
  • No, it's not. The question is: what would happen if TeX didn't strip blanke? The TAB issue is not relevant, as the question refers to the “original TeX”. – egreg Apr 14 '20 at 14:26
  • I guess the question is about trailing spaces mainly, rather than the complication of tabs... The intent of the question seems to be that in regular paragraphs trailing spaces don't matter anyway, so why strip them at input time. – ShreevatsaR Apr 14 '20 at 14:27
  • @egreg I don't really want to be in the middle of a fight between you and David (or is it some kind of teasing?), but it seems to me that David's example shows an input that would behave differently if traditional TeX didn't strip any trailing blank, which I believe includes tabs. Is it really so far-fetched? Regarding ShreevatsaR's comment, the question appears to carefully use the term ”blank” rather than “space,” thus I deem it reasonable to consider this word as encompassing both spaces and tabs. I could be wrong, of course. :-) – frougon Apr 14 '20 at 14:48
  • 1
    @frougon TeX originally didn't strip tabs at the end of lines. The feature was introduced at some point in time in TeX Live implementations, but now removed. TeX strips characters with character code 32, tokenization has not yet come into play at that stage. – egreg Apr 14 '20 at 14:51
  • @egreg Ah, I knew there had been a change but didn't know it was a back-and-forth. Thanks for the clarification. From the link given by ShreevatsaR, it seems that traditional TeX didn't accept TAB chars at all? Which would indeed imply it can't strip them (without signaling an error). – frougon Apr 14 '20 at 14:53
  • 2
    @egreg the point of giving an example with tabs is you can see the difference using actual implementations rather than an example with space where you have to imagine an implementation that didn't strip, – David Carlisle Apr 14 '20 at 15:33
  • @DavidCarlisle OK, fair enough. – egreg Apr 14 '20 at 15:36
  • @user202729 that's odd your edit puts Tab in the code, but as I explicitly commented in the answer that was not possible as they got changed to spaces, so I marked with T. The stackexchange editor has had some changes, it seems you can use Tab now... – David Carlisle Aug 04 '22 at 10:33
  • @DavidCarlisle Not really, there's this trick https://meta.stackexchange.com/a/294870/388243 at the expense of more ugly source code. – user202729 Aug 04 '22 at 10:36
  • @user202729 ah, OK – David Carlisle Aug 04 '22 at 10:38
2

The reason is to get normalized input across various operating systems.

At the time TeX was written, several operating systems used fixed length records, as they were based on punch cards (the typical system with fixed length records). Some of these systems filled the record with NUL characters (corresponding to no punch on a column)

punch card

Others filled with blank spaces, for instance IBM's OS360 and VM/CMS (see the quotation in https://tex.stackexchange.com/a/389871/4427).

The problem with the NUL character was solved by using for it category code 9 (ignored), for blank spaces the solution was to remove them when the record is read in before tokenization. Giving the space category code 13 and defining it to be \space would not typeset spaces, because the removal happens before tokenization and every character with category code 32 is removed; after the removal, the \endlinechar is added (but not yet tokenized).

Some TeX implementations (TeX Live, in particular) used to also remove TABs, but nowadays this no longer happens.

In any case, a trailing space would be impossible to spot and could give surprises in output. TeX78 had a different way to cope with endlines, but TeX82 introduced \endlinechar and category code 5, besides space removal.

egreg
  • 1,121,712