5

How are spaces and empty lines processed by long commands (i.e., those that do not accept paragraph breaks inside)? Are there different space tokens aside from " " and an empty line? It appears that an empty line counts as exactly one empty argument:

\documentclass{article}

\newcommand{\oneArg}[1]{}
\newcommand{\twoArgs}[2]{}
\newcommand{\threeArgs}[3]{}

\begin{document}

\indent
A \oneArg

B

% output:
% A B


A \twoArgs

B

C

% output:
% A
% C

A \threeArgs

B

C

% output:
% A C

\end{document}

And is there anything special one needs to know about math mode in this regard?

One pointer: Some relevant information is in this question, especially this discussion thread about \somecommand * being legal LaTeX.


Addendum: An interesting detail about short macros (those defined for example with \newcommand*): If I add \newcommand*{\noPar}[1]{#1} to my source code and try to compile an additional codechunk

\noPar{
A \threeArgs

B

C
}

the compiler will throw an error. As this is semantically not a paragraph break, the long-short distinction between commands should probably be described in terms of empty lines, not paragraph breaks. Or not?

  • That's exactly what I meant: an end of a line and a following empty line are equivalent to \par which is read in as an argument to your commands. Spaces alone are ignored when scanning for arguments except when they're used as delimiters for the arguments. (... or between delimiters of a delimited argument, or in braces { } ...) – cgnieder Feb 19 '13 at 09:36
  • @cgnieder 1. Make it an answer? 2. Spaces are ignored or are not ignored "between delimiters of a delimited argument"? You mean "spaces are ignored between } and { of successive arguments' delimiters", correct? And the "in braces" addendum is a bit confusing (surely not all spaces occurring somewhere inside { } are ignored, perhaps only those at the beginning or end, but I though those are not ignored. – Lover of Structure Feb 19 '13 at 09:40
  • I don't see the link to math mode: I removed the tag – Joseph Wright Feb 19 '13 at 09:51
  • 1
    A delimited argument cannot be defined through \newcommand but it can with \def: in \def\test a#1b{(#1)} the argument has the delimiters a and b. In \test a b the space between a and b will not be ignored, similar to \newcommand\test[1]{(#1)}\test{ }. – cgnieder Feb 19 '13 at 09:53
  • @cgnieder Are two successive linebreak characters exactly equivalent to feeding the command one \par? And, what about a non-empty line, i.e. two linebreak characters separated only by space characters? – Lover of Structure Feb 19 '13 at 10:14
  • One end-of-line is the same as a single space. One end-of-line- followed by an empty line is eqivalent to \par, see Joseph's answer and the comments below – cgnieder Feb 19 '13 at 10:18
  • @LoverofStructure About the last addendum: you macro \noPar is not long, so it doesn't allow \par tokens in its argument; it's obvious that the code raises an error. It has already said many times: a blank line is converted into \par, so there's no distinction to be made. – egreg Feb 19 '13 at 11:30
  • @egreg Actually, the point of the addendum was to show that it's illegal even though \threeArgs is long. This is interesting and imo worth pointing out, as it teaches (a non-expert) something about the order in which things are processed. Not knowing about TeX-internals, it's entirely plausible to assume that \threeArgs processes its arguments first. It's about greedy vs lazy evaluation order, though TeX's very special parsing algorithm doesn't quite fit into those categories and makes it tricky to figure out and understand such details. – Lover of Structure Feb 19 '13 at 11:36
  • 1
    @LoverofStructure I disagree: TeX sees \noPar and so it absorbs its argument, which happens to contain \par. Error. – egreg Feb 19 '13 at 11:39
  • @egreg Yes, so this means that \noPar does this error checking before \threeArgs has a chance to eat \par, B, and \par. Why would this be obvious to a non-expert? – Lover of Structure Feb 19 '13 at 12:26
  • @LoverofStructure Because macro expansion proceeds in the order the tokens are found. – egreg Feb 19 '13 at 12:48
  • @egreg But checking that there is no \par, which is part of macro execution of \noPar, happens before \threeArgs is executed. I don't see how this order is obvious to a casual user of LaTeX. – Lover of Structure Feb 20 '13 at 11:08
  • @LoverofStructure There is \par in the argument of \noPar. It doesn't matter what possible macros in the same argument will do with it. I've never encountered such a problem from users. – egreg Feb 20 '13 at 11:12
  • @egreg Say you want to apply a comment macro inside of \noPar, like what is described here, the comment can be multiple paragraphs. The question is not whether this is a likely thing to occur; my point is that it's something about execution order and parsing that is not obvious (to a non-expert) and needs to be documented, such as here. – Lover of Structure Feb 20 '13 at 11:16

3 Answers3

9

The answer to the question in the title is technically "they are not processed" but I don't think that's the answer you want.

If you modify your definitions to

\newcommand{\oneArg}[1]{\long\def\a{[#1]}\typeout{\meaning\a}}
\newcommand{\twoArgs}[2]{\long\def\a{[#1][#2]}\typeout{\meaning\a}}
\newcommand{\threeArgs}[3]{\long\def\a{[#1][#2][#3]}\typeout{\meaning\a}}

you will see

\long macro:->[\par ]
\long macro:->[\par ][B]
\long macro:->[\par ][B][\par ]

Assuming normal catcodes are in force a blank line is turned by TeX into the token \par (literally the command name token \par not the primitive paragraph end function) It does this at a very early stage as characters are being tokenised, so before any token lists are passed to a macro. So a macro never sees a blank line in its argument. The behaviour is always as if you replace the blank line by \par in the input file.

Space tokens are similarly processed at this early stage. Spaces at the end of the line and the beginning of the next are discarded and never tokenised at all so macros have no record of them. (You can not prevent the discarding of space at the end of the line even if you change the catcode of space) and runs of spaces characters only produce one space token. It is the tokens not the file characters that are passed to a macro.

If you have non-delimited arguments as in your example any spaces tokens are skipped while looking for the argument, if you want a space to be the argument you need { }. \par can be an undelimited argument if the macro is \long or an error otherwise.

David Carlisle
  • 757,742
7

TeX's tokenization process is important here. When (La)TeX reads one newline

some text
some more text

it converts the newline to a space (with any spaces at the start of the second line ignored): some text some more text. However, when TeX reads two consecutive newlines it converts to a \par token

some text

some more text

ends up as some text \par some more text. (I'm assuming standard setting for \endlinechar here.)

Grabbing arguments happens after this process has occurred, so you are not seeing 'blank lines' being read as arguments but rather \par tokens. It's \par tokens in the argument of a command that make the difference between 'short' and 'long' commands, and these have to appear directly in the argument to raise an error with a 'short' command.

wipet
  • 74,238
Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036
  • So there's a difference between "token \par" and "macro name token \par" ? – yo' Feb 19 '13 at 09:53
  • 1
    @tohecz None: what there is a difference between is a token called \par (a macro, \let to a something, ...) and the \par primitive used by TeX internally. See for example \endgraf, which is \let to the \par primitive but is allowed in 'short' arguments as it is not a \par token. – Joseph Wright Feb 19 '13 at 10:02
  • So I still don't understand, at which stage TeX recognizes the difference between \section{x\par y} and \section{x<EOL><EOL>y} so that the 1st goes through, but the 2nd throws an error? – yo' Feb 19 '13 at 10:07
  • @tohecz You never get \section{x<EOL><EOL>y}: TeX tokenizes as it reads the argument, so you can only get \section{x\par y} (unless you do verbatim-like reading by fiddling with \endlinechar). – Joseph Wright Feb 19 '13 at 10:11
  • What about <EOL> <EOL> with tons of space in between (and possible space before and after)? – Lover of Structure Feb 19 '13 at 10:23
  • 1
    @LoverofStructure TeX is in 'skipping spaces' mode after the first newline, so any intervening spaces are ignored. The TeXbook is the place to read up on this. – Joseph Wright Feb 19 '13 at 10:27
3

(1) White space not containing two <end-of-line>s is skipped when scanning for arguments. (2) White-space containing at least two <end-of-lines> is converted to \par and can be passed as an argument.

On the other hand, when TeX scans a definition of a command without \long, and spots \par (either as \par or as (2) above), it throws an error. This is so that you can easier debug wrong grouping. (Remember: \newcommand => yes-\long and \newcommand* => no-\long.)


As well \par in case (2) is a valid not only as an argument, but as an argument delimiter, too. However, the argument will still contain a terminating space token:

\def\xy#1\par{x#1y}

Hello \xy World

How are you

\bye

Output (one line, remember we "ate" the \par using it as a argument delimiter):

Hello xWorld yHow are you


AFAIK this is used in ConTeXt sometimes and it allows you to get rid of many braces, e.g. consider the defintion \def\Section#1\par{\section{#1}}, the your code would be:

\Section Your well-being

How are you?

Output:

Your well-being

How are you?

yo'
  • 51,322
  • 2
    I think you need to be clearer on the distinction between space characters in the input file and space tokens as the timing of tokenisation is the entire issue here (compare to my and @Joseph's answers) The skipping of white space tokens when looking for delimited arguments works on token lists and is completely separate from the skipping of space at start and end of line which works on characters only. – David Carlisle Feb 19 '13 at 10:17
  • +1 for a good quick answer (This time it was very hard for me to decide which answer to accept: they differ essentially only in the level of "detail vs high-level".) – Lover of Structure Feb 25 '13 at 22:49