20

With ε-TeX, the go-to method for testing if a <token-list> is empty is the following test:

\if\relax\detokenize{<token-list>}\relax
  % empty
\else
  % not empty
\fi

The method is fool-proof as long as the <token-list> can be safely \detokenized, which is the case when it is grabbed as argument to some other macro which does the testing.

Now looking at the expl3 sources I found the test to actually be (modulo _ and :)

\expandafter\ifx\expandafter\qnil\detokenize{#1}\qnil
  % empty
\else
  % not empty
\fi

where \qnil are “quarks” defined with \def\qnil{\qnil}, which means that \ifx\qnil<token> will only be true if <token> is \qnil, which will be the case only if #1 is empty; otherwise <token> will be any other (catcode-10 or 12) token which will make the test return false.

But this condition is also true for the first test: \if\relax<token> will only be true if <token> is another control sequence, which will never be the case if there's anything inside the \detokenize.

Or is it?

Is there a reason for the second method being preferred over the first? Is there an edge-case in which one of them would fail?

Both methods, as far as I can tell, apply the same treatment to the input token list, and are both robust regarding weird arguments, such as \iftrue\else\fi (which would otherwise be a problem) because in either case the <token-list> is \detokenized, so the argument can be virtually anything.


Motivation:

I’m working on some code that will use this test and should be executed a few hundred times for each function call, so performance is important. According to my tests the first method is slightly (very, very slightly) faster than the second:

\RequirePackage{l3benchmark}
\ExplSyntaxOn
\prg_new_conditional:Npnn \pho_tl_if_empty:n #1 { TF }
  {
    \if:w \scan_stop: \tl_to_str:n {#1} \scan_stop:
      \prg_return_true:
    \else:
      \prg_return_false:
    \fi:
  }
\cs_new:Npn \pho_test:N #1
  {
    \benchmark_tic:
    \int_step_inline:nn { 999999 }
      {
        #1 { } { } { } % Empty
        #1 { X } { } { } % non-empty
        #1 { \iftrue \else \fi } { } { } % just in case
      }
    \benchmark_toc:
  }
\pho_test:N \pho_tl_if_empty:nTF
\pho_test:N \tl_if_empty:nTF
\stop

output:

(l3benchmark) + TIC
(l3benchmark) + TOC: 2.17 s
(l3benchmark) + TIC
(l3benchmark) + TOC: 2.32 s

. . . Yes, those are 15 hundredths of a second in one million repetitions :-)

Thus, the motivation here is to know whether I can use the (in)significantly faster method without sacrificing robustness. The real motivation is to know in what way this type of choice may come to bite me in the future.

  • 1
    note that the quark-delimited test is older than \detokenize l3 is older than etex.... – David Carlisle Oct 23 '19 at 10:06
  • @DavidCarlisle Ah, I didn't consider the reason could be historical... – Phelype Oleinik Oct 23 '19 at 10:11
  • 9
    it's tex, almost every reason is historical:-) – David Carlisle Oct 23 '19 at 10:14
  • 3
    I tried benchmarking \if aa\fi versus \ifx aa\fi and the latter is slightly faster, but \expandafter\ifx aa\fi is noticebly slower. However, \if aa\fi is noticeably slower than \ifx\a\a\fi (with \def\a{a}) and \expandafter\ifx\aa\fi performs just slightly slower than \if aa\fi. – egreg Oct 23 '19 at 13:47
  • @egreg Interesting... It seems that \ifx and \if are more or less equally fast (at least for single tokens, which is the case in the question), and what slows the process down is the \expandafter... – Phelype Oleinik Oct 23 '19 at 14:24
  • Your question is more specific than the title suggests, but when talking about performance, I'm wondering how either of the \detokenize versions performs compared to other emptiness tests when it comes to the length of the argument. As far as I understand, \detokenize has to go through the whole list before the comparison.even starts. So for very long lists it should be notably slower than e.g. a naive \ifx\relax#1\relax. – siracusa Oct 23 '19 at 14:31
  • 1
    @siracusa Yes, I've already considered that. You are right, for longer arguments the \detokenize slows down the operation by a considerable amount. However for what I'm trying to do I need the \detokenize approach because it has to cope with possibly unbalanced conditionals in the argument, in which case other approaches all fail. Thanks for pointing it out! – Phelype Oleinik Oct 23 '19 at 14:36
  • 1
    If you really really really care about performance, don't use \prg_new_conditional:Npnn, but instead code the test yourself, because the way \prg_new_conditional:Npnn sets up the branching is slow (in the produced code, not during the definition, hint: it uses \expandafter). Instead, if you want the last tiny bit of performance you should use \cs_new:Npn \__pho_fi_use_i:wnn \fi: \use_ii:nn #1 #2 { \fi: #1 } \cs_new:Npn \pho_tl_if_empty:nTF #1 { \if:w \scan_stop: \tl_to_str:n { #1 } \scan_stop: \__pho_fi_use_i:wnn \fi: \use_ii:nn } – Skillmon Nov 13 '19 at 18:42
  • @Skillmon Nice one! It gets me four more hundredths in that benchmark, but I don't have the T or F branches anymore (although if I needed I could define them manually). Depending when, it may be worth it. Although my actual use case (never trust OP :-) is \if_catcode:w \scan_stop: \tl_to_str:n \exp_after:wN { \use_none:n #1 } \scan_stop: ^ \fi: in a three-way sort-of conditional. Anyway, thanks for the suggestion! – Phelype Oleinik Nov 13 '19 at 19:49
  • @PhelypeOleinik defining the T and F version is easy, too, use \cs_new:Npn \__pho_fi_use_i:wn \fi: \use_none:n #1 { \fi: #1 } and \cs_new:Npn \__pho_fi_use_none:wn \fi: \use_i:n #1 { \fi: } but you have to put some duplicate code there (the actual \if... test). Still the general concept can be applied, if you know what the read tokens will be, it is faster to define the macro with a delimited argument instead of gobbling the token as an actual argument. And in general \expandafter is slow, so if you can get around it with the same number of expansions, do it. – Skillmon Nov 14 '19 at 01:28
  • @Skillmon I see you're doing your performance homework with that sorting code :-) (which is quite impressive, by the way). Yes, defining the variants is not the problem. My remark about defining the T and F variants was only the tradeoff between four hundredths of a second and the code duplication. You said "it is faster to define the macro with a delimited argument instead of gobbling the token as an actual argument": do you have any reference or that's just from overusing l3benchmark? – Phelype Oleinik Nov 14 '19 at 01:35
  • @PhelypeOleinik the latter, but also it seems logical if you think of how TeX (most likely) handles macro arguments (I don't know anything about the code though, so this is just by assumption). For TeX to handle arbitrary arguments it has to run a couple of tests internally (braced argument or single token?), needs to provide the read token(s) to the macro definition and only that definition says that the token isn't used at all. On the other hand, providing the tokens as an argument-delimiter just tells TeX what exactly to expect and doesn't need to add it to some list of arguments. – Skillmon Nov 14 '19 at 01:42
  • @PhelypeOleinik for a single token this rule of thumb holds true in all of my tests so far, I have no experience yet about how this would scale for many tokens (e.g., would \def\foo#1\qstop be faster than \def\foo\a\b\c\d\e\f\g\h\i\j\k\l\m\n\o\p\q\r\s\t\u\v\w\x\y\z\qstop?). Another thing that is generally faster is, don't read in an argument twice if you can read it once, better to add another token (e.g., see https://tex.stackexchange.com/a/515744/117050 the non-expandable version and the macro ...@false, but this way it gets vulnerable against input like 1pt, instead of 1pt). – Skillmon Nov 14 '19 at 01:53
  • @PhelypeOleinik never mind, just run l3benchmark at it, it scales pretty well, the latter being considerably faster than the former, taking only about 63% of the time. – Skillmon Nov 14 '19 at 02:03
  • @Skillmon Your reasoning makes total sense. In fact, it's correct! I was looking at tex.pdf and found the relevant bits in §291 (general description of how TeX stores token lists) and §397 (specific procedure to matches a non-parameter parameter text). Using an argument is slow because it a) triggers a procedure to scan a parameter (delimited by whatever, which also needs checking and specific procedures), b) stores the scanned parameter in a pstack array, and c) retrieves each out_param in the replacement text from the pstack. Explicit delimiters are just matched and discarded. – Phelype Oleinik Nov 14 '19 at 03:17
  • @Skillmon What would slow your hypothetical \foo\a...\z (imagine ... being the missing letters :-) would be to define it with \def\foo#1\a...\z{} and then use with \foo\a...\y\a...\z so that the scanner in §397 is thrown off the track in the first try. But in that case the programmer would be asking for it ;-) – Phelype Oleinik Nov 14 '19 at 03:28
  • A fast \ifempty test that only fails if the argument contains \ifempty@A\ifempty@B directly after each other (taking about 70% the time a \if\relax\detokenize{#1}\relax takes): \long\def\ifempty@true\ifempty@A\ifempty@B\@secondoftwo#1#2{#1}\long\def\ifempty@#1\ifempty@A\ifempty@B{}\long\def\ifempty#1{\ifempty@\ifempty@A#1\ifempty@B\ifempty@true\ifempty@A\ifempty@B\@secondoftwo} – Skillmon Nov 19 '19 at 09:21
  • @Skillmon Well, that one is actually quite interesting! I don't think that that is a serious restriction: given the proper name space, those two tokens shouldn't even appear in the argument at all, so I'd say it's pretty safe... I think you should post an answer with your findings. It would be a shame if it got lost in the comments (though I'm inclined to drop the \if\relax\detokenize test in favour of yours now :-) – Phelype Oleinik Nov 19 '19 at 19:39
  • Instead of \if\relax\detokenize{<token-list>}\relax... I might probably do \ifcat$\detokenize{<token-list>}$... in order to not have to rely on \relax not being redefined in a way which fools the test. – Ulrich Diez Jul 04 '21 at 01:23

2 Answers2

22

General

There are a few considerations when it comes to performance of TeX code:

  1. argument grabbing costs time, don't grab arguments unnecessarily
  2. \expandafter is slow, if you can work around it with the same amount of expansions it's faster, so instead of
    \if...
      \expandafter\@firstoftwo
    \else
      \expandafter\@secondoftwo
    \fi
    
    we'd use (this uses an aspect of the first point, too, namely if false only the contents of the true branch will be gobbled)
    \long\def\my@fi@firstoftwo\fi#1#2#3{\fi#2}
    \if...
      \my@fi@firstoftwo
    \fi
    \@secondoftwo
    
  3. gobbling tokens explicitly as delimiters for arguments is faster than gobbling them as an argument which is delimited, so the above example can further be optimized:
    \long\def\my@fi@firstoftwo\fi\@secondoftwo#1#2{\fi#1}
    \if...
      \my@fi@firstoftwo
    \fi
    \@secondoftwo
    
    But be aware that this way code becomes less readable, less reusable, and less maintainable, so the small performance gain comes at a cost.

\if... can represent any if test that results in a TeX-syntax if, such as \ifx AB, \iftrue, etc.

Also \if tests can be slow (depending on the used test) and so is \detokenize, if we can get around those, we should. Another thing to consider is that \if tests are not robust if their arguments contains other \if tests, \else or \fi. To overcome this the standard test for an empty argument does \detokenize the argument with:

\long\def\ifemptyStandard#1%
  {%
    \if\relax\detokenize{#1}\relax
      \expandafter\@firstoftwo
    \else
      \expandafter\@secondoftwo
    \fi
  }

This yields an unbeatable robustness, as the only possible argument that might fail this test would be an unbalanced input (that arguably is not really an argument, see the comments of Phelype and Ulrich below this answer), which needs to be actively created, such as \expandafter\ifemptyStandard\expandafter{\iffalse{\fi}}{true}{false} (but who would do that anyway).

Of all the if tests built into TeX, \ifx is probably the fastest. So a naive test \ifx <some-token>#1<some-token> would be pretty fast, unfortunately this would not be robust. Cases for which it'd fail would be if \if..., \else, or \fi would be part of the argument or if #1 starts with <some-token> (though we can make <some-token> pretty unlikely).

Fast \ifempty

The following is a fast test, that considers some of the above mentioned aspects. We don't use any \if... test, but instead do the branching through TeX's argument grabbing logic:

\long\def\ifempty@true\ifempty@A\ifempty@B\@secondoftwo#1#2{#1}
\long\def\ifempty@#1\ifempty@A\ifempty@B{}
\long\def\ifempty#1%
  {%
    \ifempty@\ifempty@A#1\ifempty@B\ifempty@true
      \ifempty@A\ifempty@B\@secondoftwo
  }

So if #1 is empty \ifempty@ will gobble only the first \ifempty@A and \ifempty@B and \ifempty@true will be executed, gobbling the following \ifempty@A\ifempty@B\@secondoftwo and the false-branch. On the other hand, if #1 is not empty everything up to \@secondoftwo (non-inclusive) will be gobbled and \@secondoftwo will execute the false-branch.

This way we get a fast testing macro (taking about 70% the time of the \if\relax\detokenize{#1}\relax test during my benchmarks), that's fairly robust (only input which contains \ifempty@A\ifempty@B will fail the test, and that should be rare).

And of course, we can use tokens which are even more unlikely than \ifempty@A and \ifempty@B, e.g., why not use a <DEL> characters for both but with different category codes (that should be pretty very very unlikely to ever be part of a valid argument):

\begingroup
\lccode`\&=127
\lccode`\$=127
\catcode`\&=12
\catcode`\$=11
\lowercase{\endgroup
\long\def\ifempty@true&$\@secondoftwo#1#2{#1}
\long\def\ifempty@#1&${}
\long\def\ifempty#1{\ifempty@&#1$\ifempty@true&$\@secondoftwo}
}

Fast \ifblank

As a small addition, we can also create a fast \ifblank test based on the aforementioned thoughts. The standard \ifblank looks something like the following:

\long\def\ifblankStandard#1%
  {%
    \if\relax\detokenize\expandafter{\@gobble #1.}\relax
      \expandafter\@firstoftwo
    \else
      \expandafter\@secondoftwo
    \fi
  }

So essentially the same as \ifemptyStandard but with an \expandafter and a \@gobble #1. added. But we could do the same as for our fast \ifempty test with just some small additions (I'll just add this to the slightly obfuscated variant using the <DEL> tokens). And we don't want to use some \expandafters (remember they are slow) so we use \ifblank@ to gobble one token and insert the necessary tests of \ifempty.

\begingroup
\lccode`\&=127
\lccode`\$=127
\catcode`\&=12
\catcode`\$=11
\lowercase{\endgroup
\long\def\ifempty@true&$\@secondoftwo#1#2{#1}
\long\def\ifempty@#1&${}
\long\def\ifempty#1{\ifempty@&#1$\ifempty@true&$\@secondoftwo}
\long\def\ifblank@#1{\ifempty@&}
\long\def\ifblank#1{\ifblank@#1.$\ifempty@true&$\@secondoftwo}
}

Faster \ifblank

It is indeed possible to create an even faster \ifblank test, which is a bit less robust. The previous \ifblank would fail for a combination of tokens which have to be directly adjacent. This test fails if the argument contains a single marker. The test again uses TeX's argument grabbing logic, maybe in an even more ingenious way.

The speed advantage stems from the fact that TeX has to reinsert the first marker token (\ifempty@A) in the fast implementation if the argument isn't empty/blank. This implementation never needs to reinsert a token, instead it gobbles the first marker if #1 is blank, because it uses two parameters, one normal one and one delimited one. Also, it needs one step of expansion less.

The result is ca. 20% faster.

\long\def\ifblank#1%
  {%
    \ifblank@#1\ifblank@mark\ifblank@false
      \ifblank@mark\@firstoftwo
  }
\long\def\ifblank@#1#2\ifblank@mark{}
\long\def\ifblank@false\ifblank@mark\@firstoftwo#1#2{#2}
Skillmon
  • 60,462
  • 3
    I'd argue about the unbalanced input, \expandafter\ifempty\expandafter{\iffalse{\fi}}. This is, for all effects, the same as \ifempty{}}, which doesn't make any sense. Other than that, excellent answer! – Phelype Oleinik Nov 20 '19 at 17:33
  • 2
    @PhelypeOleinik as I said, it has to be created malevolently. – Skillmon Nov 20 '19 at 18:27
  • 1
    If unbalanced braces, i.e., s.th. that generally cannot be handled by macro-arguments, are created—e.g., via "\expandafter...\iffalse{\fi"-trickery—before carrying out the macro in question (\ifemptyStandard in this case), then failure is not due to the macro but due to the process which delivers the arguments of the macro. Ad "the only possible argument that might fail this test would be an unbalanced input": Unbalanced input cannot be a macro argument in TeX. ;-) The test doesn't fail. The concept of "unbalanced macro argument" is faulty in TeX. ;-) – Ulrich Diez Dec 05 '22 at 21:14
  • @UlrichDiez yes, you're right, looking back now the formulation is poor (which I thought every time I looked at this answer again), but I don't think it's poor enough to be removed from the answer... Maybe I should add a note :) – Skillmon Dec 06 '22 at 08:32
  • @Skillmon The wording somehow implies that the test has a particular weakness. But I don't see it that way. The test and the answer are good. All macros have the "weakness" that only brace-balanced token sequences are possible as arguments. – Ulrich Diez Dec 06 '22 at 11:13
  • @UlrichDiez most, not all. See the gtl package. Also with packages like pgfparser you can build stuff that acts on potentially unbalanced lists. But those are admittedly exceptions that require much code to make stuff work :) – Skillmon Dec 06 '22 at 11:36
  • @Skillmon Of course you can define macros which a) trigger expansion-cascades which in turn at some stage deliver primitives for handling "unbalancedness", e.g., via \let/\futurelet/\string/\hbox/whatsoever. b) like gtl handle "unbalancedness" by transforming balanced token-lists into s.th. where braces are represented by s.th. which itself in any case is balanced. With such "mechanisms" (functions in expl3) "unbalancedness" is not handled by macros' argument-grabbing but by other aspects of the "mechanism".You never have TeX grab an unbalanced list as an argument for a macro. – Ulrich Diez Dec 06 '22 at 12:48
  • @UlrichDiez Yes, that's correct, directly grabbing unbalanced tokens as a parameter is never possible. – Skillmon Dec 06 '22 at 19:10
5

In case you need an expandable empty-test which does without e-TeX-extensions and without forbidden tokens, I can offer this one:

%%-----------------------------------------------------------------------------
%% Check whether argument is empty:
%%.............................................................................
%% \CheckWhetherEmpty{<Argument which is to be checked>}%
%%                   {<Tokens to be delivered in case that argument
%%                     which is to be checked is empty>}%
%%                   {<Tokens to be delivered in case that argument
%%                     which is to be checked is not empty>}%
%%
%% The gist of this macro comes from Robert R. Schneck's \ifempty-macro:
%% <https://groups.google.com/forum/#!original/comp.text.tex/kuOEIQIrElc/lUg37FmhA74J>
%%
%% Due to \romannumeral-expansion the result is delivered after two
%% expansion-steps/after two "hits" by \expandafter.
\chardef\stopromannumeral=`\^^00
\long\def\firstoftwo#1#2{#1}%
\long\def\secondoftwo#1#2{#2}%
\long\def\CheckWhetherEmpty#1{%
  \romannumeral\expandafter\secondoftwo\string{\expandafter
  \secondoftwo\expandafter{\expandafter{\string#1}\expandafter
  \secondoftwo\string}\expandafter\firstoftwo\expandafter{\expandafter
  \secondoftwo\string}\expandafter\stopromannumeral\secondoftwo}%
  {\expandafter\stopromannumeral\firstoftwo}%
}%

Like anything else that works in terms of macros, this does not work with arguments that contain \outer-tokens.

Deviating from the requirements formulated in the question, \CheckWhetherEmpty is rather slow.

I take \CheckWhetherEmpty for a moot thing/for a slow workaround in situations where one can't take for granted that e-TeX's \detokenize is available/is allowed by the terms of the macro-writing-challenge.

I emphasize that the gist/the basic idea of "hitting" either the first token of the non-empty argument or the closing brace behind the empty argument with \string for the sake of probably "neutralizing" some braces before it comes to brace-matching in the course of gathering and removing a (to-be brace-balanced) macro-argument, and this way cranking out the brace-matching-cases, does not come from me but does come from Robert R. Schneck's \ifempty-macro.

I just added \romannumeral-expansion and stringification and removal of superfluous curly braces via \expandafter\secondoftwo\string in favor of removing superfluous curly braces via \iffalse..\fi.
I did so for ensuring that things won't break half-way through the expansion-chain due to unbalanced \if..\else..\fi at some stage popping up that might be contained in the argument or might come into being due to "hitting" the first token of the argument with \string...

Besides this the user-provided macro-argument in any stage of the expansion-cascade either is wrapped in a pair of matching curly braces or is already removed. This way the user-provided argument containing & or the like won't disturb the expansion-cascade in case the test is executed inside an alignment or tabular-environment or the like.

In order to explain how the test works, let's rewrite this with different line-breaking:

\long\def\CheckWhetherEmpty#1{%
  \romannumeral
  \expandafter\secondoftwo\string{%
  \expandafter\secondoftwo % <- The interesting \secondoftwo
  \expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
  \expandafter{%
  \string#1}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
  \expandafter
  \secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
  \expandafter\firstoftwo\expandafter{\expandafter
  \secondoftwo\string}%
  \expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
  {\expandafter\stopromannumeral\firstoftwo}%
}%

The comments about closing braces of "the interesting \secondoftwo" indicate that there are three interesting scenarios.

Let's look at these three scenarios:


Scenario 1: #1 is not empty and #1's first token is an opening brace—e.g., #1={foo}bar:

\CheckWhetherEmpty{{foo}bar}{empty}{not empty}%

Step 1: Toplevel-expansion of \CheckWhetherEmpty delivers the following tokens to TeX's gullet:

\romannumeral
\expandafter\secondoftwo\string{%
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string{foo}bar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 2: \romannumeral-expansion initiated:

%\romannumeral-expansion in progress:
\expandafter\secondoftwo\string{%
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string{foo}bar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 3: \expandafter "hits" \string and { gets stringified:

%\romannumeral-expansion in progress:
\secondoftwo{12%
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string{foo}bar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 4: \secondoftwo removes {12:

%\romannumeral-expansion in progress:
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string{foo}bar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 5: \expandafter-chain "hits" \string which in case of the argument not being empty strigifies the argument's first token and in case of the argument being empty stringifies the closing brace:

%\romannumeral-expansion in progress:
\secondoftwo % <- The interesting \secondoftwo
{% <- Opening brace of interesting \secondoftwo's first argument.
{%
{12foo}bar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 6: The interesting \secondoftwo acts:

%\romannumeral-expansion in progress:
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 7: \expandafter "hits" \string and } gets stringified:

%\romannumeral-expansion in progress:
\secondoftwo}12% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 8: \secondoftwo removes }12:

%\romannumeral-expansion in progress:
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 9: \expandafter-chain "hits" \string and } gets stringified:

%\romannumeral-expansion in progress:
\firstoftwo{\secondoftwo}12%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 10: \firstoftwo acts:

%\romannumeral-expansion in progress:
\secondoftwo}12%
\expandafter\stopromannumeral\secondoftwo
{empty}{not empty}%

Step 11: \secondoftwo removes }12:

%\romannumeral-expansion in progress:
\expandafter\stopromannumeral\secondoftwo
{empty}{not empty}%

Step 12: \expandafter "hits" \secondoftwo:

%\romannumeral-expansion in progress:
\stopromannumeral not empty%

Step 13: While still in the stage of expanding things in the course of gathering tokens that make up \romannumeral's ⟨number⟩-quantity TeX now encounters the token \stopromannumeral which denotes the non-positive number 0 in a way which stops TeX's gathering of tokens belonging to a ⟨number⟩-quantity. TeX removes the token forming the ⟨number⟩-quantity and - as that quantity's value is not positive - silently terminates the \romannumeral-process without delivering any token in return:

%\romannumeral-expansion terminated:
not empty%

Scenario 2: #1 is not empty and #1's first token is not an opening brace—e.g., #1=foobar:

\CheckWhetherEmpty{foobar}{empty}{not empty}%

Step 1: Toplevel-expansion of \CheckWhetherEmpty delivers the following tokens to TeX's gullet:

\romannumeral
\expandafter\secondoftwo\string{%
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string foobar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 2: \romannumeral-expansion initiated:

%\romannumeral-expansion in progress:
\expandafter\secondoftwo\string{%
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string foobar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 3: \expandafter "hits" \string and { gets stringified:

%\romannumeral-expansion in progress:
\secondoftwo{12%
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string foobar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 4: \secondoftwo removes {12:

%\romannumeral-expansion in progress:
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string foobar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 5: \expandafter-chain "hits" \string which in case of the argument not being empty strigifies the argument's first token and in case of the argument being empty stringifies the closing brace:

%\romannumeral-expansion in progress:
\secondoftwo % <- The interesting \secondoftwo
{% <- Opening brace of interesting \secondoftwo's first argument.
{%
f12oobar}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 6: The interesting \secondoftwo acts:

%\romannumeral-expansion in progress:
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 7: \expandafter-chain "hits" \string and } gets stringified:

%\romannumeral-expansion in progress:
\firstoftwo{\secondoftwo}12%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 8: \firstoftwo acts:

%\romannumeral-expansion in progress:
\secondoftwo}12%
\expandafter\stopromannumeral\secondoftwo
{empty}{not empty}%

Step 9: \secondoftwo removes }12:

%\romannumeral-expansion in progress:
\expandafter\stopromannumeral\secondoftwo
{empty}{not empty}%

Step 10: \expandafter "hits" \secondoftwo:

%\romannumeral-expansion in progress:
\stopromannumeral not empty%

Step 11: While still in the stage of expanding things in the course of gathering tokens that make up \romannumeral's ⟨number⟩-quantity TeX now encounters the token \stopromannumeral which denotes the non-positive number 0 in a way which stops TeX's gathering of tokens belonging to a ⟨number⟩-quantity. TeX removes the token forming the ⟨number⟩-quantity and - as that quantity's value is not positive - silently terminates the \romannumeral-process without delivering any token in return:

%\romannumeral-expansion terminated:
not empty%

Scenario 3: #1 is empty:

\CheckWhetherEmpty{}{empty}{not empty}%

Step 1: Toplevel-expansion of \CheckWhetherEmpty delivers the following tokens to TeX's gullet:

\romannumeral
\expandafter\secondoftwo\string{%
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 2: \romannumeral-expansion initiated:

%\romannumeral-expansion in progress:
\expandafter\secondoftwo\string{%
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 3: \expandafter "hits" \string and { gets stringified:

%\romannumeral-expansion in progress:
\secondoftwo{12%
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 4: \secondoftwo removes {12:

%\romannumeral-expansion in progress:
\expandafter\secondoftwo % <- The interesting \secondoftwo
\expandafter{% <- Opening brace of interesting \secondoftwo's first argument.
\expandafter{%
\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1).
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 5: \expandafter-chain "hits" \string which in case of the argument not being empty strigifies the argument's first token and in case of the argument being empty stringifies the closing brace:

%\romannumeral-expansion in progress:
\secondoftwo % <- The interesting \secondoftwo
{% <- Opening brace of interesting \secondoftwo's first argument.
{%
}12% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is an opening brace (Scenario 1) got stringified.
\expandafter
\secondoftwo\string}% <- Closing brace of interesting \secondoftwo's first argument in case #1's first token is not an opening brace (Scenario 2).
\expandafter\firstoftwo\expandafter{\expandafter
\secondoftwo\string}%
\expandafter\stopromannumeral\secondoftwo}% <- Closing brace of interesting \secondoftwo's first argument in case #1 is empty (Scenario 3).
{\expandafter\stopromannumeral\firstoftwo}%
{empty}{not empty}%

Step 6: The interesting \secondoftwo acts:

%\romannumeral-expansion in progress:
\expandafter\stopromannumeral\firstoftwo
{empty}{not empty}%

Step 7: \expandafter "hits" \firstoftwo:

%\romannumeral-expansion in progress:
\stopromannumeral empty%

Step 8: While still in the stage of expanding things in the course of gathering tokens that make up \romannumeral's ⟨number⟩-quantity TeX now encounters the token \stopromannumeral which denotes the non-positive number 0 in a way which stops TeX's gathering of tokens belonging to a ⟨number⟩-quantity. TeX removes the token forming the ⟨number⟩-quantity and - as that quantity's value is not positive - silently terminates the \romannumeral-process without delivering any token in return:

%\romannumeral-expansion terminated:
empty%

Based on that you can implement an \ifblank-test as follows:

%%-----------------------------------------------------------------------------
%% Check whether argument is blank (empty or only spaces):
%%-----------------------------------------------------------------------------
%% -- Take advantage of the fact that TeX discards space tokens when
%%    "fetching" _un_delimited arguments: --
%% \CheckWhetherBlank{<Argument which is to be checked>}%
%%                   {<Tokens to be delivered in case that
%%                     argument which is to be checked is blank>}%
%%                   {<Tokens to be delivered in case that argument
%%                     which is to be checked is not blank}%
\long\def\CheckWhetherBlank#1{%
  \romannumeral\expandafter\expandafter\expandafter\secondoftwo
  \expandafter\CheckWhetherEmpty\expandafter{\firstoftwo#1{}.}%
}%

The dot will be the second argument of \firstoftwo only if #1 is blank. Thus the dot will be removed only in case #1 is blank.
Thus only in case #1 is blank the argument of \CheckWhetherEmpty is empty.


Based on the gist of the implementation of \CheckWhetherEmpty you can implement checking whether a non-delimited argument's first token is an explicit character token of category code 1 (begin group): Just ensure by appending a dot that the \string which gets carried out right before executing the "interesting \secondoftwo" never "hits" a closing brace (which implies elimination of scenario 3) and implement forking between scenario 1 and scenario 2:

%%-----------------------------------------------------------------------------
%% Check whether argument's first token is a catcode-1-character
%%-----------------------------------------------------------------------------
%% \CheckWhetherBrace{<Argument which is to be checked>}%
%%                   {<Tokens to be delivered in case that argument
%%                     which is to be checked has leading
%%                     catcode-1-token>}%
%%                   {<Tokens to be delivered in case that argument
%%                      which is to be checked has no leading
%%                      catcode-1-token>}%
%%
%% Due to \romannumeral0-expansion the result is delivered after two
%% expansion-steps/after two "hits" by \expandafter.
%%
\long\def\CheckWhetherBrace#1{%
  \romannumeral\expandafter\secondoftwo\expandafter{\expandafter{%
  \string#1.}\expandafter\firstoftwo\expandafter{\expandafter
  \secondoftwo\string}\expandafter\stopromannumeral\firstoftwo}%
  {\expandafter\stopromannumeral\secondoftwo}%
}%
Ulrich Diez
  • 28,770
  • Thanks a lot for the thorough explanation! I remember dissecting your \CheckWhetherNull (that was the name at the time) macro a few months ago when adapting some code for this package, to find out precisely what it does. It's indeed a masterful expansion management (and quite hard to come up with). Though its increased robustness comes at the price of it taking twice as long to run (in my test above) compared to the \detokenize approach, so I won't use it this time. But thanks again for the thorough explanation! – Phelype Oleinik Jan 01 '20 at 23:53
  • @UlrichDiez please note that using \romannumeral with a 0 isn't as robust as using \romannumeral\^^@` (which also is faster). – Skillmon Oct 28 '20 at 14:19
  • @Skillmon Yes, \romannumeral`\^^@ is faster. But why is \romannumeral0 not as robust as \romannumeral`\^^@? Do you have things like \upper-/\lowercase in mind in situations where 0 has some lccode/uccode assigned which may turn it into some non-zero? – Ulrich Diez Oct 28 '20 at 18:00
  • @UlrichDiez when another number follows the 0 it is considered part of the \romannumeral, so \romannumeral0\empty3 will result in iii, whereas \romannumeral\^^@\empty3will result in3. In (almost?) all situations in which\romannumeral` is used to trigger expansion, the second result is the one you want to get. – Skillmon Oct 28 '20 at 19:31
  • @Skillmon When TeX is gathering the -quantity for \romannumeral and has found the 0, TeX looks, hereby expanding expandable tokens, for the presence of subsequent s or of an (where only the catcode does matter, not the character code) which in any case terminates the number. Therefore I always (unless when producing a bug in my code) ensure that the expansion-cascade after \romannumeral0 yields a token-sequence whose very first token is that . That gets discarded and terminates the search for more digits. – Ulrich Diez Oct 28 '20 at 20:03
  • @UlrichDiez in that case, consider omitting the 0 after \romannumeral and using \z@ instead of the space to end the expansion. Would be even faster and as robust as your way (but the \z@ is easier to spot than a space while coding). I'd only use \romannumeral\^^@` to expand unknown/user input. Hence the robustness concern. – Skillmon Oct 28 '20 at 20:08
  • @Skillmon In short: I ensure that in the end you get something like this: \romannumeral0<explicit space token>. Thus with a trailing 3 you would get something like: \romannumeral0<explicit space token>3. The terminates the number, therefore the 3 does not belong to what is processed by \romannumeral. \romannumeral0<explicit space token>3 just yields 3. The is discarded as teerminatig the number. The number 0 is not positive, thus \romannumeral will silently not deliver any token at all. – Ulrich Diez Oct 28 '20 at 20:09
  • @Skillmon I tend to avoid expanding tokens coming from user-input whenever possible. ;-) I suppose you do the same. ;-) You are right: \^^@/\z@ might be easier to spot, but when it comes to branching, terminating each branch's expansion-chain by delivering a leading space/typing a single space is probably less typing work than terminating each branch's expansion-chain by delivering a leading \^^@. I am lazy. ;-) – Ulrich Diez Oct 28 '20 at 20:15
  • @UlrichDiez in that case I'd deliver a leading \z@ (a leading \^^@` wouldn't end the expansion and further expand until a leading space or unexpandable token is found). This was quite elongated for just my simple statement. I should've made clear what exactly I meant (and in which context) from the beginning. Sorry :) – Skillmon Oct 28 '20 at 20:21
  • @Skillmon By the way:With \^^@ a trailing optional space is searched whereby undesired expansion can take place: \def\PleaseDontExpandThis{You shouldn't!!!!}\def\empty{}\expandafter\def\expandafter\test\expandafter{\romannumeral`\^^@\empty\PleaseDontExpandThis}\show\test\bye (With LaTeX's \z@ this does not happen because \z@ is a \dimendef-token and TeX does not search for a trailing optional space when the -quantity consists of a \dimendef-token.) – Ulrich Diez Oct 28 '20 at 20:30
  • @Skillmon Still the only difference concerning "robustness" I see is with \uppercase/\lowercase: This might affect 0, turning it into some non-zero-digit/into something that is not discarded. \z@ would not be affected. ;-) – Ulrich Diez Oct 28 '20 at 20:35
  • @UlrichDiez That's just what I wrote, \^^@is faster than0and doesn't read a digit as part of the number but ends expansion when it finds it and that's the robustness difference. Else it is acting like0.\z@on the other hand ends the expansion at that spot. So to expand own code where you have complete control, I'd always use\z@at the end (for performance reasons), whereas when expanding user-provided code, I'd always use`^^@. I'd never use\romannumeral0` though. – Skillmon Oct 29 '20 at 07:34