Detect catcode of next character?

Question

I use only lualatex, but I suspect this is a general question, having something to do with core TeX.

Is there such a thing as \ifnextcatcode ? Pseudo-code:

\def\wanted{\ifnextcatcode{12}{\dothis}{\dothat}}

The idea is for the \wanted command to detect whether the following character has catcode 12, or not, then branch. Might need to gobble space (or not).

This would be text only, not math, not tables, not lists, not diagrams.

Although \@ifnextchar exists, there are too many possible characters to detect each one. But detecting the catcode would group them nicely.

EDIT: Why I asked. I looked at the luaquotes package, but there were issues. From my small understanding of its code, the issues would be hard to resolve, even by someone who codes better than I do.

My own experience is that the stray " gets into my text, out of force of habit. It will by typeset as right curly quotes, which is often wrong. This is a well-known problem. The error is hard to see.

I do not need " for math or for such things as calling a Unicode value, or for babel shorthands. So I though I might make disable tlig, make " an active character, then decide what to do based on whether the following character is a letter (or a small number of other things, such as backtick or left single quote). The curly left or right double quote would be typeset.

Preliminary experiments suggest that this might work, for English. Currently, I make " active, then step a counter each time it is used. The log file tells me how many times I used it (ideally, none) in the document body. Then I can edit the plain text source file, and manually substitute the quotes.

the number of characters doesn't matter you only need to test if the next token has catcode 12 so a single \ifcat — David Carlisle, Sep 29 '23 at 19:42
@Qrrbrbirlbel related, yes. And as DC noted, \ifcat might be helpful. The answer by cfr directly addresses my question. — rallg, Sep 29 '23 at 19:47
@DavidCarlisle I edited the question, to show why I asked it, and what I hope to do with the result. I think I can take it from here. — rallg, Sep 29 '23 at 19:57
the expl3 version cfr posted is ifcat plus a few extra precautions in case the next token isn't a character, and stepping out of the if, but basically same thing. — David Carlisle, Sep 29 '23 at 19:59

David Carlisle · Answer 1 · 2023-09-29T20:10:11.307

10

As you already have an expl3 answer but tagged this tex-core I'll show this using tex primitives and a plain tex test file:


\def\wanted{\futurelet\tmp\xwanted}
\def\xwanted{\ifcat.\tmp[is12]\else[isnot12]\fi}
aaa \wanted a  or \wanted (x).
\bye

beware if you have an expandable macro starting with a catcode 12 character the above will go wrong (the other branch is safe)

\def\oops{(abc)}
aaa \wanted a  or \wanted (x)  \wanted \oops.

gives

so a safer version is


\def\wanted{\futurelet\tmp\xwanted}
\def\xwanted{\ifcat.\expandafter\xwantedstop\tmp x\xwantedstop[is12]\else[isnot12]\fi}
\def\xwantedstop#1#2\xwantedstop{#1}
\def\oops{(abc)}
aaa \wanted a  or \wanted (x).
aaa \wanted a  or \wanted (x)  \wanted \oops.
\bye

edited Sep 29 '23 at 20:10

answered Sep 29 '23 at 19:48

David Carlisle

757,742

1

I will also look at this. explN syntax is frightening (any N). – rallg Sep 29 '23 at 19:59
1

@rallg see update, expl3 takes care of such issues for you:-) – David Carlisle Sep 29 '23 at 20:10
This implies that . has already catcode 12 (or we have to make sure . has catcode 12 when we define \xwanted)? I guess that's exactly how and why \c_catcode_other_token is defined. – Qrrbrbirlbel Sep 29 '23 at 20:18
@Qrrbrbirlbel yes and yes – David Carlisle Sep 29 '23 at 20:19

score 9 · Accepted Answer · answered Sep 29 '23 at 19:20

9

Since you're using LaTeX, you could use expl3. You've not provided code, so it's hard to tailor this for the best fit, but here's an example.

\documentclass{article}
\begin{document}
\ExplSyntaxOn
\cs_new:Nn \rallg_tester:
{
  \peek_remove_spaces:n
  {
    \peek_catcode:NTF \c_catcode_other_token
    {
      \rallg_dothis:n
    }{
      \rallg_dothat:n
    }
  }
}
\cs_new_eq:NN \rallgwanted \rallg_tester:
\cs_new_protected_nopar:Nn \rallg_dothis:n { #1 ~ is ~ other}
\cs_new_protected_nopar:Nn \rallg_dothat:n { #1 ~ is ~ not ~ other}
\ExplSyntaxOff
\rallgwanted   0
\rallgwanted  1
\rallgwanted  ABC
\end{document}

With more information, one of the other tests provided by l3token might be preferred. You can find details in interface3.pdf. For example, if you don't want to gobble spaces, there's an alternative function which won't.

answered Sep 29 '23 at 19:20

cfr

198,882

1

I will test that, and report back. Thanks. Edit: Yes, it works for me. Now I can use it to build what I really need (too complex to post here). – rallg Sep 29 '23 at 19:23
1

I edited the question, to show why I asked. – rallg Sep 29 '23 at 19:58
2

(+1) Yeah, plain TeX is cool and all but this doesn't deserver a −1. Tsk! – Qrrbrbirlbel Sep 29 '23 at 20:21
@Qrrbrbirlbel Thanks ;). I wish people would say why. The question says the use is for luaLaTeX, so an expl3 answer seemed reasonable enough. I took the tagging to be due to the OP thinking that's where a solution would have to be found, as opposed to a constraint on acceptable answers .... – cfr Sep 29 '23 at 20:26
2

OP is absolutely correct with the tagging. [tag:tex-core] applies (it's about catcodes) and [tag:tex-core] does not mean [tag:plaintex]. – Qrrbrbirlbel Sep 29 '23 at 21:15
@Qrrbrbirlbel True, but the code in the question was non-LaTeX. But I think mainly people conflate TeX with plaintex. They don't think of plain as an alternative format ;). That's just the main reason I could think of for downvoting. – cfr Sep 29 '23 at 23:26
The reason I tagged it tex core was that I imagined a useful answer would be of interest to others, even though my own purpose is more limited. I actually expected a one-line answer that would name a TeX primitive that I did not know. Something like the \ifcat but usable in all situations. How wrong I was! – rallg Oct 01 '23 at 21:49
1

@rallg As Qrrbrbirlbel said above, you were right. These expl3 functions are just giving access to core TeX stuff with a few extra checks and a clearer syntax. (Unless you're immersed in TeX, I guess. I expect it isn't clearer then.) I just wondered if a fan of plain didn't like an expl3 answer for that reason. But it's pure speculation anyway. Maybe they just don't like my cauldron. – cfr Oct 02 '23 at 00:04

Ulrich Diez · Answer 3 · 2023-10-06T19:07:35.883

This answer does not focus on peeking ahead at the next token of the token stream.
The focus of this answer is on finding out about the first token of an undelimited macro argument/on finding out about the first token inside {...} in case the undelimited argument consists of several tokens that are nested between curly braces.

Due to the 30000-character-limit for answers I needed to split my answer in two parts.

This is part 1 of my answer.

Part 1 of my answer holds general explanations of the workings of TeX and of the code/working example provided in part 2.

Part 2 of my answer holds the working example.

In case you wish to upvote, please upvote only one part of my answer. This prevents unfair reputation gain.
In case you wish to downvote, downvote whichever part of my answer you wish to downvote.

General considerations:

Let's note that there are 16 category codes:

0 = Escape character, normally \ .
1 = Begin grouping, normally { .
2 = End grouping, normally } .
3 = Math shift, normally $ .
4 = Alignment tab, normally & .
5 = End of line, normally <carriage return> .
6 = Parameter, normally # .
7 = Superscript, normally ^ .
8 = Subscript, normally _ .
9 = Ignored character, normally <null> .
10 = Space, normally <space> and <horizontal tab> .
11 = Letter, normally only contains the letters a,...,z and A,...,Z. These characters can be used in command names.
12 = Other, normally everything else not listed in the other categories.
13 = Active character, for example ~ .
14 = Comment character, normally % .
15 = Invalid character, normally <delete> .

Thereof only 1, 2, 3, 4, 6, 7, 8, 10, 11, 12 and 13 can make it into categories of character tokens.

Let's note that both implicit and explicit character tokens have categories.

E.g., after

\let\egroup=}

the control word token \egroup is an implicit character token of category 2(end group) and
\ifcat }\egroup same\else different\fi
yields same.

\ifcat does not distinguish explicit and implicit character tokens.

Therefore the routines in part 2 of this answer do not distinguish between explicit and implicit character tokens.

Category 13 requires special attention:

\ifcat usually triggers expansion of subsequent expandable tokens until two non-expandable tokens are obtained where comparison of categories can be done.

In case an active character via \let is made equal to a non-expandable token, \ifcat treats it as if it were that non-expandable token.

In case an active character is undefined or via \let or \def is turned into an expandable token, \ifcat\noexpand⟨(expandable or undefined) active character⟩... leads to TeX in the \ifcat-comparison assuming category 13 for it.

Therefore with \ifcat you can crank out undefined/expandable active characters because when preventing their expansion via \noexpand TeX in the \ifcat-comparison does not assume them to be of the same category as frozen-\relax: When preventing their expansion via \noexpand they are assumed to be of category 13 while frozen-\relax is assumed to be of category 16. Category 16 is TeX's internal way of saying that it is not a token with a character-category.

But with \ifcat

you cannot distinguish expandable active character tokens from undefined active character tokens.
you cannot distinguish (unexpandable) active characters which denote unexpandable non-characters from whatsoever unexpandable or \noexpand-expansion-prevented control-word/symbol-tokens that are not implicit characters.
you cannot distinguish (unexpandable) active characters which are implicit characters and thus denote non-active character tokens from non-active character-tokens of the same category, be they explicit or implicit.

For doing such distinctions expandably you would need a mechanism where each possible character token of category 13 occurs as delimiter of a delimited argument. While with traditional TeX-engines, where the internal character representation is 8bit-ASCII, which has only 256 code points, this might be feasible, it might be a problem with XeTeX- and LuaTeX-based engines as here the internal character representation is unicode which has 1114111 code points. (Of course, with LuaTeX-based engines you can use the Lua backend for examining tokens.)

Your scenario:

Seems you wish a macro \ifnextcatcode. As it is a macro and not one of TeX's \if..-\else-\fi-tests, I suggest giving it a different name, a name not beginning with \if.., in order not to mix up things where \if..-\else-\fi-matching applies with things where \if..-\else-\fi-matching does not apply.

I can offer a routine \CategoryOfArgumentsFirstToken which processes an undelimited macro argument and after triggering two expansion-steps returns a sequence of digit-tokens denoting the category of the very first token of the argument. You can use this with \ifnum. (Or for forking via delimited arguments.)

Above I used \egroup as an example as with undelimited arguments explicit braces must be balanced, so a token of category 2 can only be the very first token of a macro argument in case it is implicit.

Some cases require special attention:

The argument being empty.
The very first token of the argument being an explicit character of category 1(begin group).
The very first token of the argument being an explicit character of category 10(space).
Brace-hacks for getting \ifcat-tests for categories 1 and 2 into a macro where braces must be balanced.
The very first token of the argument being an undefined active character or an expandable active character..
In the routine in part 2 of this answer frozen-\relax is used for testing against an unexpandable control sequence because frozen-\relax cannot be redefined to be an implicit character token or an \outer-token.

The very first token of the argument being a control sequence (i.e., an active character token or a control word token or a control symbol token) which is undefined is not a problem as with the \ifcat-tests \noexpand is used for preventing expansion.

In part 2 of this answer you find a routine \CategoryOfArgumentsFirstToken{⟨argument⟩} which produces the following digit-sequences (with no trailing spaces):

1 = The very first token of the argument is of category 1 (begin grouping).
2 = The very first token of the argument is of category 2 (end grouping).
3 = The very first token of the argument is of category 3 (math shift).
4 = The very first token of the argument is of category 4 (alignment tab).
6 = The very first token of the argument is of category 6 (parameter).
7 = The very first token of the argument is of category 7 (superscript).
8 = The very first token of the argument is of category 8 (subscript).
10 = The very first token of the argument is of category 10 (space).
11 = The very first token of the argument is of category 11 (letter).
12 = The very first token of the argument is of category 12 (other).
13 = The very first token of the argument is an undefined active character token or an active character token denoting an expandable control sequence.
16 = The very first token of the argument either is a control-word/symbol-token not denoting an implicit character or is an active-character-token denoting an unexpandable control-word/symbol-token.
17 = With the argument there is no very first token as the argument is empty.

You can use it for \ifnum-tests like

\ifnum\CategoryOfArgumentsFirstToken{!bla}=12 %
  Argument has a first token whose category is 12(other).
\else
  Argument does not have a first token whose category is 12(other).
\fi

The ⟨argument⟩ of \CategoryOfArgumentsFirstToken can consist of a control sequence which came into being in the course of peeking ahead at the token-stream's next token via \futurelet or \@ifnextchar/\kernel@ifnextchar.

Here \futurelet is used for peeking ahead at the next token:

\newcommand\wanted[1]{%
  \edef\scratchnum{\unexpanded{#1}}%
  \futurelet\@let@token\wantedinner
}%
\newcommand*\wantedinner{%
  \ifnum\CategoryOfArgumentsFirstToken{\@let@token}=\numexpr(\scratchnum)\relax
    Argument has a first token whose category is \the\numexpr(\scratchnum)\relax.
  \else
    Argument does not have a first token whose category is \the\numexpr(\scratchnum)\relax.
  \fi
}%
\wanted{12}$
\wanted{12}A
\wanted{11}A

Here \@ifnextchar/\kernel@ifnextchar, which copies the meaning of the next non-space-token to \@let@token, is used for peeking ahead at the next non-space-token:

\newcommand\wanted[1]{%
  \edef\scratchnum{\unexpanded{#1}}%
  \@ifnextchar{\relax}{\wantedinner}{\wantedinner}%
}%
\newcommand*\wantedinner{%
  \ifnum\CategoryOfArgumentsFirstToken{\@let@token}=\numexpr(\scratchnum)\relax
    Argument has a first token whose category is \the\numexpr(\scratchnum)\relax.
  \else
    Argument does not have a first token whose category is \the\numexpr(\scratchnum)\relax.
  \fi
}%
\wanted{12}   $
\wanted{12}   A
\wanted{11}   A

But the \futurelet- or \@ifnextchar/\kernel@ifnextchar-way you can, additionally to the restrictions in distinguishing already mentioned, not distinguish active characters in the token-stream from other kinds of control sequences in the token-stream because the token, which \CategoryOfArgumentsFirstToken looks at, in any case is not an active character token but is the control word token \@let@token.

Besides this you find a routine

\CategoryOfArgumentsFirstTokenFork{&langle;argument&rangle;}%
  {&langle;tokens in case argument has a first token of category 1&rangle;}%
  {&langle;tokens in case argument has a first token of category 2&rangle;}%
  {&langle;tokens in case argument has a first token of category 3&rangle;}%
  {&langle;tokens in case argument has a first token of category 4&rangle;}%
  {&langle;tokens in case argument has a first token of category 6&rangle;}%
  {&langle;tokens in case argument has a first token of category 7&rangle;}%
  {&langle;tokens in case argument has a first token of category 8&rangle;}%
  {&langle;tokens in case argument has a first token of category 10&rangle;}%
  {&langle;tokens in case argument has a first token of category 11&rangle;}%
  {&langle;tokens in case argument has a first token of category 12&rangle;}%
  {&langle;tokens in case argument has a first token which is an undefined active character or an expandable active character&rangle;}%
  {&langle;tokens in case argument has a first token which either is not a character or is an active character that is equal to an unexpandable non-character&rangle;}%
  {&langle;tokens in case argument is empty&rangle;}%

Usage of this routine, also, can be applied with an ⟨argument⟩ which consists of a control sequence which came into being due to peeking ahead at the token-stream's next token via \futurelet or \@ifnextchar.

Internally that routine uses a routine

\KeepKthOfLArguments{&langle;TeX-&langle;number&rangle;-quantity with integer-value K&rangle;}%
                    {&langle;TeX-&langle;number&rangle;-quantity with integer-value L&rangle;}%
                    &langle;list of L undelimited arguments&rangle;%

If you like, you can use that routine and within the ⟨TeX-⟨number⟩-quantity with integer-value K⟩-argument use \CategoryOfArgumentsFirstTokenFork for combining some cases, e.g.:

\KeepKthOfLArguments{%
  \CategoryOfArgumentsFirstTokenFork{&langle;argument&rangle;}%
    {1}% category 1
    {1}% category 2
    {2}% category 3
    {3}% category 4
    {3}% category 6
    {2}% category 7
    {2}% category 8
    {4}% category 10
    {5}% category 11
    {5}% category 12
    {6}% expandable active character or undefined active character
    {6}% non-character or active character equal to unexpandable non-character
    {6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%

So you need sub-routines for

testing whether an argument is empty; \UD@CheckWhetherNull in the example in part 2 of this answer.
testing whether an argument's first token is an explicit character token of category 1; \UD@CheckWhetherBrace in the example in part 2 of this answer.
testing whether an argument's first token is an explicit space token; \UD@CheckWhetherLeadingExplicitSpace in the example in part 2 of this answer.
extracting the first token of an argument in case the argument is not empty and the 1st token is not a space or a brace; \UD@ExtractFirstArg in the example in part 2 of this answer.
selecting the K-th of L arguments; \KeepKthOfLArguments in the example in part 2 of this answer.
doing \ifcat-tests and selecting the 1st or the 2nd of two subsequent arguments; \UD@CheckWhetherCategoriesEqual in the example in part 2 of this answer.

The example in part 2 of this answer is LaTeX 2ε, but you can easily turn it into Knuthian-TeX by doing \def instead of \newcommand and by just omitting the \@ifdefinable-tests.

Wow. I thought my original question was simple... Shows how little I know about TeX and its variants. Will read through and attempt to digest this. — rallg, Oct 01 '23 at 18:32

egreg · Answer 4 · 2023-09-30T13:11:11.933

The characters are too many, but the testable category codes are just a few.

\documentclass{article}
\usepackage[T1]{fontenc}
\ExplSyntaxOn
\NewDocumentCommand{\nextcatcodedo}{m}
 {
  \ralig_nextcatcodedo:n { #1 }
 }
\NewDocumentCommand{\nextcatcodedoignorespaces}{m}
 {
  \peek_remove_spaces:n { \ralig_nextcatcodedo:n { #1 } }
 }
\cs_new_protected:Nn \ralig_nextcatcodedo:n
 {% #1 = {N}{1}{2}{3}{4}{7}{8}{11}{12}{10}
  \peek_catcode:NTF \c_group_begin_token
   { \tl_item:nn { #1 } { 2 } }
   {
    \peek_catcode:NTF \c_group_end_token
     { \tl_item:nn { #1 } { 3 } }
     {
      \peek_catcode:NTF \c_math_toggle_token
       { \tl_item:nn { #1 } { 4 } }
       {
        \peek_catcode:NTF \c_alignment_token
         { \tl_item:nn { #1 } { 5 } }
         {
          \peek_catcode:NTF \c_math_superscript_token
           { \tl_item:nn { #1 } { 6 } }
           {
            \peek_catcode:NTF \c_math_subscript_token
             { \tl_item:nn { #1 } { 7 } }
             {
              \peek_catcode:NTF \c_catcode_letter_token
               { \tl_item:nn { #1 } { 8 } }
               {
                \peek_catcode:NTF \c_catcode_other_token
                 { \tl_item:nn { #1 } { 9 } }
                 {
                  \peek_catcode:NTF \c_space_token
                   { \tl_item:nn { #1 } { 10 } }
                   { \tl_item:nn { #1 } { 1 } }
                 }
               }
             }
           }
         }
       }
     }
   }
 }
\cs_new:Npx \c_ralig_catcode_active_token { \exp_not:V \c_catcode_active_tl }
\ExplSyntaxOff
\NewDocumentCommand{\testnext}{}{%
 \nextcatcodedo{%
  {Not a character token \string}% N
  {Begin group token }% 1
  { End group token}% 2
  {Math toggle }% 3
  {Alignment token \string}% 4
  {\mbox{Superscript token}}% 7
  {\mbox{Subscript token}}% 8
  {Letter }% 11
  {Other character }% 12
  {Space token}% 10
 }%
}
\begin{document}
\testnext{abc}
{\itshape abc\testnext} def
\testnext$1+1$
\testnext&
$a\testnext^2$
$a\testnext_2$
\testnext a
\testnext @
\expandafter\testnext\space
\testnext\mbox
\end{document}

The argument to \nextcatcodedo should be of the form

{N}{1}{2}{3}{4}{7}{8}{11}{12}{10}

where N stands for the code to execute when the token is not a character token (including active characters) and the other parts are the code to execute if the category code is as indicated. The last one is for the space and can be omitted if \nextcatcodedoignorespaces is used.

As you see, almost every of those codes can end with a one-argument macro that will take the character as argument (not for { and }, of course).

The N code might contain further code in order to test for an active character.

You can leave any of those parts empty, but there should be the corresponding {}.

Ulrich Diez · Answer 5 · 2023-10-06T14:36:39.943

This answer does not focus on peeking ahead at the next token of the token stream.
The focus of this answer is on finding out about the first token of an undelimited macro argument/on finding out about the first token inside {...} in case the undelimited argument consists of several tokens that are nested between curly braces.

Due to the 30000-character-limit for answers I needed to split my answer in two parts.

This is part 2 of my answer.

Part 1 of my answer holds general explanations of the workings of TeX and of the code/working example provided in part 2.

Part 2 of my answer holds the working example.

In case you wish to upvote, please upvote only one part of my answer. This prevents unfair reputation gain.
In case you wish to downvote, downvote whichever part of my answer you wish to downvote.

\errorcontextlines=10000
\makeatletter
%%=============================================================================
%% PARAPHERNALIA:
%% \UD@firstoftwo, \UD@secondoftwo, \UD@PassFirstToSecond, \UD@Exchange,
%% \UD@removespace, \UD@stopromannumeral, \UD@CheckWhetherNull,
%% \UD@CheckWhetherBrace, \UD@CheckWhetherLeadingExplicitSpace,
%% \UD@ExtractFirstArg
%%=============================================================================
\newcommand\UD@firstoftwo[2]{#1}%
\newcommand\UD@secondoftwo[2]{#2}%
\newcommand\UD@PassFirstToSecond[2]{#2{#1}}%
\newcommand\UD@Exchange[2]{#2#1}%
\@ifdefinable\UD@removespace{\UD@Exchange{ }{\def\UD@removespace}{}}%
\@ifdefinable\UD@stopromannumeral{\chardef\UD@stopromannumeral=`\^^00}%
%%-----------------------------------------------------------------------------
%% Check whether argument is empty:
%%.............................................................................
%% \UD@CheckWhetherNull{<Argument which is to be checked>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is empty>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is not empty>}%
%%
%% The gist of this macro comes from Robert R. Schneck's \ifempty-macro:
%% <https://groups.google.com/forum/#!original/comp.text.tex/kuOEIQIrElc/lUg37FmhA74J>
\newcommand\UD@CheckWhetherNull[1]{%
  \romannumeral\expandafter\UD@secondoftwo\string{\expandafter
  \UD@secondoftwo\expandafter{\expandafter{\string#1}\expandafter
  \UD@secondoftwo\string}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\expandafter\UD@stopromannumeral\UD@secondoftwo}{%
  \expandafter\UD@stopromannumeral\UD@firstoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether argument's first token is a catcode-1-character
%%.............................................................................
%% \UD@CheckWhetherBrace{<Argument which is to be checked>}%
%%                      {<Tokens to be delivered in case that argument
%%                        which is to be checked has a leading
%%                        explicit catcode-1-character-token>}%
%%                      {<Tokens to be delivered in case that argument
%%                        which is to be checked does not have a
%%                        leading explicit catcode-1-character-token>}%
\newcommand\UD@CheckWhetherBrace[1]{%
  \romannumeral\expandafter\UD@secondoftwo\expandafter{\expandafter{%
  \string#1.}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\expandafter\UD@stopromannumeral\UD@firstoftwo}{%
  \expandafter\UD@stopromannumeral\UD@secondoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether brace-balanced argument starts with a space-token
%%.............................................................................
%% \UD@CheckWhetherLeadingExplicitSpace{<Argument which is to be checked>}%
%%                                     {<Tokens to be delivered in case <argument
%%                                       which is to be checked> does have a
%%                                       leading explicit space-token>}%
%%                                     {<Tokens to be delivered in case <argument
%%                                       which is to be checked> does not have a
%%                                       a leading explicit space-token>}%
\newcommand\UD@CheckWhetherLeadingExplicitSpace[1]{%
  \romannumeral\UD@CheckWhetherNull{#1}%
  {\expandafter\UD@stopromannumeral\UD@secondoftwo}%
  {%
    % Let's nest things into \UD@firstoftwo{...}{} to make sure they are nested in braces
    % and thus do not disturb when the test is carried out within \halign/\valign:
    \expandafter\UD@firstoftwo\expandafter{%
      \expandafter\expandafter\expandafter\UD@stopromannumeral
      \romannumeral\expandafter\UD@secondoftwo
      \string{\UD@CheckWhetherLeadingExplicitSpaceB.#1 }{}%
    }{}%
  }%
}%
\@ifdefinable\UD@CheckWhetherLeadingExplicitSpaceB{%
  \long\def\UD@CheckWhetherLeadingExplicitSpaceB#1 {%
    \expandafter\UD@CheckWhetherNull\expandafter{\UD@firstoftwo{}#1}%
    {\UD@Exchange{\UD@firstoftwo}}{\UD@Exchange{\UD@secondoftwo}}%
    {\expandafter\expandafter\expandafter\UD@stopromannumeral
     \expandafter\expandafter\expandafter}%
     \expandafter\UD@secondoftwo\expandafter{\string}%
  }%
}%
%%-----------------------------------------------------------------------------
%% Extract first inner undelimited argument:
%%
%%   \UD@ExtractFirstArg{ABCDE} yields  {A}
%%
%%   \UD@ExtractFirstArg{{AB}CDE} yields  {AB}
%%
%% Due to \romannumeral-expansion the result is delivered after two 
%% expansion-steps/after "hitting" \ExtractFirstArg with \expandafter
%% twice.
%%
%% \UD@ExtractFirstArg's argument must not be blank.
%%
%% Use frozen-\relax as delimiter for speeding things up.
%% I chose frozen-\relax because David Carlisle pointed out in
%% <https://tex.stackexchange.com/a/578877>
%% that frozen-\relax cannot be (re)defined in terms of \outer and cannot be
%% affected by \uppercase/\lowercase.
%%
%% \ExtractFirstArg's argument may contain frozen-\relax:
%% The only effect is that internally more iterations are needed for
%% obtaining the result.
%%
%%.............................................................................
\@ifdefinable\UD@RemoveTillFrozenrelax{%
  \expandafter\expandafter\expandafter\UD@Exchange
  \expandafter\expandafter\expandafter{%
  \expandafter\expandafter\ifnum0=0\fi}%
  {\long\def\UD@RemoveTillFrozenrelax#1#2}{{#1}}%
}%
\expandafter\UD@PassFirstToSecond\expandafter{%
  \romannumeral\expandafter
  \UD@PassFirstToSecond\expandafter{\romannumeral
    \expandafter\expandafter\expandafter\UD@Exchange
    \expandafter\expandafter\expandafter{%
    \expandafter\expandafter\ifnum0=0\fi}{\UD@stopromannumeral#1{}}%
  }{%
    \UD@stopromannumeral\romannumeral\UD@ExtractFirstArgLoop
  }%
}{%
  \newcommand\UD@ExtractFirstArg[1]%
}%
\newcommand\UD@ExtractFirstArgLoop[1]{%
  \expandafter\UD@CheckWhetherNull\expandafter{\UD@firstoftwo{}#1}%
  {\UD@stopromannumeral#1}%
  {\expandafter\UD@ExtractFirstArgLoop\expandafter{\UD@RemoveTillFrozenrelax#1}}%
}%
%%=============================================================================
%% Check whether categories of two tokens are equal.
%% Expansion of #1 is to yield a single unexpandable token.
%% #2 is to be a single token whose expansion will be prevented.
\newcommand\UD@CheckWhetherCategoriesEqual[2]{%
  \expandafter\UD@firstoftwo\expandafter{%
    \romannumeral
    \ifcat#1\noexpand#2%
    \expandafter\UD@stopromannumeral\expandafter\UD@firstoftwo\else
    \expandafter\UD@stopromannumeral\expandafter\UD@secondoftwo\fi
  }{}%
}%
%%=============================================================================
%% Keep only the K-th of L consecutive undelimited arguments.
%%   ( IF K < 1 OR K > L just remove L consecutive undelimited arguments.
%%     IF L < 1 do nothing. )
%%
%% \KeepKthOfLArguments{<TeX-<number>-quantity K>}%
%%                     {<TeX-<number>-quantity L>}%
%%                     <sequence of L consecutive undelimited arguments>
%%-----------------------------------------------------------------------------
%% If L < 1 yields nothing.
%% Else:
%%   If K >= 1 and K <= L  yields:
%%     <K-th undelimited argument from <sequence of L consecutive undelimited 
%%      arguments>>
%%   If K < 1 or K > L
%%     (-> there is no K-th argument in the
%%         <sequence of L consecutive undelimited arguments> )
%%   yields nothing  but removal of <sequence of L consecutive 
%%          undelimited arguments>
%%.............................................................................
\newcommand\KeepKthOfLArguments[2]{%
  \romannumeral
  % #1: <integer number K>
  % #2: <integer number L>
  \expandafter\UD@KeepKthOfLArgumentsKSmallerOneFork
  \expandafter{\romannumeral\number\number#1 000\expandafter}%
  \expandafter{\romannumeral\number\number#2 000}%
}%
%%-----------------------------------------------------------------------------
\newcommand\UD@KeepKthOfLArgumentsKSmallerOneFork[2]{%
  % #1: <K letters m>
  % #2: <L letters m >
  \UD@CheckWhetherNull{#1}{% K is smaller than one:
    \UD@KeepKthOfLArgumentsRemoveNArguments{#2}{\UD@stopromannumeral}{}%
  }{% K is not smaller than one:
    \expandafter\UD@PassFirstToSecond
    \expandafter{%
      \UD@firstoftwo{}#1%
    }{%
      \UD@KeepKthOfLArgumentsEvaluateLMinusKDifferenceLoop{#1}{#2}%
    }{#2}%
  }%
}%
%%-----------------------------------------------------------------------------
\newcommand\UD@KeepKthOfLArgumentsEvaluateLMinusKDifferenceLoop[4]{%
  % #1: <K letters m>  
  % #2: <L letters m>
  % (For detecting whether K>L or K<=L, during the loop letters m will
  %  be removed both from #1 and #2 until at least one of these arguments 
  %  is empty.
  %  When the loop terminates with 0<K<=L, #1 will be empty and #2
  %  will hold an amount of letters m corresponding to the the 
  %  difference L-K.
  %  When the loop terminates with K>L, #1 will not be empty and #2
  %  will be empty.
  % )
  % #3: <K-1 letters m>
  % #4: <L letters m>
  % (#3 and #4 will be left untouched during the loop so they can be 
  %  used for performing appropriate action when loop terminates as
  %  it is known whether K>L.)
  \UD@CheckWhetherNull{#1}{% We have K<=L:
     \UD@KeepKthOfLArgumentsRemoveNArguments{%
       #3%
      }{%
       \UD@KeepKthOfLArgumentsRemoveNArguments{#2}{\UD@stopromannumeral}%
      }{}%
  }{%
    \UD@CheckWhetherNull{#2}{% We have K>L:
      \UD@KeepKthOfLArgumentsRemoveNArguments{#4}{\UD@stopromannumeral}{}%
    }{% We don't know yet whether K<=L or K>L, thus remove letters m and 
      % do another iteration:
      \expandafter\UD@PassFirstToSecond
      \expandafter{%
        \UD@firstoftwo{}#2%
      }{%
        \expandafter\UD@KeepKthOfLArgumentsEvaluateLMinusKDifferenceLoop
        \expandafter{%
          \UD@firstoftwo{}#1%
        }%
      }{#3}{#4}%
    }%
  }%
}%
%%-----------------------------------------------------------------------------
%% \UD@KeepKthOfLArgumentsRemoveNArguments{<N letters m>}%
%%                                        {<argument 1>}%
%%                                        {<argument 2>}%
%%                                        <sequence of consecutive 
%%                                         undelimited arguments>
%%.............................................................................
%% Removes the first N undelimited arguments from the <sequence of 
%% consecutive undelimited arguments>, then inserts  
%% <argument 1><argument 2>
%%
%% On the one hand when providing <argument 2> empty, you can use 
%% <argument 1> for nesting calls to \UD@KeepKthOfLArgumentsRemoveNArguments.
%% On the other hand you can provide a <space token> for stopping
%% \romannumeral-expansion as  <argument 1> and have the
%% macro grab the <K-th undelimited argument> from the <sequence of L 
%% consecutive undelimited arguments> as <argument 2>.
%%
\newcommand\UD@KeepKthOfLArgumentsRemoveNArguments[3]{%
  %% #1: <N letters m>  
  %% #2: <Argument 1>   
  %% #3: <Argument 2>
  \UD@CheckWhetherNull{#1}{#2#3}{%
    \UD@firstoftwo{%
      \expandafter\UD@KeepKthOfLArgumentsRemoveNArguments
      \expandafter{%
        \UD@firstoftwo{}#1%
      }{#2}{#3}%
    }%
  }%
}%
%%-----------------------------------------------------------------------------
%% End of code for \KeepKthOfLArguments.
%%=============================================================================
%%
%% Fork according to the category of the first token of a macro argument:
%%
\newcommand\CategoryOfArgumentsFirstTokenFork[1]{%
  \romannumeral\expandafter\UD@secondoftwo
  \KeepKthOfLArguments{%
    \UD@CheckWhetherNull{#1}{13}{% number 17 / argument 13
      \UD@CheckWhetherBrace{#1}{1}{% number 1 / argument 1
        \UD@CheckWhetherLeadingExplicitSpace{#1}{8}{%  number 10 / argument 8
           \expandafter\expandafter\expandafter
           \UD@CategoryOfArgumentsFirstTokenFork\UD@ExtractFirstArg{#1}%
        }%
      }%
    }%
  }{13}%
}%
\begingroup
\catcode`\{=1
\catcode`\}=2
\catcode`\$=3
\catcode`\&=4
\catcode`\#=6
\catcode`\^=7
\catcode`\_=8
\catcode`\ =10
\catcode`\A=11
\catcode`\.=12
\newcommand\UD@CategoryOfArgumentsFirstTokenFork[1]{%
  \endgroup  
  \newcommand\UD@CategoryOfArgumentsFirstTokenFork[1]{%
    \UD@CheckWhetherCategoriesEqual{%
      \expandafter\expandafter\expandafter{%
      \expandafter\UD@firstoftwo\expandafter{\expandafter}\string}%
    }{##1}{%
      1% mumber 1 / argument 1
    }{%
      \UD@CheckWhetherCategoriesEqual{%
        \expandafter\UD@firstoftwo\expandafter{\expandafter}\string{}%
      }{##1}{%
        2% mumber 2 / argument 2
      }{%
        \UD@CheckWhetherCategoriesEqual{$}{##1}{3}{% mumber 3 / argument 3
        \UD@CheckWhetherCategoriesEqual{&}{##1}{4}{% mumber 4 / argument 4
        \UD@CheckWhetherCategoriesEqual{####}{##1}{5}{% mumber 6 / argument 5
        \UD@CheckWhetherCategoriesEqual{^}{##1}{6}{% mumber 7 / argument 6
        \UD@CheckWhetherCategoriesEqual{_}{##1}{7}{% mumber 8 / argument 7
        \UD@CheckWhetherCategoriesEqual{ }{##1}{8}{% % number 10 / argument 8
        \UD@CheckWhetherCategoriesEqual{A}{##1}{9}{% mumber 11 / argument 9
        \UD@CheckWhetherCategoriesEqual{.}{##1}{10}{% mumber 12 / argument 10
        \UD@CheckWhetherCategoriesEqual{#1}{##1}{12}{% mumber 16 / argument 12
          11% mumber 13 / argument 11
        }}}}}}}}}%
      }%
    }%
  }%
}%
\expandafter\expandafter\expandafter\UD@CategoryOfArgumentsFirstTokenFork
\expandafter\expandafter\expandafter{%
\expandafter\expandafter\ifnum0=0\fi}%
%%-----------------------------------------------------------------------------
%%
%% Get the category of the first token of a macro argument:
%%
\newcommand\CategoryOfArgumentsFirstToken[1]{%
  \romannumeral\expandafter\UD@secondoftwo
  \CategoryOfArgumentsFirstTokenFork{#1}{1}{2}{3}{4}{6}{7}{8}{10}{11}{12}{13}{16}{17}%
}%
\makeatother
\documentclass{article}
\parindent=0pt
\begin{document}
\hrule height 0cm \kern-2cm
\verb*|\CategoryOfArgumentsFirstToken{}|: \CategoryOfArgumentsFirstToken{}
\verb*|\CategoryOfArgumentsFirstToken{\relax dsda \fi}|: \CategoryOfArgumentsFirstToken{\relax dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\undefined dsda \fi}|: \CategoryOfArgumentsFirstToken{\undefined dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\fi dsda \fi}|: \CategoryOfArgumentsFirstToken{\fi dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\endcsname dsda \fi}|: \CategoryOfArgumentsFirstToken{\endcsname dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\LaTeX dsda \fi}|: \CategoryOfArgumentsFirstToken{\LaTeX dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\section dsda \fi}|: \CategoryOfArgumentsFirstToken{\section dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{{\relax} dsda \fi}|: \CategoryOfArgumentsFirstToken{{\relax} dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\egroup dsda \fi}|: \CategoryOfArgumentsFirstToken{\egroup dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{$ dsda \fi}|: \CategoryOfArgumentsFirstToken{$ dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{& dsda \fi}|: \CategoryOfArgumentsFirstToken{& dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{# dsda \fi}|: \CategoryOfArgumentsFirstToken{# dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{^ dsda \fi}|: \CategoryOfArgumentsFirstToken{^ dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{_ dsda \fi}|: \CategoryOfArgumentsFirstToken{_ dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{  dsda \fi}|: \CategoryOfArgumentsFirstToken{  dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{A  dsda \fi}|: \CategoryOfArgumentsFirstToken{A  dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{.  dsda \fi}|: \CategoryOfArgumentsFirstToken{.  dsda \fi}
\bigskip
Test with implicit characters:
\begin{verbatim}
\def\foo#1{#1}
\let\implicitDollar=$
\let\implicitAnd=&
\let\implicitHash=#
\let\implicitHat=^
\let\implicitUnderscore=_
\foo{\let\implicitSpace= } %
\let\implicitA=A
\let\implicitDot=.
\end{verbatim}
\def\foo#1{#1}
\let\implicitDollar=$
\let\implicitAnd=&
\let\implicitHash=#
\let\implicitHat=^
\let\implicitUnderscore=_
\foo{\let\implicitSpace= } %
\let\implicitA=A
\let\implicitDot=.
\verb*|\CategoryOfArgumentsFirstToken{\bgroup dsda \fi}|: \CategoryOfArgumentsFirstToken{\bgroup dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\egroup dsda \fi}|: \CategoryOfArgumentsFirstToken{\egroup dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\implicitDollar dsda \fi}|: \CategoryOfArgumentsFirstToken{\implicitDollar dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\implicitAnd dsda \fi}|: \CategoryOfArgumentsFirstToken{\implicitAnd dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\implicitHash dsda \fi}|: \CategoryOfArgumentsFirstToken{\implicitHash dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\implicitHat dsda \fi}|: \CategoryOfArgumentsFirstToken{\implicitHat dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\implicitUnderscore dsda \fi}|: \CategoryOfArgumentsFirstToken{\implicitUnderscore dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\implicitSpace dsda \fi}|: \CategoryOfArgumentsFirstToken{\implicitSpace dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\implicitA  dsda \fi}|: \CategoryOfArgumentsFirstToken{\implicitA  dsda \fi}
\verb*|\CategoryOfArgumentsFirstToken{\implicitDot dsda \fi}|: \CategoryOfArgumentsFirstToken{\implicitDot dsda \fi}
\bigskip Test with active characters:
\begin{verbatim}
\catcode\W=13 \let W=\section \catcode\X=13 \let X={
\catcode\Y=13 \let Y=\hbox \catcode\Z=13 \let Z=\UndeFinEd
\end{verbatim}
\verb*|\CategoryOfArgumentsFirstToken{W dsda \fi}|:
\begingroup
\catcode`\W=13 \let W=\section
\CategoryOfArgumentsFirstToken{W dsda \fi}
\endgroup
\verb*|\CategoryOfArgumentsFirstToken{X dsda \fi}|:
\begingroup
\catcode`\X=13 \let X={
\CategoryOfArgumentsFirstToken{X dsda \fi}
\endgroup
\verb*|\CategoryOfArgumentsFirstToken{Y dsda \fi}|:
\begingroup
\catcode`\Y=13 \let Y=\hbox
\CategoryOfArgumentsFirstToken{Y dsda \fi}
\endgroup
\verb*|\CategoryOfArgumentsFirstToken{Z dsda \fi}|:
\begingroup
\catcode`\Z=13 \let Z=\UndeFinEd
\CategoryOfArgumentsFirstToken{Z dsda \fi}
\endgroup
\newpage
\begingroup\footnotesize
\begin{verbatim}
\ifnum\CategoryOfArgumentsFirstToken{. dsda \endcsname}=12 %
  The argument \verb|. dsda \endcsname|' has a first token which is of category 12. \else The argument\verb|. dsda \endcsname|' does not have a first token which is of category 12.
\fi
\end{verbatim}
\endgroup
\ifnum\CategoryOfArgumentsFirstToken{. dsda \endcsname}=12 %
  The argument \verb|. dsda \endcsname|' has a first token which is of category 12. \else The argument\verb|. dsda \endcsname|' does not have a first token which is of category 12.
\fi
\bigskip\hrule\bigskip
\begingroup\footnotesize
\begin{verbatim}
\ifnum\CategoryOfArgumentsFirstToken{Z dsda \endcsname}=12 %
  The argument \verb|Z dsda \endcsname|' has a first token which is of category 12. \else The argument\verb|Z dsda \endcsname|' does not have a first token which is of category 12.
\fi
\end{verbatim}
\endgroup
\ifnum\CategoryOfArgumentsFirstToken{Z dsda \endcsname}=12 %
  The argument \verb|Z dsda \endcsname|' has a first token which is of category 12. \else The argument\verb|Z dsda \endcsname|' does not have a first token which is of category 12.
\fi
\bigskip\hrule\bigskip
\begingroup\footnotesize
\begin{verbatim}
\ifnum\CategoryOfArgumentsFirstToken{}=12 %
  The argument \verb||' has a first token which is of category 12. \else The argument\verb||' does not have a first token which is of category 12.
\fi
\end{verbatim}
\endgroup
\ifnum\CategoryOfArgumentsFirstToken{}=12 %
  The argument \verb||' has a first token which is of category 12. \else The argument\verb||' does not have a first token which is of category 12.
\fi
\newpage
\begingroup\footnotesize
\begin{verbatim}
\KeepKthOfLArguments{%
  \CategoryOfArgumentsFirstTokenFork{A Bla}%
    {1}% category 1
    {1}% category 2
    {2}% category 3
    {3}% category 4
    {3}% category 6
    {2}% category 7
    {2}% category 8
    {4}% category 10
    {5}% category 11
    {5}% category 12
    {6}% expandable active character or undefined active character
    {6}% non-character or active character equal to unexpandable non-character
    {6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%
\end{verbatim}
\endgroup
\KeepKthOfLArguments{%
  \CategoryOfArgumentsFirstTokenFork{A Bla}%
    {1}% category 1
    {1}% category 2
    {2}% category 3
    {3}% category 4
    {3}% category 6
    {2}% category 7
    {2}% category 8
    {4}% category 10
    {5}% category 11
    {5}% category 12
    {6}% expandable active character or undefined active character
    {6}% non-character or active character equal to unexpandable non-character
    {6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%
\bigskip\hrule\bigskip
\begingroup\footnotesize
\begin{verbatim}
\KeepKthOfLArguments{%
  \CategoryOfArgumentsFirstTokenFork{! Bla}%
    {1}% category 1
    {1}% category 2
    {2}% category 3
    {3}% category 4
    {3}% category 6
    {2}% category 7
    {2}% category 8
    {4}% category 10
    {5}% category 11
    {5}% category 12
    {6}% expandable active character or undefined active character
    {6}% non-character or active character equal to unexpandable non-character
    {6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%
\end{verbatim}
\endgroup
\KeepKthOfLArguments{%
  \CategoryOfArgumentsFirstTokenFork{! Bla}%
    {1}% category 1
    {1}% category 2
    {2}% category 3
    {3}% category 4
    {3}% category 6
    {2}% category 7
    {2}% category 8
    {4}% category 10
    {5}% category 11
    {5}% category 12
    {6}% expandable active character or undefined active character
    {6}% non-character or active character equal to unexpandable non-character
    {6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%
\bigskip\hrule\bigskip
\begingroup\footnotesize
\begin{verbatim}
\KeepKthOfLArguments{%
  \CategoryOfArgumentsFirstTokenFork{$ Bla}%
    {1}% category 1
    {1}% category 2
    {2}% category 3
    {3}% category 4
    {3}% category 6
    {2}% category 7
    {2}% category 8
    {4}% category 10
    {5}% category 11
    {5}% category 12
    {6}% expandable active character or undefined active character
    {6}% non-character or active character equal to unexpandable non-character
    {6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%
\end{verbatim}
\endgroup
\KeepKthOfLArguments{%
  \CategoryOfArgumentsFirstTokenFork{$ Bla}%
    {1}% category 1
    {1}% category 2
    {2}% category 3
    {3}% category 4
    {3}% category 6
    {2}% category 7
    {2}% category 8
    {4}% category 10
    {5}% category 11
    {5}% category 12
    {6}% expandable active character or undefined active character
    {6}% non-character or active character equal to unexpandable non-character
    {6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%
\bigskip\hrule\bigskip
\begingroup\footnotesize
\begin{verbatim}
\KeepKthOfLArguments{%
  \CategoryOfArgumentsFirstTokenFork{_ Bla}%
    {1}% category 1
    {1}% category 2
    {2}% category 3
    {3}% category 4
    {3}% category 6
    {2}% category 7
    {2}% category 8
    {4}% category 10
    {5}% category 11
    {5}% category 12
    {6}% expandable active character or undefined active character
    {6}% non-character or active character equal to unexpandable non-character
    {6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%
\end{verbatim}
\endgroup
\KeepKthOfLArguments{%
  \CategoryOfArgumentsFirstTokenFork{_ Bla}%
    {1}% category 1
    {1}% category 2
    {2}% category 3
    {3}% category 4
    {3}% category 6
    {2}% category 7
    {2}% category 8
    {4}% category 10
    {5}% category 11
    {5}% category 12
    {6}% expandable active character or undefined active character
    {6}% non-character or active character equal to unexpandable non-character
    {6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%
\end{document}

Skillmon · Answer 6 · 2023-09-30T14:46:40.970

The following is similar to the nice answer by @egreg (from which I copied the example list), but with a slightly different interface.

Instead of having a hard coded number of elements inside your argument this answer allows to use a key=value input that also supports a smaller subset of categories to test and an else key to execute code if none of the categories was found. The key=value parsing is aborted as soon as a matching category code is found. The answer uses expkv because in it it is safe to gobble the remainder of the kv-parsing -- it doesn't use any variables to store its current state (also, I'm the author of expkv). But as a result this answer has a mix of L3 and plain TeX syntax.

EDIT: I've added category 13 to the supported category codes to check. Also the categories 0, 5, 9, 14, and 15 now throw an error, because it's impossible to peek for them, instead of being an undefined key. And giving one of the known keys without a value is now equivalent to an empty action (except for the else key, because it defaults to being empty, and I don't see much sense in providing this).

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{expkv}
\ExplSyntaxOn
\NewDocumentCommand \peekcatcodes { m } { \ralig_peek_catcodes:n {#1} }
\NewDocumentCommand \removespacesand { m } { \peek_remove_spaces:n {#1} }
% we need to define one key outside of the following group, or else the key-set
% will not be defined. This is something which wasn't considered in the way
% expkv defines keys, and something that I'll need to fix.
% The key 1 will be redefined globally inside the following group.
\ekvdef { ralig / catcodes } 1 {}
\protected\long\ekvsetdef __ralig_peek_catcodes:n { ralig / catcodes }
\msg_new:nnn { ralig } { impossible-peek } { Impossible~ to~ peek~ for~ #1 }
\group_begin:
  % temporarily defined auxiliaries to ease setting up the keys
  \cs_set:Npn __ralig_setup_catcode:nn #1#2
    {
      \global\protected\long\ekvdef { ralig / catcodes } {#1}
        { __ralig_peek_catcodes:Nnw #2 {##1} }
      \global\protected\ekvdefNoVal { ralig / catcodes } {#1}
        { __ralig_peek_catcodes:Nnw #2 {} }
    }
  \cs_set:Npn __ralig_inval_catcode:nn #1#2
    {
      \global\protected\long\ekvdef { ralig / catcodes } {#1}
        { \msg_error:nnn { ralig } { impossible-peek } {#2} }
      \global\protected\ekvdefNoVal { ralig / catcodes } {#1}
        { \msg_error:nnn { ralig } { impossible-peek } {#2} }
    }
  % real key setup
  __ralig_inval_catcode:nn {  0 } { escape~ character }
  __ralig_setup_catcode:nn {  1 } \c_group_begin_token
  __ralig_setup_catcode:nn {  2 } \c_group_end_token
  __ralig_setup_catcode:nn {  3 } \c_math_toggle_token
  __ralig_setup_catcode:nn {  4 } \c_alignment_token
  __ralig_inval_catcode:nn {  5 } { end~ of~ line }
  __ralig_setup_catcode:nn {  6 } { ## }
  __ralig_setup_catcode:nn {  7 } \c_math_superscript_token
  __ralig_setup_catcode:nn {  8 } \c_math_subscript_token
  __ralig_inval_catcode:nn {  9 } { ignored~ character }
  __ralig_setup_catcode:nn { 10 } \c_space_token
  __ralig_setup_catcode:nn { 11 } \c_catcode_letter_token
  __ralig_setup_catcode:nn { 12 } \c_catcode_other_token
  \exp_args:Nne
  __ralig_setup_catcode:nn { 13 } \c_catcode_active_tl
  __ralig_inval_catcode:nn { 14 } { comment~ character }
  __ralig_inval_catcode:nn { 15 } { invalid~ character }
  % temporary definition to have slightly better performance in the else key
  \cs_set:Npn __ralig_peek_catcodes_else:nw
      #1 #2__ralig_peek_catcodes_else:n #3
    { #2 __ralig_peek_catcodes_else:n {#1} }
  \global\ekvlet { ralig / catcodes } { else } __ralig_peek_catcodes_else:nw
\group_end:
\cs_new_protected:Npn \ralig_peek_catcodes:n #1
  { __ralig_peek_catcodes:n {#1} __ralig_peek_catcodes_else:n {} }
\cs_new_eq:NN __ralig_peek_catcodes_else:n \use:n
\cs_new:Npn __ralig_peek_catcodes:Nnw #1#2 #3 __ralig_peek_catcodes_else:n #4
  {
    \peek_catcode:NTF #1
      { #2 }
      { #3 __ralig_peek_catcodes_else:n {#4} }
  }
\ExplSyntaxOff
\NewDocumentCommand{\testnext}{}
  {%
    \peekcatcodes{
       1    = {Begin group token }
      ,2    = { End group token}
      ,3    = {Math toggle }
      ,4    = Alignment token \string
      ,6    = Parameter token \string
      ,7    = \mbox{Superscript token}
      ,8    = \mbox{Subscript token}
      ,10   = Space token
      ,11   = {Letter }
      ,12   = {Other character }
      ,13   = Active character \string
      ,else = Not a character token \string
    }%
  }
\NewDocumentCommand\othertest{}
  {%
    \par
    \removespacesand
      {\peekcatcodes{11={Letter }, 12={Other }, else=Something else: \string}}%
  }
\begin{document}
Using \string\testnext\par
\testnext{abc}\par
{\itshape abc\testnext} def\par
\testnext$1+1$\par
\testnext&\par
\testnext#\par
$a\testnext^2$\par
$a\testnext_2$\par
\testnext a\par
\testnext @\par
\expandafter\testnext\space\par
\testnext\mbox\par
\testnext~\par
\bigskip
Using \string\othertest\par
\othertest a
\othertest @
\expandafter\othertest\space &
\end{document}

Thanks. Will also give this a test drive. – rallg Sep 30 '23 at 18:09 — rallg, Sep 30 '23 at 18:09

Ulrich Diez · Answer 7 · 2023-10-06T18:07:09.670

You mentioned \@ifnextchar.

The implementation of \@ifnextchar/\kernel@ifnextchar in the LaTeX 2ε-kernel is based on \futurelet.

\@ifnextchar/\kernel@ifnextchar via \ifx only "looks" at the meanings of tokens but not on what is called a token's "shape" in The LaTeX3 interfaces.

There are some minor drawbacks with \@ifnextchar/\kernel@ifnextchar:

Looking only at the meanings of tokens implies that implicit character tokens and explicit character tokens are not distinguished.
\ifx-comparison to predefined tokens like \@sptoken or \reserved@d implies that the test may go wrong in case the next token is not a character token but one of these predefined tokens.
If you do s.th. like \kernel@ifnextchar{Z}{There is Z}{There is no Z}\@sptoken, you get an error-message about usage of \reserved@c, which equals \@xifnch at that moment, not matching its definition.
If you do s.th. like \kernel@ifnextchar{Z}{There is Z}{There is no Z}\reserved@d, you get "There is ZZ".

Apart from that, if it is only about category 12(other) and expandability is not an issue, you can (ab)use \kernel@ifnextchar for just copying the meaning of the next non-space-token to \@let@token and then have TeX do \ifcat-comparison of \@let@token to some character token whose category is 12(other):

\documentclass{article}
\makeatletter
\newcommand*\wanted{%
  % Use \kernel@ifnextchar for copying the meaning of the next non-space to @let@token:
  \kernel@ifnextchar{\relax}%
                    {\ifcat\noexpand@let@token!\expandafter\dothis\else\expandafter\dothat\fi}%
                    {\ifcat\noexpand@let@token!\expandafter\dothis\else\expandafter\dothat\fi}%
}%
\makeatother
% Let's define \dothis and \dothat to neutralize the next token by stringifying it:
\newcommand\dothis{\bigskip \noindent Category of next non-space is 12.\The next non-space is: \string}
\newcommand\dothat{\bigskip \noindent Category of next non-space differs from 12.\The next non-space is: \string}
% Helper-macro for getting several spaces:
\newcommand\MultiplySecond[2]{#1#2#2#2#2#2#2#2#2#2}%
\begin{document}
\ttfamily\frenchspacing
\wanted\LaTeX
\wanted\fi
\wanted\endcsname
\wanted $
\wanted{
\wanted}
\wanted !
%Let's get several space tokens behind \wanted:
\MultiplySecond\wanted{ }\LaTeX
\MultiplySecond\wanted{ }\fi
\MultiplySecond\wanted{ }\endcsname
\MultiplySecond\wanted{ }$
\MultiplySecond\wanted{ }{
\MultiplySecond\wanted{ }}
\MultiplySecond\wanted{ }!
\end{document}

Thanks. Will have a look. But I must say, until you mentioned it, I had no idea that there were "implicit" and "explicit" character tokens. — rallg, Oct 06 '23 at 17:29
Implicit is when it is a control sequence, i.e., a control word token or a control symbol token or an active character, which via \let is made equal to a character token. E.g., after s.th. like \let\ControlWordToken=x the token \ControlWordToken is an implicit character x of category 11(letter) and xy yields the same output as \ControlWordToken y. — Ulrich Diez, Oct 06 '23 at 17:40

Detect catcode of next character?

7 Answers7

Linked