This answer does not focus on peeking ahead at the next token of the token stream.
The focus of this answer is on finding out about the first token of an undelimited macro argument/on finding out about the first token inside {...} in case the undelimited argument consists of several tokens that are nested between curly braces.
Due to the 30000-character-limit for answers I needed to split my answer in two parts.
This is part 1 of my answer.
Part 1 of my answer holds general explanations of the workings of TeX and of the code/working example provided in part 2.
Part 2 of my answer holds the working example.
In case you wish to upvote, please upvote only one part of my answer. This prevents unfair reputation gain.
In case you wish to downvote, downvote whichever part of my answer you wish to downvote.
General considerations:
Let's note that there are 16 category codes:
0 = Escape character, normally \ .
1 = Begin grouping, normally { .
2 = End grouping, normally } .
3 = Math shift, normally $ .
4 = Alignment tab, normally & .
5 = End of line, normally <carriage return> .
6 = Parameter, normally # .
7 = Superscript, normally ^ .
8 = Subscript, normally _ .
9 = Ignored character, normally <null> .
10 = Space, normally <space> and <horizontal tab> .
11 = Letter, normally only contains the letters a,...,z and A,...,Z. These characters can be used in command names.
12 = Other, normally everything else not listed in the other categories.
13 = Active character, for example ~ .
14 = Comment character, normally % .
15 = Invalid character, normally <delete> .
Thereof only 1, 2, 3, 4, 6, 7, 8, 10, 11, 12 and 13 can make it into categories of character tokens.
Let's note that both implicit and explicit character tokens have categories.
E.g., after
\let\egroup=}
the control word token \egroup is an implicit character token of category 2(end group) and
\ifcat }\egroup same\else different\fi
yields same.
\ifcat does not distinguish explicit and implicit character tokens.
Therefore the routines in part 2 of this answer do not distinguish between explicit and implicit character tokens.
Category 13 requires special attention:
\ifcat usually triggers expansion of subsequent expandable tokens until two non-expandable tokens are obtained where comparison of categories can be done.
In case an active character via \let is made equal to a non-expandable token, \ifcat treats it as if it were that non-expandable token.
In case an active character is undefined or via \let or \def is turned into an expandable token, \ifcat\noexpand⟨(expandable or undefined) active character⟩... leads to TeX in the \ifcat-comparison assuming category 13 for it.
Therefore with \ifcat you can crank out undefined/expandable active characters because when preventing their expansion via \noexpand TeX in the \ifcat-comparison does not assume them to be of the same category as frozen-\relax: When preventing their expansion via \noexpand they are assumed to be of category 13 while frozen-\relax is assumed to be of category 16. Category 16 is TeX's internal way of saying that it is not a token with a character-category.
But with \ifcat
you cannot distinguish expandable active character tokens from undefined active character tokens.
you cannot distinguish (unexpandable) active characters which denote unexpandable non-characters from whatsoever unexpandable or \noexpand-expansion-prevented control-word/symbol-tokens that are not implicit characters.
you cannot distinguish (unexpandable) active characters which are implicit characters and thus denote non-active character tokens from non-active character-tokens of the same category, be they explicit or implicit.
For doing such distinctions expandably you would need a mechanism where each possible character token of category 13 occurs as delimiter of a delimited argument. While with traditional TeX-engines, where the internal character representation is 8bit-ASCII, which has only 256 code points, this might be feasible, it might be a problem with XeTeX- and LuaTeX-based engines as here the internal character representation is unicode which has 1114111 code points. (Of course, with LuaTeX-based engines you can use the Lua backend for examining tokens.)
Your scenario:
Seems you wish a macro \ifnextcatcode. As it is a macro and not one of TeX's \if..-\else-\fi-tests, I suggest giving it a different name, a name not beginning with \if.., in order not to mix up things where \if..-\else-\fi-matching applies with things where \if..-\else-\fi-matching does not apply.
I can offer a routine \CategoryOfArgumentsFirstToken which processes an undelimited macro argument and after triggering two expansion-steps returns a sequence of digit-tokens denoting the category of the very first token of the argument. You can use this with \ifnum. (Or for forking via delimited arguments.)
Above I used \egroup as an example as with undelimited arguments explicit braces must be balanced, so a token of category 2 can only be the very first token of a macro argument in case it is implicit.
Some cases require special attention:
- The argument being empty.
- The very first token of the argument being an explicit character of category 1(begin group).
- The very first token of the argument being an explicit character of category 10(space).
- Brace-hacks for getting
\ifcat-tests for categories 1 and 2 into a macro where braces must be balanced.
- The very first token of the argument being an undefined active character or an expandable active character..
In the routine in part 2 of this answer frozen-\relax is used for testing against an unexpandable control sequence because frozen-\relax cannot be redefined to be an implicit character token or an \outer-token.
The very first token of the argument being a control sequence (i.e., an active character token or a control word token or a control symbol token) which is undefined is not a problem as with the \ifcat-tests \noexpand is used for preventing expansion.
In part 2 of this answer you find a routine \CategoryOfArgumentsFirstToken{⟨argument⟩} which produces the following digit-sequences (with no trailing spaces):
1 = The very first token of the argument is of category 1 (begin grouping).
2 = The very first token of the argument is of category 2 (end grouping).
3 = The very first token of the argument is of category 3 (math shift).
4 = The very first token of the argument is of category 4 (alignment tab).
6 = The very first token of the argument is of category 6 (parameter).
7 = The very first token of the argument is of category 7 (superscript).
8 = The very first token of the argument is of category 8 (subscript).
10 = The very first token of the argument is of category 10 (space).
11 = The very first token of the argument is of category 11 (letter).
12 = The very first token of the argument is of category 12 (other).
13 = The very first token of the argument is an undefined active character token or an active character token denoting an expandable control sequence.
16 = The very first token of the argument either is a control-word/symbol-token not denoting an implicit character or is an active-character-token denoting an unexpandable control-word/symbol-token.
17 = With the argument there is no very first token as the argument is empty.
You can use it for \ifnum-tests like
\ifnum\CategoryOfArgumentsFirstToken{!bla}=12 %
Argument has a first token whose category is 12(other).
\else
Argument does not have a first token whose category is 12(other).
\fi
The ⟨argument⟩ of \CategoryOfArgumentsFirstToken can consist of a control sequence which came into being in the course of peeking ahead at the token-stream's next token via \futurelet or \@ifnextchar/\kernel@ifnextchar.
Here \futurelet is used for peeking ahead at the next token:
\newcommand\wanted[1]{%
\edef\scratchnum{\unexpanded{#1}}%
\futurelet\@let@token\wantedinner
}%
\newcommand*\wantedinner{%
\ifnum\CategoryOfArgumentsFirstToken{\@let@token}=\numexpr(\scratchnum)\relax
Argument has a first token whose category is \the\numexpr(\scratchnum)\relax.
\else
Argument does not have a first token whose category is \the\numexpr(\scratchnum)\relax.
\fi
}%
\wanted{12}$
\wanted{12}A
\wanted{11}A
Here \@ifnextchar/\kernel@ifnextchar, which copies the meaning of the next non-space-token to \@let@token, is used for peeking ahead at the next non-space-token:
\newcommand\wanted[1]{%
\edef\scratchnum{\unexpanded{#1}}%
\@ifnextchar{\relax}{\wantedinner}{\wantedinner}%
}%
\newcommand*\wantedinner{%
\ifnum\CategoryOfArgumentsFirstToken{\@let@token}=\numexpr(\scratchnum)\relax
Argument has a first token whose category is \the\numexpr(\scratchnum)\relax.
\else
Argument does not have a first token whose category is \the\numexpr(\scratchnum)\relax.
\fi
}%
\wanted{12} $
\wanted{12} A
\wanted{11} A
But the \futurelet- or \@ifnextchar/\kernel@ifnextchar-way you can, additionally to the restrictions in distinguishing already mentioned, not distinguish active characters in the token-stream from other kinds of control sequences in the token-stream because the token, which \CategoryOfArgumentsFirstToken looks at, in any case is not an active character token but is the control word token \@let@token.
Besides this you find a routine
\CategoryOfArgumentsFirstTokenFork{⟨argument⟩}%
{⟨tokens in case argument has a first token of category 1⟩}%
{⟨tokens in case argument has a first token of category 2⟩}%
{⟨tokens in case argument has a first token of category 3⟩}%
{⟨tokens in case argument has a first token of category 4⟩}%
{⟨tokens in case argument has a first token of category 6⟩}%
{⟨tokens in case argument has a first token of category 7⟩}%
{⟨tokens in case argument has a first token of category 8⟩}%
{⟨tokens in case argument has a first token of category 10⟩}%
{⟨tokens in case argument has a first token of category 11⟩}%
{⟨tokens in case argument has a first token of category 12⟩}%
{⟨tokens in case argument has a first token which is an undefined active character or an expandable active character⟩}%
{⟨tokens in case argument has a first token which either is not a character or is an active character that is equal to an unexpandable non-character⟩}%
{⟨tokens in case argument is empty⟩}%
Usage of this routine, also, can be applied with an ⟨argument⟩ which consists of a control sequence which came into being due to peeking ahead at the token-stream's next token via \futurelet or \@ifnextchar.
Internally that routine uses a routine
\KeepKthOfLArguments{⟨TeX-⟨number⟩-quantity with integer-value K⟩}%
{⟨TeX-⟨number⟩-quantity with integer-value L⟩}%
⟨list of L undelimited arguments⟩%
If you like, you can use that routine and within the ⟨TeX-⟨number⟩-quantity with integer-value K⟩-argument use \CategoryOfArgumentsFirstTokenFork for combining some cases, e.g.:
\KeepKthOfLArguments{%
\CategoryOfArgumentsFirstTokenFork{⟨argument⟩}%
{1}% category 1
{1}% category 2
{2}% category 3
{3}% category 4
{3}% category 6
{2}% category 7
{2}% category 8
{4}% category 10
{5}% category 11
{5}% category 12
{6}% expandable active character or undefined active character
{6}% non-character or active character equal to unexpandable non-character
{6}% empty
}{6}%
{The first token of the argument is some brace, probably implicit.}%
{The first token of the argument is something that is interesting when doing maths.}%
{The first token of the argument is an alignment tab or a parameter.}%
{The first token of the argument is an explicit space token.}%
{The first token of the argument is a letter or an other character.}%
{The argument does not have a first token which is a nice character token.}%
So you need sub-routines for
- testing whether an argument is empty;
\UD@CheckWhetherNull in the example in part 2 of this answer.
- testing whether an argument's first token is an explicit character token of category 1;
\UD@CheckWhetherBrace in the example in part 2 of this answer.
- testing whether an argument's first token is an explicit space token;
\UD@CheckWhetherLeadingExplicitSpace in the example in part 2 of this answer.
- extracting the first token of an argument in case the argument is not empty and the 1st token is not a space or a brace;
\UD@ExtractFirstArg in the example in part 2 of this answer.
- selecting the K-th of L arguments;
\KeepKthOfLArguments in the example in part 2 of this answer.
- doing
\ifcat-tests and selecting the 1st or the 2nd of two subsequent arguments; \UD@CheckWhetherCategoriesEqual in the example in part 2 of this answer.
The example in part 2 of this answer is LaTeX 2ε, but you can easily turn it into Knuthian-TeX by doing \def instead of \newcommand and by just omitting the \@ifdefinable-tests.
\ifcat– David Carlisle Sep 29 '23 at 19:42\ifcatmight be helpful. The answer by cfr directly addresses my question. – rallg Sep 29 '23 at 19:47