Explicit character tokens have a character code which denotes the number of the code point of the corresponding character in the TeX-engine's internal character encoding scheme (which is ASCII with traditional TeX engines and is unicode with LuaTeX/XeTeX-based TeX-engines) and one of the categories 1 (begin grouping), 2 (end grouping), 3 (math shift), 4 (alignment tab), 6 (parameter), 7 (superscript), 8 (subscript), 10 (space), 11 (letter), 12 (other), 13(active).
Explicit character tokens of a category differing from 13(active) are not control sequences.
Explicit character tokens of category 13 (active) need special consideration:
Together with the control-sequence-tokens (which come in two flavors - "control-word-tokens" and "control-symbol-tokens") they belong to the "control sequences".
[Properties of control-symbol-tokens:
- Names of control-symbol-tokens consist of a single character whose (!!!)current1 category code is not 11(letter).
- A character in the source code whose category code at the time of tokenizing is 10(space), which trails things that got tokenized as control-symbol-tokens, is not discarded but gets tokenized as an explicit space token (explicit character token of category 10(space), character code 32).
Exception: If the category code of the character that forms the name of the control-symbol-token is 10(space), then trailing characters of category code 10(space) are discarded.
- When unexpanded-writing a control-symbol-token to external text file, no space is appended.
Properties of control-word-tokens:
- Names of control-word-tokens either consist of several characters (not necessarily all of them having category code 11(letter)) or consist of a single character whose current category code is 11(letter).
However, with control-word-tokens which come into being by having TeX read and tokenize .tex-input, the name of the control-sequence-token is obtained after encountering in the .tex-input-file the backslash/character of category 0(escape), which denotes that a control-sequence-token is to be created, by gathering from the line of the .tex-input-file characters whose current category code is 11(letter) until encountering a character whose category code differs from 11.
Nevertheless control-sequence-tokens, thus also control-word-tokens, can come into being in other ways, too. E.g., by having TeX expand a \csname..\endcsname-expression. This way multi-character-control-word-tokens can come into being where characters with category codes differing from 11(letter) can be components of the name, too.
- Characters in the source code whose category code at the time of tokenizing is 10(space), which trail things that got tokenized as control-word-token, are discarded and don't yield any token.
- When unexpanded-writing a control-word-token to external text file, a space character is appended.
1For writing unexpanded to file it can be toggled between TeX treating a control-sequence-token whose name consists of a single character as a control-word-token (where a space is appended) or as a control-symbol-token (where no space is appended) by toggling between assigning the character whose name makes up the name of the control-sequence-token either category code 11 (letter) or s.th. differing from 11. ]
If the meaning of a control sequence via \let/\futurelet is made equal to the meaning of an explicit character token whose category is not 13(active), then that control sequence is called an "implicit character token".
If the control sequence whose meaning via \let/\futurelet is made equal to the meaning of an explicit character token whose category is not 13(active) itself is an explicit character token whose category is 13(active), then that control sequence is both an explicit character token and an implicit character token at the same time.
If the meaning of a control sequence—that control sequence can be an active character or a control-word-token or a control-symbol-token—is made equal via \let or \futurelet to the meaning of an explicit character token whose category is not 13(active), then the result of \ifcat- and \if- and \ifx-comparison with that control sequence or of applying \meaning to that control sequence is the same as you get when comparison or applying \meaning is done with the explicit character token in place of the control sequence. With \ifcat and \if this is the case even if \noexpand is prepended to the control sequence in question—the control sequence in question is neither expandable nor an undefined control sequence, thus applying \noexpand has no effect. This is also the case when doing s.th. like \expandafter\ifx\noexpand⟨character-token that might be explicit or implicit⟩....
(I mention the prepending of \noexpand because in "Chapter 20: Definitions (also called Macros)" of the TeXbook you find an explanation about \ifcat where the need of using \noexpand for suppressing expansion when "looking" at active characters is explained:
\ifcat⟨token1⟩⟨token2⟩ (test if category codes agree)
This is just like \if, but it tests the category codes, not the character codes. Active characters have category 13, but you have to say ‘\noexpand⟨active character⟩’ in order to suppress expansion when you are looking at such characters with \if or \ifcat. For example, after
\catcode`[=13 \catcode`]=13 \def[{*}
the tests ‘\ifcat\noexpand[\noexpand]’ and ‘\ifcat[*’ will be true, but the test ’\ifcat\noexpand[*’ will be false.
Don't be confused by this. This refers to situations where active character tokens are expandable. But if a control sequence, e.g., an active character token, via \let/\futurelet is turned into an implicit character token, then it is not expandable.)
Thus \ifcat/\if/\ifx-comparison and examining the result of applying \meaning—things which focus on aspects of the meanings of tokens—is not useful for distinguishing explicit character tokens from their implicit pendants.
To some degree you can distinguish things by comparing the results of applying \string.
But there are edge cases where this is not possible.
In this context you might be interested in the answers to the question
Distinguish active characters from non-active pendants with expansion-methods only?, which I asked about four years ago.
And in the answers to the question Cases of different tokens having same meaning and same \string-representation that can occur in the stage of expansion, which I asked about a year ago.
In expl3-manuals (source3.pdf, interface.pdf) terminology is introduced for distinguishing properties/aspects of tokens:
It is important to distinguish two aspects of a token: its "shape" (for lack of a better word), which affects the matching of delimited arguments and the comparison of token lists containing this token, and its “meaning”, which affects whether the token expands or what operation it performs. One can have tokens of different shapes with the same meaning, but not the converse.
Using this terminology one can say that implicit character tokens differ from their explicit pendants in shape, but not in meaning.
As indicated in the quote, you can use delimited arguments for expandably distinguishing specific explicit non-category-13(active)-character tokens from their implicit pendants.
But there is no general expandable method known to me, which is 100%-ly reliable for deciding whether an arbitrary character token is an explicit or an implicit character token.
Theoretically a mechanism could be implemented where auxiliary macros are used for cranking out active-character-tokens and one-character-control-sequence-tokens by means of delimited arguments.
- For one thing such a mechanism needs to be defined in a way where re-defining some of these active-character-tokens/one-character-control-sequence-tokens to be
\outer doesn't matter as long as the user does not provide such \outer-tokens as components of arguments her-/himself.
- For another thing such a mechanism should be defined not to break when being used in alignments/tables.
- For yet another thing consider that a traditional TeX-engine's internal character-encoding-scheme is ASCII and that on common computer-platforms a traditional TeX-engine assumes input-files to be encoded in some 8-bt-encoding/byte-encoding, so with traditional TeX-engines the range of possible code-point-numbers of characters is 0..255. This could probably be handled. But the trend goes towards LuaTeX and XeTeX which assume input-files to be encoded in unicode/utf-8. So with these TeX-engines the range of possible code-point-numbers of characters is 0..1114111. That's a lot.
- For yet one more other thing, if you wish to expandably compare arbitrary token-lists, consider
- even more edge cases, e.g., distinguishing frozen-
\relax from non-frozen-\relax, frozen \font-control-sequences from their non-frozen pendants, ...
- token-lists where brace-groups and the like are nested.
- token-lists where
\if, \else, \fi and the like are not balanced.
- ...
E.g., examining the result of applying \string usually relies on applying \string to control sequences yielding more than one explicit character token. This is not the case if the control sequence in question is an active character token or if the value of \escapechar is outside the range of valid code point numbers of characters while the name of the control sequence consists of a single character.
If examining is done by means of macros that process undelimited arguments, then the first and/or the second of these explicit character tokens being an explicit space-token (category 10, character code 32) might be a problem, too. (Like \meaning etc \string delivers explicit character tokens of category 12(other). With one exception: Character tokens of character code 32 delivered by things like \string/\meaning/\detokenize/etc always are of category 10(space) and thus are explicit space tokens which may be discarded in the course of gathering the first token belonging to an undelimited macro argument...)
Applying \string to the nameless control sequence [which can come into being by expanding \csname\endcsname or by ending a line of .tex-source-code with a backslash (a character of category 0(escape)) while \endlinechar has a negative value]
- in case the value of
\escapechar not being in the range of valid code-point-numbers of characters yields a sequence of explicit category-12(other)-character-tokens csnameendcsname,
- in case the value of
\escapechar being 32 yields
an explicit space token (explicit character token of category 10 and character code 32) trailed by a sequence of explicit category-12(other)-character-tokens csname, trailed by an explicit space token, trailed by a sequence of explicit category-12(other)-character-tokens endcsname,
- in case the value of
\escapechar being anything else within the range of valid code-point-numbers of characters yields an explicit character token of category 12(other) and character-code equal to the value of \escapechar trailed by a sequence of explicit category-12(other)-character-tokens csname, trailed by an explicit character token of category 12(other) and character-code equal to the value of \escapechar, trailed by a sequence of explicit category-12(other)-character-tokens endcsname.
You get the same by applying \string to a different token which comes into being by doing \csname csname\string\endcsname\endcsname.
If both the nameless control sequence token and that token via \let or \futurelet are made equal, e.g., to the same explicit non-category-13-character-token, then distinguishing them by expansion-methods is tricky and can probably only be done by cranking things out via macros that process delimited arguments.
Be aware that LuaTeX-based engines are a different matter: As \directlua is expandable, you can, e.g., use token.scan_toks() for getting a brace-delimited set of tokens into a Lua-table of tokens and then examine properties of tokens stored in that table. token.get_next() might also be of interest.
In TUGboat, Volume 36 (2015), No. 1, you find an article `Still tokens: LuaTeX scanners" by Hans Hagen about this.
\def\xx{a}then\xxis a macro expanding toabut if (as here) you define\let\xx=athenxxis not a macro. So the question in your title is misleading – David Carlisle Dec 01 '22 at 11:45soulpackage typeset it in a monotype font, and it fails if the font does not have some characters in the word) --" – user202729 Dec 01 '22 at 12:01\directluakind cuts through the Gordian Knot in one fell swoop in the case of LuaTeX. The flip side to the weirdness of TeX is that solving a seemingly (but not actually) simple task with some mind-boggling combination of conditionals and other stuff after racking your brain on it can be oddly satisfying. – Taederias Dec 01 '22 at 12:53