3

I'm still trying (again) to not succumb to the TeX syntax.

I'd like to create a "if/when" that checks if a character is letter (catcode 11).

 \def\whenletter#1{\expandafter\expandafter\expandafter\ifnum\getcatcode#1=11} 
 % I'm not confortable to use many \expandafter's in line. I would check if that would make sense.
 % the issue is around the \getcatcode 
 \def\getcatcode#1{\the\catcode`\#1} %fails: \# is the # itself
 \def\getcatcode#1{\the\catcode`\\#1} %fails: \\ has no useful meaning in this case
 \def\getcatcode#1{\the\catcode`\{{##1}}} %fails: \{ generates a syntax error
 \def\getcatcode#1{\the\catcode`\\{{##1}}} %fails: kill my self!

For a expert, this is trivial but, even after many tries I couldn't define \getcatcode properly. I would like to keep the legibility.

My main intention is understand better TeX syntax and its possibilities.

Is it possible to write \getcatcode more or less in this manner? If yes, how?

Thanks in advance!

  • 1
    You wrote charcode, not catcode. Which one are you trying to retrieve? For the catcode, \def\getcatcode#1{\the\catcode`#1 } should do – Phelype Oleinik Sep 08 '22 at 00:18
  • For learning reference, you may want to start with reading the TeXbook & other resources in package writing - Where do I start LaTeX programming? - TeX - LaTeX Stack Exchange – user202729 Sep 08 '22 at 01:03
  • Be more precise - due to TeX's different stages of processing you need to be picky when phrasing things: What do you mean by "character"? A character occurring in the .tex-input file and seen by TeX's eyes and placed/copied into TeX's mouth? A character-token created in TeX's mouth, where tokenizing is done, and sent down TeX's gullet, where expandable tokens are expanded in a process of regurgitation so that unexpandable tokens reach the stomach and subsequent digestive organs for being carried out/executed/processed there? (All macro expansion is done in the gullet and applies to tokens.) – Ulrich Diez Sep 08 '22 at 02:00
  • 1
    The difference is: Characters and character- tokens exist in different stages of processing. A character is not a character token. A character exists before tokenization and does not have a category. A character token exists after tokenization and has a category and a character code which corresponds to the number of the code point of the character in question in TeeX's internal character representation scheme which either is ASCII (traditional TeX) or is Unicode (LuaTeX/XeTeX). The category is assigned to a character token during tokenization according to current category-code-régime. – Ulrich Diez Sep 08 '22 at 02:12
  • 1
    So the question is whether the focus is on info on future behavior during tokenization or whether the focus is on having info about properties of tokens that already came into being by tokenization or came into being during expansion as expandable tokens' replacement-texts and during the stage of expansion can be components of arguments of macros/expandable primitives and in subsequent stages can be components of arguments of non-expandable primitives. – Ulrich Diez Sep 08 '22 at 02:19
  • A silly mistaking write \charcode instead of \catcode. Sorry. The correction was made. – Daniel Bandeira Sep 08 '22 at 10:10

2 Answers2

3

\charcode is undefined.
I suppose you wish to use \catcode.

Never give a macro a name that begins with \if like \ifletter.
You will quickly reach a point where you cannot distingush such macros from TeX's \if...-primitives and things defined in terms of \newif.
But instances of macros on the one side and instances of \if..-primitives/\newif-thingies on the other side need to be distinguished from each other because instances of TeX's \if...-primitives and \newif-thingies are taken into account when TeX with \if..\else..\fi-branching does its \if..\else..\fi-matching while instances of macros are not taken into account there.
So if you don't stick to the convention of only giving \if..-primitives/\newif-thingies names that begin with \if.., you might easily be confused about what is/is not taken into account by TeX's \if..\else..\fi-matching.

If you really wish to find out whether a specific character currently is assigned the category code 11—which actually is an information about the behavior of TeX's reading- and tokenizing-apparatus and is not an information about properties of a token that already exists—so that TeX in the future will create an explicit character token of category 11(letter) when encountering an instance of that character while reading from the .tex-input-file and creating tokens from the characters just read, you can do so via \ifnum\the\catcode`#1=11 ... \else ... \fi whereby #1 is an alphabetic constant, i.e., either a one-letter-control-sequence or an explicit character token.

But this check does only tell you something about how currently the reading- and tokenizing-apparatus is adjusted, namely about the categories which explicit character tokens will have when coming into being by tokenizing subsequent instances of the character in question while reading and tokenizing the .tex-input file.

This check does not tell you something about properties of character tokens that already exist. This check does not tell you which category an already tokenized character token has.
In order to find out about the latter you can use \ifcat.

If you wish to test whether an already tokenized explicit character-token is of category 11(letter), and you can rely on TeX's default-settings for category codes and thus can rely on category code 11(letter) being assigned to the character A, so that characters A in the .tex-input file will be tokenized as explicit character tokens of category 11(letter) you can do something like this:

\long\def\firstoftwo#1#2{#1}%
\long\def\secondoftwo#1#2{#2}%
\def\CheckWhetherTokenOfletterCategory#1{%
   \ifcat A\noexpand#1\expandafter\firstoftwo\else\expandafter\secondoftwo\fi
}%

\newlinechar=`^^J \message{^^JThe token + \CheckWhetherTokenOfletterCategory{+}{is}{is not} of category 11(letter).} \message{^^JThe token $ \CheckWhetherTokenOfletterCategory{$}{is}{is not} of category 11(letter).} \message{^^JThe token A \CheckWhetherTokenOfletterCategory{A}{is}{is not} of category 11(letter).} \bye

(When it comes to forking/branching I often do the \expandafter\firstoftwo/\expandafter\secondoftwo-trick because this way the user-provided argument does not occur within the \if..\else..\fi-expression. User-provided arguments occurring within \if..\else..\fi-expressions might cause problems with proper \if..\else..\fi-matching in case the user-provided argument contains unbalanced \if.. or \else or \fi-tokens or the like.)

If for some reason you cannot rely on TeX's default-settings for category codes and thus cannot rely on category code 11(letter) being assigned to character A and thus cannot rely on explicit character tokens that come into being while tokenizing the input-character A being of category 11(letter), you can adjust the category-code-régime within a local scope before having TeX read the argument from the .tex-file and tokenize it:

\begingroup
% Within the local scope/group do adjustments to the reading- and
% tokenizing-apparatus, i.e., adjustments to the category-code-régime,
% adjustments to the parameter `\endlinechar`:
\def\firstofone#1{#1}%
\catcode`\A=11 %<-Just to make sure...
\firstofone{% \firstofone ensures tokenization of its argument while
            % catcode-changes etc introduced within the local scope/
            % group are in effect. Then in the gullet the 
            % replacement of \firstofone , i.e., the set of tokens
            % that forms the argument, is delivered. It contains the
            % token \endgroup which makes it into the stomach and
            % ends the group so that catcode-changes etc are not in
            % effect any more when tokenizing stuff that follows
            % \firstofone's argument.
  \endgroup
  \long\def\firstoftwo#1#2{#1}%
  \long\def\secondoftwo#1#2{#2}%
  \def\CheckWhetherTokenOfletterCategory#1{%
     \ifcat A\noexpand#1\expandafter\firstoftwo\else\expandafter\secondoftwo\fi
  }%
}%
\newlinechar=`\^^J
\message{^^JThe token + \CheckWhetherTokenOfletterCategory{+}{is}{is not} of category 11(letter).}
\message{^^JThe token $ \CheckWhetherTokenOfletterCategory{$}{is}{is not} of category 11(letter).}
\message{^^JThe token A \CheckWhetherTokenOfletterCategory{A}{is}{is not} of category 11(letter).}
\bye

Console output:

The token + is not of category 11(letter).

The token $ is not of category 11(letter).

The token A is of category 11(letter).

If you really need to know which which category explicit character tokens will have when coming into being due to tokenizing subsequent instances of the character in question while reading and tokenizing the .tex-input file, I suggest doing something like this:

\long\def\firstoftwo#1#2{#1}%
\long\def\secondoftwo#1#2{#2}%
\def\getCurrentCategoryCode#1{\the\catcode`#1}
\def\AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter#1{%
  \ifnum\getCurrentCategoryCode{#1}=11 %
  \expandafter\firstoftwo\else\expandafter\secondoftwo\fi
}%
% Without \getcatcode:
%
%\def\AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter#1{%
%  \ifnum\the\catcode`#1=11 %
%  \expandafter\firstoftwo\else\expandafter\secondoftwo\fi
%}%
\newlinechar=`\^^J
\message{^^J\getCurrentCategoryCode{#}}
\message{^^J\getCurrentCategoryCode{-}}
\message{^^J\getCurrentCategoryCode{+}}
\message{^^J\getCurrentCategoryCode{\%}}
\message{^^J\getCurrentCategoryCode{\\}}
\message{^^J\getCurrentCategoryCode{\{}}
\message{^^J\getCurrentCategoryCode{\}}}
\message{^^J\getCurrentCategoryCode{A}}
\message{^^JSubsequent instances of the hash-character occurring in the .tex-input-file will 
           \AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter{#}{be}{not be} tokenized as explicit character tokens of letter category.}
\message{^^JSubsequent instances of the character - occurring in the .tex-input-file will 
           \AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter{-}{be}{not be} tokenized as explicit character tokens of letter category.}
\message{^^JSubsequent instances of the character + occurring in the .tex-input-file will 
           \AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter{+}{be}{not be} tokenized as explicit character tokens of letter category.}
\message{^^JSubsequent instances of the percent-character occurring in the .tex-input-file will 
           \AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter{\%}{be}{not be} tokenized as explicit character tokens of letter category.}
\message{^^JSubsequent instances of the backslash character occurring in the .tex-input-file will 
           \AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter{\\}{be}{not be} tokenized as explicit character tokens of letter category.}
\message{^^JSubsequent instances of the curly left brace occurring in the .tex-input-file will 
           \AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter{\{}{be}{not be} tokenized as explicit character tokens of letter category.}
\message{^^JSubsequent instances of the curly right brace occurring in the .tex-input-file will 
           \AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter{\}}{be}{not be} tokenized as explicit character tokens of letter category.}
\message{^^JSubsequent instances of the character A occurring in the .tex-input-file will 
           \AtIfSubsequentlyTokenizedCharactersWillHaveCategoryLetter{A}{be}{not be} tokenized as explicit character tokens of letter category.}
\bye

Console output:

6 
12 
12 
14 
0 
1 
2 
11

Subsequent instances of the hash-character occurring in the .tex-input-file wil l not be tokenized as explicit character tokens of letter category.

Subsequent instances of the character - occurring in the .tex-input-file will n ot be tokenized as explicit character tokens of letter category.

Subsequent instances of the character + occurring in the .tex-input-file will n ot be tokenized as explicit character tokens of letter category.

Subsequent instances of the percent-character occurring in the .tex-input-file will not be tokenized as explicit character tokens of letter category.

Subsequent instances of the backslash character occurring in the .tex-input-fil e will not be tokenized as explicit character tokens of letter category.

Subsequent instances of the curly left brace occurring in the .tex-input-file w ill not be tokenized as explicit character tokens of letter category.

Subsequent instances of the curly right brace occurring in the .tex-input-file will not be tokenized as explicit character tokens of letter category.

Subsequent instances of the character A occurring in the .tex-input-file will b e tokenized as explicit character tokens of letter category.

Ulrich Diez
  • 28,770
  • You've supposed well: I wanted to mean \catcode. This resolution (\catcode`\A=11 inside a group) is somewhat redundant. You pass information that will be retrieved, need to create a group to make code portable and lose legibility. This is exactly what I was not looking for. Anyway, your answer is really useful and rich of information; it has showed miss conceptions I was holding. Thank you!

    PS: I not realize the approach I want is impossible. I will do it soon :-)

    – Daniel Bandeira Sep 08 '22 at 10:26
  • I have a doubt about your first snippet: the \firstofone ends the group before using the safe value of catcode's A. In my naive view, this should destroy the propose of setting catcode's A to 11. – Daniel Bandeira Sep 08 '22 at 10:49
  • applying \firstofone would tokenize its argument so the catcodes are frozen – plante Sep 08 '22 at 11:55
  • 1
    @DanielBandeira \firstofone in TeX's gullet first causes TeX to read and tokenize and deliver to the gullet the tokens that form \firstofone's argument. While tokens are created, categories are assigned to explicit character tokens.The category of an explicit character token won't change any more. Then, still in the gullet, \firstofone and its argument are replaced by the argument. Thus \endgroup, which is niot expandable, reaches the stomach and ends the group, i.e., ensures that catcode-changes are not in effect any more when tokenizing things following \firstofone's argument. – Ulrich Diez Sep 08 '22 at 17:20
  • Another clobbering on the stomach by Mr TeX!

    It's not being easy to me to accept that.

    – Daniel Bandeira Sep 09 '22 at 22:13
2

Let's say you want to test whether the next nonspace token is a letter, that is, category code 11. This won't work with {<more than one token>} after \isletter.

\newlinechar=`^^J % for newlines

\def\isletter#1{% TT\fi \ifcat\noexpand#1\relax % the next token is a control sequence \expandafter\testletchar \else \expandafter\testchar \fi #1% } \def\testletchar#1{% \ifcat\noexpand#1a% } \def\testchar#1{% \ifcat\noexpand#1a% \noexpand for active characters }

\let\achar=a

\def\test#1{ \if\isletter #1% \message{LETTER^^J}% \else \message{NONLETTER^^J}% \fi }

\test{a} \test{\achar} \test{~} \test{\hfuzz} \test{$} \test{b}

\bye

The output on the console is

a = LETTER

\achar = LETTER

~ = NONLETTER

\hfuzz = NONLETTER

$ = NONLETTER

b = LETTER

Doing an \ifnum test would fail with \achar.

If you want to check also for braces, you need something more complicated with \futurelet.

egreg
  • 1,121,712