Expansion rule for a character redefined with `\newunicodechar`

Question

Consider the following code. With xelatex or lualatex, it produces three identical results. However, with pdflatex one shall get an error.

The main "macro" is a Unicode character ᵢ, defined to be a command with \newunicodechar. As comparison, two other macros with similar definitions are given:

The first is a macro \TestA which is then used directly, one \expandafter is needed for \@ifnextchar to work properly.
The second is another macro \TestB which is later called through \csname...\endcsname, and according to this answer, three \expandafter is needed here.

With an Unicode engine, the character version seems to behave like \TestA. However, with pdflatex, neither one nor three \expandafter work.

\documentclass[border=5pt]{standalone}
\usepackage{newunicodechar}
\makeatletter
\newif\if@unisub\@unisubfalse
\newcommand{\@unisubA}{\if@unisub\else\sb\bgroup\fi}
\newcommand{\@unisubB}{\@ifnextchar\@unisubA{\@unisubtrue}{\egroup\@unisubfalse}}
\newunicodechar{ᵢ}{@unisubA i \expandafter@unisubB}
\def\testA{@unisubA i \expandafter@unisubB}
\def\testB{@unisubA i \expandafter\expandafter\expandafter@unisubB}
\makeatother
\begin{document}
$ᵢᵢᵢ$
$\testA \testA \testA$
$\csname testB\endcsname \csname testB\endcsname \csname testB\endcsname$
\end{document}

What is the expansion rule for a character defined with \newunicodechar?

I happen to have an implementation that interprets superscripts and subscripts in pdflatex and supports sequence of super/subscript characters correctly here (code written long ago, in retrospect I should not have used prop because it's kind of slow) — user202729, Jul 02 '22 at 14:08
Side note × 2, the package only guarantee that when executed the Unicode character will do the task, I think it's bad idea to rely on what happens when these characters are executed. (nevertheless I think some internal code in expl3 assumes inputenc utf8 is "csname-safe", something like that?) — user202729, Jul 02 '22 at 14:11

score 2 · Answer 1 · answered Jul 01 '22 at 17:33

With a unicode engine, the command \newunicodechar{<char>}{<replacement>} essentially does

\catcode`<char>=\active
\protected\def<char>{<replacement>}

but in an indirect way that allows <char> to be used in <replacement> with its original category code.

Thus <char> becomes a \protected macro.

For pdflatex the thing is different. If n is the hexadecimal slot of <char>, then the above is essentially the same as

\DeclareUnicodeCharacter{n}{<replacement>}

and \@ifnextchar<char> will fail miserably, because <char> has to be a multibyte character as will any \expandafter trick. Well, maybe some cleverly applied \expandafter trickery might work.

You can examine the <char> and decide whether it's a standard ASCII one or it starts with a multibyte prefix. In the latter case you can grab as many tokens as specified, do expansion and hope for the best.

Expansion rule for a character redefined with `\newunicodechar`

1 Answers1

Linked