Interpreting unicode ², ³, etc... characters in math mode

Question

Short question:

Is there some font encoding or other trick that would allow the unicode character ² (exponent 2) to be correctly interpreted by LaTeX, notably in math mode (i.e. translate it automatically into ^{2})?

(and likewise for ³, ⁴, ...)

Detailed rationale:

Normally, to introduce an exponent in math mode one uses the ^ symbol, as in x^2 or e^{i\pi}. I've been using LaTeX for decades and this feels very natural. However, with my current setup, when I type the characters x ^ 2 on the keyboard, it results in the two unicode characters x².

I do have a \RequirePackage[utf8]{inputenc} around the beginning of my personal style file, and Unicode characters in general are interpreted correctly (accented characters and such).

Usually I think about it and do type x ^ ^ 2 which results in x^2, but often I'm typing fast and later have to go back to every line of code that produces a LaTeX Error: Command \texttwosuperior unavailable in en error. This is annoying.

I could implement a workaround at the level of my editor (I use vim, so it would be simple to add a mapping to convert the ² into ^2), but I'm wondering if there's a better editor-independent way to handle this.

For whatever it's worth, the Unicode Technical Committee, when math symbols were being added, didn't consider the built-in numeric sub and superscripts to be math, since no other subs or sups (alphabetic, etc.) would be added to Unicode. The stated opinion was that math subs and sups should be indicated by markup. — barbara beeton, May 11 '23 at 20:40

Henri Menke · Accepted Answer · 2016-12-15T10:44:32.573

16

ConTeXt does it right.

\starttext

x²³ and $x²³$

\stoptext

The same can be achieved with LaTeX and unicode-math.

\documentclass{article}
\usepackage{unicode-math}
\begin{document}
x²³ and $x²³$
\end{document}

In pdfLaTeX you have to use newunicodechar to redefine ² and ³ to make them math-mode aware.

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % for \text...superior
\usepackage{newunicodechar}
\newunicodechar{²}{\ifmmode{}^2\else\texttwosuperior\fi}
\newunicodechar{³}{\ifmmode{}^3\else\textthreesuperior\fi}
\begin{document}
x²³ and $x²³$
\end{document}

edited Dec 15 '16 at 10:44

answered Dec 15 '16 at 10:39

Henri Menke

109,596

The first LaTeX variant you suggest produces an error message with pdflatex. I was about to comment on that when I saw your edit and 3rd variant. That one is indeed working with pdflatex. Nice workaround! – Dalker Dec 15 '16 at 10:51
4

@Dalker Of course, the second variant produces an error with pdflatex because it is not a Unicode engine. Long time I was also an advocate of pdflatex (mainly for performance reasons) but not using a Unicode engine in 2016 is just not reasonable. I therefore use lualatex. – Henri Menke Dec 15 '16 at 12:32
Don't worry, @Henri Menke, I did upvote your answer earlier. As for accepting, I have a general policy in StackExchange sites to wait ~24 hours in case someone from a different time zone wants to contribute a different answer. Regarding lualatex, I guess I should take a thorough look at it sometime. Haven't done so yet mostly for lack of time, as well as to remain 100% compatible with colleagues who use pdflatex. But you have given me arguments to try it soonish. – Dalker Dec 15 '16 at 15:35
1

Did you try $\Big|²³$ ? Actually a single exponent suffices for seeing that your \ifmmode trick is wrong, because it sets an exponent to an empty formula. – egreg Jan 06 '17 at 11:05
@egreg That is true. But I don’t see an easy solution to this. One could attempt something like ' does but would need a lot of nested if statements. – Henri Menke Jan 06 '17 at 11:38
@HenriMenke A lot indeed! – egreg Jan 06 '17 at 11:42
1

Looking around a bit I don't understand how ConTeXt handles it however. \showlists just show the characters ², the mathcode is normal, the $ doesn't grab argument verbatim etc., but in the output PDF it's still identical to the output of ^2. Perhaps ConTeXt hooks into the output routine for the fonts or something. – user202729 Jan 06 '22 at 06:21

user202729 · Answer 2 · 2023-05-25T08:56:51.793

Note: This answer is mostly redundant. You can use unicode-math as described in the other answer, or if it's inapplicable, use my package unicode-math-input.

Original answer below.

If you want to use the symbols in XeTeX or LuaTeX without importing Unicode-math (because it changes the output), it's possible to copy some part of the Unicode-math source code and put it in the preamble.

The basic idea is define the characters to be active character, that when expanded repeatedly ("scan" the characters forward, using \__um_scan_sscript: and its auxiliary functions)

peek the character to the right of it (\peek_N_type:TF)
check if it's another superscript/subscript character (\prop_get:cxNTF)
If it is, push it somewhere (\l__um_ss_chain_tl) and repeat.

See "19. Unicode sub- and super-scripts" section in the documentation of the unicode-math source code (CTAN) for a more detailed explanation.

Another method is to do something like the following, inspired from an idea in the implementation of the ' macro in math mode (see e.g. appendix B part 6 "Macros for math" → "an interesting set of macros that convert f''' into f^{\prime\prime\prime}" in the TeXbook):

make ² expands to ^{2\continuesuperscript},
then ²³ expands to ^{2\continuesuperscript}^{3\continuesuperscript},
the \continuesuperscript macro is such that ^{2\continuesuperscript}^{3} is equivalent to ^{23},
such a macro works by:
- absorbs the following },
- then do an unlimited f-expansion of the following tokens (in order to expand any ³ to ^{3...} as needed, note that the possible loss of a space does not matter in math mode),
- then check if the following character is ^,
  - if yes then remove the ^, and another argument #2, and put back #2\egroup (so ^{2\continuesuperscript}^{34} becomes ^{234\egroup),
    - interestingly, by doing this ^{2\continuesuperscript}^\bgroup34\egroup becomes ^{2\bgroup\egroup34\egroup where the #2 is \bgroup, and it's as intended,
  - otherwise, then put back and execute the }.

The checking whether the following character is ^ necessitates an unexpandable implementation in non-LuaTeX engines (use \futurelet).

See https://gist.github.com/user202729/9b599cdaec827a0bdf6478fb5729b157 for a proof-of-concept implementation. (the answer becomes too long I can't include it)

A similar algorithm is possible in PDFTeX, but it needs some adaptation because each Unicode character (in UTF-8) corresponds to multiple tokens in PDFTeX. (scroll all the way down below for a proof-of-concept implementation)

Demonstration for Unicode engines: (in retrospect this was some ugly implementation, but I don't particularly feel like rewriting it)

%! TEX program = xelatex
\documentclass{article}
\usepackage{amssymb}
\ExplSyntaxOn
\prop_new:N \g__um_supers_prop
\prop_new:N \g__um_subs_prop
\cs_generate_variant:Nn \prop_gput:Nnn {Nxn}
\cs_generate_variant:Nn \prop_get:NnNTF {cxNTF}
\cs_new:Nn __um_char_gmake_mathactive:n
  {
    \tex_global:D \tex_mathcode:D \int_eval:n {#1} = "8000 \scan_stop:
  }
\cs_new:Nn __um_mathactive_remap:nn
  {
    \group_begin:
      \cs_set_protected:Npn __um_tmp: {#2}
      __um_char_gmake_mathactive:n {#1}
      \char_gset_active_eq:nN {#1} __um_tmp:
    \group_end:
  }
\cs_new:Nn __um_setup_active_superscript:nn
  {
    \prop_gput:Nxn \g__um_supers_prop { \int_eval:n {#1} } {#2}
    __um_mathactive_remap:nn {#1}
       {
        \tl_set:Nn \l__um_ss_chain_tl {#2}
        \cs_set_eq:NN __um_sub_or_super:n \sp
        \tl_set:Nn \l__um_tmpa_tl {supers}
        __um_scan_sscript:
       }
  }
\cs_new:Nn __um_setup_active_subscript:nn
  {
    \prop_gput:Nxn \g__um_subs_prop { \int_eval:n {#1} } {#2}
    __um_mathactive_remap:nn {#1}
      {
        \tl_set:Nn \l__um_ss_chain_tl {#2}
        \cs_set_eq:NN __um_sub_or_super:n \sb
        \tl_set:Nn \l__um_tmpa_tl {subs}
        __um_scan_sscript:
      }
  }
\cs_new_protected:Nn __um_scan_sscript:
  {
    __um_scan_sscript:TF
      { __um_scan_sscript: }
      { __um_sub_or_super:n {\l__um_ss_chain_tl} }
  }
\cs_new_protected:Nn __um_scan_sscript:TF
  {
    \peek_N_type:TF
      {
        \group_align_safe_begin:
        __um_scan_sscript_aux:nnN {#1} {#2}
      }
      {#2}
  }
\cs_new_protected:Nn __um_scan_sscript_aux:nnN
  {
    \tl_set:Nx \l__um_tmpa_key_tl { \tl_to_str:n {#3} }
    \prop_get:cxNTF {g__um_\l__um_tmpa_tl _prop}
      { \int_eval:n { \exp_after:wN ` \l__um_tmpa_key_tl } }
      \l__um_tmpb_tl
      {
        \tl_put_right:NV \l__um_ss_chain_tl \l__um_tmpb_tl
        \group_align_safe_end:
        #1
      }
      { \group_align_safe_end: #2 #3 }
  }
__um_setup_active_superscript:nn {"2070} {0}
__um_setup_active_superscript:nn {"00B9} {1}
__um_setup_active_superscript:nn {"00B2} {2}
__um_setup_active_superscript:nn {"00B3} {3}
__um_setup_active_superscript:nn {"2074} {4}
__um_setup_active_superscript:nn {"2075} {5}
__um_setup_active_superscript:nn {"2076} {6}
__um_setup_active_superscript:nn {"2077} {7}
__um_setup_active_superscript:nn {"2078} {8}
__um_setup_active_superscript:nn {"2079} {9}
__um_setup_active_superscript:nn {"207A} {+}
__um_setup_active_superscript:nn {"207B} {-}
__um_setup_active_superscript:nn {"207C} {=}
__um_setup_active_superscript:nn {"207D} {(}
__um_setup_active_superscript:nn {"207E} {)}
__um_setup_active_superscript:nn {"1D2C} {A}
__um_setup_active_superscript:nn {"1D2E} {B}
__um_setup_active_superscript:nn {"1D30} {D}
__um_setup_active_superscript:nn {"1D31} {E}
__um_setup_active_superscript:nn {"1D33} {G}
__um_setup_active_superscript:nn {"1D34} {H}
__um_setup_active_superscript:nn {"1D35} {I}
__um_setup_active_superscript:nn {"1D36} {J}
__um_setup_active_superscript:nn {"1D37} {K}
__um_setup_active_superscript:nn {"1D38} {L}
__um_setup_active_superscript:nn {"1D39} {M}
__um_setup_active_superscript:nn {"1D3A} {N}
__um_setup_active_superscript:nn {"1D3C} {O}
__um_setup_active_superscript:nn {"1D3E} {P}
__um_setup_active_superscript:nn {"1D3F} {R}
__um_setup_active_superscript:nn {"1D40} {T}
__um_setup_active_superscript:nn {"1D41} {U}
__um_setup_active_superscript:nn {"2C7D} {V}
__um_setup_active_superscript:nn {"1D42} {W}
__um_setup_active_superscript:nn {"1D43} {a}
__um_setup_active_superscript:nn {"1D47} {b}
__um_setup_active_superscript:nn {"1D9C} {c}
__um_setup_active_superscript:nn {"1D48} {d}
__um_setup_active_superscript:nn {"1D49} {e}
__um_setup_active_superscript:nn {"1DA0} {f}
__um_setup_active_superscript:nn {"1D4D} {g}
__um_setup_active_superscript:nn {"02B0} {h}
__um_setup_active_superscript:nn {"2071} {i}
__um_setup_active_superscript:nn {"02B2} {j}
__um_setup_active_superscript:nn {"1D4F} {k}
__um_setup_active_superscript:nn {"02E1} {l}
__um_setup_active_superscript:nn {"1D50} {m}
__um_setup_active_superscript:nn {"207F} {n}
__um_setup_active_superscript:nn {"1D52} {o}
__um_setup_active_superscript:nn {"1D56} {p}
__um_setup_active_superscript:nn {"02B3} {r}
__um_setup_active_superscript:nn {"02E2} {s}
__um_setup_active_superscript:nn {"1D57} {t}
__um_setup_active_superscript:nn {"1D58} {u}
__um_setup_active_superscript:nn {"1D5B} {v}
__um_setup_active_superscript:nn {"02B7} {w}
__um_setup_active_superscript:nn {"02E3} {x}
__um_setup_active_superscript:nn {"02B8} {y}
__um_setup_active_superscript:nn {"1DBB} {z}
__um_setup_active_superscript:nn {"1D5D} {\beta}
__um_setup_active_superscript:nn {"1D5E} {\gamma}
__um_setup_active_superscript:nn {"1D5F} {\delta}
__um_setup_active_superscript:nn {"1D60} {\phi}
__um_setup_active_superscript:nn {"1D61} {\chi}
__um_setup_active_superscript:nn {"1DBF} {\theta}
__um_setup_active_subscript:nn {"2080} {0}
__um_setup_active_subscript:nn {"2081} {1}
__um_setup_active_subscript:nn {"2082} {2}
__um_setup_active_subscript:nn {"2083} {3}
__um_setup_active_subscript:nn {"2084} {4}
__um_setup_active_subscript:nn {"2085} {5}
__um_setup_active_subscript:nn {"2086} {6}
__um_setup_active_subscript:nn {"2087} {7}
__um_setup_active_subscript:nn {"2088} {8}
__um_setup_active_subscript:nn {"2089} {9}
__um_setup_active_subscript:nn {"208A} {+}
__um_setup_active_subscript:nn {"208B} {-}
__um_setup_active_subscript:nn {"208C} {=}
__um_setup_active_subscript:nn {"208D} {(}
__um_setup_active_subscript:nn {"208E} {)}
__um_setup_active_subscript:nn {"2090} {a}
__um_setup_active_subscript:nn {"2091} {e}
__um_setup_active_subscript:nn {"2095} {h}
__um_setup_active_subscript:nn {"1D62} {i}
__um_setup_active_subscript:nn {"2C7C} {j}
__um_setup_active_subscript:nn {"2096} {k}
__um_setup_active_subscript:nn {"2097} {l}
__um_setup_active_subscript:nn {"2098} {m}
__um_setup_active_subscript:nn {"2099} {n}
__um_setup_active_subscript:nn {"2092} {o}
__um_setup_active_subscript:nn {"209A} {p}
__um_setup_active_subscript:nn {"1D63} {r}
__um_setup_active_subscript:nn {"209B} {s}
__um_setup_active_subscript:nn {"209C} {t}
__um_setup_active_subscript:nn {"1D64} {u}
__um_setup_active_subscript:nn {"1D65} {v}
__um_setup_active_subscript:nn {"2093} {x}
__um_setup_active_subscript:nn {"1D66} {\beta}
__um_setup_active_subscript:nn {"1D67} {\gamma}
__um_setup_active_subscript:nn {"1D68} {\rho}
__um_setup_active_subscript:nn {"1D69} {\phi}
__um_setup_active_subscript:nn {"1D6A} {\chi}
\ExplSyntaxOff
\begin{document}
$x²³₄₅ + \mathbb{R} + \varnothing$
\end{document}

Output:

As you can see, ℝ and ∅ are not affected. (it's possible, but complex to get the old symbols back, see 1 2. Besides, unicode-math is slow to compile)

This is an implementation for PDFTeX ucs package.

(warning: the implementation is extremely fragile and may break in the next UCS version)

Read the comment for more details.

Usage is not recommended.

And this is a version for standard (recommended) utf8 encoding.

%! TEX program = pdflatex
% vim: ts=2 sw=2 et:
\documentclass[12pt]{article}
%\usepackage[mathletters]{ucs}
%\usepackage[utf8x]{inputenc}
\usepackage[utf8]{inputenc}
\ExplSyntaxOn
% (originally) key: int value (Unicode code point), value: the corresponding (non-sscript) character
% (modified) key: sequence of UTF8,
%   value: either __um_partial:n or __um_complete {non-sscript character}
\prop_new:N \g__um_supers_prop
\prop_new:N \g__um_subs_prop
\cs_generate_variant:Nn \prop_gput:Nnn {Nxn}
\cs_generate_variant:Nn \prop_get:NnNTF {cxNTF}
\cs_generate_variant:Nn \exp_args:Nx {c}
\cs_new:Nn __um_mathactive_remap:nn
  {
    \group_begin:
      % for [utf8]
      \exp_args:Nx \DeclareUnicodeCharacter {\tl_tail:n {#1}} {#2}
  % for [utf8x] (require decimal, briefly mentioned in https://github.com/latex3/latex2e/issues/24)
  %\exp_args:Nx \DeclareUnicodeCharacter {\int_eval:n {#1}} {#2}

  % for [utf8x] with clash (see https://tex.stackexchange.com/a/620231/250119)
  %\exp_args:cx {uc@dclc} {\int_eval:n {#1}} {mathletters} {#2}
\group_end:

}
\cs_generate_variant:Nn \int_step_inline:nn {xn}
% #1: property list
% #2: the code point as "AAAA
% #3: the non-sscript corresponding character
\cs_new:Nn __um_put_prefixes:Nnn
{
\tl_set:Nx \l__um_utfviii_bytes {\char_to_utfviii_bytes:n {#2}}
% drop the trailing empty groups (nonexistent bytes)
  \tl_set:Nx \l__um_last_byte {\tl_item:Nn \l__um_utfviii_bytes {-1}}
  \bool_while_do:nn {
    \tl_if_empty_p:N \l__um_last_byte
  } {
    \tl_set:Nx \l__um_utfviii_bytes {\tl_range:Nnn \l__um_utfviii_bytes {1} {-2}}
    \tl_set:Nx \l__um_last_byte {\tl_item:Nn \l__um_utfviii_bytes {-1}}
  }
\cs_set:Nn __um_char_generate_as_other:n {
    \char_generate:nn {##1} {12} % 12: other, same as output of \tl_to_str:n
  }
% convert hex to bytes
  \tl_set:Nx \l__um_utfviii_bytes {
    \tl_map_function:NN \l__um_utfviii_bytes __um_char_generate_as_other:n
  }
% iterate through incomplete prefixes and define
  \int_step_inline:xn {\tl_count:N \l__um_utfviii_bytes - 1}
  {
    \prop_gput:Nxn #1 { \tl_range:Nnn \l__um_utfviii_bytes {1} {##1} } {__um_partial:nnnn}
  }
% define for the only complete prefix
  \prop_gput:Nxn #1 \l__um_utfviii_bytes {__um_complete:nnnnn {#3}}
}
\cs_new:Nn __um_setup_active_superscript:nn
  {
    __um_put_prefixes:Nnn \g__um_supers_prop {#1} {#2}
    __um_mathactive_remap:nn {#1}
       {
        \tl_set:Nn \l__um_ss_chain_tl {#2}
        \cs_set_eq:NN __um_sub_or_super:n \sp
        \tl_set:Nn \l__um_tmpa_tl {supers}
        __um_scan_sscript:
       }
  }
\cs_new:Nn __um_setup_active_subscript:nn
  {
    __um_put_prefixes:Nnn \g__um_subs_prop {#1} {#2}
    __um_mathactive_remap:nn {#1}
      {
        \tl_set:Nn \l__um_ss_chain_tl {#2}
        \cs_set_eq:NN __um_sub_or_super:n \sb
        \tl_set:Nn \l__um_tmpa_tl {subs}
        __um_scan_sscript:
      }
  }
\cs_new_protected:Nn __um_scan_sscript:
  {
    __um_scan_sscript:nnn
      { __um_scan_sscript: }  % true (got a new character), keep scanning
      { __um_sub_or_super:n {\l__um_ss_chain_tl} }  % "typesets what it has collected"
      {}
  }
% #1, #2, #3: same as below
\cs_new_protected:Nn __um_scan_sscript:nnn
  {
    \peek_N_type:TF
      {
        \group_align_safe_begin:
        __um_scan_sscript_aux:nnnN {#1} {#2} {#3}
      }
      {
        #2 % execute false code
        #3 % return the partial token
      }
  }
%\cs_generate_variant:Nn __um_scan_sscript:nnn {nnx}
% #1: true code (if the new token continues the chain, then \tl_put_right:NV it to the chain and execute this)
% #2: false code
% #3: the partial token (must not be stringified, in case it's returned later)
% #4: the new token (also not stringified)
\cs_new_protected:Nn __um_scan_sscript_aux:nnnN
  {
    \tl_set:Nx \l__um_tmpa_key_tl { \tl_to_str:n {#4} }
    \prop_get:cxNTF {g__um_\l__um_tmpa_tl _prop}
      %{ \int_eval:n { \exp_after:wN ` \l__um_tmpa_key_tl } }
      { \tl_to_str:n {#3}  \l__um_tmpa_key_tl }
      \l__um_tmpb_tl
      {
        % if there is, do something depends on the result
        \group_align_safe_end:
        \l__um_tmpb_tl {#1} {#2} {#3} {#4}
      }
      { \group_align_safe_end: #2 #3 #4 }  % execute the false code, then return the non-matching part back
  }
% #1: corresponding non-sscript character
% rest: as above
\cs_new:Nn __um_complete:nnnnn
  {
    \tl_put_right:Nn \l__um_ss_chain_tl #1
    #2
  }
% 1-4: as __um_scan_sscript_aux:nnnN
\cs_new:Nn __um_partial:nnnn
  {
    __um_scan_sscript:nnn {#1} {#2} {#3 #4}
  }
__um_setup_active_superscript:nn {"2070} {0}
__um_setup_active_superscript:nn {"00B9} {1}
__um_setup_active_superscript:nn {"00B2} {2}
__um_setup_active_superscript:nn {"00B3} {3}
__um_setup_active_superscript:nn {"2074} {4}
__um_setup_active_superscript:nn {"2075} {5}
__um_setup_active_superscript:nn {"2076} {6}
__um_setup_active_superscript:nn {"2077} {7}
__um_setup_active_superscript:nn {"2078} {8}
__um_setup_active_superscript:nn {"2079} {9}
__um_setup_active_superscript:nn {"207A} {+}
__um_setup_active_superscript:nn {"207B} {-}
__um_setup_active_superscript:nn {"207C} {=}
__um_setup_active_superscript:nn {"207D} {(}
__um_setup_active_superscript:nn {"207E} {)}
__um_setup_active_superscript:nn {"1D2C} {A}
__um_setup_active_superscript:nn {"1D2E} {B}
__um_setup_active_superscript:nn {"1D30} {D}
__um_setup_active_superscript:nn {"1D31} {E}
__um_setup_active_superscript:nn {"1D33} {G}
__um_setup_active_superscript:nn {"1D34} {H}
__um_setup_active_superscript:nn {"1D35} {I}
__um_setup_active_superscript:nn {"1D36} {J}
__um_setup_active_superscript:nn {"1D37} {K}
__um_setup_active_superscript:nn {"1D38} {L}
__um_setup_active_superscript:nn {"1D39} {M}
__um_setup_active_superscript:nn {"1D3A} {N}
__um_setup_active_superscript:nn {"1D3C} {O}
__um_setup_active_superscript:nn {"1D3E} {P}
__um_setup_active_superscript:nn {"1D3F} {R}
__um_setup_active_superscript:nn {"1D40} {T}
__um_setup_active_superscript:nn {"1D41} {U}
__um_setup_active_superscript:nn {"2C7D} {V}
__um_setup_active_superscript:nn {"1D42} {W}
__um_setup_active_superscript:nn {"1D43} {a}
__um_setup_active_superscript:nn {"1D47} {b}
__um_setup_active_superscript:nn {"1D9C} {c}
__um_setup_active_superscript:nn {"1D48} {d}
__um_setup_active_superscript:nn {"1D49} {e}
__um_setup_active_superscript:nn {"1DA0} {f}
__um_setup_active_superscript:nn {"1D4D} {g}
__um_setup_active_superscript:nn {"02B0} {h}
__um_setup_active_superscript:nn {"2071} {i}
__um_setup_active_superscript:nn {"02B2} {j}
__um_setup_active_superscript:nn {"1D4F} {k}
__um_setup_active_superscript:nn {"02E1} {l}
__um_setup_active_superscript:nn {"1D50} {m}
__um_setup_active_superscript:nn {"207F} {n}
__um_setup_active_superscript:nn {"1D52} {o}
__um_setup_active_superscript:nn {"1D56} {p}
__um_setup_active_superscript:nn {"02B3} {r}
__um_setup_active_superscript:nn {"02E2} {s}
__um_setup_active_superscript:nn {"1D57} {t}
__um_setup_active_superscript:nn {"1D58} {u}
__um_setup_active_superscript:nn {"1D5B} {v}
__um_setup_active_superscript:nn {"02B7} {w}
__um_setup_active_superscript:nn {"02E3} {x}
__um_setup_active_superscript:nn {"02B8} {y}
__um_setup_active_superscript:nn {"1DBB} {z}
__um_setup_active_superscript:nn {"1D5D} {\beta}
__um_setup_active_superscript:nn {"1D5E} {\gamma}
__um_setup_active_superscript:nn {"1D5F} {\delta}
__um_setup_active_superscript:nn {"1D60} {\phi}
__um_setup_active_superscript:nn {"1D61} {\chi}
__um_setup_active_superscript:nn {"1DBF} {\theta}
__um_setup_active_subscript:nn {"2080} {0}
__um_setup_active_subscript:nn {"2081} {1}
__um_setup_active_subscript:nn {"2082} {2}
__um_setup_active_subscript:nn {"2083} {3}
__um_setup_active_subscript:nn {"2084} {4}
__um_setup_active_subscript:nn {"2085} {5}
__um_setup_active_subscript:nn {"2086} {6}
__um_setup_active_subscript:nn {"2087} {7}
__um_setup_active_subscript:nn {"2088} {8}
__um_setup_active_subscript:nn {"2089} {9}
__um_setup_active_subscript:nn {"208A} {+}
__um_setup_active_subscript:nn {"208B} {-}
__um_setup_active_subscript:nn {"208C} {=}
__um_setup_active_subscript:nn {"208D} {(}
__um_setup_active_subscript:nn {"208E} {)}
__um_setup_active_subscript:nn {"2090} {a}
__um_setup_active_subscript:nn {"2091} {e}
__um_setup_active_subscript:nn {"2095} {h}
__um_setup_active_subscript:nn {"1D62} {i}
__um_setup_active_subscript:nn {"2C7C} {j}
__um_setup_active_subscript:nn {"2096} {k}
__um_setup_active_subscript:nn {"2097} {l}
__um_setup_active_subscript:nn {"2098} {m}
__um_setup_active_subscript:nn {"2099} {n}
__um_setup_active_subscript:nn {"2092} {o}
__um_setup_active_subscript:nn {"209A} {p}
__um_setup_active_subscript:nn {"1D63} {r}
__um_setup_active_subscript:nn {"209B} {s}
__um_setup_active_subscript:nn {"209C} {t}
__um_setup_active_subscript:nn {"1D64} {u}
__um_setup_active_subscript:nn {"1D65} {v}
__um_setup_active_subscript:nn {"2093} {x}
__um_setup_active_subscript:nn {"1D66} {\beta}
__um_setup_active_subscript:nn {"1D67} {\gamma}
__um_setup_active_subscript:nn {"1D68} {\rho}
__um_setup_active_subscript:nn {"1D69} {\phi}
__um_setup_active_subscript:nn {"1D6A} {\chi}
\ExplSyntaxOff
\begin{document}
[ x²³₄₅ + \left( \frac{1}{2} \right) ⁶⁷₈₉ + \int ₁² x , dx]
\end{document}

This is probably not the best method to implement it, but I couldn't figure out any better way.

(Idea from David Carlisle in chat. I can't find something similar existing, or a package) — user202729, Oct 10 '21 at 10:34
Also not sure if this fails if a braced group directly follows the super/subscript. Maybe I could use \peek_N_type. — user202729, Jul 02 '22 at 14:10

score 2 · Answer 3 · answered Apr 12 '23 at 18:10

2

OpTeX does it right.

\fontfam[lm]
x²³ and $x²³$.
\bye

answered Apr 12 '23 at 18:10

wipet

74,238

Meanwhile can you explain how it's implemented there? OpTeX only supports LuaTeX engine so it doesn't have to deal with multi-character issue but otherwise does it use active-mathcode-peekahead? (and if I recalled correctly the documentation is not in English...?) – user202729 Apr 13 '23 at 06:46
@user202729 OPMac documentation is not in English, but OpTeX is: http://petr.olsak.net/ftp/olsak/optex/optex-doc.pdf – User Apr 14 '23 at 20:36
Okay I see. It doesn't quite get the kerning right though. https://i.stack.imgur.com/0Qtca.png – user202729 Apr 15 '23 at 01:31

Interpreting unicode ², ³, etc... characters in math mode

3 Answers3

Linked

Related