Automatically combine unicode double subscripts aᵢⱼ = a_{i}_{j} as a_{ij}

Question

I want to use unicode subscripts. How can I make it so that double subscripts are automatically combined into a single one?

\documentclass{standalone}
\usepackage{newunicodechar}
\newunicodechar{ᵢ}{_{i}}
\newunicodechar{ⱼ}{_{j}}
\newunicodechar{ₖ}{_{k}}
\newunicodechar{ₗ}{_{l}}
\begin{document}
\begin{tabular}{l}
$a_{ijkl}$ \\$aᵢⱼₖₗ$  % Error: Double subscript.
\end{tabular}
\end{document}

LuaLatex seems to be able to handle it:

\documentclass{standalone}
\usepackage{unicode-math}
\begin{document}
\begin{tabular}{l}
$a_{ijkl}$ \\$aᵢⱼₖₗ$
\end{tabular}
\end{document}

EDIT: Increased the example size to make possible space differences visible.

More testcases using Jinwen's suggestion

\documentclass{standalone}
%\usepackage{unicode-math}
\usepackage{newunicodechar}
\newunicodechar{ⁱ}{{}^{i}}
\newunicodechar{ʲ}{{}^{j}}
\newunicodechar{ᵏ}{{}^{k}}
\newunicodechar{ˡ}{{}^{l}}
\newunicodechar{ᵢ}{{}_{i}}
\newunicodechar{ⱼ}{{}_{j}}
\newunicodechar{ₖ}{{}_{k}}
\newunicodechar{ₗ}{{}_{l}}
\begin{document}
% spacing test subscript
\begin{tabular}{l}
   $a_{ijkl}$ \ $aᵢⱼₖₗ$
\end{tabular}
% spacing test supscript
\begin{tabular}{l}
   $a^{ijkl}$ \ $aⁱʲᵏˡ$
\end{tabular}
% comined test
\begin{tabular}{l}
   $a^{i}_{j}$ \ $aⁱⱼ$
\end{tabular}
% reverse comined test
\begin{tabular}{l}
   $a_{j}^{i}$ \ $aⱼⁱ$
\end{tabular}
% long sub+supscript
\begin{tabular}{l}
   $a^{ijkl}_{ijkl}$ \ $aⁱʲᵏˡᵢⱼₖₗ$
\end{tabular}
% multiple sub+supscripts
\begin{tabular}{l}
   $a^{ij}_{kl}$ \ $aⁱₗʲₗ$   % Error: Double subscript. (fair enough!)
\end{tabular}
\end{document}

Jinwen · Answer 1 · 2022-07-01T15:15:48.257

1

Below is a method that answers your original question: to combine the scripts together. Take superscript as an example, we have

\@unisupA, which inserts \sp\bgroup at the beginning;
\@unisupB, which checks if the next macro is \@unisupA, if it is, then there is another superscript that follows, in this case there is nothing to do; and if is not, this means that we have reach the end, in this case one should insert \egroup.
For the logic to work, there is also a conditional \if@unisup.

With this method, however, mixture of subscripts and superscripts, as in your last example, is not allowed.

\documentclass{standalone}
\usepackage{newunicodechar}
\makeatletter
\newif\if@unisup\@unisupfalse
\newcommand{\@unisupA}{\if@unisup\else\sp\bgroup\fi}
\newcommand{\@unisupB}{\@ifnextchar\@unisupA{\@unisuptrue}{\egroup\@unisupfalse}}
\newunicodechar{ⁱ}{\@unisupA i \expandafter\@unisupB}
\newunicodechar{ʲ}{\@unisupA j \expandafter\@unisupB}
\newunicodechar{ᵏ}{\@unisupA k \expandafter\@unisupB}
\newunicodechar{ˡ}{\@unisupA l \expandafter\@unisupB}
\newif\if@unisub\@unisubfalse
\newcommand{\@unisubA}{\if@unisub\else\sb\bgroup\fi}
\newcommand{\@unisubB}{\@ifnextchar\@unisubA{\@unisubtrue}{\egroup\@unisubfalse}}
\newunicodechar{ᵢ}{\@unisubA i \expandafter\@unisubB}
\newunicodechar{ⱼ}{\@unisubA j \expandafter\@unisubB}
\newunicodechar{ₖ}{\@unisubA k \expandafter\@unisubB}
\newunicodechar{ₗ}{\@unisubA l \expandafter\@unisubB}
\makeatother
\begin{document}
% spacing test subscript
\begin{tabular}{l}
   $a_{ijkl}$ \ $aᵢⱼₖₗ$
\end{tabular}
% spacing test supscript
\begin{tabular}{l}
   $a^{ijkl}$ \ $aⁱʲᵏˡ$
\end{tabular}
% comined test
\begin{tabular}{l}
   $a^{i}_{j}$ \ $aⁱⱼ$
\end{tabular}
% reverse comined test
\begin{tabular}{l}
   $a_{j}^{i}$ \ $aⱼⁱ$
\end{tabular}
% long sub+supscript
\begin{tabular}{l}
   $a^{ijkl}_{ijkl}$ \ $aⁱʲᵏˡᵢⱼₖₗ$
\end{tabular}
% multiple sub+supscripts
% \begin{tabular}{l}
%    $a^{ij}_{kl}$ \ $aⁱₗʲₗ$   % Error: Double subscript. (fair enough!)
% \end{tabular}
\end{document}

The following is the result of your own examples:

Old answer:

You can add an empty group before the subscript.

\documentclass{standalone}
\usepackage{newunicodechar}
\newunicodechar{ᵢ}{{}_{i}}
\newunicodechar{ⱼ}{{}_{j}}
\begin{document}$aᵢⱼ$\end{document}

edited Jul 01 '22 at 15:15

answered Jul 01 '22 at 13:56

Jinwen

8,518

It seems like this adds a tiny bit of extra spacing. – Hyperplane Jul 01 '22 at 14:01
Also, it breaks the positioning if we try it in the presence of a superscript like typing a^{2}_{ijkl} as a²ᵢⱼₖⱼ – Hyperplane Jul 01 '22 at 14:13
@Hyperplane I've updated my answer, see if this meets your need :) – Jinwen Jul 01 '22 at 15:11
Hm it doesn't want to compile with pdflatex. It works with lualatex. – Hyperplane Jul 01 '22 at 15:19
@Hyperplane Too bad, this makes the code completely useless :( To be frank I don't quite understand why this won't work with pdflatex. I suspect it's the \expandafter that fails. – Jinwen Jul 01 '22 at 15:25
1

The sub/superscripts defined in Unicode are assumed to be upright. The policy of the Unicode Technical Committee is that math notation should be indicated by markup. For clarification, see Unicode Technical Report #25, section 2.8, p.15. – barbara beeton Jul 01 '22 at 15:37
@barbarabeeton Well, to be honest, I don't really care that the unicode committee thinks that way. In fact, the case they make for not intending that usage is kind of weak and the decision shortsighted (IMO). Using unicode subscripts removes noise from source code and makes is much more readable. In particular, a use-case I have is writing mathematical expressions in docstrings that can both be printed as plaintext in terminal or properly rendered using LaTeX / MathJax in autogenerated documentations. In such a scenario you want as little markup as possible. – Hyperplane Jul 01 '22 at 15:50
2

@Hyperplane -- You are welcome to do what you want, but the population of sub/superscripts in Unicode is not "complete", and whatever is not there now will not be added. Then the problem you are faced with will become more complicated, and possibly intractable. – barbara beeton Jul 01 '22 at 20:31
using subscript like that may make if more readable for sighted people with good eyes (for me the tiny subscripts are already nearly unreadable on a screen), but you are removing markup and so remove meaning for screen readers and automatic export to other formats like html. – Ulrike Fischer Jul 02 '22 at 08:40
@UlrikeFischer To small font sizes can literally be fixed by pressing two buttons (crtl++ to zoom in many applications), I use it all the time especially with multiple monitor setup with varying DPI. And yes this removes markup. That's the whole point. Like it or not Unicode basically introduced a subset of what markup languages provide through the inclusion of bold, italic, blackboard bold and other alphabets. If anything I'd advocate that in this case screen readers should become smarter and find useful ways to deal with different Unicode blocks, and not be the reason to hold back. – Hyperplane Jul 02 '22 at 09:22

Hyperplane · Answer 2 · 2024-02-26T12:11:58.393

After a lot of trial and error, and digging into how the Unicode stuff works in 8-bit engine, I found a solution that works both in LuaTeX and PDFTeX. The key is to use

\expandafter\futurelet\expandafter\successor\expandafter\check@successor%

To store the once expanded successor token. When using PDFTeX, if the successor is unicode, this will cause it to be one of \UTFviii@four@octets, \UTFviii@three@octets or \UTFviii@two@octets. We can then dispatch a function that inspects the next 1+n tokens and combines them to the Unicode character. Afterwards, we expand this character once and compare against \subscript.

File: unicode-subscript.sty

% region preamble --------------------------------------------------------------
% IMPLEMENTATION BASED ON \expandafter + \futurelet
%
% Provides public command: `\subscript{arg}`
% Internally uses the namespaces`\usubscript@`
%
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{unicode-subscript}[2024/02/21 Combining Subscripts]
\RequirePackage{iftex}
%
% Usage: \newunicodechar{ᵢ}{\subscript{i}}
% This allows to use multiple unicode subscripts in succession:
% - `xᵢⱼₖ` ⇝ `x\textsubscript{ijk}`
% - `$xᵢⱼₖ$` ⇝ `$x_{ijk}$`
%
% The package is designed to work with both pdftex and luatex.
% Note: Usage of the form `x\subscript{i}\subscript{j}' is not supported.
% endregion preamble -----------------------------------------------------------
% region Package Options -------------------------------------------------------
\newif\ifusubscript@debug\usubscript@debugfalse%  Debug flag
\newif\ifusubscript@testing\usubscript@testingfalse%  Testing flag
\DeclareOption{debug}{\usubscript@debugtrue}
\ProcessOptions\relax%
% endregion Package Options ----------------------------------------------------
% region globals and helper functions ------------------------------------------

% global subscript list variable
\newcommand{\usubscript@start}{\relax}%  marker for the start of a subscript
\newcommand{\usubscript@list@reset}{\let\usubscript@list=\usubscript@start}
\newcommand{\usubscript@list@append}[1]{\edef\usubscript@list{\unexpanded\expandafter{\usubscript@list#1}}}
\usubscript@list@reset% initialize the list
\newcommand{\usubscript@log}[1]{%
%
% Prints the given message if the debug flag is set.
%
\ifusubscript@debug\PackageInfo{subscript}{#1}\fi%
}
\newcommand{\usubscript@getfirsttok}[2]{%
%
% stores the first token of #2 in #1
%
\def@extract##1##2@terminator{\let#1=##1}%
\expandafter@extract#2@terminator%
}
% select the correct dispatch function
\ifpdftex%
    \def\usubscript@check@successor{\usubscript@check@successor@pdftex}%
\else%
    \def\usubscript@check@successor{\usubscript@check{\usubscript@successor}}%
\fi%
% endregion globals and helper functions ---------------------------------------
% region public interface ------------------------------------------------------
\newcommand{\subscript}[1]{%
%
% 1. If we are already in a subscript, \subscript appends the given tokens to the \usubscript@list
%    Else, it resets the \subscriptlist
% 2. Executes \usubscript@check@successor which determines if the next character is also a subscript.
%    In this case, we go back to 1, else we stop the process.
%
\ifx\usubscript@list\usubscript@start%
    % Initialize the list with the frst token.
    \usubscript@log{Initializing list with '\meaning#1'}%
    \def\usubscript@list{#1}%
\else%
    % Append token to existing list.
    \usubscript@log{Appending '\meaning#1' to '\usubscript@list'}%
    \usubscript@list@append{#1}%
\fi%
%
% Check the next token to determine whether to continue the subscript or to terminate it
% Expands successor first before \futurelet, this is important to handle unicode in pdftex
\expandafter\futurelet\expandafter\usubscript@successor\expandafter\usubscript@check@successor%
}
% endregion public interface ---------------------------------------------------
% region private implementation ------------------------------------------------
\newcommand{\usubscript@check}[1]{%
%
% Test whether to terminate the subscript
%
\usubscript@log{Testing against '\meaning#1'}%
%
\ifx#1\subscript%
    \usubscript@log{ >>> Successor is another subscript!}%
\else%
    \usubscript@log{ >>> Successor is not a subscript!}%
    \usubscript@finalize%
\fi%
}
\newcommand{\usubscript@finalize}{%
%
% Terminate the subscript and insert the result
%
\usubscript@log{Terminating with current list '\meaning\usubscript@list'}%
%
\ifmmode%
    \usubscript@log{ >>> Inserting '_{\meaning\usubscript@list}{}'}%
    \sb\bgroup\usubscript@list\egroup%
\else%
    \usubscript@log{ >>> Inserting '\textsubscript{\meaning\usubscript@list}'}%
    \textsubscript{\usubscript@list}%
\fi%
%
\usubscript@list@reset%
}
\newcommand{\usubscript@check@successor@pdftex}{%
%
% There are 2 cases we consider:
% 1. The next token is a subscript, in which case we continue the process.
% 2. The next token is some unicode character, in which case:
%   2.1. We grab the necessary number of tokens if using an 8-bit engine
%   2.2. We expand the unicode character once to get the replacement tokens.
%   2.3. We compare the first token of the replacement tokens to the subscript token.
%
\usubscript@log{>>> Dispatching on \meaning\usubscript@successor'}%
%
\ifx\usubscript@successor\UTFviii@four@octets%
    \usubscript@log{ >>> Detected Unicode 4 octets}%
    \def\usubscript@execute{\usubscript@check@unicode@four}%
\else\ifx\usubscript@successor\UTFviii@three@octets%
    \usubscript@log{ >>> Detected Unicode 3 octets}%
    \def\usubscript@execute{\usubscript@check@unicode@three}%
\else\ifx\usubscript@successor\UTFviii@two@octets%
    \usubscript@log{ >>> Detected Unicode 2 octets}%
    \def\usubscript@execute{\usubscript@check@unicode@two}%
\else%
    \usubscript@log{ >>> Detected non-Unicode}%
    \def\usubscript@execute{\usubscript@check{\usubscript@successor}}%
\fi\fi\fi%
%
% dispatch the selected command.
%
\usubscript@execute%
}%
\newcommand{\usubscript@check@unicode@four}[5]{% grabs 1+4 tokens
%
\usubscript@log{>>> Expand Unicode Quadruplet}%
%
\unless\ifcsname u8:#1#2#3#4#5\endcsname%
    \PackageError{subscript}{Detected undefined unicode.}%
\fi%
%
\expandafter\let\expandafter\usubscript@token\csname u8:#1#2#3#4#5\endcsname%
\usubscript@log{Detected unicode '\meaning\usubscript@token'}%
%
\usubscript@getfirsttok{\usubscript@firsttoken}{\usubscript@token}%
\usubscript@check{\usubscript@firsttoken}%
%
\usubscript@log{Reinserting '\meaning#1#2#3#4#5'}%
#1#2#3#4#5%
}
\newcommand{\usubscript@check@unicode@three}[4]{% grabs 1+3 tokens
%
\usubscript@log{>>> Expand Unicode Triplet}%
%
\unless\ifcsname u8:#1#2#3#4\endcsname%
    \PackageError{subscript}{Detected undefined unicode.}%
\fi%
%
\expandafter\let\expandafter\usubscript@token\csname u8:#1#2#3#4\endcsname%
\usubscript@log{Detected unicode '\meaning\usubscript@token'}%
%
\usubscript@getfirsttok{\usubscript@firsttoken}{\usubscript@token}%
\usubscript@check{\usubscript@firsttoken}%
%
\usubscript@log{Reinserting '\meaning#1#2#3#4'}%
#1#2#3#4%
}
\newcommand{\usubscript@check@unicode@two}[3]{% grabs 1+2 tokens
%
\usubscript@log{>>> Expand Unicode Duplet}%
%
\unless\ifcsname u8:#1#2#3\endcsname%
    \PackageError{subscript}{Detected undefined unicode.}%
\fi%
%
\expandafter\let\expandafter\usubscript@token\csname u8:#1#2#3\endcsname%
\usubscript@log{Detected unicode '\meaning\usubscript@token'}%
%
\usubscript@getfirsttok{\usubscript@firsttoken}{\usubscript@token}%
\usubscript@check{\usubscript@firsttoken}%
%
\usubscript@log{Reinserting '\meaning#1#2#3'}%
#1#2#3%
}
% endregion private implementation ---------------------------------------------
\endinput

File tests.tex

\documentclass{standalone}
\usepackage{newunicodechar}
\usepackage[debug]{unicode-subscript}
\AtBeginDocument{
\newunicodechar{ᵢ}{\subscript{i}}
\newunicodechar{ⱼ}{\subscript{j}}
\newunicodechar{ₖ}{\subscript{k}}
\newunicodechar{ₗ}{\subscript{l}}
\newunicodechar{ₘ}{\subscript{m}}
\newunicodechar{ₙ}{\subscript{n}}
}
\newcommand{\needsfour}[4]{#4#3#2#1}
\begin{document}
$aᵢⱼₖ$
a\subscript{ij$kl$mn}
$a\subscript{ijk}$
% test mathmode
\begin{tabular}{l}
$a_{ijklmn}$
\ $aᵢⱼₖₗₘₙ$
\ $a\subscript{ijklmn}$
\end{tabular}
% test textmode
\begin{tabular}{ll}
a\textsubscript{ijklmn}
\ aᵢⱼₖₗₘₙ
\ a\subscript{ijklmn}
\end{tabular}
$aᵢ\needsfour4321$
dₘ
\end{document}

Result

Automatically combine unicode double subscripts aᵢⱼ = a_{i}_{j} as a_{ij}

More testcases using Jinwen's suggestion

2 Answers2

Linked