After a lot of trial and error, and digging into how the Unicode stuff works in 8-bit engine, I found a solution that works both in LuaTeX and PDFTeX. The key is to use
\expandafter\futurelet\expandafter\successor\expandafter\check@successor%
To store the once expanded successor token. When using PDFTeX, if the successor is unicode, this will cause it to be one of \UTFviii@four@octets, \UTFviii@three@octets or \UTFviii@two@octets. We can then dispatch a function that inspects the next 1+n tokens and combines them to the Unicode character. Afterwards, we expand this character once and compare against \subscript.
File: unicode-subscript.sty
% region preamble --------------------------------------------------------------
% IMPLEMENTATION BASED ON \expandafter + \futurelet
%
% Provides public command: `\subscript{arg}`
% Internally uses the namespaces`\usubscript@`
%
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{unicode-subscript}[2024/02/21 Combining Subscripts]
\RequirePackage{iftex}
%
% Usage: \newunicodechar{ᵢ}{\subscript{i}}
% This allows to use multiple unicode subscripts in succession:
% - `xᵢⱼₖ` ⇝ `x\textsubscript{ijk}`
% - `$xᵢⱼₖ$` ⇝ `$x_{ijk}$`
%
% The package is designed to work with both pdftex and luatex.
% Note: Usage of the form `x\subscript{i}\subscript{j}' is not supported.
% endregion preamble -----------------------------------------------------------
% region Package Options -------------------------------------------------------
\newif\ifusubscript@debug\usubscript@debugfalse% Debug flag
\newif\ifusubscript@testing\usubscript@testingfalse% Testing flag
\DeclareOption{debug}{\usubscript@debugtrue}
\ProcessOptions\relax%
% endregion Package Options ----------------------------------------------------
% region globals and helper functions ------------------------------------------
% global subscript list variable
\newcommand{\usubscript@start}{\relax}% marker for the start of a subscript
\newcommand{\usubscript@list@reset}{\let\usubscript@list=\usubscript@start}
\newcommand{\usubscript@list@append}[1]{\edef\usubscript@list{\unexpanded\expandafter{\usubscript@list#1}}}
\usubscript@list@reset% initialize the list
\newcommand{\usubscript@log}[1]{%
%
% Prints the given message if the debug flag is set.
%
\ifusubscript@debug\PackageInfo{subscript}{#1}\fi%
}
\newcommand{\usubscript@getfirsttok}[2]{%
%
% stores the first token of #2 in #1
%
\def@extract##1##2@terminator{\let#1=##1}%
\expandafter@extract#2@terminator%
}
% select the correct dispatch function
\ifpdftex%
\def\usubscript@check@successor{\usubscript@check@successor@pdftex}%
\else%
\def\usubscript@check@successor{\usubscript@check{\usubscript@successor}}%
\fi%
% endregion globals and helper functions ---------------------------------------
% region public interface ------------------------------------------------------
\newcommand{\subscript}[1]{%
%
% 1. If we are already in a subscript, \subscript appends the given tokens to the \usubscript@list
% Else, it resets the \subscriptlist
% 2. Executes \usubscript@check@successor which determines if the next character is also a subscript.
% In this case, we go back to 1, else we stop the process.
%
\ifx\usubscript@list\usubscript@start%
% Initialize the list with the frst token.
\usubscript@log{Initializing list with '\meaning#1'}%
\def\usubscript@list{#1}%
\else%
% Append token to existing list.
\usubscript@log{Appending '\meaning#1' to '\usubscript@list'}%
\usubscript@list@append{#1}%
\fi%
%
% Check the next token to determine whether to continue the subscript or to terminate it
% Expands successor first before \futurelet, this is important to handle unicode in pdftex
\expandafter\futurelet\expandafter\usubscript@successor\expandafter\usubscript@check@successor%
}
% endregion public interface ---------------------------------------------------
% region private implementation ------------------------------------------------
\newcommand{\usubscript@check}[1]{%
%
% Test whether to terminate the subscript
%
\usubscript@log{Testing against '\meaning#1'}%
%
\ifx#1\subscript%
\usubscript@log{ >>> Successor is another subscript!}%
\else%
\usubscript@log{ >>> Successor is not a subscript!}%
\usubscript@finalize%
\fi%
}
\newcommand{\usubscript@finalize}{%
%
% Terminate the subscript and insert the result
%
\usubscript@log{Terminating with current list '\meaning\usubscript@list'}%
%
\ifmmode%
\usubscript@log{ >>> Inserting '_{\meaning\usubscript@list}{}'}%
\sb\bgroup\usubscript@list\egroup%
\else%
\usubscript@log{ >>> Inserting '\textsubscript{\meaning\usubscript@list}'}%
\textsubscript{\usubscript@list}%
\fi%
%
\usubscript@list@reset%
}
\newcommand{\usubscript@check@successor@pdftex}{%
%
% There are 2 cases we consider:
% 1. The next token is a subscript, in which case we continue the process.
% 2. The next token is some unicode character, in which case:
% 2.1. We grab the necessary number of tokens if using an 8-bit engine
% 2.2. We expand the unicode character once to get the replacement tokens.
% 2.3. We compare the first token of the replacement tokens to the subscript token.
%
\usubscript@log{>>> Dispatching on \meaning\usubscript@successor'}%
%
\ifx\usubscript@successor\UTFviii@four@octets%
\usubscript@log{ >>> Detected Unicode 4 octets}%
\def\usubscript@execute{\usubscript@check@unicode@four}%
\else\ifx\usubscript@successor\UTFviii@three@octets%
\usubscript@log{ >>> Detected Unicode 3 octets}%
\def\usubscript@execute{\usubscript@check@unicode@three}%
\else\ifx\usubscript@successor\UTFviii@two@octets%
\usubscript@log{ >>> Detected Unicode 2 octets}%
\def\usubscript@execute{\usubscript@check@unicode@two}%
\else%
\usubscript@log{ >>> Detected non-Unicode}%
\def\usubscript@execute{\usubscript@check{\usubscript@successor}}%
\fi\fi\fi%
%
% dispatch the selected command.
%
\usubscript@execute%
}%
\newcommand{\usubscript@check@unicode@four}[5]{% grabs 1+4 tokens
%
\usubscript@log{>>> Expand Unicode Quadruplet}%
%
\unless\ifcsname u8:#1#2#3#4#5\endcsname%
\PackageError{subscript}{Detected undefined unicode.}%
\fi%
%
\expandafter\let\expandafter\usubscript@token\csname u8:#1#2#3#4#5\endcsname%
\usubscript@log{Detected unicode '\meaning\usubscript@token'}%
%
\usubscript@getfirsttok{\usubscript@firsttoken}{\usubscript@token}%
\usubscript@check{\usubscript@firsttoken}%
%
\usubscript@log{Reinserting '\meaning#1#2#3#4#5'}%
#1#2#3#4#5%
}
\newcommand{\usubscript@check@unicode@three}[4]{% grabs 1+3 tokens
%
\usubscript@log{>>> Expand Unicode Triplet}%
%
\unless\ifcsname u8:#1#2#3#4\endcsname%
\PackageError{subscript}{Detected undefined unicode.}%
\fi%
%
\expandafter\let\expandafter\usubscript@token\csname u8:#1#2#3#4\endcsname%
\usubscript@log{Detected unicode '\meaning\usubscript@token'}%
%
\usubscript@getfirsttok{\usubscript@firsttoken}{\usubscript@token}%
\usubscript@check{\usubscript@firsttoken}%
%
\usubscript@log{Reinserting '\meaning#1#2#3#4'}%
#1#2#3#4%
}
\newcommand{\usubscript@check@unicode@two}[3]{% grabs 1+2 tokens
%
\usubscript@log{>>> Expand Unicode Duplet}%
%
\unless\ifcsname u8:#1#2#3\endcsname%
\PackageError{subscript}{Detected undefined unicode.}%
\fi%
%
\expandafter\let\expandafter\usubscript@token\csname u8:#1#2#3\endcsname%
\usubscript@log{Detected unicode '\meaning\usubscript@token'}%
%
\usubscript@getfirsttok{\usubscript@firsttoken}{\usubscript@token}%
\usubscript@check{\usubscript@firsttoken}%
%
\usubscript@log{Reinserting '\meaning#1#2#3'}%
#1#2#3%
}
% endregion private implementation ---------------------------------------------
\endinput
File tests.tex
\documentclass{standalone}
\usepackage{newunicodechar}
\usepackage[debug]{unicode-subscript}
\AtBeginDocument{
\newunicodechar{ᵢ}{\subscript{i}}
\newunicodechar{ⱼ}{\subscript{j}}
\newunicodechar{ₖ}{\subscript{k}}
\newunicodechar{ₗ}{\subscript{l}}
\newunicodechar{ₘ}{\subscript{m}}
\newunicodechar{ₙ}{\subscript{n}}
}
\newcommand{\needsfour}[4]{#4#3#2#1}
\begin{document}
$aᵢⱼₖ$
a\subscript{ij$kl$mn}
$a\subscript{ijk}$
% test mathmode
\begin{tabular}{l}
$a_{ijklmn}$
\ $aᵢⱼₖₗₘₙ$
\ $a\subscript{ijklmn}$
\end{tabular}
% test textmode
\begin{tabular}{ll}
a\textsubscript{ijklmn}
\ aᵢⱼₖₗₘₙ
\ a\subscript{ijklmn}
\end{tabular}
$aᵢ\needsfour4321$
dₘ
\end{document}
Result

pdflatex. It works withlualatex. – Hyperplane Jul 01 '22 at 15:19pdflatex. I suspect it's the\expandafterthat fails. – Jinwen Jul 01 '22 at 15:25crtl++to zoom in many applications), I use it all the time especially with multiple monitor setup with varying DPI. And yes this removes markup. That's the whole point. Like it or not Unicode basically introduced a subset of what markup languages provide through the inclusion of bold, italic, blackboard bold and other alphabets. If anything I'd advocate that in this case screen readers should become smarter and find useful ways to deal with different Unicode blocks, and not be the reason to hold back. – Hyperplane Jul 02 '22 at 09:22