Seeking Enhancements and Solutions for Dynamic LaTeX Command Substitution

Question

I am engaging with the community to seek insights and possible solutions for a LaTeX macro customisation. This is inspired by a discussion in this thread, which explores the possibility of dynamically replacing tokens with control sequences. Specifically, the second answer in the thread provides a foundation for my current approach, aiming to replace arbitrary symbols or words with specified control sequences using the command \myspecdef somethinghere : \somecontrolseq.

Here is the code excerpt from the second answer of the aforementioned thread, demonstrating the initial attempt to create this functionality:

\long\def\isnextchar#1#2#3{\begingroup\toks0={\endgroup#2}\toks1={\endgroup#3}%
   \let\tmp=#1\futurelet\next\isnextcharA
}
\def\isnextcharA{\the\toks\ifx\tmp\next0\else1\fi\space}
\def\skipnext#1#2{#1}

\def\trynext#1{\trynextA#1\relax\relax}
\def\trynextA#1#2\relax#3\relax#4#5{%
   \ifx\relax#2\relax \def\next{\isnextchar#1{\skipnext{#4}}{#5#3}}\else
      \def\next{\isnextchar#1{\skipnext{\trynextA#2\relax#3#1\relax#4{#5}}}{#5#3}}\fi
   \next
}
\def\mspecdefA#1#2#3 : #4{\ifx#2\undefined
   \def#2{\trynext{#3}#4{#1}}\else
   \toks0={\trynext{#3}#4}\toks1=\expandafter{#2}%
   \edef#2{\the\toks0{\the\toks1}}\fi
}
\def\mspecdef#1{%
   \expandafter\ifx\csname m:#1\endcsname\relax
      \expandafter\mathchardef\csname m:#1\endcsname=\mathcode#1 \fi \mathcode#1="8000 
   \begingroup \lccode~=#1 
   \lowercase{\endgroup\expandafter\mspecdefA\csname m:#1\endcsname~}%
}
\mspecdef << : \ll
\mspecdef <> : \neq
\mspecdef <= : \leq
\mspecdef <== : \Leftarrow
\mspecdef <=> : \Leftrightarrow
\mspecdef <-- : \leftarrow
\mspecdef <-> : \leftrightarrow
\mspecdef >> : \gg
\mspecdef >= : \geq
\mspecdef --> : \rightarrow
\mspecdef -+ : \pm
\mspecdef +- : \mp
\mspecdef ... : \dots
\mspecdef == : \equiv
\mspecdef =. : \doteq
\mspecdef ==> : \Rightarrow
\mspecdef =( : \subseteq
\mspecdef =) : \supseteq
\mspecdef =[ : \sqsubseteq
\mspecdef =] : \sqsubseteq
\myspecdef integration : \int %<- an example of what I want
\myspecdef int : \int %<- this produces an error whilst...
\myspecdef int : \sin %<- does not!
test:
$$ a << b < c <= d >= e > f >> g $$
$$ a <> b = c =. d == e $$
$$ a <== b <-- c <-> d <=> e --> f ==> g $$
$$ a +- b = -(-a -+ +b) $$
$$ a, ..., z <> a + ...+ z $$
$$ a =( b =) c =[ e =] f $$
[ x^+ ] %<- this produces an error

Although this method allows for dynamic substitution, such as transforming integral into \int, it introduces several issues. Notably, the command causes unexpected brace errors with certain inputs, like \[0^+\] when defining \myspecdef +- : \pm. Whilst an alternate notation \[0^{+}\] resolves this, it is not an ideal requirement. Additionally, the substitution inadvertently affects the tilde ~ character's functionality, leading to errors like missing number, treated as zero. Furthermore, using commands like \operatorname results in 'bad mathchar' errors due to the inclusion of - in the definition.

To address these problems and refine the command's behavior, the fourth answer of the thread offers an alternative approach using expl3 syntax as illustrated below:

\documentclass{article}
\usepackage{amsmath}
\usepackage{xparse}
\ExplSyntaxOn
\seq_new:N \l_math_subs_seq
\cs_new_protected:Npn \math_add_sub:nn #1 #2
 {
   \seq_put_right:Nn \l_math_subs_seq { { #1 } { #2 } }
 }
\cs_new_protected:Npn \math_ascii_sub:n #1
 {
  \tl_set:Nn \l_tmpa_tl { #1 }
  \seq_map_inline:Nn \l_math_subs_seq
   {
    \tl_replace_all:Nnn \l_tmpa_tl ##1
   }
  \tl_use:N \l_tmpa_tl
 }
\cs_new_protected:Npn \math_grabinline:w #1 $
 {
  \math_ascii_sub:n { #1 } $
 }
\cs_new_protected:Npn \math_grabdisplay:w #1 ]
 {
  \math_ascii_sub:n { #1 } ]
 }
% Set substitutions (be careful with order!)
% Three letter sequences first
\math_add_sub:nn { <== } { \Leftarrow }
\math_add_sub:nn { <=> } { \Leftrightarrow }
\math_add_sub:nn { <-- } { \leftarrow }
\math_add_sub:nn { <-> } { \leftrightarrow }
\math_add_sub:nn { --> } { \rightarrow }
\math_add_sub:nn { ==> } { \Rightarrow }
\math_add_sub:nn { ... } { \dots }
% Then two letter sequences
\math_add_sub:nn { << } { \ll  }
\math_add_sub:nn { <> } { \neq }
\math_add_sub:nn { <= } { \leq }
\math_add_sub:nn { >> } { \gg  }
\math_add_sub:nn { >= } { \geq }
\math_add_sub:nn { -+ } { \mp }
\math_add_sub:nn { +- } { \pm }
\math_add_sub:nn { == } { \equiv }
\math_add_sub:nn { =. } { \doteq }
\math_add_sub:nn { =( } { \subseteq }
\math_add_sub:nn { =) } { \supseteq }
\math_add_sub:nn { =[ } { \sqsubseteq }
\math_add_sub:nn { =] } { \sqsubseteq }
% Enable substitutions for $...$ and [...]
\everymath { \math_grabinline:w }
\tl_put_right:Nn [ { \math_grabdisplay:w }
\ExplSyntaxOff
\begin{document}
\centering
\newcommand*{\test}[1]{%
  $#1$%
  [#1]%
}
\test{a << b < c <= d >= e > f >> g}
\test{a <> b = c =. d == e}
\test{a <== b <-- c <-> d <=> e --> f ==> g}
\test{a +- b = -(-a -+ +b)}
\test{a, ..., z <> a + ...+ z}
\test{a =( b =) c =[ e =] f}
\end{document}

While this method seems promising and resolves some issues presented by the initial approach, it has its own limitations. Notably, the order of substitutions significantly affects the outcome, making the solution less flexible and more cumbersome for extensive use. Additionally, I attempted to create an alias for \spec_add_sub:nn using \cs_new_eq:NN \mspecdef \spec_add_sub:nn, but it did not perform as anticipated outside of \ExplSyntaxOn ... \ExplSyntaxOff.

My objective is to refine this LaTeX customization for a more robust and flexible command substitution system, ideally one that can be used conveniently within the document environment, as preamble access might not always be available. I understand that this might be a challenging or unconventional request, but I believe it presents a fascinating problem for the LaTeX community.

I welcome any insights, suggestions, or alternative approaches that could lead to an improved solution. Thank you in advance for your time and assistance.

With your mention that "the order of substitutions significantly affects the outcome", it reminds me of this answer, which attempts to translate non-LaTeX equation syntax into something recognizable by LaTeX: https://tex.stackexchange.com/questions/332012/translate-in-line-equations-to-tex-code-any-package/332061#332061 — Steven B. Segletes, Dec 24 '23 at 21:28
If you can use LuaTeX, then you can maybe adapt something from https://tex.stackexchange.com/a/676977/270600, https://tex.stackexchange.com/a/690037/270600, or https://tex.stackexchange.com/a/694316/270600. — Max Chernoff, Dec 25 '23 at 01:19
There are many ways available to achieve what you are after, but it depends on the problem definition. Check for example PGFs module parser. Uses futurelet extensively for a letter-by-letter-parser used in the svg module. Provides a Domain Specific Language. For more general solutions Lua is a better option. I have used l3 regex expressions to colorize code in the macrocode environments. — yannisl, Dec 25 '23 at 05:39
Other cases transliteration to convert say SH to a unicode symbol in Coptic texts. In the l3doc environment a @@ is changed to a module name and underscores. But to summarize lpeg and Lua is the way to go for anything more serious. Provide a spec and there are many people here that can help you. — yannisl, Dec 25 '23 at 05:39
Thank you for the suggestions, especially regarding PGF's parser module and Lua. However, I'm working in an environment that uses pdfTeX for rendering LaTeX, specifically the TeXlive 2023 distribution, and doesn't support LuaTeX, like the TeXit bot. Could you recommend any methods for dynamic command substitution or text processing that are compatible with these constraints? Your further guidance would be greatly appreciated! — KevinHeart, Dec 25 '23 at 07:19
I think it would be safer to propose this type of redefinition only for specific environments and not globally, as this could create more problems than solutions. — projetmbc, Dec 26 '23 at 17:10

Alan Xiang · Accepted Answer · 2023-12-26T16:47:14.247

I improved the LaTeX3 code provided in your question, and now it addresses two of the problems you faced.

It allows arbitrary substitution order

This is made possible by maintaining a prefix tree for the symbols in LaTeX. The desired substitution table can be acquired from the post-order traversal of the prefix tree. However, because of this, whenever a new symbol is inserted, the user must call \math_sub_generate: to convert the prefix tree to a substitution table (of course, \math_add_sub:nn can always call \math_sub_generate:)
It supports alias

The inconsistent behavior is likely due to category code differences between document mode and LaTeX3 mode. This can be corrected using \tl_set_rescan:Nnn.

Notice that in the code, I have swapped the order of 2 character and 3 character symbols, and the results should be still correct.

The final substitution table is shown as below:

The sequence \l_math_subs_seq contains the items (without outer braces):
>  {{<<}{\ll }}
>  {{<>}{\neq }}
>  {{<==}{\Leftarrow }}
>  {{<=>}{\Leftrightarrow }}
>  {{<=}{\leq }}
>  {{<--}{\leftarrow }}
>  {{<->}{\leftrightarrow }}
>  {{>>}{\gg }}
>  {{>=}{\geq }}
>  {{-+}{\mp }}
>  {{-->}{\rightarrow }}
>  {{+-}{\pm }}
>  {{==>}{\Rightarrow }}
>  {{==}{\equiv }}
>  {{=.}{\doteq }}
>  {{=(}{\subseteq }}
>  {{=)}{\supseteq }}
>  {{=[}{\sqsubseteq }}
>  {{=]}{\sqsubseteq }}
>  {{...}{\dots }}
>  {{@@@@@}{\mbox { }FIVE\mbox { }}}
>  {{@@@@}{\mbox { }FOUR\mbox { }}}
>  {{@@@}{\mbox { }THREE\mbox { }}}.

Code

\documentclass{article}
\usepackage{amsmath}
\usepackage{xparse}
\ExplSyntaxOn
\seq_new:N \l_math_subs_seq
\msg_new:nnn {math} {symbol-exists} {symbol~#1~already~exists}
\int_new:N \g_math_struct_counter
\int_gset:Nn \g_math_struct_counter {1}
\cs_new_protected:Npn \mathstruct_new:N #1
{
  \tl_set:Nx #1 {l_mathstruct_internal_\int_use:N \g_math_struct_counter _prop}
  \prop_new:c {#1}
  \int_gincr:N \g_math_struct_counter
\prop_new:c {l_mathstruct_internal_\int_use:N \g_math_struct_counter prop}
  \prop_put:cnn {#1} {value} {}
  \prop_put:cnn {#1} {exist} {\c_false_bool}
  \prop_put:cnn {#1} {mapping} {}
  \prop_put:cnx {#1} {children} {l_mathstruct_internal\int_use:N \g_math_struct_counter _prop}
  \int_gincr:N \g_math_struct_counter
}
\mathstruct_new:N \l_math_prefix_tree_root
\tl_new:N \l_math_children_prop_tl
\tl_new:N \l_math_tmpa_tl
\tl_new:N \l_math_tmpb_tl
\tl_new:N \l_math_tmpc_tl
\cs_new_protected:Npn \math__recursive_add_sub:nnnn #1#2#3#4
 {
\str_if_empty:nTF {#2} 
  {
    \prop_get:cnN {#1} {exist} \l_math_tmpa_tl
    \exp_args:NV \bool_if:nTF {\l_math_tmpa_tl}
    {
      % if symbol alreasy exists, send a warning
      \msg_warning:nnn {math} {symbol-exists} {#4}
    }
    {
      % need to set this symbol as "exists" and set the mapping
      \prop_put:cnn {#1} {exist} {\c_true_bool}
      \prop_put:cnn {#1} {mapping} {#3}
    }
  }
  {
    \prop_get:cnN {#1} {children} \l_math_children_prop_tl
    \str_set:Nx \l_math_tmpa_tl {\str_head:n {#2}}
    % see if it is one of its children
    \prop_if_in:cVTF {\l_math_children_prop_tl} \l_math_tmpa_tl
    {
      % if the node exists, continue recursively
      \tl_set:Nx \l_math_tmpc_tl {\str_tail:n {#2}}
      \prop_get:cVN {\l_math_children_prop_tl} \l_math_tmpa_tl \l_math_tmpb_tl
      \math__recursive_add_sub:Vxnn \l_math_tmpb_tl {\str_tail:n {#2}} {#3} {#4}
    }
    {
      % otherwise, need to create new node
      \mathstruct_new:N \l_math_tmpb_tl
      \prop_put:cnV {\l_math_tmpb_tl} {value} \l_math_tmpa_tl
      \prop_put:cVV {\l_math_children_prop_tl} \l_math_tmpa_tl \l_math_tmpb_tl
      % apply recursively
      \math__recursive_add_sub:Vxnn \l_math_tmpb_tl {\str_tail:n {#2}} {#3} {#4}
    }
}
 }
\cs_generate_variant:Nn \math__recursive_add_sub:nnnn {Vxnn,VVVV}
\tl_new:N \l_math_add_tmpa_tl
\tl_new:N \l_math_add_tmpb_tl
 \cs_new_protected:Npn \math_add_sub:nn #1 #2
 {
   \tl_set_rescan:Nnn \l_math_add_tmpa_tl {\cctab_select:N\c_code_cctab} {#1}
   \tl_set_rescan:Nnn \l_math_add_tmpb_tl {\cctab_select:N\c_code_cctab} {#2}
\math__recursive_add_sub:VVVV \l_math_prefix_tree_root \l_math_add_tmpa_tl \l_math_add_tmpb_tl \l_math_add_tmpa_tl
 }
% post order traversal 
 \cs_new_protected:Npn \math__sub_recursive_generate:nn #1#2
 {
  \group_begin:
    % check children first
    \prop_get:cnN {#1} {children} \l_math_children_prop_tl
    \prop_map_inline:cn {\l_math_children_prop_tl}
    {
      \math__sub_recursive_generate:nn {##2} {#2##1}
    }
% check current node
\prop_get:cnN {#1} {exist} \l_math_tmpa_tl
\bool_if:nT {\l_math_tmpa_tl}
{
  \prop_get:cnN {#1} {mapping} \l_math_tmpb_tl
  \tl_set_rescan:Nnn \l_math_tmpc_tl { \cctab_select:N \c_document_cctab } {#2}
  \seq_gput_right:Nx \l_math_subs_seq { {\exp_not:V \l_math_tmpc_tl}  {\exp_not:V \l_math_tmpb_tl} }
}

\group_end:
 }
% traverse the tree to get the correct order
 \cs_new_protected:Npn \math_sub_generate:
 {
  \seq_gclear:N \l_math_subs_seq
  \exp_args:NV \math__sub_recursive_generate:nn \l_math_prefix_tree_root {}
 }
\cs_new_protected:Npn \math_ascii_sub:n #1
 {
  \tl_set:Nn \l_tmpa_tl { #1 }
  \seq_map_inline:Nn \l_math_subs_seq
   {
    \tl_replace_all:Nnn \l_tmpa_tl ##1
   }
  \tl_use:N \l_tmpa_tl
 }
\cs_new_protected:Npn \math_grabinline:w #1 $
 {
  \math_ascii_sub:n { #1 } $
 }
\cs_new_protected:Npn \math_grabdisplay:w #1 ]
 {
  \math_ascii_sub:n { #1 } ]
 }
\cs_set_eq:NN \MathAddSub \math_add_sub:nn
\cs_set_eq:NN \MathGenSub \math_sub_generate:
% Enable substitutions for $...$ and [...]
\everymath { \math_grabinline:w }
\tl_put_right:Nn [ { \math_grabdisplay:w }
\ExplSyntaxOff
\begin{document}
\centering
\newcommand*{\test}[1]{%
  $#1$%
  [#1]%
}
\MathAddSub{ << }{ \ll  }
\MathAddSub{ <> }{ \neq }
\MathAddSub{ <= }{ \leq }
\MathAddSub{ >> }{ \gg  }
\MathAddSub{ >= }{ \geq }
\MathAddSub{ -+ }{ \mp }
\MathAddSub{ +- }{ \pm }
\MathAddSub{ == }{ \equiv }
\MathAddSub{ =. }{ \doteq }
\MathAddSub{ =( }{ \subseteq }
\MathAddSub{ =) }{ \supseteq }
\MathAddSub{ =[ }{ \sqsubseteq }
\MathAddSub{ =] }{ \sqsubseteq }
\MathAddSub{ <== }{ \Leftarrow }
\MathAddSub{ <=> }{ \Leftrightarrow }
\MathAddSub{ <-- }{ \leftarrow }
\MathAddSub{ <-> }{ \leftrightarrow }
\MathAddSub{ --> }{ \rightarrow }
\MathAddSub{ ==> }{ \Rightarrow }
\MathAddSub{ ... }{ \dots }
\MathAddSub{ int }{ \int }
% generate the substitution table based on post-order traversal of the prefix tree
\MathGenSub
\ExplSyntaxOn
% show the substitution table
\seq_show:N \l_math_subs_seq
\ExplSyntaxOff
\test{a << b < c <= d >= e > f >> g}
\test{a <> b = c =. d == e}
\test{a <== b <-- c <-> d <=> e --> f ==> g}
\test{a +- b = -(-a -+ +b)}
\test{a, ..., z <> a + ...+ z}
\test{a =( b =) c =[ e =] f}
\test{int _a^b}
\test{@@@@@ @@@@ @@@}
\MathAddSub{ @@@ }{ \mbox{~}THREE\mbox{~} }
\MathAddSub{ @@@@ }{ \mbox{~}FOUR\mbox{~} }
\MathAddSub{ @@@@@ }{ \mbox{~}FIVE\mbox{~} }
% generate the substitution table again
\MathGenSub
\ExplSyntaxOn
% show the substitution table
\seq_show:N \l_math_subs_seq
\ExplSyntaxOff
\test{@@@@@ @@@@ @@@}
\end{document}

Thank you so much for providing the LaTeX solution with the \MathAddSub command! I've tried implementing it as you've described, modifying it to \MathAddSub{ int }{ \int } \MathGenSub. However, I've encountered a bit of an issue. Whilst the command works wonderfully with other symbols, It seems like the command doesn't work with letter sequences. When I use it to replace int with the integral symbol using \int, it doesn't produce the expected result. I'm not sure if I'm missing a step or if there's a limitation with the command. Any guidance you can provide would be greatly appreciated! — KevinHeart, Dec 26 '23 at 13:50
This is due to a category code issue for letters in math equations. It has been fixed in the latest update. — Alan Xiang, Dec 26 '23 at 16:45
Thank you so much for your help with the command! It's working as expected, and I really appreciate your assistance. I've noticed a slight issue with a conflict involving the nicematrix package with the error ./TEST.tex:229:Runaway argument? \col@sep \tabcolsep \let \d@llarbegin \begingroup \let \d@llarend \endgroup \ET C. ./TEST.tex:229: Paragraph ended before \math_grabinline:w was complete. <to be read again> I'll try to look into it further and see if I can resolve the conflict. Thanks again for your support! — KevinHeart, Dec 26 '23 at 19:09
This seems to be a problem with nicematrix since their commands are not robust enough to be used in this way. One work around is to capture math content as string (ignore all LaTeX syntax), perform the substitution and write the result to an external file. Then use \input to read back the results. This approach will be slower, but it is free from robustness problems like this. — Alan Xiang, Dec 27 '23 at 06:29
Thanks for the suggestion and the workaround!. While I understand this might circumvent the robustness issues with the nicematrix package, I'm limited to using only one preamble file in the bot I'm implementing this on. So, unfortunately, I won't be able to implement a solution that requires multiple files or external inputs. Nevertheless, I appreciate your insight and will continue to look for a solution within these constraints! — KevinHeart, Dec 27 '23 at 14:21

Seeking Enhancements and Solutions for Dynamic LaTeX Command Substitution

1 Answers1

Code

Linked