Conditionally replacing sequences of characters

Question

I am a noob in the area of plain TeX. All commands which are used in this question are written with the help of answers on this site and the 'LaTeX Wikibook'. I want to change keyboard mapping using TeX commands. I want to do this for a non-latin script. Hence I've made all the characters of that script active and redefined them with catcodes and def commands. The script is unreadable to many users here and hence my previous question did not receive a complete solution. Here I'm giving a latin example in this question.

Let's assume that I've remapped A, B, C, D with P, Q, R, S (respectively)

Code -

\documentclass{article}
\makeatletter
\catcode`\A=\active
\protected\def A{P}
\catcode`\B=\active
\protected\def B{Q}
\catcode`\C=\active
\protected\def C{R}
\catcode`\D=\active
\protected\def D{S}
\makeatother

\begin{document}
    ABCD
\end{document}

This produces -

but now I have one condition. When A, B, C or D are not followed by letter 'a', add letter 'x' after them.

Now therefore my code is -

\documentclass{article}
\makeatletter
\catcode`\A=\active
\protected\def A{\bgroup P\futurelet\tmp\check}
\catcode`\B=\active
\protected\def B{\bgroup Q\futurelet\tmp\check}
\catcode`\C=\active
\protected\def C{\bgroup R\futurelet\tmp\check}
\catcode`\D=\active
\protected\def D{\bgroup S\futurelet\tmp\check}
\protected\def\check{\ifx\tmp a\egroup\expandafter\@gobble\else x\egroup\fi}
\makeatother

\begin{document}
    AaBCD
\end{document}

This produces -

Till now everything is fine. Now I have multiple conditions. If A, B, C or D are not followed by 'a', 'e', 'i', 'o' and 'u' then add 'x'.

I don't know how to add multiple conditions in this line -

\protected\def\check{\ifx\tmp a\egroup\expandafter\@gobble\else x\egroup\fi}

Second problem -

I also want to condition my document for such cases.

If 'A' is followed by 'X' change it to 'L'
If 'B' is followed by 'X' change it to 'M'
If 'C' is followed by 'X' change it to 'N'

I don't even know how to put these conditions in TeX language. Any kind of help is appreciated!

What you want (fir a simple ascii charset) is the same as TikZ does to parse commands. Take a look at this file, especially the macro \tikz@handle. There are lots of nested conditionals which check for each possible letter and the ones that follow it. — Phelype Oleinik, Jul 02 '19 at 11:15
will you need follow-up lookups? E.g. Replace "f+f" by "ff-ligature" and in the next step replace "ff-ligature+i" by "ffi-ligature"? — Ulrike Fischer, Jul 02 '19 at 13:23
Not really. I am using XeTeX to change character keys of one script to another. Yesterday in another question we discussed one way in LuaTeX, but that is not a possible solution for me. Those are completely two different scripts, with two different fonts. — Niranjan, Jul 02 '19 at 13:40
As a general note: you should only do such invasive catcode changes within a local group. As soon as one of the changed letters occurs in a macro name, you will run into compilation errors. — siracusa, Jul 02 '19 at 18:13
@siracusa the script which I'm using is not used for writing any macros. — Niranjan, Jul 03 '19 at 04:12
Can I define a group of characters with \def\ABC{a \or e \or i \or o \or u}? (As I need them in the last condition.) and then put \ABC in the command like this - \protected\def\check{\ifx\tmp \ABC\egroup\expandafter@gobble\else x\egroup\fi} — Niranjan, Jul 03 '19 at 04:44
@Niranjan You are using a lot of macros in your code. If you make any of the used characters in those names active, things will blow up. As for your last question: No, you can't. If you want to compare to a list of tokens, you have to iterate over that token list explicitly. — siracusa, Jul 03 '19 at 07:37
@Niranjan See David's answer here on how to iterate over a list of tokens. Instead of the \meaning part you would then use your character remapping. — siracusa, Jul 03 '19 at 18:47
I read the answer. Sorry, I couldn't understanding anything of it. Is it possible for you to state the same using my example? (a,e,i,o,u) — Niranjan, Jul 03 '19 at 20:00
You say: " If A, B, C or D are not followed by 'a', 'e', 'i', 'o' and 'u' then add 'x'." I assume: ...are followed neither by 'a', nor by 'e', nor by 'i', nor by 'o', nor by 'u', then add 'x'. Question: Where to add 'x'? Behind that following non-'a'-'e- 'i'-'o'-'u'? Instead of that following non-'a'-'e- 'i'-'o'-'u'? If the latter: Do you also want it instead of that following non-'a'-'e- 'i'-'o'-'u' in case that following non-'a'-'e- 'i'-'o'-'u' is one of 'A', 'B', 'C', 'D'? What to do with the token that follows A, B, C or D in case that token is one of 'a', 'e', 'i', 'o' or 'u'? — Ulrich Diez, Jul 05 '19 at 14:48

Phelype Oleinik · Accepted Answer · 2019-07-05T14:32:40.577

The code became a tad long. Much longer than I initially thought it would be.

Your conditions aren't too clear, I think. What I implemented (in crappy pseudocode) is:

procedure cur_char // (a, b, c, or d)
  if next_char is lowercase_vowel
    print replacement (cur_char) // replaces ABCD by PQRS
  else
    if next_char is uppercase_X
      override_replacement // changes replacement of ABC to LMN
    end
    print replacement (cur_char) // possibly overriden because of X
    print "x"
  end
end

which is more or less what the code does in an over simplified way.

With that code, the following substitutions occur:

I used expl3 for the job, otherwise the code would be much longer. I defined a macro \niranjan_define_active:Nn which you should use, for example, like this:

\niranjan_define_active:Nn A { \__niranjan_process:NN A P }

Another pair of macros \activateall and \deactivateall controls when the replacement behaviour should happen. Ideally you want this replacement active in a group to avoid chaos.

The code can be extented to handle other types of substitutions if you want to. The \__niranjan_define_charset_conditional:Nn allows you to define a conditional function to check if the next character belongs to a given set of characters.

However the code only works for characters with catcode 11. I didn't do any verification whatsoever for other catcodes.

I must say I doubt this code will simply work for the Devanagari script. But, as you wanted an example for latin characters, here it is:

\documentclass{article}
\usepackage{expl3}
\ExplSyntaxOn
% Main code
\cs_new_protected:Npn \__niranjan_process:NN #1#2
  {
    \cs_set_eq:NN \l__niranjan_curr_char #1
    \cs_set_eq:NN \l__niranjan_replacement_char #2
    \niranjan_deactivate_all:
    \peek_after:Nw \__niranjan_process_aux:
  }
\cs_new_protected:Npn \__niranjan_process_aux:
  {
    \__niranjan_if_lower_vowel:NTF \l_peek_token
      {
        \l__niranjan_replacement_char
        \__niranjan_rescan_token:w
      }
      {
        \__niranjan_if_upper_X:NT \l_peek_token
          { \__niranjan_followed_by_X: }
        \l__niranjan_replacement_char
        x
        \__niranjan_rescan_token:w
      }
  }
\cs_new_protected:Npn \__niranjan_followed_by_X:
  {
    \__niranjan_if_upper_ABC:NT \l__niranjan_curr_char
      {
        \exp_args:Nf \__niranjan_replace_char:n
          { \__niranjan_get_char:N \l__niranjan_curr_char }
      }
  }
\cs_new:Npn \__niranjan_replace_char:n #1
  {
    \exp_last_unbraced:NNf \cs_set_eq:NN \l__niranjan_replacement_char
    \str_case:nnF {#1}
      {
        { A } { L }
        { B } { M }
        { C } { N }
      }
      {#1}
  }
\cs_new_protected:Npn \__niranjan_rescan_token:w
  {
    \peek_N_type:TF
      { \__niranjan_rescan_token:Nw }
      { \niranjan_activate_all: }
  }
\cs_new_protected:Npn \__niranjan_rescan_token:Nw #1
  {
    \niranjan_activate_all:
    \tl_rescan:nn { } {#1}
  }
% Checking for following charset
\cs_set:Npn \__niranjan_tmp:w #1
  {
    \cs_new_protected:Npn \__niranjan_define_charset_conditional:Nn ##1 ##2
      {
        \prg_new_protected_conditional:Npnn ##1 ####1 { T, F, TF }
          {
            \exp_last_unbraced:No \__niranjan_if_charset:wn
              \token_to_meaning:N ####1 #1 \q_nil \q_stop {##2}
          }
      }
    \cs_new_protected:Npn \__niranjan_if_charset:wn ##1 #1 ##2##3 \q_stop ##4
      {
        \quark_if_nil:NTF ##2
          { \prg_return_false: }
          {
            \str_if_in:nnTF {##4} {##2}
              { \prg_return_true: }
              { \prg_return_false: }
          }
      }
    \cs_new:Npn \__niranjan_get_char:N ##1
      { \exp_last_unbraced:No \__niranjan_get_char:w \token_to_meaning:N ##1 }
    \cs_new:Npn \__niranjan_get_char:w #1 ##1 { ##1 }
  }
\use:x { \exp_not:N \__niranjan_tmp:w { \tl_to_str:n { the~letter~ } } }
\__niranjan_define_charset_conditional:Nn \__niranjan_if_lower_vowel:N { aeiou }
\__niranjan_define_charset_conditional:Nn \__niranjan_if_upper_X:N { X }
\__niranjan_define_charset_conditional:Nn \__niranjan_if_upper_ABC:N { ABC }
% Setting active chars
\tl_new:N \g__niranjan_chars_tl
\cs_new_protected:Npn \niranjan_define_active:Nn #1#2
  {
    \cs_gset_protected:cpn { __niranjan_active_letter_#1: } {#2}
    \char_set_active_eq:Nc #1 { __niranjan_active_letter_#1: }
    \tl_gput_right:Nn \g__niranjan_chars_tl { #1 }
  }
\cs_new_protected:Npn \niranjan_activate_all:
  { \tl_map_function:NN \g__niranjan_chars_tl \char_set_catcode_active:N }
\cs_new_protected:Npn \niranjan_deactivate_all:
  { \tl_map_function:NN \g__niranjan_chars_tl \char_set_catcode_letter:N }
\cs_new_eq:NN \activateall \niranjan_activate_all:
\cs_new_eq:NN \deactivateall \niranjan_deactivate_all:
\niranjan_define_active:Nn A { \__niranjan_process:NN A P }
\niranjan_define_active:Nn B { \__niranjan_process:NN B Q }
\niranjan_define_active:Nn C { \__niranjan_process:NN C R }
\niranjan_define_active:Nn D { \__niranjan_process:NN D S }
\ExplSyntaxOff
\begin{document}

\ttfamily

AaBeCiDoAu ->
\activateall
AaBeCiDoAu
\deactivateall

ABCDA ->
\activateall
ABCDA
\deactivateall

AXBXCXDXAX ->
\activateall
AXBXCXDXAX
\deactivateall

\end{document}

Ulrich Diez · Answer 2 · 2019-07-05T16:02:09.303

Using \futurelet you can crank out the cases of the following token being { or } or a space-token.

If the following token is none of these tokens, you can have LaTeX process it as macro-argument.

Macro-arguments in turn can be examined in various ways.

One way of interpreting your vague specifications is:

\documentclass{article}
\makeatletter
\newcommand\fetchargendgroup[1]{#1\endgroup}%
\begingroup
\def\tmp{\ActiveABCD}%
\catcode`\A=\active
\catcode`\B=\active
\catcode`\C=\active
\catcode`\D=\active
\@firstofone{%
  \expandafter\endgroup
  \expandafter\newcommand\expandafter{\tmp}{%
    \begingroup
    \protected\def A{\bgroup\futurelet\tmp\checka}%
    \protected\def B{\bgroup\futurelet\tmp\checkb}%
    \protected\def C{\bgroup\futurelet\tmp\checkc}%
    \protected\def D{\bgroup\futurelet\tmp\checkd}%
    \catcode`\A=\active
    \catcode`\B=\active
    \catcode`\C=\active
    \catcode`\D=\active
    \fetchargendgroup
  }%
}%
\newcommand\checka{\check{P}{Lx}}
\newcommand\checkb{\check{Q}{Mx}}
\newcommand\checkc{\check{R}{Nx}}
\newcommand\checkd{\check{S}{SxX}}
\protected\def\check#1#2{%
  \ifcat\noexpand\tmp\bgroup\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
  {\egroup#1x}{%
    \ifcat\noexpand\tmp\egroup\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
    {\egroup#1x}{%
      \ifcat\noexpand\tmp\@sptoken\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
      {\egroup#1x}{%
        \nextargfork{#1}{#2}%
      }%
    }%
  }%
}%
\newcommand*\mya{a}
\newcommand*\mye{e}
\newcommand*\myi{i}
\newcommand*\myo{o}
\newcommand*\myu{u}
\newcommand*\myX{X}
\newcommand\nextargfork[3]{%
  \def\tmp{#3}%
  \ifx\tmp\myX\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
  {\egroup#2}{%
    \ifx\tmp\mya\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
    {\egroup#1}{%
      \ifx\tmp\mye\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
      {\egroup#1}{%
        \ifx\tmp\myi\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
        {\egroup#1}{%
          \ifx\tmp\myo\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
          {\egroup#1}{%
            \ifx\tmp\myu\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
            {\egroup#1}{\egroup#1x#3}%
          }%
        }%
      }%
    }%
  }%
}%
\makeatother

\begin{document}
\verb|\ActiveABCD{AaeBCD AXaBXCXDX}|:  \ActiveABCD{AaeBCD AXaBXCXDX}%
\end{document}

If this does not behave as desired, please specify all requirements regarding replacement exactly.

I'm sorry for not being able to explain my problem well, but I didn't understand your code. I'll try to edit my post — Niranjan, Jul 07 '19 at 13:58

Conditionally replacing sequences of characters

2 Answers2

Linked