\tl_replace_all:Nnn recurse subgroups

Question

Suppose I had a token list variable containing abc{ab{abc}c}. I want to replace every occurrence of b with d. As you can see there are subgroups containing b which I also want to have replaced, so a simple

\documentclass{article}
\usepackage{xparse}
\ExplSyntaxOn
\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }
\tl_replace_all:Nnn \l_tmpa_tl { b } { d }
\tl_show:N \l_tmpa_tl
\ExplSyntaxOff
\begin{document}
\end{document}

won't do, as the result is adc{ab{abc}c} (only the b in the outermost grouping level was replaced).

One might attempt to grab the first level of grouping by mapping the token list à la

\documentclass{article}
\usepackage{xparse}
\ExplSyntaxOn
\tl_new:N \l_tmpc_tl
\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }
% iterate all tokens
\tl_map_inline:Nn \l_tmpa_tl
 {
  % obtain sub token list
  \tl_set:Nn \l_tmpb_tl { #1 }
  % replace
  \tl_replace_all:Nnn \l_tmpb_tl { b } { d }
  % append to result
  \tl_put_right:NV \l_tmpc_tl \l_tmpb_tl
 }
\tl_show:N \l_tmpc_tl
\ExplSyntaxOff
\begin{document}
\end{document}

Unfortunately, this only acts on the first level and has the undesired side-effect of completely losing that level, as the result of this is adcad{abc}c.

How can I do a recursive search and replace without losing grouping? (Bonus for full expandability!)

I hope this simple example does not lose any generality.

@ChristianHupfer I haven't used l3regex before. I am happy to read your answer using it. — Henri Menke, May 26 '16 at 21:37
\regex_replace_all:nnN is here for this purpose. In any case, I think they actually do something similar for \tl_(lower|upper)_case:nn so you might want to look at the implementation until an answer arrives. — Manuel, May 26 '16 at 21:37
@Manuel: I think that \regex_replace_all:nnN does something similar internally what Henri tried to achieve with the \tl_map_inline — , May 26 '16 at 21:45
@ChristianHupfer Yes, that's what I would use, but I didn't know he was not aware of l3regex so I thought he was trying to solve it without that, hence my proposal of looking into \tl_.._case:nn. I think l3regex does internally by \detokenizeing the token list, then doing replacements. — Manuel, May 26 '16 at 21:48
@Manuel It would actually be awesome, if one could do it with an approach similar to \tl_..._case:n, because these are fully expandable. — Henri Menke, May 26 '16 at 21:54
Replacing tokens in a variable necessarily does an assignment which never is expandable. — cgnieder, May 26 '16 at 22:14
@clemens I though about it like this: \tl_set:Nx \l_tmpa_tl { \tl_upper_case:n { abc } } — Henri Menke, May 27 '16 at 06:56

score 8 · Answer 1 · answered May 26 '16 at 21:40

8

While Manuel was commenting, I remembered \regex_replace_all:nnN, where the first argument contains the token to be replaced by the 2nd argument in the token variable given as 3rd. argument.

\documentclass{article}


\usepackage{l3regex}

\begin{document}

\ExplSyntaxOn

\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }

Before:\space \l_tmpa_tl \par

\regex_replace_all:nnN  {b} {d} \l_tmpa_tl

After:\space \l_tmpa_tl

\ExplSyntaxOff

\end{document}

answered May 26 '16 at 21:40

While l3regex surely works, it makes it hard to replace strings containing many special characters, because each of the has to be escaped. – Henri Menke May 26 '16 at 21:46
1

@HenriMenke It's not that difficult, in my opinion. Could you give an example? And, by the way, why putting the code on an external webpage? – Manuel May 26 '16 at 21:51
@HenriMenke: Well, I am waiting for a use case then – May 26 '16 at 21:55
@ChristianHupfer See this answer of mine, where I have weird replacements to do like =[ → \sqsubseteq. – Henri Menke May 26 '16 at 21:57
@Manuel See comment to Christian. I didn't want to clutter the question with irrelevant code. – Henri Menke May 26 '16 at 21:58
@HenriMenke: I see -- well, that's nothing I can attack quickly. – May 26 '16 at 21:59
1

@HenriMenke How can a minimal example required to understand the question be irrelevant code? – cfr May 26 '16 at 21:59
3

@HenriMenke \regex_replace_all:nnN { =[ } { \c{sqsubseteq} } \l_tmpa_tl – Manuel May 26 '16 at 21:59

score 6 · Answer 2 · edited Apr 13 '17 at 12:36

I don't know if this is what you mean now I've seen your discussion in comments, but it certainly works for the cases in the original question.

This code does not rely on any package or code explicitly designated as experimental by the L3 developers.

Note, however, that I have no idea what I am doing.

Caveat emptor ....

Counts are included to show that the grouping within the token lists is preserved e.g. that a{bcde}f is counted as 3 tokens and not 6 or 8 when the token list is reassembled. During processing, the string is obviously counted as having more tokens since this is necessary to search and replace within the groups.

The result of the replacement operation is stored in a globally set variable \g_henri_mod_tl.

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{expl3}

\begin{document}

\ExplSyntaxOn
\str_new:N \l_henri_mod_str
\int_new:N \l_henri_tmpa_int
\int_new:N \l_henri_tmpb_int
\int_new:N \l_henri_tmpc_int
\tl_new:N \g_henri_mod_tl

\cs_new_protected:Npn \henri_replace_all:nnn #1 #2 #3
{
  \group_begin:
    \str_clear:N \l_henri_mod_str
    \int_zero:N \l_henri_tmpa_int
    \str_set:Nn \l_tmpa_str { #1 }
    \str_set:Nn \l_tmpb_str { #2 }
    \int_set:Nn \l_henri_tmpb_int { \str_count:N \l_tmpa_str }
    \int_set:Nn \l_henri_tmpc_int { \str_count:N \l_tmpb_str }
    \int_compare:nTF { \l_henri_tmpc_int = 1 }
    {
      \int_do_until:nn { \l_henri_tmpb_int = \l_henri_tmpa_int }
      {
        \str_if_eq_x:nnTF { #2 } { \str_head:N \l_tmpa_str }
        {
          \str_put_right:Nx \l_henri_mod_str { #3 }
        }
        {
          \str_put_right:Nx \l_henri_mod_str { \str_head:N \l_tmpa_str }
        }
        \str_set:Nx \l_tmpa_str { \str_tail:N \l_tmpa_str }
        \int_incr:N \l_henri_tmpa_int
      }
    }
    {
      \int_do_until:nn { \l_henri_tmpb_int = \l_henri_tmpa_int }
      {
        \str_if_eq_x:nnTF { #2 } { \str_range:Nnn \l_tmpa_str { 1 } { \l_henri_tmpc_int } }
        {
          \str_put_right:Nx \l_henri_mod_str { #3 }
          \int_set:Nn \l_tmpa_int { \str_count:N \l_tmpa_str }
          \str_set:Nx \l_tmpa_str { \str_range:Nnn \l_tmpa_str { 1 + \l_henri_tmpc_int } { \l_tmpa_int } }
          \int_add:Nn \l_henri_tmpa_int { \l_henri_tmpc_int }
        }
        {
          \str_put_right:Nx \l_henri_mod_str { \str_head:N \l_tmpa_str }
          \str_set:Nx \l_tmpa_str { \str_tail:N \l_tmpa_str }
          \int_incr:N \l_henri_tmpa_int
        }
      }
    }
    \tl_gset_rescan:Nno \g_henri_mod_tl {} { \l_henri_mod_str }
  \group_end:
}

\cs_generate_variant:Nn \henri_replace_all:nnn { Vnn }

\henri_replace_all:nnn { abc{ab{abc}c} } { b } { d }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }
\henri_replace_all:Vnn \l_tmpa_tl { b } { d }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { {a=b}\,{[]} } { [ } { \sqsubseteq }
$\g_henri_mod_tl$ {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { gydihŵs } { y } { w }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { abc{ab{abc}c} } { bc } { doodle }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { {a=[b}\,{[]} } { =[ } { \sqsubseteq }
$\g_henri_mod_tl$ {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\ExplSyntaxOff

\end{document}

EDITED to deal with searches for strings of more than one character. This can correctly substitute =[ with \sqsubseteq as mentioned in a comment.

EDIT

It is possible to define a further command sequence which obeys the target syntax. However, it should be noticed that this will not work in all cases. In particular, it fails to work correctly with gwdihŵs.

The idea is just to do the replacement and then spit out the global variable. I am not sure that it is correct to call the macro \tl_replace_allrecursive:nnn as this lacks any appropriate prefix, but if the macro is for purely personal use and you're not worried about future breakage, that's up to you. Personally, I'd call it something like \henri_replace_allrecursive:nnn and be safe since I don't see anything to be gained from violating the naming rules.

\cs_new_protected:Npn \tl_replace_allrecursive:nnn #1 #2 #3
{
  \henri_replace_all:nnn { #1 } { #2 } { #3 }
  \g_henri_mod_tl
}

Then we can say

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { abc{ab{abc}c} } { b } { d } }
\l_tmpa_tl \par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { {a=b}\,{[]} } { [ } { \sqsubseteq }  }
$\l_tmpa_tl$ \par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { abc{ab{abc}c} } { bc } { doodle } }
\l_tmpa_tl \par
%
\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { {a=[b}\,{[]} } {  =[ } { \sqsubseteq } }
$\l_tmpa_tl$ \par

and, comparing with the original results, we can see that the replacements are as expected (less gwdihŵs, of course).

I take it the count of tokens here is irrelevant since everything is being expanded.

Complete code:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{expl3}

\begin{document}

\ExplSyntaxOn
\str_new:N \l_henri_mod_str
\int_new:N \l_henri_tmpa_int
\int_new:N \l_henri_tmpb_int
\int_new:N \l_henri_tmpc_int
\tl_new:N \g_henri_mod_tl

\cs_new_protected:Npn \henri_replace_all:nnn #1 #2 #3
{
  \group_begin:
    \str_clear:N \l_henri_mod_str
    \int_zero:N \l_henri_tmpa_int
    \str_set:Nn \l_tmpa_str { #1 }
    \str_set:Nn \l_tmpb_str { #2 }
    \int_set:Nn \l_henri_tmpb_int { \str_count:N \l_tmpa_str }
    \int_set:Nn \l_henri_tmpc_int { \str_count:N \l_tmpb_str }
    \int_compare:nTF { \l_henri_tmpc_int = 1 }
    {
      \int_do_until:nn { \l_henri_tmpb_int = \l_henri_tmpa_int }
      {
        \str_if_eq_x:nnTF { #2 } { \str_head:N \l_tmpa_str }
        {
          \str_put_right:Nx \l_henri_mod_str { #3 }
        }
        {
          \str_put_right:Nx \l_henri_mod_str { \str_head:N \l_tmpa_str }
        }
        \str_set:Nx \l_tmpa_str { \str_tail:N \l_tmpa_str }
        \int_incr:N \l_henri_tmpa_int
      }
    }
    {
      \int_do_until:nn { \l_henri_tmpb_int = \l_henri_tmpa_int }
      {
        \str_if_eq_x:nnTF { #2 } { \str_range:Nnn \l_tmpa_str { 1 } { \l_henri_tmpc_int } }
        {
          \str_put_right:Nx \l_henri_mod_str { #3 }
          \int_set:Nn \l_tmpa_int { \str_count:N \l_tmpa_str }
          \str_set:Nx \l_tmpa_str { \str_range:Nnn \l_tmpa_str { 1 + \l_henri_tmpc_int } { \l_tmpa_int } }
          \int_add:Nn \l_henri_tmpa_int { \l_henri_tmpc_int }
        }
        {
          \str_put_right:Nx \l_henri_mod_str { \str_head:N \l_tmpa_str }
          \str_set:Nx \l_tmpa_str { \str_tail:N \l_tmpa_str }
          \int_incr:N \l_henri_tmpa_int
        }
      }
    }
    \tl_gset_rescan:Nno \g_henri_mod_tl {} { \l_henri_mod_str }
  \group_end:
}

\cs_generate_variant:Nn \henri_replace_all:nnn { Vnn }

\cs_new_protected:Npn \tl_replace_allrecursive:nnn #1 #2 #3
{
  \henri_replace_all:nnn { #1 } { #2 } { #3 }
  \g_henri_mod_tl
}

\verb|\henri_replace_all:nnn {  } {  } {  } \g_henri_mod_tl|
\smallskip\par

\henri_replace_all:nnn { abc{ab{abc}c} } { b } { d }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }
\henri_replace_all:Vnn \l_tmpa_tl { b } { d }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { {a=b}\,{[]} } { [ } { \sqsubseteq }
$\g_henri_mod_tl$ {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { gydihŵs } { y } { w }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { abc{ab{abc}c} } { bc } { doodle }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { {a=[b}\,{[]} } { =[ } { \sqsubseteq }
$\g_henri_mod_tl$ {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\bigskip\par
\verb|\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { ... } { ... } { ... } }|
\smallskip\par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { abc{ab{abc}c} } { b } { d } }
\l_tmpa_tl \par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { {a=b}\,{[]} } { [ } { \sqsubseteq }  }
$\l_tmpa_tl$ \par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { abc{ab{abc}c} } { bc } { doodle } }
\l_tmpa_tl \par
%
\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { {a=[b}\,{[]} } {  =[ } { \sqsubseteq } }
$\l_tmpa_tl$ \par


\ExplSyntaxOff

\end{document}

In terms of 'why not in expl3: we've not needed it for our work, and likely will aim to avoid such things :-) (We do have an expandable set up for replacing material, used in for example case changing, and that is probably more 'interesting', plus there is l3regex for tricky cases.) — Joseph Wright, May 30 '16 at 08:12
@JosephWright Thanks for the information. Were you responding to something I said? — cfr, May 30 '16 at 13:00
@JosephWright I was thinking of providing an expandable replacement like \tl_replacement_new:nnn { name } { search } { replace } and then \tl_set:Nx \l_tmpa_tl { \tl_replacement_use:nn { name } { search and destroy } } which would expand to replace and destroy. — Manuel, Jun 05 '16 at 13:25

wipet · Accepted Answer · 2016-06-05T20:26:09.957

6

If you need full expandable solution of replacing text including text in groups, here is an idea:

\def\repl#1{\replA #1{\end}}
\def\replA#1#{\replB#1\end}
\def\replB#1{\ifx\end#1\expandafter\replC \else\replX{#1}\expandafter\replB\fi}
\def\replC#1{\ifx\end#1\empty\else{\repl{#1}}\expandafter\replA\fi}
\def\replX#1{\ifx#1bd\else#1\fi}

\message{....\repl{abc{aabc{bb}}cb}}

\bye

The \message command expands its argument and prints: ....adc{aadc{dd}}cd.

This code ignores spaces between tokens. The space handling was not specified in your task. If you need to keep spaces unchanged then the code needs to be a slight more complicated (about five more lines).

Edit My estimation was not exact. The code which accepts spaces needs only three more lines:

\def\repl#1{\replA #1{\end}}
\def\replA#1#{\replD#1 {\end} }
\def\replD#1 #2 {\replB#1\end
   \ifx\end#2\expandafter\replC\else\space\fihere\replD#2 \fi}
\def\replB#1{\ifx\end#1\else\replX{#1}\expandafter\replB\fi}
\def\replC#1{\ifx\end#1\empty\else{\repl{#1}}\expandafter\replA\fi}
\def\replX#1{\ifx#1bd\else#1\fi}
\def\fihere#1\fi{\fi#1}

\message{....\repl{ab c{ aa bc {bb}}cb}}

\bye

edited Jun 05 '16 at 20:26

answered Jun 05 '16 at 18:37

wipet

74,238

I admire your skill in writing so concise and short macros! – Henri Menke Jun 05 '16 at 20:37
But isn't it a little dangerous to use \end as a delimiter? What if it gets expanded by accident? – Henri Menke Jun 05 '16 at 20:38
Hm. I just noticed that this is limited to a one-to-one token replacement. It would be great to be able to replace substrings of several tokens. – Henri Menke Jun 05 '16 at 20:56
@HenriMenke At least there's a requirement that you need to define with a symbolic name such kind of replacement. Like \definerepl{name}{foo}{bar} \edef\tmp{\repl{name}{foo{foo{foo}}}}. – Manuel Jun 05 '16 at 21:31
@Manuel No, You can do fully expandable string comparison in pdftex using pdfstrcmp. – Henri Menke Jun 05 '16 at 21:43
@HenriMenke But how do you know what to “grab”. How do you know (easily) that \repl{henrimenke}{here}{it would work henrimenke?}, you definitely need some sort of \def\foo#1henrimenke#2\end{#1here#2}. – Manuel Jun 05 '16 at 21:50
@HenriMenke I'd mistakenly assumed you wanted an expl3 solution. But I guess I never really understood the question. Should I delete my answer? – cfr Jun 05 '16 at 22:54
@cfr No! Your answer is of very high quality and is a good example of how to have non-expandable recursive replacing in expl3. – Henri Menke Jun 07 '16 at 20:50
1

@HenriMenke in wipet's world, \end is a non-expandable primitive:-) – David Carlisle Jun 08 '16 at 14:32
@DavidCarlisle True ;) But what if you hit it and it is executed (that's what I meant, not expanded)? – Henri Menke Jun 08 '16 at 14:42
@DavidCarlisle It is irrelevant where is my world. The processing fails if \end is executed as primitive (unwanted end of all processing) or \end is expanded (another curious bug). But \end is not executed nor expanded in my macros:). – wipet Jun 08 '16 at 19:58
@wipet yes I know! it was just a flippant reply to Henri's comment not really about your macros at all, but I'll delete it (and this comment) in a bit:-) – David Carlisle Jun 08 '16 at 20:20
1

@DavidCarlisle Please, don't delete your comments. It is enjoyable to read about different worlds:) – wipet Jun 08 '16 at 20:48

Manuel · Answer 4 · 2016-08-31T06:33:28.560

Here's (hopefully) a working fully expandable solution with expl3. It's f-expandable, and, of course, x-expandable.

This solution provides three commands

\setfreplace{name}{search}{replace}
\freplace{name}{token list where one wants to search}
\hmenke_tl_replace_nested:Nnn \l_tmpa_tl { search } { replace }

Once certain replacement is defined, the \freplace will search and replace recursively inside of braces and will return the token list already replaced within \unexpanded so it won't expand further in \edef. \hmenke_tl_replace_nested:Nnn works as expected.

\documentclass{scrartcl}

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{etoolbox,xparse}

\ExplSyntaxOn

\cs_generate_variant:Nn \cs_generate_variant:Nn { c }

\NewDocumentCommand \setfreplace { m +m +m }
 {
  \hmenke_set_freplace:nnn { #1 } { #2 } { #3 }
 }
\DeclareExpandableDocumentCommand \freplace { +m +m }
 {
  \hmenke_freplace:nn { #1 } { #2 }
 }
\quark_new:N \q_hmenke
\cs_new:Npn \hmenke_freplace:nn #1 #2
 {
  \exp_not:f { \use:c { hmenke_freplace_#1:n } { #2 } }
 }
\cs_new_protected:Npn \hmenke_set_freplace:nnn #1 #2 #3
 {
  \cs_set:cpx { hmenke_freplace_#1:n } ##1
   {
    \exp_not:c { hmenke_freplace_#1_auxi:nw } { } ##1 { \exp_not:N \q_hmenke }
   }
  \cs_set:cpx { hmenke_freplace_#1_auxi:nw } ##1 ##2 ##
   {
    \exp_not:c { hmenke_freplace_#1_nobraces:nfn }
     { ##1 } { \exp_not:c { hmenke_freplace_#1_do:n } { ##2 } }
   }
  \cs_set:cpx { hmenke_freplace_#1_nobraces:nnn } ##1 ##2
   {
    \exp_not:c { hmenke_freplace_#1_auxii:nn } { ##1 ##2 }
   }
  \cs_generate_variant:cn { hmenke_freplace_#1_nobraces:nnn } { nf }
  \cs_set:cpx { hmenke_freplace_#1_auxii:nn } ##1 ##2
   {
    \exp_not:N \str_if_eq:nnTF { \exp_not:N \q_hmenke } { ##2 }
     { \exp_stop_f: ##1 }
     {
      \exp_not:c { hmenke_freplace_#1_addbraces:nfw }
       { ##1 } { \exp_not:c { hmenke_freplace_#1:n } { ##2 } }
     }
   }
  \cs_set:cpx { hmenke_freplace_#1_addbraces:nnw } ##1 ##2
   {
    \exp_not:c { hmenke_freplace_#1_auxi:nw } { ##1 { ##2 } }
   }
  \cs_generate_variant:cn { hmenke_freplace_#1_addbraces:nnw } { nf }
  \cs_set:cpx { hmenke_freplace_#1_do:n } ##1
   {
    \exp_not:N \tl_if_empty:nTF { ##1 }
     { \exp_stop_f: }
     {
      \exp_not:c { hmenke_freplace_#1_auxiii:nww }
       { } ##1 \exp_not:n { #2 \q_hmenke \q_stop }
     }
   }
  \cs_set:cpx { hmenke_freplace_#1_auxiii:nww } ##1 ##2 #2 ##3 \q_stop
   {
    \exp_not:N \str_if_eq:nnTF { \exp_not:N \q_hmenke } { ##3 }
     { \exp_stop_f: ##1 ##2 }
     {
      \exp_not:c { hmenke_freplace_#1_auxiii:nww }
       { ##1 ##2 \exp_not:n { #3 } } ##3 \exp_not:N \q_stop
     }
   }
 }

\cs_generate_variant:Nn \hmenke_freplace:nn { nV }
\cs_new_protected:Npn \hmenke_tl_replace_nested:Nnn #1 #2 #3
 {
  \hmenke_set_freplace:nnn { _hmenke_ } { #2 } { #3 }
  \tl_set:Nx #1 { \hmenke_freplace:nV { _hmenke_ } #1 }
 }

\ExplSyntaxOff

\setfreplace{wipet}{b}{d}
\setfreplace{cfr1}{[}{\sqsubseteq}
\setfreplace{cfr2}{y}{w}
\setfreplace{cfr3}{bc}{doodle}
\setfreplace{cfr4}{=[}{\sqsubseteq}
\setfreplace{style}{\textbf}{\textsf}


\begin{document}

%%% For this particular example to show the \detokenize of the f-expanded \freplace
\ExplSyntaxOn
\DeclareExpandableDocumentCommand \fdetokenize { } { \exp_args:Nf \tl_to_str:n }
\ExplSyntaxOff
%%%

\newcommand*\test[3][]{\par\medskip
  edef: \edef\tmp{\freplace{#2}{#3}}\meaning\tmp\par
  detokenize: \fdetokenize{\freplace{#2}{#3}}\par
  output: #1\tmp#1\par}

\test{wipet}{ab c{ aa bc {bb}}cb}
\test[$]{cfr1}{{a=b}\,{[]}}
\test{cfr2}{gydihŵs}
\test{cfr3}{abc{ab{abc}c}}
\test[$]{cfr4}{{a=[b}\,{[]}}
\test{style}{this is \textbf{boldface}, {or {is it {\textbf{sans} serif}?}}}

\ExplSyntaxOn
\ExplSyntaxOn
\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }
\hmenke_tl_replace_nested:Nnn \l_tmpa_tl { b } { d }
\tl_show:N \l_tmpa_tl
\ExplSyntaxOff

\end{document}

Here it is with a different way of calling it, may be more natural, that doesn't rely on symbolic names. You “define” the replacement with

\setfreplace{search}{replace}

and then call it with \freplace{search}{replace}{string to search text}, which might be more natural

\documentclass{scrartcl}

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{etoolbox,xparse}

\ExplSyntaxOn

\cs_generate_variant:Nn \cs_generate_variant:Nn { c }

\NewDocumentCommand \setfreplace { +m +m }
 {
  \freplace_set:nn { #1 } { #2 }
 }
\DeclareExpandableDocumentCommand \freplace { +m +m +m }
 {
  \freplace:nnn { #1 } { #2 } { #3 }
 }
\quark_new:N \q_freplace
\quark_new:N \q_freplacestop
\cs_new:Npn \freplace:nnn #1 #2 #3
 {
  \exp_not:f { \use:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } ):n } { #3 } }
 }
\cs_new_protected:Npn \freplace_set:nn #1 #2
 {
  \cs_set:cpx { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } ):n } ##1
   {
    \exp_not:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_auxi:nw } { } ##1 { \exp_not:N \q_freplace }
   }
  \cs_set:cpx { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_auxi:nw } ##1 ##2 ##
   {
    \exp_not:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_nobraces:nfn }
     { ##1 } { \exp_not:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_do:n } { ##2 } }
   }
  \cs_set:cpx { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_nobraces:nnn } ##1 ##2
   {
    \exp_not:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_auxii:nn } { ##1 ##2 }
   }
  \cs_generate_variant:cn { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_nobraces:nnn } { nf }
  \cs_set:cpx { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_auxii:nn } ##1 ##2
   {
    \exp_not:N \str_if_eq:nnTF { \exp_not:N \q_freplace } { ##2 }
     { \exp_stop_f: ##1 }
     {
      \exp_not:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_addbraces:nfw }
       { ##1 } { \exp_not:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } ):n } { ##2 } }
     }
   }
  \cs_set:cpx { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_addbraces:nnw } ##1 ##2
   {
    \exp_not:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_auxi:nw } { ##1 { ##2 } }
   }
  \cs_generate_variant:cn { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_addbraces:nnw } { nf }
  \cs_set:cpx { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_do:n } ##1
   {
    \exp_not:N \tl_if_empty:nTF { ##1 }
     { \exp_stop_f: }
     {
      \exp_not:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_auxiii:nww }
       { } ##1 \exp_not:n { #1 \q_freplace \q_freplacestop }
     }
   }
  \cs_set:cpx { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_auxiii:nww } ##1 ##2 #1 ##3 \q_freplacestop
   {
    \exp_not:N \str_if_eq:nnTF { \exp_not:N \q_freplace } { ##3 }
     { \exp_stop_f: ##1 ##2 }
     {
      \exp_not:c { freplace_( \tl_to_str:n { #1 } )_( \tl_to_str:n { #2 } )_auxiii:nww }
       { ##1 ##2 \exp_not:n { #2 } } ##3 \exp_not:N \q_freplacestop
     }
   }
 }

\cs_generate_variant:Nn \freplace:nnn { nnV }
\cs_new_protected:Npn \tl_replace_nested:Nnn #1 #2 #3
 {
  \freplace_set:nn { #2 } { #3 }
  \tl_set:Nx #1 { \freplace:nnV { #2 } { #3 } #1 }
 }

\ExplSyntaxOff

\setfreplace{ }{}
\setfreplace{b}{d}
\setfreplace{\textbf}{\textsf}


\begin{document}

%%% For this particular example to show the \detokenize of the f-expanded \freplace
\ExplSyntaxOn
\DeclareExpandableDocumentCommand \fdetokenize { } { \exp_args:Nf \tl_to_str:n }
\ExplSyntaxOff

\newcommand\test[3]{\par\fdetokenize{\freplace{#1}{#2}{#3}}\par}
%%%

\test{b}{d}{ab c{ aa bc {bb}}cb}
\test{ }{}{string to {be removed} {of {spaces }}}
\test{\textbf}{\textsf}{this is \textbf{boldface}, {or {is it {\textbf{sans} serif}?}}}

\ExplSyntaxOn
\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }
\tl_replace_nested:Nnn \l_tmpa_tl { b } { d }
\tl_to_str:N \l_tmpa_tl
\ExplSyntaxOff

\end{document}

score 0 · Answer 5 · answered Mar 26 '22 at 03:17

As a somewhat-general solution to manipulate tl as list of tokens, the analysis family of expl3 functions can be used.

Alternatives

l3regex can be used too, but internally it uses peek_analysis_map_inline anyway.
tokcycle is also good, but it does not support getting charcode of {} directly (as far as I can see.).

(also, for rebuilding the token list, to deal with the fact that the intermediate value may be unbalanced, the approach is to store e.g. \iffalse { \else } \fi into it then x-expand the thing. Which is also the approach of l3regex)

There's a slight disadvantage however, that the result will not be expandable.

The actual code:

\def \f #1 {
    \tl_build_begin:N \a
    \tl_analysis_map_inline:nn {#1} {
        \int_compare:nNnTF {##2} = {`b} {
            \tl_build_put_right:Nn \a {d}
        } {
            \tl_build_put_right:Nn \a {##1}
        }
    }
    \tl_build_end:N \a
    \tl_set:Nx \a {\a}
}

Takes input as #1, store result into \a.

_{needless to say, in real code don't redefine important LaTeX macros such as \f, \a etc.}

\tl_replace_all:Nnn recurse subgroups

5 Answers5

Caveat emptor ....

EDIT

Linked