0

Let's say, I want to replace all occurrences of \A{ <balanced text> } in my token list with \B{ \A{ <the same content> } }.

e.g.

\process{\A{123}}  % -> \processresult = \B{\A{123}}

\process{before \A{1{2}3} after} % -> \processresult = before \B{\A{1{2}3}} after

\process{before {\A} after} % -> don't need to support this case, but probably no-op

Is there an easy way to do that?

It would be nice if the code also support:

  • support "balanced text" have arbitrarily deep (instead of e.g. 1 level)
  • preserve the char code of {}
  • preserve white spaces

I don't think l3regex can do this? (both (.*) and (.*?) doesn't care about balance level)

actually I already wrote myself the "hard" way using \tl_analysis_map_inline:nn.

I think it's also possible to do with a recursive function, although I'm not sure if it can preserve char code of {}.


Example how (naive) application of regex won't work:


%! TEX program = lualatex
\documentclass{article}
\begin{document}

\ExplSyntaxOn

\def\process #1 { \def __a {#1} \texttt{ input:~ \exp_args:NV \detokenize __a } \par \regex_replace_all:nnN {\c{A} \cB{ (.*) \cE}} {\c{B} \cBx \1 \cEy } __a \texttt{ output:\exp_args:NV \detokenize __a } \par }

\process{before \A{1{2}3} \A{1{2}3} after}

\def\process #1 { \def __a {#1} \texttt{ input:~ \exp_args:NV \detokenize __a } \par \regex_replace_all:nnN {(\c{A} \cB{ .*? \cE})} {\c{B} \cB[ \1 \cE] } __a \texttt{ output:\exp_args:NV \detokenize __a } \par }

\process{before \A{1{2}3} \A{1{2}3} after}

\ExplSyntaxOff

\end{document}

Result:

input: before\A {1{2}3}\A {1{2}3}after
output:before\B x1{2}3}\A {1{2}3yafter
input: before\A {1{2}3}\A {1{2}3}after
output:before\B [\A {1{2}]3}\B [\A {1{2}]3}after

Here the new brace groups are [] instead of {} to show the difference clearly... it can be seen the content inside is not balanced.

user202729
  • 7,143
  • 1
    do you really need to replace, why not just locally define \A{..}, to do \B{\originalA{..}} ? – David Carlisle Mar 26 '22 at 08:13
  • @DavidCarlisle Hm, actually good idea. Let's say there's some other control sequence in it, and x-expand will mess it up, and I need to store the result that that some other token list. – user202729 Mar 26 '22 at 08:19
  • @UlrichDiez 1. I guess it means "l3regex cannot do it in general"... either way is okay. 2. I think the natural option is \A(X)\B{A(X)}? – user202729 Mar 26 '22 at 10:51
  • Not that I particularly need an answer (see, I already have a solution using \tl_analysis_map_inline:nn ), but maybe I'll learn how to deal with macros - Get \string-ification of first opening brace in argument?/Get \string-ification of first opening brace's matching closing brace in argument? - TeX - LaTeX Stack Exchange later expandably for... perhaps performance improvement. – user202729 Mar 26 '22 at 10:54
  • The token list contains \B {\A {1{2}3}} after the substitution. Do you mean you want the braces to print? (or the token list not to expand?) – Cicada Mar 26 '22 at 12:56
  • @Cicada I don't understand what you're talking about. – user202729 Mar 26 '22 at 13:53
  • @user202729 Sorry, I don't understand what you want the output to look like. The regex method works for me (that is, the replacement action replaces, as described in the question), so it is likely that I have misconstrued the use-case. – Cicada Mar 27 '22 at 07:09
  • @Cicada In the question I wrote "(both (.) and (.?) doesn't care about balance level)" – user202729 Mar 27 '22 at 07:13
  • @user202729 What is your expected output? Do you have an MWE? When I test, for the three cases, \tl_show:N shows that the token list contains: \B {\A {123}}, \B {\A {1{2}3}} and {\B {\A {}}} (when the 3rd case is adjusted to {\A{}}; plus, it is a separate structure from the first two, so needs its own regex). The TL prints accordingly (I made working dummy \A and \B definitions, each to print #1), as per catcodes, so catcodes are OK. – Cicada Mar 27 '22 at 08:45
  • @Cicada Uh, I forgot to mention the cases where it fails; although I think it can be seen because of how regex works... maybe will edit. – user202729 Mar 27 '22 at 08:52
  • @Cicada See there... – user202729 Mar 27 '22 at 08:58
  • The first thing to ask is whether it is possible to balance tokens with regular expressions to begin with. – egreg Mar 27 '22 at 09:08
  • @egreg Well, I know it cannot. Just asking if there's any "easy" way (not necessarily with regex), since balanced text is the "natural" unit of information in TeX. – user202729 Mar 27 '22 at 09:10
  • @user202729 The problem is that you don't want to use TeX for this. – egreg Mar 27 '22 at 09:49
  • @egreg well, the team wrote the whole LaTeX3 kernel in e-TeX instead of Lua... – user202729 Mar 27 '22 at 09:57
  • @user202729 Please, don't be offensive. Why can't you use TeX features for this? Because you don't want it to do expansion, or your structure would be destroyed before any possibility to add \B{...} around it. And TeX checks for balanced groups only when expanding macros or performing definitions and assignments. Now, what if \A is a two argument macro? You'd want \B{\A{1}{2}}; what if \A has delimited arguments? And so on. – egreg Mar 27 '22 at 10:21
  • @egreg (that was not meant to be offensive by the way, just remark.) -- It's okay, I have a specific use case in mind (was writing some TeX library (basically it allows user to write e.g. { \some \content \EXECUTETHIS {\something} \EXPANDANDINSERTTHIS {\ int_eval:n {1+1}} \some \other \content } and the result is e.g. { \some \content 2 \some \other \content}) -- as I said, I already implement the thing by counting braces, just want to see if there's some other simpler approaches I missed. // library code still very wrong – user202729 Mar 27 '22 at 10:27
  • it's okay if there isn't, I can still do the task. – user202729 Mar 27 '22 at 10:30

0 Answers0