Can you use l3regex / \regex_replace_all:nnN for replacing each explicit character token of category 11 in a token list by its category 12 pendant?

Question

Can you use l3regex / \regex_replace_all:nnN for replacing each explicit character token of category 11(letter) in a token list by its category 12(other) pendant? Or by its category 6(parameter) pendant?

More generalized: Can you use l3regex / \regex_replace_all:nnN for replacing each explicit character token of a certain category in a token-list by its pendant of another category?
(Leave scenarios aside where this would lead to tokens of the token-list containing unbalanced braces at some time of the processing.)

If so: How? What must the replacement text with \regex_replace_all:nnN look like?

If not: Never mind. I will write my own replacement-routine. ;-)

As expected by me the example below does not work out—\scratchy_showloop:n reveals that after the regexp-replace the token-list still contains "letters"=character tokens of category 11, and not "characters"=character tokens of category 12.

(I deliver this minimal example for satisfying those who insist in seeing a minimal example.)

\ExplSyntaxOn
\tl_new:N \l__scratchy_tl
\cs_new:Nn \scratchy_showloop:n {
  \quark_if_recursion_tail_stop:n {#1}
  \cs_show:N #1
  \scratchy_showloop:n
}
\tl_set:Nn  \l__scratchy_tl { abcdefg }
\regex_replace_all:nnN { \cL. } { \cO(\0) }  \l__scratchy_tl
\exp_args:NnV \use:nn \scratchy_showloop:n  \l__scratchy_tl  \q_recursion_tail \q_recursion_stop
\stop

if the tokens in the list are not expandable you can insert \string before each found character then edef the whole list at the end — David Carlisle, Feb 02 '22 at 09:01
(for the own-replacement-routine part, tokcycle may help if you(future reader)'re open to new packages. expl3 manipulation of token list that preserves whitespaces is a little confusing (there's \tl_if_head_is_space but none for determining how many spaces there are.) — user202729, Feb 02 '22 at 09:53
@DavidCarlisle So I chose a bad scenario. Instead assume for some obscure reason replacing each explicit character token of category 11(letter) in a token list by its category 6(parameter) pendant. ;-) — Ulrich Diez, Feb 02 '22 at 11:06
you could do that the same way, just replacing string by \foo defined as \def\foo#1{\Ucharcat{\#1}{6}}` (more or less) — David Carlisle, Feb 02 '22 at 11:09
Looks like the answer is quite clearly "no". Have to write your own, then. (Lua solution may be easier than TeX solution, but it is (should be) possible in both) — user202729, Feb 02 '22 at 11:12
@user202729 You don't have to count spaces, you can easily loop through the tokens one by one, and decide how to process each. If there are two spaces, you process one space then another. This code does that: https://pastebin.com/raw/m0cSDE3Q — Phelype Oleinik, Feb 02 '22 at 11:25
@user202729 It should be feasible in Lua. I don't know if a solution is feasible only in TeX which is without shortcomings: The question subsumes the task of detecting whether an arbitrary token is an explicit character token. The token could be an implicit character token like an active character token let equal to some non-active pendant... This probably could be cranked out via delimited arguments, but in unicode there are 1114112 code points... ;-) — Ulrich Diez, Feb 02 '22 at 11:25
@UlrichDiez tokcycle seems to be able to do it, although I don't know how it works internally. — user202729, Feb 02 '22 at 11:27
@PhelypeOleinik Yes, I know it's somehow possible macros - How to iterate through a token list to make characters uppercase, while preserving spaces? - TeX - LaTeX Stack Exchange (see, your answer), but I decide to just use tokcycle instead of figuring out how that one works. — user202729, Feb 02 '22 at 11:28
@user202729 Yeah, that's the standard approach to manipulating a token list token by token in expl3 (you can see the main loop macro \__diez_detokenize_loop:w is quite similar to \__zbp_loop:w in the other answer). I'm not familiar with tokcycle, but I know one big difference is that tokcycle is not expandable, so that may be a no go depending on the application. Other than that, I think it serves the same purpose — Phelype Oleinik, Feb 02 '22 at 11:33
@user202729 Recently I posted a routine \ReplicateEveryHash which leaves spaces in place. However as a side-effect it replaces explicit character tokens of category 1/2 by explicit {/} of category 1/2 which might bite you in edge-case-scenarios. ;-) — Ulrich Diez, Feb 02 '22 at 11:33
@PhelypeOleinik Why is the thing called \diez_detokenize:n? ;-) — Ulrich Diez, Feb 02 '22 at 11:36
@UlrichDiez Well, it kinda does what \detokenize does. Then I added four random letters to the name so it is not confused with the primitive ;-) — Phelype Oleinik, Feb 02 '22 at 11:38
@PhelypeOleinik I'm glad the name of the routine is based on randomness. ;-) With names of macros that belong to package code, I suggest analogously following the guidelines for package-ids at https://ctan.org/file/help/ctan/CTAN-upload-addendum. I.e., not naming them by some people put naming them by their purpose. — Ulrich Diez, Feb 02 '22 at 11:50
@UlrichDiez If I were to upload that code to CTAN, then yes, I'd invent a descriptive name, but since it's just a one-off for a stackexchange post, I didn't bother trying to exercise my creativity (plus, naming the macros after the user who asked the question makes it easier to locate the code in case someone 2 years from now decides to copy only half of it and complain it's not working). Would you like me to delete the comment with that code? — Phelype Oleinik, Feb 02 '22 at 12:01
On that note, etl package has a macro to do the task similar-to-tokcycle expandably, but if expandable is a requirement then it has the same limitation of active-character-let-equal-to-other-character, see the answer to expansion - Define an expandable function for comparing a token list to a string in LaTeX3 - TeX - LaTeX Stack Exchange. — user202729, Feb 02 '22 at 12:10
@PhelypeOleinik Please don't misunderstand me: I don't take umbrage. I am amused. And I am a nitpicker who from time to time does his nitpicking in the wrong place. That's all. ;-) — Ulrich Diez, Feb 02 '22 at 13:16
@UlrichDiez Ah, no problem! It's hard to pick up tone in written communication, and I didn't quite understand if you were upset or just, as you say, nitpicking :) — Phelype Oleinik, Feb 02 '22 at 13:29
@UlrichDiez Bruno has been informed of the problem and said “Hum... indeed it would be good if that would work.” There may be a workable solution in the near future. — egreg, Feb 02 '22 at 13:42
@PhelypeOleinik It's definitely more my fault than the medium's. Unfortunately, I have a talent for appearing odd. Even people who see me face-to-face say that. Your way of responding to questions and providing answers is characterized by a level of courtesy, objectivity, expertise and diligence that I admire. I am far from taking umbrage at anything you write. :-) — Ulrich Diez, Feb 02 '22 at 13:54
@egreg Thank you. :-) When I wrote down the question, I wasn't sure if - this is not meant as a criticism - it was currently not possible or if once more I was just too clumsy to grasp the l3regex documentation correctly. — Ulrich Diez, Feb 02 '22 at 14:02
Future note/correction, for the own-replacement-routine part \tl_analysis_* macros would be better, tokcycle does not preserve char code of {} tokens and expandable-macro option is slow (quadratic in number of tokens) — user202729, Apr 21 '22 at 06:04

score 2 · Accepted Answer · answered Feb 02 '22 at 09:38

You can use \tl_set_rescan:Nno instead, together with category code tables:

\ExplSyntaxOn
\cctab_new:N \g_xitoxii_cctab
\cctab_gset:Nn \g_xitoxii_cctab
 {
  \cctab_select:N \c_document_cctab
  \int_step_inline:nn { 255 }
   {
    \int_compare:nT { \char_value_catcode:n { #1 } = 11 } { \char_set_catcode_other:n { #1 } }
   }
 }
\tl_set:Nn \l_tmpa_tl {abcde@f&^}
\tl_set_rescan:Nno \l_tmpa_tl { \cctab_select:N \g_xitoxii_cctab } { \l_tmpa_tl }
\tl_analysis_show:N \l_tmpa_tl
\stop

The console would show

The token list \l_tmpa_tl contains the tokens:
>  a (the character a)
>  b (the character b)
>  c (the character c)
>  d (the character d)
>  e (the character e)
>  @ (the character @)
>  f (the character f)
>  & (alignment tab character &)
>  ^ (superscript character ^).

Needless to say, this may also change the catcode of other characters, if they don't have standard catcode. — user202729, Feb 02 '22 at 09:52

Can you use l3regex / \regex_replace_all:nnN for replacing each explicit character token of category 11 in a token list by its category 12 pendant?

1 Answers1