Changing escape character in \tl_set_rescan:Nnn

Question

I'm playing around a bit with the \tl_set_rescan:Nnn function (originally to make this answer more concise), but I'm struggling to make even pretty simple uses of this function work.

Say we want to make all \ characters letters and spaces have their usual catcode. As far as I understand, the following code should produce identical outputs:

\documentclass{article}
\usepackage{expl3}

\begin{document}
\ExplSyntaxOn

\group_begin:
\char_set_catcode_escape:N \~
~char_set_catcode_letter:N ~\
~char_set_catcode_space:n {32}

~tl_set:Nn ~l_tmpa_tl {<\verb|\LaTeX| \LaTeX>}
~tl_show:N ~l_tmpa_tl
~group_end:

%%%%%%%%%%

\tl_set_rescan:Nnn \l_tmpa_tl
  { \char_set_catcode_space:n {32} \char_set_catcode_letter:N \\ }
  {<\verb|\LaTeX| \LaTeX>}
\tl_show:N \l_tmpa_tl

\ExplSyntaxOff
\end{document}

outputs

> \l_tmpa_tl=<\verb|\LaTeX| \LaTeX>.

> \l_tmpa_tl=<\verb |\LaTeX |\LaTeX >.

The result of the first token list is correct, \ was made a letter character and thus no extra spaces are output after the control sequences. However, in the rescan attempt the control sequences do still exist. Also note the missing space after the second |.

How do you make the second version produce the expected result? Or perhaps a bit more broad, as these functions don't seem to work well with verbatim input, what are the designated use cases for them?

Phelype Oleinik · Answer 1 · 2019-06-21T13:11:18.793

8

At the time you do ~tl_set:Nn ~l_tmpa_tl {<\verb|\LaTeX| \LaTeX>}, you have already set the catcode of \ to 11 and the catcode of a space character to 10, so at the time ~tl_set:Nn expands (grabbing the text as argument), the pseudo control sequences are not tokenized as control sequences, so TeX doesn't add any space after them, and what is tokenized is:

<_₁₂\_₁₁v_₁₁e_₁₁r_₁₁b_₁₁|_₁₂\_₁₁L_₁₁a_₁₁T_₁₁e_₁₁X_₁₁|_₁₂_₁₀\_₁₁L_₁₁a_₁₁T_₁₁e_₁₁X_₁₁>_₁₂

Note that as you did not insert any space after the pseudo control sequences (\verb and \LaTeX) they did not show up, as expected. Plus, the space is there after the second |_₁₂ because at the time ~tl_set:Nn expanded the space was not being ignored.

Now, you end the group and everything is back to normal. At the time TeX expands \tl_set_rescan:Nnn and grabs <\verb|\LaTeX| \LaTeX> as argument, \ is the control character and the space is catcode 9, i.e., ignored. Thus immediately when TeX sees it, the text is tokenized as the 7 tokens:

<_₁₂\verb |_₁₂\LaTeX |_₁₂\LaTeX >_₁₂

Notice that the space after | never existed in the first place, and notice also that after the three control sequences \verb, \LaTeX, and \LaTeX, TeX inserts the usual after-control-sequence space. So in this catcode regime, what TeX actually sees in the first place is <\verb |\LaTeX |\LaTeX >. And now \tl_set_rescan:Nnn does its thing and retokenizes the whole thing as:

<_₁₂\_₁₁v_₁₁e_₁₁r_₁₁b_₁₁_₁₀|_₁₂\_₁₁L_₁₁a_₁₁T_₁₁e_₁₁X_₁₁_₁₀|_₁₂\_₁₁L_₁₁a_₁₁T_₁₁e_₁₁X_₁₁_₁₀>_₁₂

In short: the problem is your two inputs are different to start with.

edited Jun 21 '19 at 13:11

answered Jun 19 '19 at 11:13

Phelype Oleinik

70,814

I could look (https://tex.stackexchange.com/a/496258/7832), hence the original query, I think the problem is related to the use \newlinechar before and after using \tl_set_rescan:Nnn(which internally assigns other values to \newlinechar). – Pablo González L Jun 19 '19 at 11:28
@PabloGonzálezL Give me a minute to try to understand the problem in your question... – Phelype Oleinik Jun 19 '19 at 11:46
@PhelypeOleinik I should have read the "TeXhackers note" for \tl_set_rescan:Nnn more carefully, as it implies the second argument is read in before the new catcodes are applied. The documentation before reads as if it would be absorbed under the new catcode regime to me. So those functions seem to be useless when dealing with verbatim code then. – siracusa Jun 19 '19 at 22:58
2

@siracusa If I'm not mistaken I read it somewhere that to support a consistent API, the expl3 functions, by design, don't do catcode changes while grabbing an argument precisely to avoid unexpected behaviour of a function. Which makes sense: you use a function without the need to worry that this one or that one will grab its contents verbatim or under some peculiar catcode regime. Of course, if you do need that, then you have to work yourself around this feature to get the catcodes right. – Phelype Oleinik Jun 20 '19 at 09:55
@PhelypeOleinik Thanks for the implementation. expl3 should definitely have some basic support for handling verbatim material like in your wrapper, IMHO. The thing is, if you do all the catcode setting yourself, you don't need \tl_set_rescan:Nnn anymore. Actually, replacing it by \tl_set:Nn #1 {#3} in your implementation gives the same result. – siracusa Jun 21 '19 at 00:28
@siracusa Hm... Obviously. I feel kind of stupid now. I'll delete that part of the answer. Thanks :-) – Phelype Oleinik Jun 21 '19 at 13:09
@PhelypeOleinik Nah, no need to feel stupid. I've already learned something new from your (now deleted) solution, e.g. I didn't know about the very useful \tl_analysis_show:N function. :) – siracusa Jun 22 '19 at 00:23
@siracusa Oh, yes, that one really comes in handy when debugging things with catcode changes. Glad it helped :-) Sorry it doesn't answer your question, but I don't think there's much that can be done. The space after the | can be added with a ~ circumventing the lack of space in expl3 syntax. The space after the control sequences is a harder problem, because when TeX reads the token list it inserts those spaces, but when it makes \ a letter it doesn't know it has to remove that space. I think you'd need some kind of post-processing in this case... – Phelype Oleinik Jun 22 '19 at 00:38

Changing escape character in \tl_set_rescan:Nnn

1 Answers1

Linked