4

Part III The l3names package—Namespace for primitives, Section 1 Setting up the LaTeX3 programming language of interface3.pdf says:

This module is entirely dedicated to primitives (and emulations of these), which should not be used directly within LaTeX3 code (outside of “kernel-level” code). As such, the primitives are not documented here: The TeXbook, TeX by Topic and the manuals for pdfTeX, XeTeX, LuaTeX, pTeX and upTeX should be consulted for details of the primitives. These are named \tex_⟨name⟩:D, typically based on the primitive’s ⟨name⟩ in pdfTeX and omitting a leading pdf when the primitive is not related to pdf output.

And in the answer to some question I recently read these statements:

One should never use \scantokens in expl3 code.
One should never use \...:D control sequences in expl3 code.

I stumbled over an issue where I don't see how to do it without \scantokens/\tex_scantokens:D:

I use xparse and intend to pass a +v-argument containing \verb*|...|-directives to some function for re-tokenizing the input.

In the following (first) example \tex_scantokens:D is used and everything works out as expected by me:

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn \group_begin: \char_set_catcode_other:N ^^M \use:n{ \group_end: \NewDocumentCommand\PassVerbArgToRetokenizer{+v}{ \group_begin: \tl_set:Nn \l_tmpa_tl {#1} \exp_args:Nnc \use:n { \exp_args:Nno \tl_put_right:Nn { \l_tmpa_tl } } {@percentchar} \exp_args:Nno \tl_put_left:Nn {\l_tmpa_tl} {\token_to_str:N\endgroup ^^M} %\tl_show:N \l_tmpa_tl \tex_newlinechar:D=\tex_endlinechar:D \exp_args:NV \tex_scantokens:D {\l_tmpa_tl} } } \ExplSyntaxOff

\begin{document}

\PassVerbArgToRetokenizer|\verb*+Some verbatim stuff!+|

\end{document}

You get a .pdf-file containing the following:

enter image description here

In the following (second) example \tl_rescan:nn is used instead of \tex_scantokens:D and you get an error-message:
! LaTeX Error: \verb ended by end of line.

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn \group_begin: \char_set_catcode_other:N ^^M \use:n{ \group_end: \NewDocumentCommand\PassVerbArgToRetokenizer{+v}{ \group_begin: \tl_set:Nn \l_tmpa_tl {#1} \exp_args:Nnc \use:n { \exp_args:Nno \tl_put_right:Nn { \l_tmpa_tl } } {@percentchar} \exp_args:Nno \tl_put_left:Nn {\l_tmpa_tl} {\token_to_str:N\endgroup ^^M} %\tl_show:N \l_tmpa_tl %\tex_newlinechar:D=\tex_endlinechar:D %\exp_args:NnV \tl_rescan:nn {\tex_newlinechar:D=\tex_endlinechar:D} {\l_tmpa_tl} \exp_args:NnV \tl_rescan:nn {} {\l_tmpa_tl} } } \ExplSyntaxOff

\begin{document}

\PassVerbArgToRetokenizer|\verb*+Some verbatim stuff!+|

\end{document}

My questions are:

  1. The result when using \scantokens/\tex_scantokens:D differs from the result when using \tl_rescan:nn. Which essential/crucial difference between \scantokens/\tex_scantokens:D and \tl_rescan:nn causes the difference in the results?

(The answer to the first question may enable me to answer the following questions myself.)

(I suppose it has to do with \tl_rescan:nn triggering insertion of the entire sequence of tokens that result from "re-scanning" the entire sequence into the token-stream at once: In the example TeX takes an active + for the second delimiter of \verb*'s argument. (This is done by some lowercase-trickery with active ~.) If things are already tokenized and inserted at once, the second + will already be of catcode 12(other) at the time of carrying out \verb*, thus TeX won't find a matching delimiter denoting the end of \verb*'s argument.

If I got it right \tl_rescan:nn on the one hand triggers TeX to re-tokenize (and this way produce tokens) the entire ⟨tokens⟩-sequence immediately and to immediately and in one go and at once append the entire resulting token-sequence to the token-stream in TeX's gullet.
\scantokens/\tex_scantokens:D on the other hand triggers TeX to re-tokenize things (and this way produces tokens) on demand only, bit by bit, hereby using \scantokens's argument as source of input (as if the tokens of the argument were written to file unexpanded) rather than some .tex-input-file, and each time it produces tokens it produces only as many tokens as were demanded by the other digestive organs. The other digestive organs in turn bit by bit process these tokens and hereby, e.g., carry out catcode-changes denoted by these tokens. These changes in turn may affect how subsequent things get (re-)tokenized—be it via \scantokens, be it via reading/processing a .tex-input-file, be it via reading from the console.)

  1. Where is my misunderstanding regarding how \tl_rescan:nn works/what \tl_rescan:nn does?
  2. What did I do wrong in the second example?
  3. How can I achieve by means of things provided by expl3 without using \tex_scantokens:D what I get in the first example where \tex_scantokens:D is used?
Ulrich Diez
  • 28,770
  • 2
    \tl_rescan:nn is not completely equivalent to \tex_scantokens:D (quite obviously), but we still don't have an interface for it, so there are places where you have to use the primitive (see the note I recently added to the :D specifier here). What you are trying to do is more or less what Pablo did in scontents. The main difference is that \tl_rescan:nn scans the entire token list with the current catcode setup (plus <setup>), whereas \tex_scantokens:D may change catcodes as it goes. – Phelype Oleinik Nov 17 '20 at 20:04
  • @PhelypeOleinik What about: “A notable difference between this function and the underlying \tex_scantokens:D primitive is that the former uses a fixed catcode to scan/re-tokenize all the in one go, whereas the primitive scans/re-tokenizes bit by bit, on demand, whenever more tokens are needed, and in each phase of producing tokens produces only as many tokens as needed, so that between the phases, in which tokens are produced, there are also phases, in which the things given/denoted by the produced tokens are executed/carried out." – Ulrich Diez Nov 17 '20 at 21:57
  • @PhelypeOleinik I think my suggestion needs to be corrected. What about: “A notable difference between this function and the underlying \tex_scantokens:D primitive is that the former uses a fixed catcode to scan/re-tokenize all the in one go, whereas the primitive delivers characters for re-tokenization on demand, whenever TeX's tokenizing-apparatus needs more input-characters so that with \scantokens time-intervals of delivering characters to the tokenizing-apparatus are mixed with time-intervals in which tokens are produced from these input-... – Ulrich Diez Nov 18 '20 at 16:10
  • @PhelypeOleinik ...from these input-characters and the things given/denoted by the produced tokens are executed/carried out. As a consequence --unlike with the processed by \scantokens-- directives inside the -argument for temporarily changing category-codes (things like \verb or the verbatim-environment bring along such directives) do not affect how subsequent things within that argument get re-tokenized because these subsequent things are already re-tokenized when it comes to carrying out these directives. – Ulrich Diez Nov 18 '20 at 16:10

0 Answers0