Stringify input - \string on token list

Question

This is a problem I came across and I couldn't find an answer to it on the web so I'm going to do an Q&A-style post here.

The problem

With the tex primitive \string one can turn the following token into a String representation of itself. E.g. \string\macro will result in the (unexpandable) string\macro.
However \string only works on the next token. But I wanted to stringify a complete token set which might be obtained via parameter in a macro.

As it turns out the problem of stringifying some input can be solved much more elegantly than by using \string which is why you won't find it in the accepted answer. There are alternative answers describing a \string-approach though.

Would going the other direction be an option for you? Have (La)TeX gather/read and tokenize the macro-parameter from the tex-source-code under verbatim-catcode-régime and then apply \scantokens for re-tokenizing in situations where you do not need the stringified/verbatimized variant but the normal-catcode-régime-variant? — Ulrich Diez, Jan 12 '19 at 11:10
Next question: Can the token-list that is to be stringified also contain (nested) curly braces? I'm asking this because curly braces can interfere with macro-based mechanisms that are intended to process things token-wise while actually processing things argument-wise. — Ulrich Diez, Jan 12 '19 at 11:14
@UlrichDiez I don't really understand what you are describing in your first comment. As I understood it this approach would get into trouble if one was to pass an undefined macro to it. And yes the content that should be stringified is not restricted to certain inputs so curly braces and nestings of them are allowed. — Raven, Jan 12 '19 at 11:37
Do you know the \verb-command? It does change the category codes before reading and tokenizing its argument so that the argument gets tokenized as a set of characters, no control-sequences. No loss of space-characters. It is possible to create a similar command. In situations where you don't want the characters but the control-sequence-tokens, you can pass the characters to \scantokens which acts as if things were written to external file before \inputting that file under normal catcode-régime where you get control-sequences etc... In case of interest I can elaborate on that. — Ulrich Diez, Jan 12 '19 at 20:42

Ulrich Diez · Answer 1 · 2021-12-28T17:01:50.483

[This is part 1 of my answer.

Due to the limitation of the amount of characters within an answer I have to divide this answer into two parts.

This part contains a lot of explanations about how things work in LaTeX.

Part 2 contains a coding-example for a routine \UDCollectverbarg. ]

Seems you wish—on the basis of a set of tokens—to create .dvi- or .pdf-output or an external text file whose content looks like the tex source code which led to the coming into being of these tokens.

Due to the way in which LaTeX "digests" .tex-input/tex source code, it is not possible to exactly conclude from a set of tokens to the look of the tex source code due to which these tokens came into being.

This has to do with the ways in which LaTeX acts when reading/processing the tex source code for forming tokens:

LaTeX does read the tex source code line by line, processing each line character by character for forming tokens (character tokens, control sequence tokens) that are to be inserted into the token stream for further processing.

(There are two kinds of control sequence tokens:

Control word tokens are control sequence tokens whose names consist of a single character of category code 11(letter) or of several characters. E.g., \e and \LaTeX.

Control symbol tokens have names that consist of a single character which is not of category code 11(letter). E.g., \!, \?, \7. )

One of the first things LaTeX does to a line of input, even before starting producing tokens, is removing all space characters (the number of the code point of the space character is 32 both in UTF-8 and in ASCII which are the two possible internal character encodings of LaTeX) that are at the right end of it. After that it inserts at the right end of the line a character whose code point's number equals the value of the integer parameter \endlinechar. Usually the value of \endlinechar is 13 which is the number of the code point for the carriage return character in many encodings, e.g., in ASCII and in UTF-8. ASCII is the internal character encoding with old-school TeX engines. UTF-8 is the internal character encoding with more recent TeX engines like LuaTeX and XeTeX.

Then LaTeX switches the state of its reading apparatus to state N (new line).
(LaTeX has a reading apparatus. It can have one of three states:
State N: New line. This state indicates that LaTeX is starting to process another line of input.
State M: Middle of line. This state indicates that LaTeX is processing characters somewhere within a line of input.
State S: Skipping blanks. This state indicates that LaTeX shall skip characters whose category code is 10(space) rather than inserting an explicit space token (character code 32, category code 10(space)) into the token stream.)

Then LaTeX starts to look at the line, character by character.

In LaTeX each character has a so called category code.

The category code of a character influences what action LaTeX will perform when in the input encountering that character.

E.g., when LaTeX in the input finds a character of category code 0(escape), LaTeX will start gathering the name of a control sequence token from the subsequent characters on the current line and then insert that control sequence token into the token stream. Usually the backslash-character \ is the only character whose category code is 0(escape).
Right after finding such a character, LaTeX will be right at the start of gathering the name of a control sequence token.
In case the following character does not have category code 11(letter), LaTeX will take that next character for the name of a control symbol token and stop gathering and insert the corresponding control symbol token into the token stream and switch the reading apparatus to state M.
In case the following character does have category code 11(letter), LaTeX will take that next character for the first character of the name of a control word token and will keep on gathering (hereby being somewhere in the middle of gathering) until either reaching the end of the line or reaching the end of the file or encountering a character whose category code is not 11(letter) which then will not be considered part of the name of the control word token in question but will be considered something that needs to be looked at separately.
Then LaTeX will take the characters gathered so far for the name of the control word token and insert the corresponding control word token into the token stream and switch the reading apparatus to state S.

E.g., when LaTeX in the input finds a character of category code 11(letter) while not gathering the name of a control word token, LaTeX will insert a character token into the token stream whose category is 11(letter) and whose character code equals the number of the code point of that character in LaTeX's internal character encoding (which, depending on the underlying engine, either is ASCII or is UTF-8).
When LaTeX in the input finds a character of category code 11(letter) while gathering the name of a control word token, it will take that character for a part of the name of that control word token and keep on gathering.

E.g., when LaTeX in the input finds a character of category code 12(other) while not at the start of gathering the name of a control sequence token, LaTeX will insert a character token into the token stream whose category is 12(other) and whose character code equals the number of the code point of that character in LaTeX's internal character encoding (which, depending on the underlying engine, either is ASCII or is UTF-8) and will switch the reading apparatus to state M.
E.g., when LaTeX in the input finds a character of category code 12(other) while at the start of gathering the name of a control sequence token, LaTeX will take that character for the name of a control symbol token and will insert the corresponding control symbol token into the token stream and switch the reading apparatus to state M.

By the way:

LaTeX always switches the reading apparatus to state M after tokenizing and inserting into the token stream a non-space character token or a control symbol token whose name is not formed by a character of category code 10(space).

LaTeX always switches the reading apparatus to state S after tokenizing and inserting into the token stream a control-symbol-token whose name is formed by a character of category code 10(space) or an explicit space token or a control word token.

LaTeX always switches the reading apparatus to state N when starting to process another line of TeX-input.

E.g., when LaTeX in the input finds a character of category code 10(space)—usually the space character, code point 32 both in ASCII and in UTF-8, and the horizontal-tab character, code point 9 both in ASCII and in UTF-8, are the only characters of category code 10(space), there basically are two possibilities:
Possibility 1:
LaTeX might be at the start of gathering the name of a control sequence token: In this case LaTeX will take the space character for the name of that control sequence token and thus will insert the control symbol token \ (control space) into the token stream and switch the reading apparatus to state S.
Possibility 2:
LaTeX might not be at the start of gathering the name of a control sequence token.
In case it is somewhere in the middle of gathering the name of a control word token, the characters gathered so far will form the name of that control word token, and that control word token will be inserted into the token stream and the reading apparatus will be switched to state S.
Further action depends on the state of the reading apparatus:
In state S, LaTeX will ignore the space character, not inserting any token for it into the token stream. (Now you see why spaces in the input behind character sequences that lead to insertion of control word tokens into the token stream won't lead to the insertion of space tokens into the token stream.)
In state N, LaTeX will ignore that space character, not inserting any token for it into the token stream.
In state M, LaTeX will insert an explicit space token (character token of category 10(space) and character code 32) into the token stream.
In any case LaTeX will switch to state S after encountering a character of category code 10(space).

This implies that with several consecutive characters of category code 10(space), the ones that follow the first one will always be skipped, not yielding any token as they will always find the reading apparatus switched to state S due to the processing of their predecessors.

Another effect of this concept is that sequences of subsequent characters of category code 10(space) that in the tex source code occur at the beginnings of lines won't lead to the coming into being of whatsoever tokens: The first one will not yield whatsoever token due to the reading apparatus being in state N. The subsequent ones will not yield whatsoever token due to the reading apparatus being in state S.
This is why you usually can indent macro code and the like from the left by means of space characters and/or tab characters for improving the readability.

E.g., when LaTeX in the input finds a character of category code 14(comment) while right at the start of gathering the name of a control sequence token, LaTeX will insert the control symbol token whose name corresponds to that character into the token stream and switch the reading apparatus to state M. (Exception: In case that character was a space character, you'll get a control space \ and the reading apparatus will be switched to state S.)
When LaTeX in the input finds a character of category code 14(comment) while not right at the start of gathering the name of a control sequence token, LaTeX will stop processing the current line, and thus drop subsequent characters of that line. In case of being in the middle of gathering the name of a control word token, the characters gathered so far will form the name of that control word token and the corresponding control word token will be inserted into the token stream.
As LaTeX stops processing the current line, it will continue with starting processing the next line of input if present. When grabbing that next line for processing it, the state of the reading apparatus will be switched to state N. Usually % is the only character of category code 14(comment).

E.g., when LaTeX in the input finds a character of category code 5(end of line) while right at the start of gathering the name of a control sequence token, LaTeX will insert the control symbol token whose name corresponds to that character into the token stream and switch the reading apparatus to state M. (Exception: In case that character was a space character, you'll get a control space \ and the reading apparatus will be switched to state S.)
When LaTeX in the input finds a character of category code 5(end of line) while not right at the start of gathering the name of a control sequence token, LaTeX will stop processing the current line, and thus drop subsequent characters of that line. In case of being in the middle of gathering the name of a control word token, the characters gathered so far will form the name of that control word token and the corresponding control word token will be inserted into the token stream and the reading apparatus will be switched to state S.
The next action in this case depends on the state of the reading apparatus:
If in state N, the token \par will be inserted.
If in state M, an explicit space token (character token of category 10(space) and character code 32) will be inserted.
If in state S, no token will be inserted.
In any of these three cases LaTeX will then start processing the next line, hereby switching the reading apparatus to state N.

Above it was said, that LaTeX inserts a character according to the value of \endlinechar at each line-ending and that usually \endlinechar has the value 13 which denotes the code point of the carriage return character. Now the information is added that usually the carriage return character has category code 5. This implies that usually every line-ending is processed with a carriage return character of category code 5(end of line) at the end. Thus an empty line implies insertion of a carriage return character of category code 5(end of line) at the beginning of that empty line which in turn implies processing that inserted character while the reading apparatus is still in state N which in turn implies the insertion of a \par-token. That's why usually an empty line has the same effect as \par.

After this little excursus we see that a sequence of tokens does not necessarily resemble all the characters of the tex source code whose reading and tokenizing lead to the coming into being of that token sequence:

In many situations spaces and tabs do not yield tokens at all.
Empty lines might yield \par-tokens while \string\par does not yield linebreaks but \, p, a, r.
Characters of category code 14(comment) do not yield tokens at all.
Characters of category code 9(ignore) do not yield tokens at all.

Also, the output of the \string-primitive does not necessarily resemble the tex source code:

If \string is applied to an explicit character token, the result will be a character token of equal character code but—in case of the character token where \string is applied to not having character code 32 (32 is the number of the space character's codepoint)—of category 12(other) or—in case of the character token where \string is applied to having character code 32—of category 10(space).

If \string is applied to a control sequence token, the result will be a sequence of character tokens:
In case the integer parameter \escapechar has a positive value within the range of the code points of the internal character encoding of the engine in use, a character token will be delivered whose character code equals the value of \escapechar and whose category is 12(other) (exception: In case of \escapechar denoting the space character, the category is 10(space) ). Usually \escapechar has the value 92 which is the number of the coding point of the backslash character. Then a sequence of character tokens follows, each of them denoting a character of the name of the control sequence token, the character code being the number of the code point of that character in LaTeX's internal character encoding, the category being 12(other) (in case of the code point in question not denoting the space character) or 10(space) (in case of the code point denoting the space character).

If \string is applied to the nameless control sequence token, the result will be the sequence \csname\endcsname.

Therefore I suggest going the opposite direction:

Create a macro which switches to verbatim category code régime and then gathers its argument via having LaTeX read and tokenize input from the file containing your tex source code.

"Switching to verbatim category code régime" means changing category codes of input characters in a way which leads to each character of that snippet of tex source code that forms the argument—after reading and tokenizing—having a counterpart in terms of a character token.
Under verbatim category code régime, e.g., the backslash does not have category code 0(escape) but does have category code 12(other) and thus does not lead to gathering the name of a control sequence token but does yield a backslash character token of category code 12(other). This way no control sequences will come into being while gathering the argument. Only character tokens will come into being.
Under verbatim category code régime, e.g., the space character does not have category code 10(space) but does have category code 12(other). Thus it will be treated as an ordinary thing whereafter the reading apparatus is not switched to state S but is switched to state M resulting in consecutive spaces not "collapsing into a single space token".

When LaTeX reads and tokenizes input under verbatim category code régime, then each character of the snippet of tex source code that forms the input in question will after reading and tokenizing have a counterpart in terms of a character token.

You can "feed" an argument that got tokenized this way to the \scantokens-primitive.
The \scantokens-primitive comes along with the eTeX extensions.

The \scantokens-primitive lets LaTeX act as if it would unexpanded-write the tokens that form its ⟨balanced text⟩ to external file and then load that file via \input.
During the latter part of that action, the \input-part, things get (re)tokenized under the category code régime which is in effect while \scantokens is carried out. If that is the normal category code régime, the (re)tokenization also may yield control sequence tokens etc.

During the further part, the \write-part, all the nice subtle rules apply that always apply to TeX's writing of tokens to external text file/screen. (Things like hash doubling. Things like character tokens whose character code equals the value of the integer parameter \newlinechar causing LaTeX to continue writing subsequent things at the beginning of a new line in the external text file/on the screen.)

A macro which switches to verbatim category code régime and then gathers its argument—how should you use such a macro?

As the macro is intended to collect arguments that are to be tokenized under verbatim category code régime, it should be possible to have it collect both arguments where opening curly braces and closing curly arguments are balanced and arguments where these braces are not balanced.
In the further case the same syntax can be applied as with any ordinary mandatory argument—i.e., just nest the argument inside curly braces.
In the latter case the syntax of LaTeX's \verb-macro needs to be applied where you use a character for delimiting the argument which does not occur inside the argument.
Thus it would be a nice idea to have the macro detect whether the very first token of the text that is to be read and tokenized under verbatim category code régime is a curly opening brace or is not a curly opening brace. If it is, collect the remaining tokens of that argument while applying the syntax of the further case. If it is not, collect the remaining tokens of that argument while applying the syntax of the latter case.

Henceforth an argument that is to be tokenized under verbatim category code régime will be called a ⟨verbatimized argument⟩, no matter whether it is to be gatherted applying the further or the latter syntax.

Another issue with handling a ⟨verbatimized argument⟩ is the treatment of the ends of lines:

Above the \endlinechar-thingie was mentioned:

Above it was said, that LaTeX inserts a character according to the value of \endlinechar at each line-ending and that usually \endlinechar has the value 13 which denotes the code point of the carriage return character. Then the information was added that usually the carriage return character has category code 5(end of line) which depending on the state of the reading apparatus may lead to the skipping of the "endline character" inserted or may lead to the coming into being of \par-tokens or explicit space tokens.

When you type text, i.e., when you create tex source code, the carriage return character does not occur in the middle of lines. It is a character which for the software which you use for creating/typing/viewing the tex source code, denotes the end of a line.

Under normal category code régime, where the ^-character has category code 7(math superscript), you can in tex source code apply ^^-notation for denoting some characters that cannot easily be typed on a keyboard.
Under normal category code régime you can in TeX source code use the character-sequence ^^M for denoting the return character. (M is the 13^th letter in the alphabet; carriage return character has code point number 13...)
^^-notation will be transformed while reading and processing the line of input where it occurs, even before putting tokens into the token-stream and even while gathering names of control sequence tokens.
^^-notation is not available under verbatim category code régime, as under that régime the category code of ^ is switched to 12(other).
More exact explanations related to ^^-notation can be found in the TeXbook.

A macro for gathering a ⟨verbatimized argument⟩ could switch the category code of the carriage return character to 12(other). This way you get explicit carriage return character tokens for the ends of lines. (An explicit carriage return character token is a character token of category 12(other) and character code 13. 13 is the number of the code point of the carriage return character both in ASCII and in UTF-8, the possible internal character-encodings with (La)TeX-engines.)
This way you get explicit carriage return character tokens only for the ends of lines.

Thus in LaTeX within a ⟨verbatimized argument⟩ carriage return character tokens can only come into being due to the \endlinechar-thingie, and thus they can be used for denoting places where in the tex source code line-endings occurred.

Thus a nice feature of a macro that gathers a ⟨verbatimized argument⟩ would be an additional argument where you can provide tokens by which explicit carriage return character tokens (which can only come into being at line-endings, due to the \endlinechar-thingie) shall be replaced.

You could use this, e.g., for having explicit carriage return character tokens replaced by explicit line feed character tokens. (An explicit line feed character token is a character token of category 12(other) and character code 10. 10 is the number of the code point of the line feed character both in ASCII and in UTF-8, the possible internal character-encodings with (La)TeX-engines. J is the 10^th letter in the alphabet. In tex source code you can under normal category code régime denote the line feed character via ^^J.)
Why would you do this? The reason has to do with LaTeX's integer parameter \newlinechar: When LaTeX does write tokens unexpanded to a text file or the screen, it will create linebreaks for explicit character tokens whose character codes equals the value of \newlinechar rather than writing the corresponding characters to file.
Usually the value of \newlinechar is 10. Thus usually, when writing tokens unexpanded to a text file or the screen, explicit line feed character tokens can be used for denoting places where LaTeX shall continue writing at the beginning of a new line.

When you intend to have passed the ⟨verbatimized argument⟩ to \scantokens, you can use this feature for having replaced all the explicit carriage return character tokens that denote line-endings by explicit line feed character tokens. The effect will be that when \scantokens does its unexpanded-writing-part, it will "write" line-breaks at these places.

[This link leads to part 2 of my answer.]

Ulrich Diez · Accepted Answer · 2019-03-23T18:43:34.407

[This is part 2 of my answer.

Due to the limitation of the amount of characters within an answer I have to divide this answer into two parts.

This part contains a coding-example for a routine \UDCollectverbarg.

Part 1 contains a lot of explanations about how things work in LaTeX. ]

I can offer a macro \UDcollectverbarg with the following syntax:

\UDcollectverbarg{&langle;^^M-replacement&rangle;}{&langle;non-optional 1&rangle;}{&langle;non-optional 2&rangle;}&langle;verbatimized argument&rangle;

which yields:

&langle;non-optional 1&rangle;{&langle;non-optional 2&rangle;{&langle;verbatimized argument&rangle;}}

, with each character ^^M within the &langle;verbatimized argument&rangle; that denotes an end of a line being replaced by the token-sequence &langle;^^M-replacement&rangle;.

There are no optional arguments:

The non-optional arguments are required. If they consist of several tokens, they must be nested into catcode-1/2-character-pairs / braces.
If reading and tokenizing is necessary, this will take place under unchanged category code régime.
The &langle;verbatimized argument&rangle; is also required. It is to be read and tokenized under verbatim category code régime. If its first character is a brace, it will be "assumed" that the argument is nested into braces. Otherwise it will be assumed that the ending of that argument is delimited by that first character—like the argument of \verb.
Empty-lines will not be ignored.

I chose this syntax as with this syntax you can collect verbatimized arguments within the second non-optional argument by nesting calls to \UDcollectverbarg within \UDcollectverbarg's first non-optional argument.

E.g.,

\UDcollectverbarg{<^^M-replacement>}%
                 {\UDcollectverbarg{<^^M-replacement>}{\UDcollectverbarg{<^^M-replacement>}{<actionA>}}}% <- Mandatory 1
                 {<actionB>}%                     <- Mandatory 2
                 <verbatimized argument 1><verbatimized argument 2><verbatimized argument 3>

yields:

\UDcollectverbarg{<^^M-replacement>}{\UDcollectverbarg{<^^M-replacement>}{<actionA>}}% <- Mandatory 1
                 {<actionB>{<verbatimized argument 1>}}%        <- Mandatory 2
                 <verbatimized argument 2><verbatimized argument 3>

yields:

\UDcollectverbarg{<^^M-replacement>}{<actionA>}% <- Mandatory 1
                 {<actionB>{<verbatimized argument 1>}{<verbatimized argument 2>}}% <- Mandatory 2
                 <verbatimized argument 3>

yields:

<actionA>{<actionB>{<verbatimized argument 1>}{<verbatimized argument 2>}{<verbatimized argument 3>}}

Assume <actionA> = \@firstofone:

\@firstofone{<actionB>{<verbatimized argument 1>}{<verbatimized argument 2>}{<verbatimized argument 3>}}

yields:

<actionB>{<verbatimized argument 1>}{<verbatimized argument 2>}{<verbatimized argument 3>}

%% Copyright (C) 2007 - 2019 by Ulrich Diez (eu_angelion@web.de)
%%
%% This work may be distributed and/or modified under the
%% conditions of the LaTeX Project Public Licence (LPPL), either
%% version 1.3 of this license or (at your option) any later
%% version. (The latest version of this license is in:
%% http://www.latex-project.org/lppl.txt
%% and version 1.3 or later is part of all distributions of LaTeX
%% version 1999/12/01 or later.)
%% The author of this work is Ulrich Diez.
%% This work has the LPPL maintenance status 'not maintained'.
%% Usage of any/every component of this work is at your own risk.
%% There is no warranty - neither for probably included
%% documentation nor for any other part/component of this work.
%% If something breaks, you usually may keep the pieces.

\errorcontextlines=10000

%%<-------------------- code for \UDcollectverbarg -------------------->
\makeatletter
%%......................................................................
%% Check whether argument is empty:
%%......................................................................
%% \UD@CheckWhetherNull{<Argument which is to be checked>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is empty>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is not empty>}%
%% The gist of this macro comes from Robert R. Schneck's \ifempty-macro:
%% <https://groups.google.com/forum/#!original/comp.text.tex/kuOEIQIrElc/lUg37FmhA74J>
\newcommand\UD@CheckWhetherNull[1]{%
  \romannumeral0\expandafter\@secondoftwo\string{\expandafter
  \@secondoftwo\expandafter{\expandafter{\string#1}\expandafter
  \@secondoftwo\string}\expandafter\@firstoftwo\expandafter{\expandafter
  \@secondoftwo\string}\expandafter\expandafter\@firstoftwo{ }{}%
  \@secondoftwo}{\expandafter\expandafter\@firstoftwo{ }{}\@firstoftwo}%
}%
%%......................................................................
\begingroup
\catcode`\^^M=12 %
\@firstofone{%
  \endgroup%
  \newcommand\UDEndlreplace[2]{\romannumeral0\@UDEndlreplace{#2}#1^^M\relax{}}%
  \newcommand*\@UDEndlreplace{}%
  \long\def\@UDEndlreplace#1#2^^M#3\relax#4#5{%
    \UD@CheckWhetherNull{#3}%
    { #5{#4#2}}{\@UDEndlreplace{#1}#3\relax{#4#2#1}{#5}}%
  }%
}%
\newcommand\UDcollectverbarg[3]{%
  \@bsphack
  \begingroup
  \let\do\@makeother % <- this and the next line switch to
  \dospecials        %    verbatim-category-code-régime.
  \catcode`\{=1      % <- give opening curly brace the usual catcode so a 
                     %    curly-brace-balanced argument can be gathered in
                     %    case of the first thing of the verbatimized-argument 
                     %    being a curly opening brace.
  \catcode`\ =10     % <- give space the usual catcode so \UD@collectverbarg
                     %    cannot catch a space as its 4th undelimited argument.
                     %    (Its 4th undelimited argument denotes the verbatim-
                     %     syntax-delimiter in case of not gathering a
                     %     curly-brace-nested argument.)
  \kernel@ifnextchar\bgroup
  {% seems a curly-brace-nested argument is to be caught:
    \catcode`\}=2    % <- give closing curly brace the usual catcode also.
    \UD@collectverbarg{#1}{#2}{#3}{}%
  }{% seems an argument with verbatim-syntax-delimiter is to be caught:
    \do\{ % <- give opening curly brace the verbatim-catcode again.
    \UD@collectverbarg{#1}{#2}{#3}%
  }%
}%
\newcommand\UD@collectverbarg[4]{%
  \do\ %             % <- Now that \UD@collectverbarg has the delimiter or
                     %    emptiness in its 4th arg, give space the 
                     %    verbatim-catcode again.
  \catcode`\^^M=12   % <- Give the carriage-return-character the verbatim-catcode.
  \long\def\@tempb##1#4{%
    \edef\@tempb{##1}%
    \@onelevel@sanitize\@tempb % <- Turn characters into their "12/other"-pendants.
                               %    This may be important with things like the 
                               %    inputenc-package which may make characters 
                               %    active/which give them catcode 13(active).
    \expandafter\UDEndlreplace\expandafter{\@tempb}{#1}{\def\@tempb}% <- this starts 
                               %    the loop for replacing endline-characters.
    \expandafter\UD@@collectverbarg\expandafter{\@tempb}{#2}{#3}% <- this "spits 
                               %    out the result.
  }%
  \@tempb
}%
\newcommand\UD@@collectverbarg[3]{%
  \endgroup
  \@esphack
  #2{#3{#1}}%
}%
%%<---------------- end of code for \UDcollectverbarg ----------------->

% As a usage-example let's now define a macro \CodeAndResult which
% collects a verbatim-argument and does both print it wrapped into a
% verbatim*-environment and execute it.
% This time the eTeX primitive \scantokens is used.
% Basically \CodeAndResult is a wrapper for calling \UDcollectverbarg and
% passing the verbatimized argument to \@CodeAndResult

\newcommand\CodeAndResult{%
  \UDcollectverbarg{^^J}{\@firstofone}{\@CodeAndResult}%
}%

\begingroup
\newcommand\@CodeAndResult[1]{%
  \endgroup
  \newcommand\@CodeAndResult[1]{%
    \par\noindent \underline{Code:}
    \scantokens{\begin{verbatim*}^^J##1^^J#1}%
    \par\noindent \underline{Result:}
    \scantokens{##1}%
  }%
}%
\UDcollectverbarg{^^J}{\@firstofone}{\@CodeAndResult}|\end{verbatim*}|%
\makeatother


\documentclass{article}

\begin{document}

\noindent\hrulefill

% Test with verbatim-delimiter-syntax:
\CodeAndResult|\csname @firstofone\endcsname{\LaTeX} is funny.|

\noindent\hrulefill

% Test with brace-nested-syntax:
\CodeAndResult{\csname @firstofone\endcsname{\TeX} is funny, too.}

\noindent\hrulefill

% Test with verbatim-delimiter-syntax and linebreaks:
\CodeAndResult|Both
\csname @firstofone\endcsname{\LaTeX}
and 
\csname @firstofone\endcsname{\TeX}
are funny.% This is a comment.|

\noindent\hrulefill

% Test with brace-nested-syntax and linebreaks:
\CodeAndResult{Both
\csname @firstofone\endcsname{\TeX}
and 
\csname @firstofone\endcsname{\LaTeX}
are funny.% This is a comment.}

\noindent\hrulefill

\end{document}

[This link leads to part 1 of my answer.]

Wow - thank you very much! This really is an excellent answer to this topic covering everything one might want to know about it. I am marking this part of the answer as this is the one that I feel like answering the concrete question the best. Did you consider releasing this code as a package? I'm sure there'd be a few people finding this very helpful in some cases. — Raven, Jan 27 '19 at 09:07
@Raven You asked: "Did you consider releasing this code as a package?" The macro \VCverbaction from the package verbatimcopy might be of interest to you.... ;-) — Ulrich Diez, Jan 27 '19 at 10:28

Ulrich Diez · Answer 3 · 2019-01-29T05:08:59.387

If the suggestion from my other answer, which was divided into part 1 and part 2, does not help you, and you still wish a routine which takes an undelimited/brace-nested argument and applies \string to each of its tokens, this is feasible.

But you need to cope with the fact that with macros you can actually do things argument-wise only, not token-wise.

This needs to be taken into account in case the argument itself contains things that are nested in curly braces.

Another aspect that needs to be taken into account is that (La)TeX will skip/silently remove explicit space tokens that occur between undelimited/brace-nested macro arguments.

Thus a routine which successively/iteratively applies \string to each of the tokens of its argument whithin each iteration needs to check whether the first token of the remaining argument is either an explicit space token or an opening-brace-token.

If none of these two possibilities is the case, the first token can be taken out of the argument and \string can be applied to it.

If the first token is an explicit space token, it needs to be taken out (which requires another trick than taking out an undelimited argument) and if you wish, you can apply \string to that explicit space token, but that will not make any difference as applying \string to an explicit space token yields an explicit space token.

If the first token is an opening-brace-token, the first component of the argument itself is something that is nested in braces. Thus you need to apply the routine that you are applying to the argument, to that component also. Plus you need to crank out the surrounding braces.

For cranking out the surrounding braces, you can double the argument and then take out the first component from one of the copies.

Then you can have LaTeX iteratively remove things both from the extracted first component and from the first component inside one of the copies until the extracted first component is empty. When this is the case, the closing-brace from inside the copy is reached.

You will need some brace-hacking-trickery.

%% Copyright (C) 2019 by Ulrich Diez (eu_angelion@web.de)
%%
%% This work may be distributed and/or modified under the
%% conditions of the LaTeX Project Public Licence (LPPL), either
%% version 1.3 of this license or (at your option) any later
%% version. (The latest version of this license is in:
%% http://www.latex-project.org/lppl.txt
%% and version 1.3 or later is part of all distributions of LaTeX
%% version 1999/12/01 or later.)
%% The author of this work is Ulrich Diez.
%% This work has the LPPL maintenance status 'not maintained'.
%% Usage of any/every component of this work is at your own risk.
%% There is no warranty - neither for probably included
%% documentation nor for any other part/component of this work.
%% If something breaks, you usually may keep the pieces.

\errorcontextlines=10000

\documentclass{article}

\makeatletter    
%%=============================================================================
%% Paraphernalia:
%%    \UD@firstoftwo, \UD@secondoftwo,
%%    \UD@PassFirstToSecond, \UD@Exchange, \UD@removespace
%%    \UD@CheckWhetherNull, \UD@CheckWhetherBrace,
%%    \UD@CheckWhetherLeadingSpace, \UD@ExtractFirstArg
%%=============================================================================
\newcommand\UD@firstoftwo[2]{#1}%
\newcommand\UD@secondoftwo[2]{#2}%
\newcommand\UD@PassFirstToSecond[2]{#2{#1}}%
\newcommand\UD@Exchange[2]{#2#1}%
\newcommand\UD@removespace{}\UD@firstoftwo{\def\UD@removespace}{} {}%
%%-----------------------------------------------------------------------------
%% Check whether argument is empty:
%%.............................................................................
%% \UD@CheckWhetherNull{<Argument which is to be checked>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is empty>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is not empty>}%
%%
%% The gist of this macro comes from Robert R. Schneck's \ifempty-macro:
%% <https://groups.google.com/forum/#!original/comp.text.tex/kuOEIQIrElc/lUg37FmhA74J>
\newcommand\UD@CheckWhetherNull[1]{%
  \romannumeral0\expandafter\UD@secondoftwo\string{\expandafter
  \UD@secondoftwo\expandafter{\expandafter{\string#1}\expandafter
  \UD@secondoftwo\string}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\expandafter\expandafter\UD@firstoftwo{ }{}%
  \UD@secondoftwo}{\expandafter\expandafter\UD@firstoftwo{ }{}\UD@firstoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether argument's first token is a catcode-1-character
%%.............................................................................
%% \UD@CheckWhetherBrace{<Argument which is to be checked>}%
%%                      {<Tokens to be delivered in case that argument
%%                        which is to be checked has leading
%%                        catcode-1-token>}%
%%                      {<Tokens to be delivered in case that argument
%%                        which is to be checked has no leading
%%                        catcode-1-token>}%
\newcommand\UD@CheckWhetherBrace[1]{%
  \romannumeral0\expandafter\UD@secondoftwo\expandafter{\expandafter{%
  \string#1.}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\expandafter\expandafter\UD@firstoftwo{ }{}%
  \UD@firstoftwo}{\expandafter\expandafter\UD@firstoftwo{ }{}\UD@secondoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether brace-balanced argument starts with a space-token
%%.............................................................................
%% \UD@CheckWhetherLeadingSpace{<Argument which is to be checked>}%
%%                             {<Tokens to be delivered in case <argument
%%                               which is to be checked>'s 1st token is a
%%                               space-token>}%
%%                             {<Tokens to be delivered in case <argument
%%                               which is to be checked>'s 1st token is not
%%                               a space-token>}%
\newcommand\UD@CheckWhetherLeadingSpace[1]{%
  \romannumeral0\UD@CheckWhetherNull{#1}%
  {\expandafter\expandafter\UD@firstoftwo{ }{}\UD@secondoftwo}%
  {\expandafter\UD@secondoftwo\string{\UD@CheckWhetherLeadingSpaceB.#1 }{}}%
}%
\newcommand\UD@CheckWhetherLeadingSpaceB{}%
\long\def\UD@CheckWhetherLeadingSpaceB#1 {%
  \expandafter\UD@CheckWhetherNull\expandafter{\UD@secondoftwo#1{}}%
  {\UD@Exchange{\UD@firstoftwo}}{\UD@Exchange{\UD@secondoftwo}}%
  {\UD@Exchange{ }{\expandafter\expandafter\expandafter\expandafter
   \expandafter\expandafter\expandafter}\expandafter\expandafter
   \expandafter}\expandafter\UD@secondoftwo\expandafter{\string}%
}%
%%-----------------------------------------------------------------------------
%% Extract first inner undelimited argument:
%%
%%   \UD@ExtractFirstArg{ABCDE} yields  {A}
%%
%%   \UD@ExtractFirstArg{{AB}CDE} yields  {AB}
%%.............................................................................
\newcommand\UD@RemoveTillUD@SelDOm{}%
\long\def\UD@RemoveTillUD@SelDOm#1#2\UD@SelDOm{{#1}}%
\newcommand\UD@ExtractFirstArg[1]{%
  \romannumeral0%
  \UD@ExtractFirstArgLoop{#1\UD@SelDOm}%
}%
\newcommand\UD@ExtractFirstArgLoop[1]{%
  \expandafter\UD@CheckWhetherNull\expandafter{\UD@firstoftwo{}#1}%
  { #1}%
  {\expandafter\UD@ExtractFirstArgLoop\expandafter{\UD@RemoveTillUD@SelDOm#1}}%
}%
%%-----------------------------------------------------------------------------    
%% In case an argument's first token is an opening brace, stringify that and
%% add another opening brace before that and remove everything behind the 
%% matching closing brace:
%% \UD@StringifyOpeningBrace{{Foo}bar} yields {{Foo}  whereby the second
%% opening brace is stringified:
%%.............................................................................
\newcommand\UD@StringifyOpeningBrace[1]{%
  \romannumeral0%
  \expandafter\UD@ExtractFirstArgLoop\expandafter{%
    \romannumeral0\UD@Exchange{ }{\expandafter\expandafter\expandafter}%
    \expandafter\expandafter
    \expandafter            {%
    \expandafter\UD@firstoftwo
    \expandafter{%
    \expandafter}%
    \romannumeral0\UD@Exchange{ }{\expandafter\expandafter\expandafter}%
    \expandafter\string
    \expandafter}%
    \string#1%
  \UD@SelDOm}%
}%
%%-----------------------------------------------------------------------------    
%% In case an argument's first token is an opening brace, remove everything till 
%% finding the corresponding closing brace. Then stringify that closing brace:
%% \UD@StringifyClosingBrace{{Foo}bar} yields: {}bar} whereby the first closing
%% brace is stringified:
%%.............................................................................
\newcommand\UD@StringifyClosingBrace[1]{%
   \romannumeral0\expandafter\expandafter\expandafter
                 \UD@StringifyClosingBraceloop
                 \UD@ExtractFirstArg{#1}{#1}%
}%
\newcommand\UD@CheckWhetherStringifiedOpenBraceIsSpace[1]{%
%% This can happen when character 32 (space) has catcode 1...
  \expandafter\UD@CheckWhetherLeadingSpace\expandafter{%
    \romannumeral0\UD@Exchange{ }{\expandafter\expandafter\expandafter}%
    \expandafter\UD@secondoftwo
    \expandafter{%
    \expandafter}%
    \expandafter{%
    \romannumeral0\UD@Exchange{ }{\expandafter\expandafter\expandafter}%
    \expandafter\UD@firstoftwo
    \expandafter{%
    \expandafter}%
    \romannumeral0\UD@Exchange{ }{\expandafter\expandafter\expandafter}%
    \expandafter\string
    \expandafter}%
    \string#1%
  }%
}%
\newcommand\UD@TerminateStringifyClosingBraceloop[2]{%
  \UD@Exchange{ }{\expandafter\expandafter\expandafter}%
  \expandafter\expandafter
  \expandafter{%
  \expandafter\string      
  \romannumeral0\UD@Exchange{ }{\expandafter\expandafter\expandafter}%
  \expandafter#1%
  \string#2%
  }%
}%
\newcommand\UD@StringifyClosingBraceloopRemoveElement[4]{%
  \expandafter\UD@PassFirstToSecond\expandafter{\expandafter
  {\romannumeral0\expandafter\UD@secondoftwo\string}{}%
    \UD@CheckWhetherStringifiedOpenBraceIsSpace{#4}{%
      \UD@Exchange{\UD@removespace}%
    }{%
      \UD@Exchange{\UD@firstoftwo\expandafter{\expandafter}}%
    }{%
      \UD@Exchange{ }{\expandafter\expandafter\expandafter}%
      \expandafter#1%
      \romannumeral0\UD@Exchange{ }{\expandafter\expandafter\expandafter}%
      \expandafter
    }%
    \string#4%
  }{\expandafter\UD@StringifyClosingBraceloop\expandafter{#2#3}}%
}%
\newcommand\UD@StringifyClosingBraceloop[2]{%
  \UD@CheckWhetherNull{#1}{%
    \UD@CheckWhetherStringifiedOpenBraceIsSpace{#2}{%
      \UD@TerminateStringifyClosingBraceloop{\UD@removespace}%
    }{%
      \UD@TerminateStringifyClosingBraceloop{\UD@firstoftwo\expandafter{\expandafter}}%
    }%
    {#2}%
  }{%
    \UD@CheckWhetherLeadingSpace{#1}{%
      \UD@StringifyClosingBraceloopRemoveElement
      {\UD@removespace}{\UD@removespace}%
    }{%
      \UD@StringifyClosingBraceloopRemoveElement
      {\UD@firstoftwo\expandafter{\expandafter}}{\UD@firstoftwo{}}%
    }%
    {#1}{#2}%
  }%
}%
%%-----------------------------------------------------------------------------    
%% Apply <action> to the stringification of each token of the argument:
%%
%% \StringifyNAct{<action>}{<token 1><token 2>...<token n>}
%%
%% yields:  <action>{<stringification of token 1>}%
%%          <action>{<stringification of token 2>}%
%%          ...
%%          <action>{<stringification of token n>}%
%%
%% whereby "stringification of token" means the result of applying \string
%% to the token in question.
%% Due to \romannumeral-expansion the result is delivered after two
%% \expandafter-chains.
%% If you leave <action> empty, you can apply a loop on the list formed by
%%   {<stringification of token 1>}%
%%   {<stringification of token 2>}%
%%   ...
%%   {<stringification of token n>}%
%%.............................................................................
\newcommand\StringifyNAct{%
  \romannumeral0\StringifyNActLoop{}%
}%
%%.............................................................................
%% \StringifyNActLoop{{<stringification of token 1>}...{<stringification of token k-1>}}%
%%                   {<action>}%
%%                   {<token k>...<token n>}
%%.............................................................................
\newcommand\StringifyNActLoop[3]{%
  \UD@CheckWhetherNull{#3}{%
    \UD@firstoftwo{ }{}#1%
  }{%
    \UD@CheckWhetherBrace{#3}{%
      \expandafter\expandafter\expandafter\UD@Exchange
      \expandafter\expandafter\expandafter{%
        \UD@StringifyClosingBrace{#3}%
      }{%
        \expandafter\StringifyNActLoop\expandafter{%
          \romannumeral0%
          \expandafter\expandafter\expandafter\UD@Exchange
          \expandafter\expandafter\expandafter{\UD@StringifyOpeningBrace{#3}}{\StringifyNActLoop{#1}{#2}}%
        }{#2}%
      }%
    }{%
      \UD@CheckWhetherLeadingSpace{#3}{%
        \expandafter\UD@PassFirstToSecond\expandafter{\UD@removespace#3}{%
          \StringifyNActLoop{#1#2{ }}{#2}%
        }%
      }{%
        \expandafter\UD@PassFirstToSecond\expandafter{\UD@firstoftwo{}#3}{%
          \expandafter\StringifyNActLoop\expandafter{%
             \romannumeral0%
             \expandafter\expandafter\expandafter\expandafter\expandafter\expandafter\expandafter\UD@PassFirstToSecond
             \expandafter\expandafter\expandafter\expandafter\expandafter\expandafter\expandafter{%
               \expandafter\expandafter\expandafter\string
               \expandafter\UD@Exchange
               \romannumeral0\UD@ExtractFirstArgLoop{#3\UD@SelDOm}{}%
             }{ #1#2}%
          }%
          {#2}%
        }%
      }%
    }%
  }%
}%
%%.............................................................................
%% Now a routine which you can apply as <action> within \StringifyNAct:
%%.............................................................................
\newcommand\printstringifiedtoken[1]{%
  \UD@CheckWhetherLeadingSpace{#1}{%
    An \fbox{\texttt{explicit space token}}%
  }{%
    The token \fbox{\texttt{#1}}%
  } 
  was stringified.\\
}%
%%.............................................................................
%% Now a routine which you can apply when prefering iterating on the result
%% of \StringifyNAct
%%.............................................................................
\newcommand\printstringifiedtokenloop[1]{%
  \ifx\relax#1\expandafter\@gobble\else\expandafter\@firstofone\fi
  {\printstringifiedtoken{#1}\printstringifiedtokenloop}%
}%
\makeatother


\begin{document}

\begin{verbatim*}
\noindent
\StringifyNAct{\printstringifiedtoken}{%
  \textbf{\csname @firstofone\endcsname{\LaTeX} is funny.}
}
\end{verbatim*}

yields:\bigskip

\noindent
\StringifyNAct{\printstringifiedtoken}{%
  \textbf{\csname @firstofone\endcsname{\LaTeX} is funny.}
}

(The last explicit space token is due to the \verb|\endlinechar|-thingie 
while the state of \LaTeX's reading-apparatus is in state M (middle of line) 
after a curly closing brace. It also is in that state after an opening
curly brace.)

\newpage
\vspace*{-1.5cm}

\begin{verbatim*}
\noindent
\expandafter\expandafter
\expandafter\printstringifiedtokenloop
\StringifyNAct{}{%
  \textbf{\csname @firstofone\endcsname{\LaTeX} is funny.}
}%
\relax
\end{verbatim*}

yields:\bigskip

\noindent
\expandafter\expandafter
\expandafter\printstringifiedtokenloop
\StringifyNAct{}{%
  \textbf{\csname @firstofone\endcsname{\LaTeX} is funny.}
}%
\relax

(The last explicit space token is due to the \verb|\endlinechar|-thingie 
while the state of \LaTeX's reading-apparatus is in state M (middle of line) 
after a curly closing brace. It also is in that state after an opening
curly brace.)

\end{document}

Raven · Answer 4 · 2019-01-12T11:39:16.877

The solution

In order to stringify a whole token list I had somehow apply the \string command recursively to all tokens in the token list. The way I accomplished this is

\def\stringify#1{%
    \def\stringifiedInput{}%
    \stringifyGrab#1\relax<!;!>%
}
\def\stringifyGrab#1#2<!;!>{%
    \apptocmd{\stringifiedInput}{\string#1}{}{}%
    %
    \noexpandarg
    \IfStrEq{#2}{\relax}{%
        \stringifiedInput%
    }{%
        \stringifyGrab#2<!;!>%
    }%
}

The recursive looping is done by the \stringifyGrab macro. Note that the packagefs etoolbox and xstring are required for this to work.

This can be tested by \stringify{\This is a \test for all \hrule and \vrules} which yields \Thisisa\testforall\hruleand\vrules. As you can see all spaces are gone. I don't know how to prevent that so if you have a solution for it feel free to leave a comment

As Ulrike Fischer pointed out in the comment below a simpler approach (preserving spaces) would be to use the \detokenize{...} macro.

Though, to be fair, \detokenize may produce spaces where you did not expect them: \detokenize{\This is a \test{uhhh} for all \hrule and \vrules}... — moewe, Dec 13 '18 at 09:38

Stringify input - \string on token list

The problem

4 Answers4

The solution

Linked