8

I'd like to learn some TeX programming technique which I can always apply for handling the same type of situation:

The code

\newtoks\mytokenregister
\def\five{\four5}
\def\four{\three4}
\def\three{\two3}
\def\two{\one2}
\def\one{1}
\def\tokenstomacro#1{%
  \mytokenregister{#1}%
  \edef\macro{\the\mytokenregister}%
}%
\tokenstomacro{\one#1\two#2\three#3\four#4\five}%
\show\macro
\csname stop\endcsname
\bye

doubles # and defines:

> \macro=macro:
->\one ##1\two ##2\three ##3\four ##4\five .
l.12     \show\macro

The question is:

How can I get a \tokenstomacro which also doubles # and expands things and therefore defines

> \macro=macro:
->1##112##2123##31234##412345.

?

I know if I have the hashes doubled in the argument of \tokenstomacro and have applied \edef directly, i.e., without the intermediate step via \the-expansion of a token-register, then I get that result:

\def\five{\four5}
\def\four{\three4}
\def\three{\two3}
\def\two{\one2}
\def\one{1}
\def\tokenstomacro#1{%
  \edef\macro{#1}%
}%
\tokenstomacro{\one##1\two##2\three##3\four##4\five}%
\show\macro
\csname stop\endcsname
\bye

But I would like to get that result without the need of "manually" doubling the hashes in the argument of \tokenstomacro.

I would like this to work without Lua-extensions.
I would also like this to work without (unexpanded)-writing to (pseudo-)file and \inputting/reading back that (pseudo-)file because between creating the tokens which form \tokenstomacro's argument and reading back the (pseudo-)file the category-code-régime might change...

David Carlisle
  • 757,742

2 Answers2

5

With an “extended” TeX engine (not with Knuthian TeX),

\newtoks\mytokenregister
\def\five{\four5}
\def\four{\three4}
\def\three{\two3}
\def\two{\one2}
\def\one{1}
\def\tokenstomacro#1{%
  \mytokenregister\expandafter{\expanded{#1}}%
  \edef\macro{\the\mytokenregister}%
}
\tokenstomacro{\one#1\two#2\three#3\four#4\five}
\show\macro
\csname stop\endcsname
\bye

you get

> \macro=macro:
->1##112##2123##31234##412345.

As Joseph Wright suggests, there's not even need for the token register, because \unexpanded works as an “unnamed token register”:

\def\five{\four5}
\def\four{\three4}
\def\three{\two3}
\def\two{\one2}
\def\one{1}
\def\tokenstomacro#1{%
  \edef\macro{\unexpanded\expandafter{\expanded{#1}}}%
}
\tokenstomacro{\one#1\two#2\three#3\four#4\five}
\show\macro
\csname stop\endcsname
\bye
egreg
  • 1,121,712
4

As an example you gave:

\tokenstomacro{\one#1\two#2\three#3\four#4\five}%

\macro=macro:
->1##112##2123##31234##412345.

What about

\tokenstomacro{\string#}%

?

Shall expansion/stringification take place before hash-doubling?
Shall expansion/stringification take place after hash-doubling?

That makes a difference.

In the further case \macro would expand to a single hash of category code 12(other).
In the latter case \macro would expand to a single hash of category code 12(other) trailed by a single hash of category code 6 (parameter).

Be that as it may.

Using only Knuthian TeX a general approach for achieving automatic \edef-expansion along with hash-doubling is not possible.

Reason for this statement:

David Carlisle drew my attention towards the fact that there is a variety of special cases for which \edef-expansion along with hash-doubling can be done by means of the on-board resources of Knuthian TeX:

For those special cases where the argument of \tokenstomacro contains explicit hashes only as single hashes trailed by digits in the range 1..9 you can via \edef define an interim macro where all sequences #1, ..., #9 that occur within the argument of \tokenstomacro are turned into arguments #1,..., #9: Due to \edef things will go expanded into the ⟨balanced text⟩ of the definition of that interim-macro. Then you can—with sequences {#1}, ..., {#9} as arguments—have that interim macro expanded into the ⟨balanced text⟩ of an assignment for a token register, and then—for the sake of doubling explicit hashes—via \edef-\the⟨token register⟩-expansion define \macro from the content of that ⟨token register⟩:

\def\five{\four5}
\def\four{\three4}
\def\three{\two3}
\def\two{\one2}
\def\one{1}
\def\tokenstomacro#1{%
  \edef\macro##1##2##3##4##5##6##7##8##9{#1}%
  \toks0\expandafter{\macro{##1}{##2}{##3}{##4}{##5}{##6}{##7}{##8}{##9}}%
  \edef\macro{\the\toks0}%
}%
\tokenstomacro{\one#1\two#2\three#3\four#4\five}%
\show\macro
\csname stop\endcsname
\bye

(\expandafter triggers expansion of the interim-definition of \macro within the \toks0-assignment. During expansion of the interim-definition of \macro two things take place:

  1. Single explicit catcode-6-character-tokens that are trailed by digits 1..9 and therefore denote parameters within the ⟨balanced text⟩ of the interim-definition will be replaced by sequences of single explicit hash-character-tokens of category code 6 that are trailed by one of the digits 1..9 that are gathered as arguments.
    This way the placement of sequences #1, #2, ..., #9 is sort of preserved.
  2. In case the argument of \tokenstomacro contains sequences with an even amount of adjacent hashes, these sequences make it into the ⟨balanced text⟩ of the interim-definition without being treated as something that denotes a parameter. Therefore the amount of hashes with these sequences will be halved at the time of expanding the interim definition.

Therefore with the result of expanding the interim definition one cannot distinguish sequences #1, ..., #9 that came as arguments of the interim-definition for preserving the placement of #1, ..., #9 from sequences #1, ..., #9 that came into being due to the halving of hashes of sequences ##1, .. ##9 that made it into the ⟨balanced text⟩ of \macro's interim-definition.

Therefore this method is not suitable for preserving within the final definition of \macro the placement of sequences with an even amount of adjacent hashes as, e.g., in ####1 or in ##1.

This method also is not suitable for preserving the placement of sequences like #A within the final definition of \macro.

As I am already doing some nitpicking here, let's also mention that with this method any single explicit catcode-6(parameter)-character-token that is trailed by one of the digits 1..9 will within the definition of \macro be replaced by an explicit hash-character-token of category code 6(parameter). Under usual category code régime this doen't matter as under usual category code régime the hash is the only character whose categoy code is 6.)

But the special cases where the above method can be applied do not make all cases one can think of.

I assume that, e.g., things like

\tokenstomacro{\one##1\two##2\three#1\four#2\five}%

shall also be taken into account and yield:

\macro=macro:
->1####112####2123##11234##212345.

With Knuthian TeX the special circumstances when the doubling of hashes occurs, are:

  1. When writing a hash-token to screen or to external file, two hashes will be written:
    \message{This is my message: #} yields on the screen: This is my message: ##.
    (Be aware that with \show and \message{\meaning\...} you don't get this hash-doubling.
    The reason is: At the time of writing to screen \show and \message actually don't deal with hash-tokens of category-code 6 (parameter) but with the "stringified" variants thereof, i.e., with hash-tokens of category code 12(other).)

  2. When \the delivers tokens of a ⟨token variable⟩, e.g., of a token-register or of a token-parameter like \everypar or \everyeof during an \edef/\xdef-expansion-context, hashes will be doubled.
    This hash-doubling is done for compensating the circumstance that hashes will be reduced after having expanded the macro in question.

Ad 1: The task of writing to screen differs from the task of defining a macro. You yourself have explicitly excluded in your question approaches in which writing to external-file and "reading back" plays a rôle.

Ad 2: With \the⟨token variable⟩ during \edef/\xdef-expansion hash-doubling is caused but at the same time further expansion of the content of the ⟨token variable⟩ is inhibited. Thus with \edef/\xdef\macro{\the⟨token variable⟩} material with hashes doubled will always go unexpanded into the macro-definition in question. In order to have expanded the still unexpanded material where hashes are doubled, in any case \macro needs to be expanded also. But during expansion of \macro in any case two consecutive hashes of the ⟨balanced text⟩ of \macro's ⟨definition text⟩ will collapse into one. And that will cancel out the hash-doubling.


Just for the sake of having fun let me mention the following:

Especially on April 1st with Knuthian TeX, in some (not all!!!) situations prepending \double to each hash instead of doubling each hash "inside \edef" might be a feasible but very inefficient workaround:

\def\five{\four5}
\def\four{\three4}
\def\three{\two3}
\def\two{\one2}
\def\one{1}
\def\double#1{#1#1}
\def\tokenstomacro#1{%
  \edef\macro{#1}%
}
\tokenstomacro{\one\double#1\two\double#2\three\double#3\four\double#4\five}
\show\macro
\csname stop\endcsname
\bye

The console output with this example is:

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2019/dev/Debian) (preloaded format=pdftex)
entering extended mode
(./test.tex
> \macro=macro:
->1##112##2123##31234##412345.
l.11     \show\macro

? 
 )
No pages of output.
Transcript written on test.log.

This workaround is funny and therefore suitable for April 1st because it is inefficient and pointless as typeing ## usually is less work than typeing \double# and typeing #### usually is less work than typeing \double{\double#}...

Another workaround for some (but not all!!!) situations which does not significantly increase the typeing work could be using an active character instead of the hash which will deliver two hashes:

\def\five{\four5}
\def\four{\three4}
\def\three{\two3}
\def\two{\one2}
\def\one{1}
% Let's use / instead of #
\catcode`\/=13
\def/{####}
\def\tokenstomacro#1{%
  \edef\macro{#1}%
}
\tokenstomacro{\one/1\two/2\three/3\four/4\five}
\show\macro
\csname stop\endcsname
\bye

The console output with this example is:

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2019/dev/Debian) (preloaded format=pdftex)
entering extended mode
(./test.tex
> \macro=macro:
->1##112##2123##31234##412345.
l.13     \show\macro

? 
 )
No pages of output.
Transcript written on test.log.

If ε-TeX extensions are available, there are more special circumstances where the doubling of hashes occurs:

  1. The ε-TeX primitive \detokenize is like unexpanded-immediate-writing tokens to an external file (hereby hash-dobling takes place!) and reading back that external file under a category-code-régime where everything but space (which has category code 10(space)) has category code 12(other).
    The ε-TeX primitive \scantokens is like unexpanded-immediate-writing tokens to an external file (hereby hash-dobling takes place!) and reading back that external file under the current category-code-régime.

  2. When the ε-TeX-primitive \unexpanded delivers its ⟨balanced text⟩ during an \edef/\xdef-expansion-context, hashes will be doubled. This hash-doubling is done for compensating the circumstance that hashes will be reduced after having expanded the macro in question.

Ad 3: You yourself have explicitly excluded in your question approaches in which writing to external pseudo file and "reading back" plays a rôle.

Ad 4: A combination only of \edef/\xdef and \unexpanded will not be sufficient for achieving what you desire: Like with circumstance 2, the material where hashes are doubled goes unexpanded into the ⟨balanced text⟩ of a macro definition. And like with circumstance 2 further expansion of that material requires expanding the macro in question whereby two consecutive hashes of the ⟨balanced text⟩ of \macro's ⟨definition text⟩ will collapse into one which will cancel out the hash-doubling.

But with more recent TeX engines there is also \expanded and that's why you are in luck here:

If \expanded is available you can use a combination of \expanded (for expanding things) and \unexpanded for doubling the hashes of the \expanded-expansion-result during \edef-expansion—assignments to scratch-token registers and the like are not needed:

\def\five{\four5}
\def\four{\three4}
\def\three{\two3}
\def\two{\one2}
\def\one{1}
\def\tokenstomacro#1{%
  \edef\macro{\unexpanded\expandafter{\expanded{#1}}}%
}
\tokenstomacro{\one#1\two#2\three#3\four#4\five}
\show\macro
\csname stop\endcsname
\bye

With the example above (saved as test.tex), console output of pdfTeX is:

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2019/dev/Debian) (preloaded format=pdftex)
entering extended mode
(./test.tex
> \macro=macro:
->1##112##2123##31234##412345.
l.10 \show\macro

? 
 )
No pages of output.
Transcript written on test.log.

(While I was still typing the preceding paragraph, Joseph Wright submitted a comment to egreg's answer—in egreg's answer \the⟨token register⟩ is used instead of \unexpanded— and pointed out the same. So I am certainly not the first one who had this idea.)

For less recent TeX engines where \detokenize of the ε-TeX extensions is available but \expanded is not available, I can offer a routine \DoubleEveryHash which can serve as a workaround in some situations:

The routine \DoubleEveryHash processes an argument and by means of \romannumeral0-expansion recursively doubles every explicit catcode-6(parameter)-character token that is contained in that argument.

The gist of the check for a hash is: \string# delivers a single hash-character-token of category code 12(other) while (as said above) with \detokenize hash-doubling takes place and therefore \detokenize{#} delivers two hash-character-tokens of category code 12(other).

In case the argument of \DoubleEveryHash contains matching pairs of explicit character tokens of catcode 1 and 2, each of these pairs triggers another level of \romannumeral0-expansion. Therefore excessive nesting of braces within the argument of \DoubleEveryHash will take its toll on the semantic nest.

Besides this \DoubleEveryHash does replace matching pairs of explicit character tokens of category code 1(begin grouping) and 2(end grouping) by matching pairs of opening curly braces of category code 1 and closing curly braces of category code  2.
I suppose this won't be a problem in most situations as usually the curly braces are the only characters of category code 1/2.
But this must be mentioned because this means that \DoubleEveryHash is suitable only for situations where replacing explicit begin-grouping-character-tokens and explicit end-grouping-character-tokens by explicit curly-brace-tokens of the same kind doesn't matter.

If you place \DoubleEveryHash{...} into an \edef, due to \romannumeral0-expansion hash-doubling will—in contrast with the \edef\macro{\unexpanded\expandafter{\expanded{#1}}}-approach—take place before expanding the tokens that form the argument of \DoubleEveryHash. Therefore \edef\macro{\DoubleEveryHash{\string#1}} (or \DoubleEveryHash{\edef\macro{\string#1}} if you prefer) will yield an error-message about an illegal parameter number because in the hash-doubling-step you will get two hashes trailed by the digit 1. The first hash will be stringified. The second hash, which is trailed by the digit 1, will not be stringified and therefore will be taken for a parameter #1 while the ⟨parameter text⟩ of \macro is empty.

\catcode`\@=11
%%=============================================================================
%% Paraphernalia:
%%    \UD@firstoftwo, \UD@secondoftwo,
%%    \UD@PassFirstToSecond, \UD@Exchange, \UD@removespace
%%    \UD@CheckWhetherNull, \UD@CheckWhetherBrace,
%%    \UD@CheckWhetherLeadingSpace, \UD@ExtractFirstArg
%%=============================================================================
\long\def\UD@firstoftwo#1#2{#1}%
\long\def\UD@secondoftwo#1#2{#2}%
\long\def\UD@PassFirstToSecond#1#2{#2{#1}}%
\long\def\UD@Exchange#1#2{#2#1}%
\UD@firstoftwo{\def\UD@removespace}{} {}%
%%-----------------------------------------------------------------------------
%% Check whether argument is empty:
%%.............................................................................
%% \UD@CheckWhetherNull{<Argument which is to be checked>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is empty>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is not empty>}%
%%
%% The gist of this macro comes from Robert R. Schneck's \ifempty-macro:
%% <https://groups.google.com/forum/#!original/comp.text.tex/kuOEIQIrElc/lUg37FmhA74J>
\long\def\UD@CheckWhetherNull#1{%
  \romannumeral0\expandafter\UD@secondoftwo\string{\expandafter
  \UD@secondoftwo\expandafter{\expandafter{\string#1}\expandafter
  \UD@secondoftwo\string}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\UD@firstoftwo\expandafter{} \UD@secondoftwo}%
  {\UD@firstoftwo\expandafter{} \UD@firstoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether argument's first token is a catcode-1-character
%%.............................................................................
%% \UD@CheckWhetherBrace{<Argument which is to be checked>}%
%%                      {<Tokens to be delivered in case that argument
%%                        which is to be checked has leading
%%                        catcode-1-token>}%
%%                      {<Tokens to be delivered in case that argument
%%                        which is to be checked has no leading
%%                        catcode-1-token>}%
\long\def\UD@CheckWhetherBrace#1{%
  \romannumeral0\expandafter\UD@secondoftwo\expandafter{\expandafter{%
  \string#1.}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\UD@firstoftwo\expandafter{} \UD@firstoftwo}%
  {\UD@firstoftwo\expandafter{} \UD@secondoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether brace-balanced argument's first token is an explicit
%% space token
%%.............................................................................
%% \UD@CheckWhetherLeadingSpace{<Argument which is to be checked>}%
%%                             {<Tokens to be delivered in case <argument
%%                               which is to be checked>'s 1st token is a
%%                               space-token>}%
%%                             {<Tokens to be delivered in case <argument
%%                               which is to be checked>'s 1st token is not
%%                               a space-token>}%
\long\def\UD@CheckWhetherLeadingSpace#1{%
  \romannumeral0\UD@CheckWhetherNull{#1}%
  {\UD@firstoftwo\expandafter{} \UD@secondoftwo}%
  {\expandafter\UD@secondoftwo\string{\UD@CheckWhetherLeadingSpaceB.#1 }{}}%
}%
\long\def\UD@CheckWhetherLeadingSpaceB#1 {%
  \expandafter\UD@CheckWhetherNull\expandafter{\UD@secondoftwo#1{}}%
  {\UD@Exchange{\UD@firstoftwo}}{\UD@Exchange{\UD@secondoftwo}}%
  {\UD@Exchange{ }{\expandafter\expandafter\expandafter\expandafter
   \expandafter\expandafter\expandafter}\expandafter\expandafter
   \expandafter}\expandafter\UD@secondoftwo\expandafter{\string}%
}%
%%-----------------------------------------------------------------------------
%% Extract first inner undelimited argument:
%%
%%   \UD@ExtractFirstArg{ABCDE} yields  {A}
%%
%%   \UD@ExtractFirstArg{{AB}CDE} yields  {AB}
%%.............................................................................
\long\def\UD@RemoveTillUD@SelDOm#1#2\UD@SelDOm{{#1}}%
\long\def\UD@ExtractFirstArg#1{%
  \romannumeral0%
  \UD@ExtractFirstArgLoop{#1\UD@SelDOm}%
}%
\long\def\UD@ExtractFirstArgLoop#1{%
  \expandafter\UD@CheckWhetherNull\expandafter{\UD@firstoftwo{}#1}%
  { #1}%
  {\expandafter\UD@ExtractFirstArgLoop\expandafter{\UD@RemoveTillUD@SelDOm#1}}%
}%
%%=============================================================================
%% \DoubleEveryHash{<argument>}%
%%
%%   Each explicit catcode-6(parameter)-character-token of the <argument> 
%%   will be doubled.
%%
%%   You obtain the result after two expansion-steps, i.e., 
%%   in expansion-contexts you get the result after "hitting" 
%%   \DoubleEveryHash by two \expandafter.
%%   
%%   As a side-effect, the routine does replace matching pairs of explicit
%%   character tokens of catcode 1 and 2 by matching pairs of curly braces
%%   of catcode 1 and 2.
%%   I suppose this won't be a problem in most situations as usually the
%%   curly braces are the only characters of category code 1 / 2...
%%
%%   This routine needs \detokenize from the eTeX extensions.
%%-----------------------------------------------------------------------------
\long\def\DoubleEveryHash#1{%
   \romannumeral0\UD@DoubleEveryHashLoop{#1}{}%
}%
\long\def\UD@DoubleEveryHashLoop#1#2{%
  \UD@CheckWhetherNull{#1}{ #2}{%
    \UD@CheckWhetherLeadingSpace{#1}{%
       \expandafter\UD@DoubleEveryHashLoop
       \expandafter{\UD@removespace#1}{#2 }%
    }{%
      \UD@CheckWhetherBrace{#1}{%
        \expandafter\expandafter\expandafter\UD@PassFirstToSecond
        \expandafter\expandafter\expandafter{%
        \expandafter\UD@PassFirstToSecond\expandafter{%
            \romannumeral0%
            \expandafter\UD@DoubleEveryHashLoop
            \romannumeral0%
            \UD@ExtractFirstArgLoop{#1\UD@SelDOm}{}%
        }{#2}}%
        {\expandafter\UD@DoubleEveryHashLoop
         \expandafter{\UD@firstoftwo{}#1}}%
      }{%
        \expandafter\UD@CheckWhetherHash
        \romannumeral0\UD@ExtractFirstArgLoop{#1\UD@SelDOm}{#1}{#2}%
      }%
    }%
  }%
}%
\long\def\UD@CheckWhetherHash#1#2#3{%
  \expandafter\UD@CheckWhetherLeadingSpace\expandafter{\string#1}{%
    \expandafter\expandafter\expandafter\UD@CheckWhetherNull
    \expandafter\expandafter\expandafter{%
    \expandafter\UD@removespace\string#1}{%
      \expandafter\expandafter\expandafter\UD@CheckWhetherNull
      \expandafter\expandafter\expandafter{%
      \expandafter\UD@removespace\detokenize{#1}}{%
        % something whose stringification yields a single space
        \UD@secondoftwo
      }{% explicit space of catcode 6
        \UD@firstoftwo
      }%
    }{% something whose stringification has a leading space
      \UD@secondoftwo
    }%
  }{%
    \expandafter\expandafter\expandafter\UD@CheckWhetherNull
    \expandafter\expandafter\expandafter{%
    \expandafter\UD@firstoftwo
    \expandafter{\expandafter}\string#1}{%
      \expandafter\expandafter\expandafter\UD@CheckWhetherNull
      \expandafter\expandafter\expandafter{%
      \expandafter\UD@firstoftwo
      \expandafter{\expandafter}\detokenize{#1}}{%
        % no hash
        \UD@secondoftwo
      }{% hash
        \UD@firstoftwo
      }%
    }{% no hash
      \UD@secondoftwo
    }%
  }%
  {% hash
    \expandafter\UD@DoubleEveryHashLoop
    \expandafter{\UD@firstoftwo{}#2}{#3#1#1}%
  }{% no hash
    \expandafter\UD@DoubleEveryHashLoop
    \expandafter{\UD@firstoftwo{}#2}{#3#1}%
  }%
}%
\catcode`\@=12
%%=============================================================================


\def\five{\four5}
\def\four{\three4}
\def\three{\two3}
\def\two{\one2}
\def\one{1}
\def\tokenstomacro#1{%
  \edef\macro{\DoubleEveryHash{#1}}%
}%
% Or, if you prefer:
% \def\tokenstomacro#1{%
%   \DoubleEveryHash{\edef\macro{#1}}%
% }%
\tokenstomacro{\one#1\two#2\three#3\four#4\five}
\show\macro
\tokenstomacro{\one##1\two##2\three#1\four#2\five}
\show\macro
\tokenstomacro{\one##1{{{ \two##2 }\three#1}\four#2}\five}
\show\macro
\csname stop\endcsname
\bye

The console output of this (rather large) minimal example is:

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2019/dev/Debian) (preloaded format=pdflatex)
entering extended mode
(./test.tex
LaTeX2e <2018-12-01>
> \macro=macro:
->1##112##2123##31234##412345.
l.174     \show\macro

? 
> \macro=macro:
->1####112####2123##11234##212345.
l.176     \show\macro

? 
> \macro=macro:
->1####1{{{ 12####2 }123##1}1234##2}12345.
l.178     \show\macro

? 
 )
No pages of output.
Transcript written on test.log.
Ulrich Diez
  • 28,770