7

Assume I have a macro like

\def\myMacro#1{<some stuff>}

And I am calling it like this

\myMacro{There are \some arguments \in \here g}

How can I iterate over each single token in the #1 argument inside \myMacro? So basically I want to know how I can iterate over a list whose delimiter is not a comma or any other character but rather the token boundaries as applied by TeX. Note that the argument may contain control sequences that are undefined so they must not be expanded.

Example of what I mean:

\def\myMacro#1{<iterate over all tokens>|\string<current token>|}
\myMacro{A \test}

which should result in

|A|| ||\test|

It is important to note that I also care about spaces so they shouldn't be gobbled away. Also I don't want to execute any code outside of \mymacro in order for this to work (e.g. changing the catcode of spaces before calling \myMacro).

As I am really into understanding how such a thing works I'd appreciate if you could also explain how and why your provided code works :)


My attempt at this was

\def\iterate#1{%
    \tokenGrabber#1\relax<!;!>%
}
\def\tokenGrabber#1#2<!;!>{%
    |\string#1|%
    \noexpandarg%
    \IfStrEq{#2}{\relax}{%
    }{%
        \tokenGrabber#2<!;!>%
    }%
}

But this gobbles away spaces and it produces an error for empty inputs or inputs ending with a space.

Raven
  • 3,023
  • Does the solution need to be expandable? It's not possible to get the charcode of {/} if it does ... – Joseph Wright Dec 15 '18 at 09:54
  • @JosephWright It would be nice if it was but I am primarily interested in the principle, so I don't care too much about it... – Raven Dec 15 '18 at 09:57
  • given a{xyz}b do you want to iterate three times, with a, xyz and b or 7 times with a, {, x, y,z,},b and it is presumably Ok to use something like \bgroup for { as you can't hold an unmatched brace in a macro. – David Carlisle Dec 15 '18 at 10:07
  • @DavidCarlisle 7 times - though I'd also be interested in the other alternative but I guess that is another story – Raven Dec 15 '18 at 10:27
  • there are already some questions and answers about this on the site, now one has to find them... –  Dec 15 '18 at 10:35
  • OK but if you need to distinguish between a{b} and a\bgroup b\egroup then it is really quite hard or even to distinguish between a and \zz if \zz is defined by \let\zz=a (there are lots of possibilities and currently the spec is rather unspecific to which is suitable....) – David Carlisle Dec 15 '18 at 11:01
  • @DavidCarlisle let's not worry too much about edge cases here. I'd be happy with a solution that works for most input (consisting of letters, spaces, control sequences and maybe braces). If the solution isn't universal and gets into trouble when one starts to use \bgroup or the like that's not a bug issue. Mostly I really care about approaching something like this in general... – Raven Dec 15 '18 at 11:15
  • @jfbu if you find them that'd be great. I did research beforehand but I didn't really find information on this... – Raven Dec 15 '18 at 11:16
  • 1
    it's not really just edge cases, the mechanism you choose affects pretty much all uses, I've added an answer but I'll add some notes about the consequences of this design. – David Carlisle Dec 15 '18 at 11:20
  • See also: tokcycle and etl package. – user202729 Dec 26 '21 at 12:34

3 Answers3

6

enter image description here

\documentclass{article}

\makeatletter
\def\endtest{\test!!!!}
\def\test#1{%
\par \bigskip\textbf{TESTING:} \texttt{\detokenize{#1}}\par
\testzz#1\endtest}

\def\testzz{\afterassignment\testzzz\let\tmp= }

\def\testzzz{%
\ifx\tmp\endtest
\else \texttt{\meaning\tmp}\par
\expandafter\testzz
\fi
}

\begin{document}

\test{123}

\test{There are \some arguments \in \here g}

\test{ a+ {x \sqrt{\frac}}}

\end{document}

Note that this mechanism consumes the supplied list using \let (the form with = and exactly one space is important so it does not drop spaces or = in the input) that makes it easy to detect spaces and braces but for example it only captures the meaning of the token, it can not distinguish { from \bgroup nor can it distinguish between any undefined commands, or show which name was used, \zzzfoo \undefined etc will all appear the same in the loop, as undefined.

For similar reasons, you can not re-construct anything equivalent to the original input from within the loop. given \frac{a}{b} you get essentially \frac\bgroup a\egrup\bgroup b\egroup from which it isn't possible in general to reconstruct a working fraction.

So... whether these restrictions matter depend on the intended use of the loop.

David Carlisle
  • 757,742
4

If you need it for debugging, it's a one-liner:

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn
\NewDocumentCommand{\test}{m}
 {
  \tl_analysis_show:n { #1 }
 }
\ExplSyntaxOff

\begin{document}

\test{123}

\test{There are \some arguments \in \here g}

\test{ a+ {x \sqrt{\frac}}}

\end{document}

If you run it with pdflatex -interaction=nonstopmode, the console will show

The token list contains the tokens:
>  1 (the character 1)
>  2 (the character 2)
>  3 (the character 3).
<recently read> }

l.13 \test{123}

The token list contains the tokens:
>  T (the letter T)
>  h (the letter h)
>  e (the letter e)
>  r (the letter r)
>  e (the letter e)
>    (blank space  )
>  a (the letter a)
>  r (the letter r)
>  e (the letter e)
>    (blank space  )
>  \some (control sequence=undefined)
>  a (the letter a)
>  r (the letter r)
>  g (the letter g)
>  u (the letter u)
>  m (the letter m)
>  e (the letter e)
>  n (the letter n)
>  t (the letter t)
>  s (the letter s)
>    (blank space  )
>  \in (control sequence=\mathchar"3232=12850)
>  \here (control sequence=undefined)
>  g (the letter g).
<recently read> }

l.15 \test{There are \some arguments \in \here g}

The token list contains the tokens:
>    (blank space  )
>  a (the letter a)
>  + (the character +)
>    (blank space  )
>  { (begin-group character {)
>  x (the letter x)
>    (blank space  )
>  \sqrt (control sequence=macro:->\protect \sqrt  )
>  { (begin-group character {)
>  \frac (control sequence=macro:#1#2->{\begingroup #1\endgroup \over #2})
>  } (end-group character })
>  } (end-group character }).
<recently read> }

l.17 \test{ a+ {x \sqrt{\frac}}}
egreg
  • 1,121,712
  • Is there also a way to print those statements into the document rather than the console? – Raven Dec 15 '18 at 12:30
  • @Raven Not at the moment. If you can show a real use case, it can possibly be added. – egreg Dec 15 '18 at 12:31
  • @egreg see the comment I just made under my answer:-) – David Carlisle Dec 15 '18 at 12:33
  • Actually \tl_analysis_inline can do that, but you have to write the printing format yourself (still, it's pretty straight forward compared to figuring out how to extract the tokens in the first place) – user202729 Mar 26 '22 at 05:13
2

Taking the expandable code from l3tl and recoding in classical style, we might do something like

\catcode`\@=11 %

\chardef\tl@exp@end=0 %

\long\def\@firstoftwo#1#2{#1}
\long\def\@secondoftwo#1#2{#2}
\long\def\@secondofthree#1#2#3{#2}
\long\def\@gobble#1{}

\long\def\tl@if@empty#1{%
  \expandafter\ifx\expandafter\relax\detokenize{#1}\relax
    \expandafter\@secondofthree
  \fi
  \@secondoftwo
}

\long\def\tl@if@head@N#1{%
  \ifcat
    \iffalse{\fi\tl@if@head@N@aux?#1 }%
    \expandafter\@gobble\expandafter{\expandafter{\string#1?}}%
    **%
    \expandafter\@firstoftwo
  \else
    \expandafter\@secondoftwo
  \fi
}
\long\def\tl@if@head@N@aux#1 {%
  \expandafter\tl@if@empty\expandafter{\@gobble#1}{^}{}%
  \expandafter\@gobble\expandafter{\iffalse}\fi
}

\long\def\tl@if@head@group#1{%
  \ifcat\expandafter\@gobble\expandafter{\expandafter{\string#1?}}**%
    \expandafter\@secondoftwo
  \else
    \expandafter\@firstoftwo
  \fi
}

\def\q@act@mark{\q@act@mark}
\def\q@act@stop{\q@act@stop}

\long\def\tl@act#1#2#3#4#5{%
  \ifnum\iffalse{\fi`}=\z@\fi
  \tl@act@loop#5\q@act@mark\q@act@stop
  {#4}#1#2#3%
  \tl@act@result{}%
}
\long\def\tl@act@loop#1\q@act@stop{%
  \tl@if@head@N{#1}
    {\tl@act@normal}
    {%
      \tl@if@head@group{#1}
        {\tl@act@group}
        {\tl@act@space}%
    }%
  #1\q@act@stop
}
\long\def\tl@act@normal#1#2\q@act@stop#3#4{%
  \ifx\q@act@mark#1\expandafter\tl@act@end\fi
  #4{#3}#1%
  \tl@act@loop#2\q@act@stop
  {#3}#4%
}
\long\def\tl@act@group#1#2\q@act@stop#3#4#5{%
  #5{#3}{#1}%
  \tl@act@loop#2\q@act@stop
  {#3}#4#5%
}
\expandafter\long\expandafter\def\expandafter
  \tl@act@space\space#1\q@act@stop#2#3#4#5{%
    #5{#2}%
    \tl@act@loop#1\q@act@stop
    {#2}#3#4#5%
  }
\long\def\tl@act@end#1\tl@act@result#2{%
  \ifnum`{=\z@}\fi
  \tl@exp@end
  #2%
}

\long\def\iterate#1{%
  \unexpanded\expandafter{%
    \romannumeral\tl@act
      \tl@iterate@normal
      \tl@iterate@group
      \tl@iterate@space
      { }
      {#1}%
  }%
}
\long\def\tl@iterate@normal#1#2{\tl@iterate@action{\string#2}}
\long\def\tl@iterate@group#1#2{\tl@iterate@action{{\detokenize{#2}}}}
\long\def\tl@iterate@space#1{\tl@iterate@action{ }}
\long\def\tl@iterate@action#1#2\tl@act@result#3{%
  #2%
  \tl@act@result{#3|#1|}%
}

\catcode`\@=12 %

\iterate{There are \some arguments \in \here g}

\bye

(This uses e-TeX, but that can be avoided.)

The basic idea is to grab the token list and examine the first token before branching and handling as required. I've not done it, but recursion inside groups is doable. Notice that all brace groups become { ... } (unavoidable in expandable code).

Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036