3

Following this post, I was trying to use soul as a hack-y tokenizer and ran into unexpected behavior on certain strings. I reduced the problem to the following MWE but haven't been able to make further progress:

\documentclass{article}
\usepackage{soul}

% Initially empty, then should be nonempty afterward \newcommand*{\state}{}

\makeatletter \def\SOUL@soeverytoken{% \ifx\state\empty% Should only happen once... \renewcommand*{\state}{Nonempty}% (\the\SOUL@token)% \else% [\the\SOUL@token]% \fi% } \makeatother

\begin{document} % For some reason, the string "unrnn" gives an unexpected result. \so{nnrnn} \par % (n)[n][r][n][n] \so{unrnn} \par % (u)n[n][n] \so{unnnn} \par % (u)[n][n][n][n] \so{unrn} \par % (u)[n][r][n] \end{document}

On the string unrnn, it prints out (r) rather than [r] as I would expect. From printing out \state, it seems like it's becoming nonempty after reading the r, but I'm not sure why this doesn't occur in the other examples.

  • 1
    May I ask why you are using soul for this when the LaTeX kernel these days have tools to go through token lists etc. – daleif Jun 20 '23 at 09:25
  • No principled reason. I'm fairly new to LaTeX macro programming and followed the top suggestion in the post linked above. I'll take a closer look at the kernel tools---thanks for the suggestion! – air-wreck Jun 21 '23 at 06:23

2 Answers2

5

I'm not sure why, but when the string unrnn is examined, TeX is at grouping level 3; however, when r is being processed, the grouping level decreases to 2.

\documentclass{article}
\usepackage{soul}

% Initially empty, then should be nonempty afterward \newcommand*{\state}{}

\makeatletter \def\SOUL@soeverytoken{% \showthe\currentgrouplevel \show\state \showthe\SOUL@token \ifx\state\empty% Should only happen once... \renewcommand*{\state}{Nonempty}% (\the\SOUL@token)% \else [\the\SOUL@token]% \fi } \makeatother

\begin{document} % For some reason, the string "unrnn" gives an unexpected result. % \so{nnrnn} \par % (n)[n][r][n][n] \so{unrnn} \par % (u)n[n][n] % \so{unnnn} \par % (u)[n][n][n][n] % \so{unrn} \par % (u)[n][r][n] \end{document}

I added some diagnostic commands to see what happens.

> 3.
\SOUL@everytoken ->\showthe \currentgrouplevel
                                               \show \state \showthe \SOUL@t...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?
> \state=macro:
->.
\SOUL@everytoken ...urrentgrouplevel \show \state
                                                  \showthe \SOUL@token \ifx ...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?
> u.
\SOUL@everytoken ...w \state \showthe \SOUL@token
                                                  \ifx \state \empty \renewc...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?
> 3.
\SOUL@everytoken ->\showthe \currentgrouplevel
                                               \show \state \showthe \SOUL@t...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?
> \state=macro:
->Nonempty.
\SOUL@everytoken ...urrentgrouplevel \show \state
                                                  \showthe \SOUL@token \ifx ...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?
> n.
\SOUL@everytoken ...w \state \showthe \SOUL@token
                                                  \ifx \state \empty \renewc...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?
> 2.
\SOUL@everytoken ->\showthe \currentgrouplevel
                                               \show \state \showthe \SOUL@t...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?
> \state=macro:
->.
\SOUL@everytoken ...urrentgrouplevel \show \state
                                                  \showthe \SOUL@token \ifx ...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?
> r.
\SOUL@everytoken ...w \state \showthe \SOUL@token
                                                  \ifx \state \empty \renewc...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?
> 2.
\SOUL@everytoken ->\showthe \currentgrouplevel
                                               \show \state \showthe \SOUL@t...
l.24   \so{unrnn}
                  \par % (u)[n](r)[n][n]
?

If I try with \so{nnrnn} the grouping level starts from 2.

I'd avoid overloading \so.

\documentclass{article}
\usepackage{soul}

\makeatletter \newcommand{\myso}[1]{% \begingroup \gdef\my@state{}% \def\SOUL@soeverytoken{% \ifx\my@state\empty% Should only happen once... \gdef\my@state{x}% (\the\SOUL@token)% \else [\the\SOUL@token]% \fi }% \so{#1} \endgroup } \makeatother

\begin{document}

\myso{nnrnn} \par % (n)[n][r][n][n] \myso{unrnn} \par % (u)n[n][n] \myso{unnnn} \par % (u)[n][n][n][n] \myso{unrn} \par % (u)[n][r][n]

\end{document}

enter image description here

A different implementation with expl3 and \text_map_inline:nn.

\documentclass{article}

\ExplSyntaxOn \NewDocumentCommand{\myso}{m} { \airwreck_process:n { #1 } }

\bool_new:N \l_airwreck_first_bool

\cs_new_protected:Nn \airwreck_process:n { \bool_set_true:N \l_airwreck_first_bool \text_map_inline:nn { #1 } { \bool_if:NTF \l_airwreck_first_bool {% first item (##1) \bool_set_false:N \l_airwreck_first_bool } {% other items [##1] } } }

\ExplSyntaxOff

\begin{document}

\myso{nnrnn} \par % (n)[n][r][n][n] \myso{unrnn} \par % (u)n[n][n] \myso{unnnn} \par % (u)[n][n][n][n] \myso{unrn} \par % (u)[n][r][n]

\end{document}

enter image description here

egreg
  • 1,121,712
5

As background: soul analyses syllables, to allow for hyphenation. So the result of parsing can depend on the language: In german unt is a syllable and so the parser restarts:

\documentclass{article}
\usepackage[ngerman,english]{babel}
\usepackage{soul}

% Initially empty, then should be nonempty afterward \newcommand*{\state}{}

\makeatletter \def\SOUL@soeverytoken{% \ifx\state\empty% Should only happen once... \renewcommand*{\state}{Nonempty}% (\the\SOUL@token)% \else% [\the\SOUL@token]% \fi% } \makeatother

\begin{document} \so{untnn} \par %

\selectlanguage{ngerman}

\so{untnn} \end{document}

enter image description here

Ulrike Fischer
  • 327,261
  • Ah, this makes sense as an explanation for the different behavior on different strings. Thanks for the insight! – air-wreck Jun 21 '23 at 06:28