Could someone explain the branching in xparse's \SplitList?

Question

I was digging through xparse.sty to better understand how \SplitList works and am confused about some branching that is happening there.

\SplitList is essentially letted to be \__xparse_split_list:nn which examines the token(s) to be used in splitting.

\cs_new_protected:Npn \__xparse_split_list:nn #1#2
  {
    \bool_if:nTF
      {
          \tl_if_single_p:n {#1} &&
        ! ( \token_if_cs_p:N #1 )
      }
      { \__xparse_split_list_single:Nn #1 {#2} }
      { \__xparse_split_list_multi:nn {#1} {#2} }
    }

If a single token is being used for the splitting, then \__xparse_split_list_single:Nn is called. This control sequence is defined within a group where the catcaode of @ has been changed.

\group_begin:
\char_set_catcode_active:N \@
\cs_new_protected:Npn \__xparse_split_list_single:Nn #1#2
  {
    \tl_set:Nn \l__xparse_split_list_tl {#2}
    \group_begin:
    \char_set_lccode:nn { `\@ } { `#1 }
    \tl_to_lowercase:n
      {
         \group_end:
        \tl_replace_all:Nnn \l__xparse_split_list_tl { @ } {#1}
      }
     \__xparse_split_list_multi:nV {#1}  \l__xparse_split_list_tl
   }
\group_end:

This seems completely unnecessary to me. What exactly is this command sequence doing that couldn't just be handled by directly passing #1 and #2 of \__xparse_split_list:nn to \__xparse_split_list_multi:nn?

This last macro is defined as:

\cs_set_protected:Npn \__xparse_split_list_multi:nn #1#2
  {
    \seq_set_split:Nnn \l__xparse_split_list_seq {#1} {#2}
    \tl_clear:N \ProcessedArgument
    \seq_map_inline:Nn \l__xparse_split_list_seq
      { \tl_put_right:Nn \ProcessedArgument { {##1} } }
  }

Here's a MWE where I was testing this out (to see whether I could figure out what I was missing). I basically skip the step of testing whether a single token has been passed and go directly to splitting the argument.

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn
\tl_new:N \myProcessedArgument
\seq_new:N \l__my_split_list_seq
\cs_new_protected:Npn \__my_split_list:nn #1#2
  {
    \typeout{----------------------------------------}%%
    \typeout{==>delimiter ~ is ~ "\detokenize{#1}"}
    \seq_set_split:Nnn \l__my_split_list_seq {#1}{#2}
    %%\seq_show:N \l__my_split_list_seq
    \tl_clear:N \myProcessedArgument
    \seq_map_inline:Nn \l__my_split_list_seq
      {
        \typeout{==>\detokenize{##1}}
      }
  }

\cs_new_eq:NN \mySplitList \__my_split_list:nn 
\ExplSyntaxOff

\pagestyle{empty}
\begin{document}

Trial: \mySplitList{.:}{a.b.:{c}.sdf.:ewrewr}

Trial: \mySplitList{.}{a.b.:{c}.sdf.:ewrewr}

\end{document}

But this MWE seems to work fine regardless of what sort of string of tokens (single or not) I'm using to split the token list with.

score 6 · Accepted Answer · answered Apr 26 '14 at 19:11

6

The code here is defensive and reflects the fact that for LaTeX2e there is not a 'fixed' list of active characters. Thus it's possible that a char might be (say) 'other' at the point that the document command is created but 'active' when it's used. A classic example is with babel, but the basic idea shows up with:

\documentclass{article}
\usepackage{xparse}

\DeclareDocumentCommand{\foo}{>{\SplitList{.}}m}{%
  \fooaux#1{oops!}%
}
\def\fooaux#1#2{\detokenize{"#1"}, \detokenize{"#2"}}
\begin{document}

\foo{ab.cd}

\catcode`\.=\active

\foo{ab.cd}

\ExplSyntaxOn
\cs_set_eq:NN \SplitList \__xparse_split_list_multi:nn
\ExplSyntaxOff

\foo{ab.cd}

\end{document}

Of course, any category code changes can give issues here, but it's active characters that are by far the most likely issue in 'real life'.

Note: While it's not finalised, the current thinking is that for a stand-alone LaTeX3 format we're likely to have a 'known' list of active chars, and to extremely strongly discourage altering this. So this approach may not be needed in such circumstances.

answered Apr 26 '14 at 19:11

Joseph Wright

259,911
34
706
1,036

Argh! This is giving me a headache! :) I'm now quite a bit confused. What exactly is getting replaced then in \__xparse_split_list_single:nn? Is it that any active character becomes the delimiter for splitting? That doesn't seem right and isn't supported when I experiment on my own: such as setting \catcode`\C=\active and then attempting \foo{ab.sdfCdsdf} which only splits on .. So, in the case of \catcode`\.=\active, how is this related to setting the catcode of @? – A.Ellett Apr 26 '14 at 19:28
@A.Ellett No, it's still . that's being replaced! This is the standard 'lowercase trick': we don't know what #1 is, so set @ to be active and to be 'lowercased' to #1. As TeX doesn't change catcode when it does that, we end up searching for 'active' . and replacing with 'normal' .. (We intend to provide a better interface for this in expl3 at some point.) – Joseph Wright Apr 26 '14 at 19:31
http://tex.stackexchange.com/questions/156759/the-lowercase-trick – Joseph Wright Apr 26 '14 at 19:32
The lowercase trick always feels like a shell game that I can never follow. Thanks for the link. I've got to think about this some more. Right now, I'm still confuzzled. – A.Ellett Apr 26 '14 at 19:38

Could someone explain the branching in xparse's \SplitList?

1 Answers1