3

Is there a way to query and manipulate a LaTeX3 token list as a list of tokens rather than as a list of items? For instance, given the token list

{a}b

is it possible to find out how many tokens it contains (4), and to access the first token on the list ({)?

Evan Aad
  • 11,066
  • Comments are not for extended discussion; this conversation has been moved to chat. – Joseph Wright Sep 27 '17 at 07:57
  • I think it's worth noting here (though it's not an answer) that the tools in expl3 get written mainly as the team or others suggest use cases. For token lists, we have mappings which can extract out 'items' (to 'do stuff' with), and some expandable code to do 'manipulations' (see \tl_upper_case:n, etc.). However, parsing in a token-by-token way is unlikely to be generic as each use case will likely have different 'rules', and so it's not been the case that we've had requests that fit within our scope (to date). – Joseph Wright Sep 27 '17 at 08:04

5 Answers5

9

You can use Bruno's gtl package for that. The package performs some extremely non-trivial tasks and has a lot of subtleties, so you have to read the documentation if you want to use it.

At the same time I don't see how it could ever be useful to “deep-count” tokens. Probably this is an XY problem, i.e. your design is flawed and an easier solution exists but you didn't tell us about the actual problem.

\documentclass{article}
\usepackage{gtl}

\begin{document}

\ExplSyntaxOn

\gtl_new:N \l_evan_test_gtl

\gtl_set:Nn \l_evan_test_gtl { {a}b }

% get count
\gtl_count_tokens:N \l_evan_test_gtl

% access first token
\gtl_head_do:NN \l_evan_test_gtl \token_to_str:N

\ExplSyntaxOff

\end{document}
Henri Menke
  • 109,596
4

If you want to store the “token length“ of a token list to an integer variable, then you can do

\regex_count:nnN { . } { {a} b } \l_tmpa_int

Accessing the n-th token might turn out to be very difficult, because TeX doesn't allow unbalanced text in macro definitions. For non braces:

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn
\cs_new_protected:Nn \aad_tokens_count:Nn
 { % #1 = int variable, #2 = token list
  \regex_count:nnN { . } { #2 } #1
 }

\int_new:N \l__aad_tokens_count_int
\tl_new:N \l__aad_tokens_temp_tl

\cs_new_protected:Nn \aad_tokens_get:nnn
 { % #1 = control sequence name, #2 = integer, #3 = token list
  \regex_count:nnN { . } { #3 } \l__aad_tokens_count_int
  \int_compare:nT { 1 <= #2 <= \l__aad_tokens_count_int }
   {
    \tl_set:Nn \l__aad_tokens_temp_tl { #3 }
    \regex_replace_once:xnN
     {
      \exp_not:N \A
      \prg_replicate:nn { #2 - 1 } { . }
      (.)
      .*
     }
     { \c{cs_set_eq:NN}\c{#1}\1 }
     \l__aad_tokens_temp_tl
     \tl_use:N \l__aad_tokens_temp_tl
    }
 }
\cs_generate_variant:Nn \regex_replace_once:nnN { x }

\aad_tokens_get:nnn {foo} { 2 } { {a}b }

\cs_show:N \foo

This will display

> \foo=the letter a.
egreg
  • 1,121,712
1

If you just need to count the tokens, tokcycle can do that easily, including spaces.

\documentclass{article}
\usepackage{tokcycle}
\xtokcycleenvironment\counttoks
{\addcytoks{+1}}
{\addcytoks{+2}\processtoks{##1}}
{\addcytoks{+1}}
{\addcytoks{+1}}
{\stripgroupingtrue}
{\the\numexpr}
\begin{document}
\counttoks abc\endcounttoks

\counttoks a{bc}\endcounttoks

\counttoks a \relax{b{c{d e f}}}\endcounttoks \end{document}

enter image description here

If one needs to know if the first token is cat-1, then apply a \futurelet\firsttok before the \counttoks and after the token cycle, test \firsttok for equivalence with \bgroup:

\documentclass{article}
\usepackage{tokcycle}
\xtokcycleenvironment\counttoks
{\addcytoks{+1}}
{\addcytoks{+2}\processtoks{##1}}
{\addcytoks{+1}}
{\addcytoks{+1}}
{\stripgroupingtrue}
{\the\numexpr}
\newcommand\counttoksplus[1]{%
  \futurelet\firsttok\counttoks #1\endcounttoks\ 
  First token \ifx\firsttok\bgroup\else NOT \fi cat-1}
\begin{document}
\counttoksplus{abc}

\counttoksplus{{bc}} \end{document}

enter image description here

1

One of the disadvantage of gtl package is that it does not consider the char code of { and }

(and also it cannot convert back to a normal tl, although it's actually easy to do that, just take the middle part -- read the documentation to understand its internal structure)

Expl3 actually have function to do that already, the analysis family of functions:

\def \f #1 {
    \def \a {0}
    \def \b {}
    \tl_analysis_map_inline:nn {#1} {
    % only run this once for the first token.
    \tl_if_empty:NT \b {
        \int_compare:nNnTF {&quot;##3} = {1} {
            \def \b {first~is~cat1}
        } {
            \def \b {first~is~not~cat1}
        }
    }

    \tl_set:Nx \a {\int_eval:n {\a+1}}
}

}

Example usage:

\f{{a}b}
number~of~tokens:~\a    \par
\b   \par
\f{x{a}b}
number~of~tokens:~\a    \par
\b   \par

needless to say, in real code don't redefine important LaTeX macros such as \f, \a etc.

user202729
  • 7,143
0

If a token list doesn't contain control sequences, its length can be obtained by viewing the list as a string, and retrieving the string's length using \str_count:.... For instance the following manuscript typesets 4 when compiled with pdftex.

\documentclass{article}
\usepackage{expl3}
\begin{document}
\ExplSyntaxOn
    \str_count:n {{a}b}
\ExplSyntaxOff
\end{document}
Evan Aad
  • 11,066