How to handle a LaTeX3 token list as a list of tokens rather than as a list of items?

Question

Is there a way to query and manipulate a LaTeX3 token list as a list of tokens rather than as a list of items? For instance, given the token list

{a}b

is it possible to find out how many tokens it contains (4), and to access the first token on the list ({)?

Comments are not for extended discussion; this conversation has been moved to chat. — Joseph Wright, Sep 27 '17 at 07:57
I think it's worth noting here (though it's not an answer) that the tools in expl3 get written mainly as the team or others suggest use cases. For token lists, we have mappings which can extract out 'items' (to 'do stuff' with), and some expandable code to do 'manipulations' (see \tl_upper_case:n, etc.). However, parsing in a token-by-token way is unlikely to be generic as each use case will likely have different 'rules', and so it's not been the case that we've had requests that fit within our scope (to date). — Joseph Wright, Sep 27 '17 at 08:04

score 9 · Accepted Answer · answered Sep 27 '17 at 07:53

9

You can use Bruno's gtl package for that. The package performs some extremely non-trivial tasks and has a lot of subtleties, so you have to read the documentation if you want to use it.

At the same time I don't see how it could ever be useful to “deep-count” tokens. Probably this is an XY problem, i.e. your design is flawed and an easier solution exists but you didn't tell us about the actual problem.

\documentclass{article}
\usepackage{gtl}

\begin{document}

\ExplSyntaxOn

\gtl_new:N \l_evan_test_gtl

\gtl_set:Nn \l_evan_test_gtl { {a}b }

% get count
\gtl_count_tokens:N \l_evan_test_gtl

% access first token
\gtl_head_do:NN \l_evan_test_gtl \token_to_str:N

\ExplSyntaxOff

\end{document}

answered Sep 27 '17 at 07:53

Henri Menke

109,596

Also \regex_count:nnN { . } { {a}b } \l_tmpa_int will set the variable to 4. – egreg Sep 27 '17 at 07:58
@egreg You should post an additional answer. – Henri Menke Sep 27 '17 at 07:59
Probably add \usepackage[T1]{fontenc} so the { comes out 'right' ;) – Joseph Wright Sep 27 '17 at 08:05
@JosephWright ... or typeset with LuaTeX (which is what I do). – Henri Menke Sep 27 '17 at 08:33
1

@HenriMenke Works with a recent format, but not with an older one :) I tend to work on the basis that unless otherwise specified all examples are for pdfTeX (particularly with LuaTeX we have the variations in behaviour from TeX90 and the 'stability' issue). – Joseph Wright Sep 27 '17 at 09:15

score 4 · Answer 2 · answered Sep 27 '17 at 08:17

If you want to store the “token length“ of a token list to an integer variable, then you can do

\regex_count:nnN { . } { {a} b } \l_tmpa_int

Accessing the n-th token might turn out to be very difficult, because TeX doesn't allow unbalanced text in macro definitions. For non braces:

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn
\cs_new_protected:Nn \aad_tokens_count:Nn
 { % #1 = int variable, #2 = token list
  \regex_count:nnN { . } { #2 } #1
 }

\int_new:N \l__aad_tokens_count_int
\tl_new:N \l__aad_tokens_temp_tl

\cs_new_protected:Nn \aad_tokens_get:nnn
 { % #1 = control sequence name, #2 = integer, #3 = token list
  \regex_count:nnN { . } { #3 } \l__aad_tokens_count_int
  \int_compare:nT { 1 <= #2 <= \l__aad_tokens_count_int }
   {
    \tl_set:Nn \l__aad_tokens_temp_tl { #3 }
    \regex_replace_once:xnN
     {
      \exp_not:N \A
      \prg_replicate:nn { #2 - 1 } { . }
      (.)
      .*
     }
     { \c{cs_set_eq:NN}\c{#1}\1 }
     \l__aad_tokens_temp_tl
     \tl_use:N \l__aad_tokens_temp_tl
    }
 }
\cs_generate_variant:Nn \regex_replace_once:nnN { x }

\aad_tokens_get:nnn {foo} { 2 } { {a}b }

\cs_show:N \foo

This will display

> \foo=the letter a.

Where can I find \regex_count:nnN? – Evan Aad Sep 27 '17 at 08:22 — Evan Aad, Sep 27 '17 at 08:22
@EvanAad In texdoc interface3 – egreg Sep 27 '17 at 08:26 — egreg, Sep 27 '17 at 08:26

Steven B. Segletes · Answer 3 · 2021-04-17T03:16:46.493

If you just need to count the tokens, tokcycle can do that easily, including spaces.

\documentclass{article}
\usepackage{tokcycle}
\xtokcycleenvironment\counttoks
{\addcytoks{+1}}
{\addcytoks{+2}\processtoks{##1}}
{\addcytoks{+1}}
{\addcytoks{+1}}
{\stripgroupingtrue}
{\the\numexpr}
\begin{document}
\counttoks abc\endcounttoks
\counttoks a{bc}\endcounttoks
\counttoks a \relax{b{c{d e f}}}\endcounttoks
\end{document}

If one needs to know if the first token is cat-1, then apply a \futurelet\firsttok before the \counttoks and after the token cycle, test \firsttok for equivalence with \bgroup:

\documentclass{article}
\usepackage{tokcycle}
\xtokcycleenvironment\counttoks
{\addcytoks{+1}}
{\addcytoks{+2}\processtoks{##1}}
{\addcytoks{+1}}
{\addcytoks{+1}}
{\stripgroupingtrue}
{\the\numexpr}
\newcommand\counttoksplus[1]{%
  \futurelet\firsttok\counttoks #1\endcounttoks\ 
  First token \ifx\firsttok\bgroup\else NOT \fi cat-1}
\begin{document}
\counttoksplus{abc}
\counttoksplus{{bc}}
\end{document}

score 1 · Answer 4 · answered Mar 26 '22 at 03:00

One of the disadvantage of gtl package is that it does not consider the char code of { and }

(and also it cannot convert back to a normal tl, although it's actually easy to do that, just take the middle part -- read the documentation to understand its internal structure)

Expl3 actually have function to do that already, the analysis family of functions:

\def \f #1 {
    \def \a {0}
    \def \b {}
    \tl_analysis_map_inline:nn {#1} {
    % only run this once for the first token.
    \tl_if_empty:NT \b {
        \int_compare:nNnTF {&quot;##3} = {1} {
            \def \b {first~is~cat1}
        } {
            \def \b {first~is~not~cat1}
        }
    }

    \tl_set:Nx \a {\int_eval:n {\a+1}}
}

}

Example usage:

\f{{a}b}
number~of~tokens:~\a    \par
\b   \par
\f{x{a}b}
number~of~tokens:~\a    \par
\b   \par

_{needless to say, in real code don't redefine important LaTeX macros such as \f, \a etc.}

Evan Aad · Answer 5 · 2017-09-28T07:18:18.513

0

If a token list doesn't contain control sequences, its length can be obtained by viewing the list as a string, and retrieving the string's length using \str_count:.... For instance the following manuscript typesets 4 when compiled with pdftex.

\documentclass{article}
\usepackage{expl3}
\begin{document}
\ExplSyntaxOn
    \str_count:n {{a}b}
\ExplSyntaxOff
\end{document}

edited Sep 28 '17 at 07:18

answered Sep 28 '17 at 07:12

Evan Aad

11,066

How to handle a LaTeX3 token list as a list of tokens rather than as a list of items?

5 Answers5

Linked