Counting the number of occurences of specific characters in a string in an expandable way

Question

This question shows how to count the number of occurrences of a specific character in a string. I would like to do it in an expandable way, and for a list (not only one) of specific characters.

\documentclass{article}
\usepackage{xparse}
\ExplSyntaxOn
\NewExpandableDocumentCommand{\countsep}{r[] m}{
    % CODE HERE
}
\ExplSyntaxOff
\begin{document}
\countsep[-+]{This-is-a+test} % Count the number of - and + in the string (3)
\countsep[-+]{T+h+i+s-is-a-test} % Count the number of - and + in the string (6)
\end{document}

Is "string" meant to really be string, or do you mean token lists (for instance, is \countsep[+]{a\+b} 0 or 1)? What about things nested inside braces (is \countsep[+]{a+b{+c}} 1 or 2)? — Skillmon, Feb 05 '24 at 08:14

Steven B. Segletes · Accepted Answer · 2024-02-05T03:36:10.477

6

\documentclass{article}

\def\countsep#1#2{\the\numexpr0\countsepy#1\Endlist#2\Endcount}
\def\countsepy#1#2\Endlist#3\Endcount{\countsepz#1#3\Endcount
  \ifx\relax#2\relax\relax\else\countsepy#2\Endlist#3\Endcount\fi}
\def\countsepz#1#2#3\Endcount{\ifx#1#2+1\fi
  \ifx\relax#3\relax\else\countsepz#1#3\Endcount\fi}

\begin{document}
\countsep{-}{This-is-a+test} % Count the number of - in the string (2)
\countsep{+}{This-is-a+test} % Count the number of + in the string (1)
\countsep{-+}{This-is-a+test} % Count the number of - and + in the string (3)

\edef\z{\countsep{-+}{T+h+i+s-is-a-test}}
\z
\end{document}

edited Feb 05 '24 at 03:36

answered Feb 05 '24 at 03:30

Steven B. Segletes

237,551

How would one have to modify \countsep so that \def\yy{T} \def\zz{TTT} \countsep{\yy}{\zz} and \countsep{ß}{ßßß} would work? (The latter case concerns utf8-encoded characters.) – Mico Feb 05 '24 at 03:57
1

@Mico The syntax \countsep{äÖÜß}{ßT+h+i+s-is-a-testäÖÜß} will already work as is, if the code is compiled in lualatex or xelatex. As to the case of \countsep{\yy}{\zz}, I could probably tweak a solution to meet your interpretation; however, I would prefer to interpret the syntax without any expansion of the arguments, such that \countsep{\today}{xy\today z\today pdq} would directly yield the result 2, which my code will do. – Steven B. Segletes Feb 05 '24 at 04:02
1

@Mico For example, \def\yy{T} \def\zz{TTT} \expandafter\expandafter\expandafter\countsep\expandafter\expandafter\expandafter{\expandafter\yy\expandafter}\expandafter{\zz} would accomplish a single expansion of each argument, which could be codified into a macro. Likewise, \expanded{\noexpand\countsep{\yy}{\zz}} for full expansion. But as I said, it makes more sense to me to interpret the arguments unexpanded. – Steven B. Segletes Feb 05 '24 at 04:20
1

Thanks for these additional explanations. – Mico Feb 05 '24 at 04:27

Mico · Answer 2 · 2024-02-05T03:47:43.070

Here's a LuaLaTeX-based solution for \countsep. The solution is a simple refinement of a solution I gave to the earlier query mentioned in the OP's posting. Note that \countsep is expandable because \directlua and \luastring are expandable.

Observe further that both the search string and the target string can include general UTF8-encoded characters.

% !TEX TS-program = lualatex
\documentclass{article}
\usepackage{luacode} % for '\luastring' macro
\newcommand\countsep[2]{\directlua{%
   _ , count = unicode.utf8.gsub ( "#2" , "["..\luastring{#1}.."]" , "" )
   tex.sprint ( count ) }}
\def\yy{äÖÜß}
\def\zz{ßT+h+i+s-is-a-testäÖÜß}
\edef\z{\countsep{\yy}{\zz}}
\begin{document}
\countsep{""}{This-is-a+test} % result: 0
\countsep{-+}{This-is-a+test} % result: 3
\countsep{-+}{T+h+i+s-is-a-test} % result: 6
\countsep{\yy}{\zz\zz} % result: 10
\z % result: 5
\end{document}

Skillmon · Answer 3 · 2024-02-06T09:23:12.930

The following implements this in L3 using etl. It does use the required argument in brackets of your question, though I strongly advice against it. Using non-standard arguments is discouraged for a reason, [] is usually an optional argument, which this isn't, and I see no good reason for the parsing overhead here.

Things to mention:

this assumes you want to actually count occurences in a token list (no stringification)
this assumes separators inside braces are to be counted (if not change the nested + \vincent_count_tokens_in:nn to \use_none:nn)
This counts correctly if one of the specified separators is a space
This doesn't work for non-ASCII separator-symbols in non-UTF8-engines (pdfTeX), otherwise it works in all engine

\documentclass{article}
\usepackage{xparse, etl}
\ExplSyntaxOn
\etl_new_if_in:Nnn __vincent_if_contains_space:n { ~ } { T }
\cs_new:Npn \vincent_count_tokens_in:nn #1#2
  {
    \int_eval:w 0
    \etl_act:nennn
      __vincent_count_tokens_in:nN
      {
        __vincent_if_contains_space:nT {#1} { + \c_one_int }
        \use_none:n
      }
      { + \vincent_count_tokens_in:nn }
      {#1}
      {#2}
    \scan_stop:
  }
\cs_generate_variant:Nn \etl_act:nnnnn { ne }
\cs_new:Npn __vincent_count_tokens_in:nN #1#2
  { \etl_token_if_in:nNT {#1} #2 { + \c_one_int } }
\NewExpandableDocumentCommand{\countsep}{r[] m}
  { \vincent_count_tokens_in:nn {#1} {#2} }
\ExplSyntaxOff
\begin{document}
\countsep[-+]{This-is-a+test} % Count the number of - and + in the string (3)
\countsep[- +]{T+h+i+s is{-a-}test} % Count the number of -, space and + in the string (6)
\end{document}

Aside: This is not particularly fast, as etl has to check for spaces and groups for each list element, so it does much more in the background than the solution by @StevenB.Segletes, this is only needed if you really want to be able to count spaces, and/or recurse into groups.

Counting the number of occurences of specific characters in a string in an expandable way

3 Answers3