strlen in TeX Language

Question

Trying to implement a simple algorithms in TeX language, I've tried to implement a string length function (potentially a token counter function). I know spacing in TeX engine has a different mean from that we're a used with, but... even after changing the catcode of space character to another, my algorithm wasn't able to detect them. What am I doing wrong? What is a idiosyncratic detail described in one paragraph of a not so technical and somewhat prolix book about 500 pages length I'm forgetting?

\def\step#1{\advance#1by1\relax}
\def\strlen{\begingroup\catcode\ =11\strlenmain} %\catcode =11 do not work too
\def\strlenmain#1{%
      \countdef\len=0
      \len=0
      \edef\arg{#1}
      \def\measurelen##1{%
         \ifx\endstr##1
         \else
            \step\len
            \expandafter\measurelen
         \fi
      }
      \expandafter\measurelen\arg\endstr
      \the\len
   \endgroup
}
O tamanho de "..." aberta é :
\strlen{Uma noite em Peron} %%%% outputs 15 %%%%
\def\frase{Uma noite em Peron}
O tamanho de "\frase" é :
\strlen\frase %%%% outputs 15 %%%%
\par
\def\frase{Uma\ noite\ em\ Peron}
O tamanho de "\frase" é :
\strlen\frase %%%% outputs 18 %%%%
\par
\end

(when you read the book you're already ahead of most other users :) "once a token is tokenized, its catcode will be frozen". \catcode `\ =11 will not retrospectively change the catcode of space tokens in the macro. — user202729, Nov 17 '22 at 05:33
If you use LaTeX there's \str_count:n in expl3 but it looks like you aren't. — user202729, Nov 17 '22 at 05:33
I assumed this I would be forgiving a part of a paragraph. If I'vent done it wrong, \strlen change the space catcode before it be catched as a macro argument. — Daniel Bandeira, Nov 17 '22 at 09:14
you are missing a space to terminate the catcode setting: \begingroup\catcode`\ =11 \strlenmain — Ulrike Fischer, Nov 17 '22 at 09:28
I would like to emphasize:
"Furthermore, spaces are not ignored after control sequences inside a token list; the ignore-space rule applies only in an input file, during the time that strings of characters are being tokenized"

Excerpt from a chapter before the exercise 7.3 of TeX's Magna Carta.

So, I'm deeply amiss (unfortunately once more time) something again. Could someone give me a hand? — Daniel Bandeira, Nov 17 '22 at 09:40
@Ulrike Fischer, thank you for help, but I do not get what you mean. After it, there is a macro that do not expand to something that could be confused with a number (and in this case, tex would generate a error). Anyway, I will focus on it right now. Thank you again! — Daniel Bandeira, Nov 17 '22 at 09:44
Much more information is needed; are you with pdflatex or (Lua|Xe)LaTeX? Do you expect to have macros (say \emph or similar) in the text you want to process? — egreg, Nov 17 '22 at 09:53
@DanielBandeira sure, but it is expanded while TeX still looks for the number, and so one step too early. — Ulrike Fischer, Nov 17 '22 at 10:00
I'm using pdftex. The code is, basically, exactally how was written.
I was believing that TeX expands until find a number and, in a not fouding case, it will recover the macro ("kill my self").

So, in the case "\catcode`\ =11 " (note the ending space), the last space would stay in 10 or 11 category (or would be even eaten)? — Daniel Bandeira, Nov 17 '22 at 10:40
@DanielBandeira The space would be eaten. The catcode assignment is only performed when the number has been found, which includes eating the space. — egreg, Nov 17 '22 at 11:28
You could try \expandafter\strlen\frase. BTW, if you want to count characters, see https://tex.stackexchange.com/questions/233085/basics-of-parsing?r=SearchResults&s=1%7C39.3909 — John Kormylo, Nov 17 '22 at 15:08

wipet · Accepted Answer · 2022-11-17T17:39:13.057

4

If you do

\def\measurelen#1{...}

then the #1 is first token after ignoring spaces or it is a text enclosed in {...}. I suggest to create the expandable macro \prepspaces which changes the text

word word word

by the text

word{ }word{ }word

Then the text can be read by \measureelement and it reads to its #1 the spaces too.

\def\step#1{\advance#1by1\relax}
\def\strlen{\begingroup\strlenmain}
\def\prepspaces#1 #2{#1\ifx\end#2\else{ }#2\expandafter\prepspaces\fi}
\def\strlenmain#1{%
      \countdef\len=0
      \len=0
      \edef\arg{\expandafter\prepspaces#1 \end}
      \def\measurelen##1{%
         \ifx\endstr##1
         \else
            \step\len
            \expandafter\measurelen
         \fi
      }
      \expandafter\measurelen\arg\endstr
      \the\len
   \endgroup
}
O tamanho de "..." aberta é :
\strlen{Uma noite em Peron} %%%% outputs 18 %%%%
\def\frase{Uma noite em Peron}
O tamanho de "\frase" é :
\strlen\frase %%%% outputs 18 %%%%
\def\frase{Uma\ noite\ em\ Peron}
O tamanho de "\frase" é :
\strlen\frase %%%% outputs 18 %%%%
\end

edited Nov 17 '22 at 17:39

answered Nov 17 '22 at 13:55

wipet

74,238

We learn a simple and useful concept despite giving me a good answer: the right way work with a end mark. No \empty, no \relax, no \AVERYUNSUALANDPOTENIALLYUNDEFINEDNAME... but \end... I am sad with myself and, at the time, glad to learn that. Thank you too much! Your solution is really realyl good! – Daniel Bandeira Nov 17 '22 at 16:43
1

@DanielBandeira If you understand the purpose of that mark you'll see that it can really be anything as long as it's guaranteed to not appear in the argument... but when you're not the only user of your code (i.e. the user of the code may not understand the code) it makes sense to use unusual name to avoid surprises. – user202729 Nov 17 '22 at 18:05
This is the point! – Daniel Bandeira Nov 17 '22 at 18:25
1

Unusual name is not sufficient if it is undefined. Because macros typically do test \ifx wich returns true for any name which isn't defined. OpTeX uses \_fin for this purposes and defines it as \_protected\_long \_def \_fin \_fin {}, i.e. something unusable macro. We suppose that \_fin will never be in users lists of tokens because it is internal macro (_ in its name). – wipet Nov 17 '22 at 18:28
When I'vee refered to undefined, I meant not used too to a argument. This is a phobia I have and never resolve it well. I think it's boring using @ in a attempt to avoid this. – Daniel Bandeira Nov 17 '22 at 19:59

egreg · Answer 2 · 2022-11-17T22:33:46.940

I believe it's easier with expl3.

\input expl3-generic
\ExplSyntaxOn
\cs_new_protected:Npn \textlen #1
 {
  \bandeira_textlen:n { #1 }
  \int_use:N \l__bandeira_textlen_int
 }
\cs_new_protected:Npn \textlensave #1 #2
 {
  \bandeira_textlen:n { #2 }
  \tl_set:Nx #1 { \int_use:N \l__bandeira_textlen_int }
 }
\int_new:N \l__bandeira_textlen_int
\cs_generate_variant:Nn \text_map_inline:nn { e }
\cs_new_protected:Nn \bandeira_textlen:n
 {
  \bool_lazy_and:nnTF { \tl_if_single_p:n { #1 } } { \token_if_cs_p:N #1 }
   {
    __bandeira_textlen:o { #1 }
   }
   {
    __bandeira_textlen:n { #1 }
   }
 }
\cs_new_protected:Nn __bandeira_textlen:n
 {
  \int_zero:N \l__bandeira_textlen_int
  \text_map_inline:en { \text_purify:n { #1 } } { \int_incr:N \l__bandeira_textlen_int }
 }
\cs_generate_variant:Nn __bandeira_textlen:n { o }
% add other things to change here
\text_declare_purify_equivalent:Nn ' {} % ignore ' for counting
\ExplSyntaxOff
\textlen{Uma noite em Peron}
\def\frase{Uma noite em Peron}
O tamanho de ``\frase'' é:
\textlen\frase
\def\frase{Uma\ noite\ em\ Peron}
O tamanho de ``\frase'' é:
\textlen\frase
\def\frase{Uma {\it noite} em Per'on}
O tamanho de ``\frase'' é:
\textlen\frase
\textlensave\foo{ÁÉónh EEE}
\foo
\bye

Adding support for UTF-8 and avoid accents has already been treated here; there are plain TeX macro packages that do it, such as OpTeX.

Why bother splitting at spaces? A quick test suggests that \text_map_inline:nn correctly loops over normal chars and spaces. — Bruno Le Floch, Nov 17 '22 at 22:24

score 2 · Answer 3 · answered Nov 17 '22 at 09:28

Ah, the comment was wrong. While \strlen\frase will clearly not work (the spaces are already tokenized), the issue with the other part is that you need to modify the code to...

\def\strlen{\begingroup\catcode`\ =11 \strlenmain}

It's all obvious in retrospect (if you omit the space, TeX will look ahead to see if the number has terminated, which involves expanding the following macro and grab the argument), but it's a bit difficult to debug without some help.

(help e.g. print out #1 and see it contains spaces of space catcode instead of the expected space of letter catcode)

Since you don't require expandability, one way to implement it with explicit space token (as in \frase) would be to use \let\macro=⟨explicit space⟩ to take and delete one token.

With space given category 11 \strlen\frase %, as in the questioner's MWE, results in \strlenmain's argument not being a token \frase but being a token \frase<space>, which in turn is undefined... Probably give space category 12, but \strlen's argument is to be expanded, thus in case they come from expanding the argument. you may still have to cope with explicit space tokens of category 10(space) and character code 32... Explicit space tokens might also come from \strlen's argument being passed to \strlen by another macro. — Ulrich Diez, Nov 17 '22 at 13:12

Ulrich Diez · Answer 4 · 2022-11-17T14:11:59.650

Trying to implement a simple algorithms in TeX language, I've tried to implement a string length function (potentially a token counter function).

The concept "string" is vague in TeX.
Right after reading and tokenizing the .tex-input-file— referring to Donald E. Knuth's analogy of TeX being an organism with eyes and a digestive tract, this is done by TeX's eyes and mouth—everything in TeX is about tokens (explicit character tokens, control word tokens, control symbol tokens).
With the expansion of macros—this is done in TeX's gullet—everything is about macro arguments. A macro argument can be empty or consist of a single token or consist of several tokens.

The mechanism implemented by you basically recursively calls the macro \measurelen for removing an undelimited macro argument and — in case the meaning of the first token of the argument does not equal the meaning of the token \endstr — incrementing a counter and calling itself again.
The mechanism implemented by you actually does not count tokens, but does count undelimited macro arguments. (However, a single macro argument, counted as one item, can consist of several character tokens...)

Some questions arise about how your routine should handle things:

Assume the macro argument/the set of tokens which forms the "string" also contains curly braces that are balanced, i.e., some explicit character tokens of category 1(begin group) and 2(end group)—{₁/}₂. These tokens usually might not yield visible output. They might affect grouping/scoping while TeX is running. Shall such tokens be counted by the \strlen-routine anyway?
What result do you wish to obtain with \strlen{A {BCD{EFG}H } {}I J}? Shall it be 20 (braces counted)? Shall it be 14 (braces not counted)? The further might make sense when doing s.th. like \strlen{#2}...\scantokens\expandafter{\string\verb*#1#2#1} where #2 denotes a set of unexpandable explicit character tokens not of category 6(parameter) and #1 denotes a verb-delimiter which does not occur within #2.
Assume the macro argument/the set of tokens which forms the "string" also contains explicit character tokens of category 6(parameter)/hashes #₆. Shall each hash-character-token be counted on its own or shall two consecutive hash-character-tokens be taken and counted for a single one? The latter might make sense when defining a scratch macro from the argument directly via \def so that at the time of expanding the scratch-macro two consecutive hashes are collapsed into a single one.
How to count tokens that don't yield single characters but yield, e.g., inclusion of a graphic, drawing of a rule, assigning to a macro/register, gobbling/removal of subsequent tokens?
How to count characters which do not denote the drawing of glyphes/of instances of graphemes like digits or letters but denote, e.g., that a linebreak shall (not) occur (s. th. like ASCII's CR and LF; s.th. like the "word joiner" of utf 8) or that some horizintal space shall be inserted (space, nobreak-space, enspace, emspace)?
How to count token-sequences that yield a single accented letter?
Shall the mechanism work out both on 8-bit-engines and on utf-8-engines like XeTeX/LuaTeX? If so, how to handle multibyte-utf-8-characters in the 8-bit-engine? In 8-bit-engines with LaTeX utf-8-encoded files are processed using the inputenc-package with utf-8-option so that in 8-bit engines with LaTeX a multibyte-utf-8-character is represented by several tokens, each coming from tokenizing the .tex-input-file 8bit-character-wise instead of multibyte-character-wise.

A basic remark:

In TeX tokens can come into being in two ways:

By having TeX read and tokenize stuff from the .tex-input-file.
During expansion by having TeX replace tokens by tokens that form replacement text (e.g., of macro-tokens and their arguments, e.g., the result of expanding a \csname..\endcsname-expression, e.g., the result of carrying out a \the-directive).

Let's look at your code:

\def\strlen{\begingroup\catcode`\ =11\strlenmain}

Don't switch space to category 11(letter)! If after doing so the tokens that form the argument of \strlenmain are to come into being by having TeX read and tokenize from the .tex-input-file, spaces in the .tex-input-file can be taken for components of names of control sequence tokens. In the same way in which \makeatletter is used for "telling" TeX that henceforth @ can be a component the name of control sequence token that comes into being by having TeX read and tokenize from the .tex-input-file. E.g., with \strlen\frase %%% the argument of \strlenmain would not be a token \frase but would be a token \frase⟨space⟩.
Switching the category code of the space character does not help with material which does not come from having TeX read and tokenize from the .tex-input-file but which does come into being due to expansion or which is passed to \strlen by another macro.
E.g., with \strlen\frase the tokens that at the time of expanding \frase form the replacement text of \frase were read and tokenized from the .tex-input-file at the time when \frase was defined, i.e., at a time when \strlen's temporary change of the category code of the space character was not in effect and thus did not affect how things got tokenized.
E.g., if with \def\PassToSrlen#1{\strlen{#1}} you do \PassToSrlen{A B C}, the argument of \strlen will be A B C where spaces got tokenized as usual/as explicit space tokens of category 10(space) and character code 32 because the argument of \PassToSrlen, which is passed on to \strlen, is tokenized at a time when \strlen's temporary change of the category code of the space character is not in effect yet and thus does not affect how things get tokenized.
So you still need to consider the case of there being explicit space tokens of category 10(space) and character code 32.
As there is nothing between 11 and \strlenmain that might indicate that the digit-sequence forming the number is finished, TeX will expand \strlenmain for finding out whether there are more digits (and if so raising an error about there not being a valid catcode). Thus \strlenmain will be toplevel-expanded in the course of evaluating how the category-code-assignment shall be done, not when the category-code-assignment is already done. In this case it doesn't matter as the first token coming from expanding \strlenmain is \countdef which clearly is not a digit. But there are scenarios where having things expanded while the digit-token-sequence forming the ⟨number⟩-quantity is not terminated might bite you, so if a TeX-⟨number⟩-quantity is to be formed by a sequence of explicit digit-character-tokens it is good practice to terminate that sequence of digit-character-tokens with an explicit space token. The space token will be removed by TeX's routine for gathering sequences of explicit digit-character-tokens as TeX-⟨number⟩-quantities.

\def\strlenmain#1{%
      \countdef\len=0
      \len=0
      \edef\arg{#1}
      \def\measurelen##1 [... etc etc]
}

Probably you don't need to define \len and \measurelen each time when \strlenmain / \strlen is carried out.
I suggest defining these things outside the group/outside \strlenmain.
You use \edef for defining a scratch macro that shall deliver the expansion of the argument. If the argument contains single hash-character-tokens of category 6(parameter), these will be taken for s.th. that denotes an argument of the macro \arg while the macro \arg is being defined without the \edef-assignment having a parameter text. Thus you will get an error in this case. If the argument contains sequences of more than one such hash-character-token, at the time of expanding \arg (\expandafter\measurelen\arg\endstr) every first and every second such hash-character-token are collapsed into a single hash-character-token. This collapsing of hash-character-tokens affects the counting of tokens. With recent TeX engines you can use \expanded{...} instead of defining a scratch macro. Then the hashes won't go into the replacement text of a macro definition and thus problems coming from hashes going into the replacement text of a macro-definition are obsolete.

Actually your mechanism counts undelimited arguments, not tokens.
What result do you wish to obtain with \strlen{A {BCD{EFG}H } {}I J}?

TeX discards explicit space tokens (character code 32, category 10) while scanning for the first token that belongs to an undelimited argument. E.g, after \def\processtwo{(#1)/(#2)} \processtwo{A} {B} and \processtwo{A}{B} and \processtwo A {B} and \processtwo A{B} and \processtwo{A} B and \processtwo{A}B and \processtwo A B and \processtwo AB all yield the same, namely (A)/(B). (There being or not being an explicit space token between the first and the second undelimited argument of \processtwo doesn't matter.)

If your "string" contains unbalanced \else or \fi, then these might erroneously match up the \ifx-comparison done by \measurelen.

If you really want to count tokens no matter

if at some stage of processing other than the stage of expansion their processing yields characters in an amount differing from the amount of tokens— e.g., the .tex-input \char65⟨space⟩ usually yields four tokens (\char, 6, 5 and an explicit space token) which in turn yield placing a single character A into the output-file/.pdf-file
that results of counting might differ depending on whether an 8-bit-TeX-engine or a utf-8-TeX-enginec (like XeTeX or LuaTeX) is in use

, while having e-TeX-extensions (\numexpr) and \expanded available, you can try s.th. like this:

\errorcontextlines=10000
\catcode`\@=11
%%=============================================================================
%% PARAPHERNALIA:
%% \UD@firstoftwo, \UD@secondoftwo, \UD@PassFirstToSecond, \UD@Exchange,
%% \UD@removespace, \UD@stopromannumeral, \UD@CheckWhetherNull,
%% \UD@CheckWhetherBrace, \UD@CheckWhetherLeadingExplicitSpace,
%% \UD@ExtractFirstArg
%%=============================================================================
\long\def\UD@firstoftwo#1#2{#1}%
\long\def\UD@secondoftwo#1#2{#2}%
\long\def\UD@PassFirstToSecond#1#2{#2{#1}}%
\long\def\UD@Exchange#1#2{#2#1}%
\UD@Exchange{ }{\def\UD@removespace}{}%
\chardef\UD@stopromannumeral=`\^^00%
%%-----------------------------------------------------------------------------
%% Check whether argument is empty:
%%.............................................................................
%% \UD@CheckWhetherNull{<Argument which is to be checked>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is empty>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is not empty>}%
%%
%% The gist of this macro comes from Robert R. Schneck's \ifempty-macro:
%% <https://groups.google.com/forum/#!original/comp.text.tex/kuOEIQIrElc/lUg37FmhA74J>
\long\def\UD@CheckWhetherNull#1{%
  \romannumeral\expandafter\UD@secondoftwo\string{\expandafter
  \UD@secondoftwo\expandafter{\expandafter{\string#1}\expandafter
  \UD@secondoftwo\string}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\expandafter\UD@stopromannumeral\UD@secondoftwo}{%
  \expandafter\UD@stopromannumeral\UD@firstoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether argument's first token is a catcode-1-character
%%.............................................................................
%% \CheckWhetherBrace{<Argument which is to be checked>}%
%%                   {<Tokens to be delivered in case that argument
%%                     which is to be checked has a leading
%%                     explicit catcode-1-character-token>}%
%%                   {<Tokens to be delivered in case that argument
%%                     which is to be checked does not have a
%%                     leading explicit catcode-1-character-token>}%
\long\def\UD@CheckWhetherBrace#1{%
  \romannumeral\expandafter\UD@secondoftwo\expandafter{\expandafter{%
  \string#1.}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\expandafter\UD@stopromannumeral\UD@firstoftwo}{%
  \expandafter\UD@stopromannumeral\UD@secondoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether brace-balanced argument starts with a space-token
%%.............................................................................
%% \UD@CheckWhetherLeadingExplicitSpace{<Argument which is to be checked>}%
%%                                     {<Tokens to be delivered in case <argument
%%                                       which is to be checked> does have a
%%                                       leading explicit space-token>}%
%%                                     {<Tokens to be delivered in case <argument
%%                                       which is to be checked> does not have a
%%                                       a leading explicit space-token>}%
\long\def\UD@CheckWhetherLeadingExplicitSpace#1{%
  \romannumeral\UD@CheckWhetherNull{#1}%
  {\expandafter\UD@stopromannumeral\UD@secondoftwo}%
  {%
    % Let's nest things into \UD@firstoftwo{...}{} to make sure they are nested in braces
    % and thus do not disturb when the test is carried out within \halign/\valign:
    \expandafter\UD@firstoftwo\expandafter{%
      \expandafter\expandafter\expandafter\UD@stopromannumeral
      \romannumeral\expandafter\UD@secondoftwo
      \string{\UD@CheckWhetherLeadingExplicitSpaceB.#1 }{}%
    }{}%
  }%
}%
\long\def\UD@CheckWhetherLeadingExplicitSpaceB#1 {%
  \expandafter\UD@CheckWhetherNull\expandafter{\UD@firstoftwo{}#1}%
  {\UD@Exchange{\UD@firstoftwo}}{\UD@Exchange{\UD@secondoftwo}}%
  {\expandafter\expandafter\expandafter\UD@stopromannumeral
   \expandafter\expandafter\expandafter}%
   \expandafter\UD@secondoftwo\expandafter{\string}%
}%
%%-----------------------------------------------------------------------------
%% Extract first inner undelimited argument:
%%
%%   \UD@ExtractFirstArg{ABCDE} yields  {A}
%%
%%   \UD@ExtractFirstArg{{AB}CDE} yields  {AB}
%%
%% Due to \romannumeral-expansion the result is delivered after two 
%% expansion-steps/after "hitting" \ExtractFirstArg with \expandafter
%% twice.
%%
%% \UD@ExtractFirstArg's argument must not be blank.
%%
%% Use frozen-\relax as delimiter for speeding things up.
%% I chose frozen-\relax because David Carlisle pointed out in
%% <https://tex.stackexchange.com/a/578877>
%% that frozen-\relax cannot be (re)defined in terms of \outer and cannot be
%% affected by \uppercase/\lowercase.
%%
%% \ExtractFirstArg's argument may contain frozen-\relax:
%% The only effect is that internally more iterations are needed for
%% obtaining the result.
%%
%%.............................................................................
\expandafter\expandafter\expandafter\UD@Exchange
\expandafter\expandafter\expandafter{%
\expandafter\expandafter\ifnum0=0\fi}%
{\long\def\UD@RemoveTillFrozenrelax#1#2}{{#1}}%
%
\expandafter\UD@PassFirstToSecond\expandafter{%
  \romannumeral\expandafter
  \UD@PassFirstToSecond\expandafter{\romannumeral
    \expandafter\expandafter\expandafter\UD@Exchange
    \expandafter\expandafter\expandafter{%
    \expandafter\expandafter\ifnum0=0\fi}{\UD@stopromannumeral#1{}}%
  }{%
    \UD@stopromannumeral\romannumeral\UD@ExtractFirstArgLoop
  }%
}{%
  \long\def\UD@ExtractFirstArg#1%
}%
\long\def\UD@ExtractFirstArgLoop#1{%
  \expandafter\UD@CheckWhetherNull\expandafter{\UD@firstoftwo{}#1}%
  {\UD@stopromannumeral#1}%
  {\expandafter\UD@ExtractFirstArgLoop\expandafter{\UD@RemoveTillFrozenrelax#1}}%
}%
%%=============================================================================
%% USER-MACRO:
%%=============================================================================
\long\def\strlen#1{%
  %
  \romannumeral\expandafter\tokencountloop\expanded{{#1}}{0}%
  %
  %{\edef\arg{#1}\expandafter}\expandafter
  %\romannumeral\expandafter\tokencountloop\expandafter{\arg}{0}%
}%
%%=============================================================================
%% MAIN LOOP:
%%=============================================================================
\long\def\tokencountloop#1#2{%
  \UD@CheckWhetherNull{#1}{\UD@stopromannumeral#2}{%
    \expandafter\UD@PassFirstToSecond\expandafter{%
      \number\numexpr#2+%
        \UD@CheckWhetherBrace{#1}{%
          2+% <-If you don't want curly braces to be counted, comment out.
          \romannumeral\expandafter\expandafter\expandafter\tokencountloop\UD@ExtractFirstArg{#1}{0}%
        }{1}%
      \relax
    }{%
      \UD@CheckWhetherLeadingExplicitSpace{#1}{%
        \expandafter\tokencountloop\expandafter{\UD@removespace#1}%
      }{%
        \expandafter\tokencountloop\expandafter{\UD@firstoftwo{}#1}%
      }%
    }%
  }%
}%
\catcode`\@=12
O tamanho de "..." aberta é :
\strlen{Uma noite em Peron} %%%% outputs 18
\def\frase{Uma noite em Peron}
O tamanho de "\frase" é :
\strlen\frase %%%% outputs 18
\def\frase{Uma\ noite\ em\ Peron}
O tamanho de "\frase" é :
\strlen\frase %%%% outputs 18
O tamanho de "..." aberta é :
\strlen{Uma {{noite} }em Peron} %%%% outputs 22 which might make sense with things like
                                %%%%    \strlen{#2}
                                %%%%    \scantokens\expandafter{\string\verb*#1#2#1}
                                %%%% where #2 denotes a set of unexpandable explicit character tokens
                                %%%% and #1 denotes a verb-delimiter which does not occur within #2.
\end

Assuming e-TeX-extensions (\numexpr/\detokenize) and \expanded are not available, the job of counting tokens will not be trivial and cannot be done in terms of expandable methods/routines alone:

In order to expand the argument you need to use \edef for defining a scratch-macro.

Definig a scratch-macro itself is not an expandable method.
Before defining the scratch macro every hash (every explicit character token of category 6(parameter)) in the argument needs to be doubled.
I don't see any reliable method for detecting an explicit character token of category 6(parameter) which does without whatsoever TeX-extension.
(If \detokenize was available one could check whether applying \string to the token in question yields a single token while applying \detokenize (due to its doubling of hashes) yields two tokens; however the edge case of there being an explicit character token of character code 32 and category 6 would require special attention...)

strlen in TeX Language

4 Answers4