How do I get the length of a type-set argument to a macro in latex3?

Question

I would like to be able to decide how long a resulting character sequence is, in order to properly type set an expression.

Consider an argument like \varepsilon or \mathscr{A} which both result in a single character.

I understand that we could compile a list of feasible macro names and look up the argument in the list.

Is there, however, a more canonical way to learn that the width of the argument is that of exactly one character (in which case I, among other actions, would omit parentheses)?

Consider a macro

\NewDocumentCommand{\fn}{m m}{
    {#1} \left( {#2} \right)
}

where #1 represents the name of the function (which is expected to be one or more characters long) and #2 represents the arguments to the function.

The macro is intended to properly typeset expressions like

\fn{f}{\frac{a}{b}}

\fn{\varepsilon}{\frac{a}{b}}

\fn{\mathscr{A}}{\frac{a}{b}}

\fn{Velocity}{vwx}

where the first argument may be of varying width and should remain unmodified if it is exactly one character long (or wide), and have \mathrm applied to if it is longer (or wider).

I intend to properly parse the input for particular tokens like \circle and see thus whether the first argument to \fn is a complex expression. I do understand that we use shorthand notation for function composition, in particular---this should be a corner case.

So to summarize,

I am looking for an approach to count the number of characters (and, likely, dimensional width) of a character sequence as it would be type-set.

do you mean length in number of characters? or length in units as in the standard latex \settowidth command? (which is usually more meaningful) — David Carlisle, Feb 05 '15 at 10:59
I only need the number of characters. The length in units of measurement would likely differ for the both commands stated in the question. Why do you consider this more meaningful? — user71833, Feb 05 '15 at 11:08
Well, \neq uses two characters, should it count for one or two? Similarly, \notin is composed with two characters, but it's produced in a four pronged array. I'm afraid the problem is not even meaningful (not your fault, of course). — egreg, Feb 05 '15 at 11:57
@egreg: Thanks. I think I understand. Is there nevertheless a way to do what is intended if we constraint the input to only letters (Latin and Greek with various type faces)? I think I could then extend it by combining tests for character length with tests for dimensional length. — user71833, Feb 05 '15 at 12:05
@egreg: Regarding the number of characters used for negated relation symbols: Does unicode-math rely on the same typesetting rule? — user71833, Feb 05 '15 at 12:08
Not latex3, but \documentclass{article} \newcounter{numchars} \newcommand\countchars[1]{\setcounter{numchars}{0}\counthelp#1\relax\relax\relax\thenumchars} \def\counthelp#1#2\relax{\ifx#1\relax\else\stepcounter{numchars}\counthelp#2\relax\fi} \begin{document} \countchars{abc} \countchars{a} \countchars{ab} \countchars{} \countchars{ab\ne c} \countchars{abc} \end{document} — Steven B. Segletes, Feb 05 '15 at 13:53
@StevenB.Segletes: Thanks, this seems to work for some input. It results here, however, in 2 for \mathscr{A} (maybe because of unicode-math). I can't explain what a sequence of \relax or a macro definition with \relax in the parameter list evaluates to. I'll read up on latex2e and TeX primitives to see whether I can suitably adjust your kindly provided suggestion. I read the implementation of unicode-math (latex3) and, as a quick fix, will do for now the same thing---an associated list of values. — user71833, Feb 06 '15 at 08:43
@user71833 \relax is a "do nothing" macro that is unexpandable. In this case, I'm using it as a flag. In this case, \counthelp recursively digests and counts each token of the input until it hits a \relax, which it takes as the end. It finds the \relax there in the first place because \countchars placed it there at the end of the argument when invoking \counthelp. Indeed, unicode chars are "double-length". FYI, I place 3 \relaxes at the end of the input when calling to \counthelp, in case you pass a blank argument, since \counthelp wants to see 3 tokens at a time. — Steven B. Segletes, Feb 06 '15 at 10:47
Actually unicode chars are multibyte (2x, 3x or 4x). See this answer: http://tex.stackexchange.com/questions/86297/catcodes-of-unicode-characters-with-usepackageutf8inputenc/86300#86300 — Steven B. Segletes, Feb 06 '15 at 13:30
There is no way to do this in full generality; something like \mathbin{\dot\cup} should count as a single character; and what about \newcommand{\foo}{\mathcal{F}(I)}. How can you hope to be able to count the items? — egreg, Jun 02 '15 at 22:15

score 1 · Answer 1 · answered Jun 02 '15 at 21:11

This is a “solution” using expl3. It only counts the tokens of catcode 11 and 12 in the first argument and makes the decision whether to use \mathrm based on this reduced count.

Because this is an automated solution there may be false positives such as f_n in the MWE. One could improve the parsing algorithm to detect this, but it is still virtually impossible to cover all edge cases.

\documentclass{article}
\usepackage{xparse,mathrsfs}

\ExplSyntaxOn

\int_new:N \l_user_charcount_int

\cs_new_protected:Npn \user_parsefn:nn #1#2
 {
  \int_zero:N \l_user_charcount_int
  \tl_map_inline:nn { #1 }
   {
    \bool_if:nT { \token_if_letter_p:N ##1 || \token_if_other_p:N ##1 }
     { \int_incr:N \l_user_charcount_int }
   }
  \int_compare:nTF { \l_user_charcount_int > 1 }
   { \mathrm { #1 } }
   { #1 }
  \left( #2 \right)
 }

\NewDocumentCommand \fn { m m }
 {
  \user_parsefn:nn { #1 } { #2 }
 }

\ExplSyntaxOff

\begin{document}

$\fn{f}{\frac{a}{b}}$

$\fn{f_n}{\frac{a}{b}}$

$\fn{\varepsilon}{\frac{a}{b}}$

$\fn{\mathscr{A}}{\frac{a}{b}}$

$\fn{Velocity}{vwx}$

\end{document}

How do I get the length of a type-set argument to a macro in latex3?

1 Answers1