Macro changing a string depending on letter case

Question

I would like to define a new mathmode command such that:

the input is a string s;
the output is the same string where the font of each caracter has been changed depending on whether the character is in lower case or upper case.

In pseudo-code, that would look like:

\newcommand{\cat}{
    forEach (character char in #1) {
        if (isLowerCase(char)) -> print \mathit{char}
        else -> print \mathcal{char}
    }
}

It seems I can use ifthenelse from the package ifthen for the if/else. I couldn't find how to:

extract characters from string;
loop for each of these characters;
check whether a character is in lower case or not.

By the way, if anyone knows about a readable but nonetheless exhaustive reference on defining commands, I would appreciate (I could only find either too basic or "way over my head" references).

Thanks for your help!

PS: if anyone needs some context, I'm working in category theory, where the typical font for categories is \mathcal. Since lower case in mathcal is not defined, I need to find a way around it.

EDIT: I describe here my understanding of Ulrike Fischer's solution, as it might be useful to others.

\ExplSyntaxOn
\NewDocumentCommand\cat{m}
 {
   \tl_set:Nn\l_tmpa_tl {#1}
   \regex_replace_all:nnN {[a-z]}{\c{mathit}\cB\{\0\cE\}}\l_tmpa_tl
   \regex_replace_all:nnN {[A-Z]}{\c{mathcal}\cB\{\0\cE\}}\l_tmpa_tl
   \l_tmpa_tl
 }
\ExplSyntaxOff

ExplSyntaxOn...ExplSyntaxOff: change code régime where spaces are ignored and ":" and "_" are treated as letters. Necessary to access functions and variables. See interface3.pdf, p.7.
\NewDocumentCommand \cat {m} {...}: create a new document-level function. You shall first describe the name (\cat), the set of parameters ({m}, one parameter which is refered as #1 in the function code) and the function code itself ({...}). See xparse.pdf, p.7.
\tl_set:Nn \l_tmpa_tl {#1}: \tl_set:Nn set the value of some variable (or "token list", hence tl). N and n define the usual type of parameters for this function, N standing for a single token and n for a list of tokens between brackets. \l_tmpa_tl is a standard name of local temporary assigned token list.
the first argument of \regex_replace_all:nnN is a list of tokens to be replaced, e.g. {[a-z]}. The second is what to do with these tokens, e.g. {\c{mathit}\cB\{\0\cE\}}. \c{mathit} is the equivalent of \mathit (for some reason, you can't just write \mathit). \0 stands for everything that as been selected through the first argument. For some reason, it should be surrounded by \cB\{...\cE\}. Finally, the third argument is the token list on which the function is applied.
the last \l_tpma_tl simply display the resulting token.

Try perhaps https://tex.stackexchange.com/questions/479/lowercase-mathcal — Benjamin McKay, Dec 27 '19 at 18:03
There is probably a way to get the same functionality with range from https://ctan.org/pkg/unicode-math. — CampanIgnis, Dec 27 '19 at 18:11
@BenjaminMcKay Thanks, but unfortunately I don't find any of these alternatives satisfying. On the other hand I have seen an article (https://arxiv.org/abs/1701.04133) using what I want to implement (i.e. mathcal for upper case and mathit for lower case), and it looked quite fine — Léo S., Dec 27 '19 at 18:11
which engine are you using? And if you are using lualatex/xelatex do you use unicode-math? — Ulrike Fischer, Dec 27 '19 at 18:16
@UlrikeFischer I'm not sure : I compile with pdfLatex, does it answer your question ? — Léo S., Dec 27 '19 at 18:19
"string" is not TeX-terminology. Please tell exactly what tokens can make up what you call a "string". E.g., what about curly braces and expandable tokens? — Ulrich Diez, Dec 27 '19 at 18:45
@UlrichDiez I meant a-Z characters. Note that Ulrike Fischer answered my question. Any help on understanding the source code (or simply good refs to do so) would still be welcome though! — Léo S., Dec 27 '19 at 19:09

score 3 · Answer 1 · answered Dec 27 '19 at 18:42

3

Assuming that you only have a-Z in the argument:

\documentclass{article}

\usepackage{xparse}
\ExplSyntaxOn
\NewDocumentCommand\mycal{m}
 {
   \tl_set:Nn\l_tmpa_tl {#1}
   \regex_replace_all:nnN {[a-z]}{\c{mathit}\cB\{\0\cE\}}\l_tmpa_tl
   \regex_replace_all:nnN {[A-Z]}{\c{mathcal}\cB\{\0\cE\}}\l_tmpa_tl
   \l_tmpa_tl
 }
\ExplSyntaxOff
\begin{document}
$\mycal{abcABC}$
\end{document}

answered Dec 27 '19 at 18:42

Ulrike Fischer

327,261

Thanks a lot! This is exactly the effect I needed. Any suggestion on where I could look to understand the source code ? – Léo S. Dec 27 '19 at 18:53
the regex code is described in interface3.pdf. You get it with texdoc interface3. – Ulrike Fischer Dec 27 '19 at 19:20

score 1 · Accepted Answer · answered Dec 31 '19 at 14:20

Ulrike’s answer is fine, but we can improve it by removing unwanted kerns.

\documentclass{article}
\usepackage{amsmath}
\usepackage{xparse}

\AtBeginDocument{$\mathcal{\global\chardef\calfam=\fam}$}

\ExplSyntaxOn

\cs_new_protected:Nn \__trynopsis_cat_neg:n
 {
  \mathchoice
   { \__trynopsis_cat_neg_aux:Nn \textfont { #1 } }
   { \__trynopsis_cat_neg_aux:Nn \textfont { #1 } }
   { \__trynopsis_cat_neg_aux:Nn \scriptfont { #1 } }
   { \__trynopsis_cat_neg_aux:Nn \scriptscriptfont { #1 } }
 }
\cs_new_protected:Nn \__trynopsis_cat_neg_aux:Nn
 {
  \kern -\fontcharic#1\calfam`#2
 }

\NewDocumentCommand{\cat}{m}
 {
  \trynopsis_cat:n { #1 }
 }

\tl_new:N \l__trynopsis_cat_tl

\cs_new_protected:Nn \trynopsis_cat:n
 {
  \tl_set:Nn \l__trynopsis_cat_tl { #1 }
  \regex_replace_all:nnN
   { ([A-Z]) ([a-z]) } % search uppercase letter followed by a lowercase letter
   { \1 \c{__trynopsis_cat_neg:n} \1 \2 } % add a negative space in between
   \l__trynopsis_cat_tl
  \regex_replace_all:nnN
   { [A-Z]+ } % search any run of uppercase letters
   { \c{mathcal}\cB\{\0\cE\} } % enclose it in \mathcal
   \l__trynopsis_cat_tl
  \regex_replace_all:nnN
   { [a-z]+ } % search any run of lowercase letters
   { \c{mathit}\cB\{\0\cE\} } % enclose it in \mathit
   \l__trynopsis_cat_tl
  \regex_replace_all:nnN
   { \c{__trynopsis_cat_neg:n} \c{mathcal} }
   { \c{__trynopsis_cat_neg:n} }
   \l__trynopsis_cat_tl
  % deliver the new token list
  \tl_use:N \l__trynopsis_cat_tl
 }

\cs_new_protected:Nn \__trynopsis_cat_neg:
 {
  \mspace{-2mu}
 }

\ExplSyntaxOff

\begin{document}

$\cat{Set}\quad\cat{Grp}\quad\cat{Abc\_def}$

$\scriptstyle\cat{Set}\quad\cat{Grp}\quad\cat{Abc\_def}$

\end{document}

The fourth replacement transforms the unwanted \__trynopsis_cat_neg:n \mathcal {<letter>} into \__trynopsis_cat_neg:n {<letter>}.

The purpose of \__trynopsis_cat_neg:n is to remove the excess space when an uppercase (calligraphic) letter is followed by a lowercase (italic) one.

thanks! that indeed looks better, especially with the Vec category. — Léo S., Jan 02 '20 at 13:39

Macro changing a string depending on letter case

2 Answers2