3

I am currently using \StrLen{#1} inside my \newcommand. This works flawlessly for any common string written in latin alphabet.

"Hello" has string length of 5 for example. Problem is with chinese characters. String length of "容容" is 8 which is technically corrent, but I wasn't able to find multibyte alternative to StrLen which would return 2.

Note: I am using pdflatex.

Regards, Jan

David Carlisle
  • 757,742

2 Answers2

8

You can count the utf-8 start bytes so for example

enter image description here

\documentclass{article}
\usepackage[utf8]{inputenc}

\makeatletter
\def\zz#1{\zzz0#1\relax}
\def\zzz#1#2{%
\ifx\relax#2 \the\numexpr#1\relax
\else
\expandafter\zzz\expandafter{%
  \the\numexpr(#1+\ifnum\expandafter`\string#2<"80 1\else \ifnum\expandafter`\string#2>"BF 1 \else 0 \fi\fi
  \expandafter)\expandafter\relax\expandafter}%
\fi}
\begin{document}

\zz{容容}

\zz{abc}

\zz{¢Àïα}

\end{document}
David Carlisle
  • 757,742
1

Just for the sake of variety, here's a LuaLaTeX-based solution.

enter image description here

\documentclass{article}
\newcommand\zz[1]{\directlua{tex.sprint(utf8.len("#1"))}}
\begin{document}
\zz{Hello}, \zz{容容}, \zz{¢Àïα}
\end{document} 

If your TeX distribution is quite old (say, at least 4 years old as of late-2020), simply replace utf8.len with unicode.utf8.len to get the code to run.

Mico
  • 506,678