I would like to have an expandable macro that extracts the first (and sometimes the second) character of UTF-8/Cyrillic text strings without using additional packages. No simple solutions from TeX or LaTeX work with UTF-8/Cyrillic strings.
I give below an example of a working macro, which is partially taken from Get the first and second character of a macro argument :
\documentclass{article}
\usepackage[T2A]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[russian]{babel}
\makeatletter
\newcommand{\firstof}[1]{@car#1@nil}
\makeatother
\begin{document}
\firstof{Vladimir}
\end{document}
Unfortunately, this example fails with the error Error: Invalid UTF-8 byte sequence (Ð\par) using Cyrillic strings like \firstof{Владимир}.
I roughly understand that by default TeX is not adapted to manipulating strings with multibyte characters, but this problem is solved in some packages. However, I do not want to use other packages for such a simple problem (as it seems at first glance) and I will be grateful to the community for help and tips.
Ideally, I would like to have an expandable macro like \newcommand{\firstof}[2][1]{.....}, which by default for UTF-8/Cyrillic strings returns the value of the first character, for example, in the case of \firstof{Владимир} returns В, and for \firstof[2]{Владимир} returns Вл, and these chars could be used in /ifx to compare with others and written to a file using \write.



char_to_utfviii_bytes:nfrom expl3. You can learn expl3, right? • "simple problem" remember that TeX's built in functionality is very limited. and without expl3 you need to do massive data juggling to get useful computation. • It looks like you're not using LuaTeX, want to give it a try? Programming in Lua is much simpler. – user202729 Jan 28 '22 at 22:37Влwith an\ifx(and get the result you expect) because each of those characters is two tokens (thus four tokens in total) and\ifxonly compares two tokens at a time. You could compare with\pdfstrcmpthough – Phelype Oleinik Jan 29 '22 at 01:49xstring. But unfortunately its commands are not expandable and this then causes a lot of problems when concatenating author name strings in loops and then writing them to a file. – Crosfield Jan 29 '22 at 09:15