16

I am trying for about a week to launch unicode-math package with XeLaTeX and got nothing. PDF is created well but all Cyrillic symbols in math mode are skipped. I tried using different math fonts - no progress at all. In log file, I found low-level error:

Missing character: There is no [cyrillic letter from input] in font cmmi12!

But all fonts used in document are Unicode ones.

Here is a file I want to be processed well (of course it is UTF-8).

\documentclass[12pt]{book}
\usepackage{polyglossia}
\setdefaultlanguage[spelling=modern]{russian}
\setotherlanguage{english}
\defaultfontfeatures{Ligatures={TeX}}
\setmainfont{CMU Serif}
\setsansfont{CMU Sans Serif}
\setmonofont{CMU Typewriter Text}  

\usepackage{amsmath, amssymb}
\usepackage[russian]{hyperref}

\usepackage{unicode-math}
\setmathfont{Latin Modern Math}

\frenchspacing

\begin{document}
Просто буквы % Plain letters
$$Память: M_{доп}(n) = \Theta(N)$$ % Memory: M_add(n) = \Theta(n)
\end{document}

Looking for your help.

Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036
  • 3
    What makes you think there are glyphs in Latin Modern Math for Cyrillic chars? – Joseph Wright Sep 15 '14 at 11:34
  • @JosephWright Actually i am trying fonts that are mentioned in unicode-math doc as Unicode-compatible ones. I also tried xits-math and don't succeed as well. – Lapshin Dmitry Sep 15 '14 at 11:43
  • 2
    I think you slightly miss my point. Unicode fonts don't have every Unicode glyph in them, they have a subset but in predictable slots. In the case of a math mode font, this means defined 'mathematical meaning' glyphs. That I know of there are not such math mode slots for Cyrillic letters: one is expected to use Roman chars for maths. As such, it's entirely unsurprising that you don't get all of the text appearing in your output. – Joseph Wright Sep 15 '14 at 11:46
  • The following is quoted from The Comprehensive LᴬTᴇX Symbol List: For example, the symbol for an impulse train or Tate-Shafarevich group (“Ш”) is actually an uppercase sha in the Cyrillic alphabet. (Cyrillic is supported by the OT2 font encoding, for instance). While a sha can be defined numerically as “{\fontencoding{OT2}\selectfont\char88}” it may be more intuitive to use the OT2 font encoding’s “SH” ligature: “{\fontencoding{OT2}\selectfont SH}”. – Symbol 1 Sep 15 '14 at 11:55
  • @JosephWright Oh, now i see now. So, as I understood, these math-fonts are Unicode-based fonts without most of Unicode symbols, including cyrillic letters; so I do agree that they shouldn't be output in sample above. But if i want cyrillic and other letters in mathmode, could you help? – Lapshin Dmitry Sep 15 '14 at 11:56
  • @Symbol1 This may work, but that's too bad. \text is shorter and easier to use. I want just to place cyrillic (and others) symbols and I want them to be typeset properly. – Lapshin Dmitry Sep 15 '14 at 12:02
  • The English equivalent would be \[\text{Memory: }M_{\textup{Dol}}(n)=\Theta(N)\] Without \text the typesetting would be wrong. Your Память is not math too. – egreg Sep 15 '14 at 12:19
  • @Symbol1 Ow. As I see, this is actuall thing I need to do. – Lapshin Dmitry Sep 15 '14 at 12:20
  • @egreg Yes, i know. That is just an example. Actually, I am interested in cyrillic letters for indexes and etalons in physics (meters, Tesla and others). – Lapshin Dmitry Sep 15 '14 at 12:23
  • @LapshinDmitry Textual subscripts should be in \textup anyway. – egreg Sep 15 '14 at 12:28
  • 4
    Now that this question is resolved, could you adjust the title of this question to be more specific please? It won't archive well in its current form. – Will Robertson Sep 15 '14 at 12:53
  • 1
    @WillRobertson changed. Actually, now I took in the problem and understand, why it should be called like this. – Lapshin Dmitry Sep 15 '14 at 16:22

3 Answers3

20

This is not a problem of cyrillic math characters; if the text were English, the correct input would be

Letters only
\[
\text{Memory: } M_{\textup{add}}(n) = \Theta(N)
\]

because Память and доп are not math. The difference becomes clear when comparing this with the output of

\[
Memory: M_{add}(n) = \Theta(N)
\]

enter image description here

The bottom formula is clearly wrong. Textual subscripts are not math variables, so they should be typeset in the normal text font (upright), thus either \textnormal or \textup (the latter is shorter). Of course, you can define your own command for them.

Here's the complete example:

\documentclass[12pt]{book}

\usepackage{amsmath,amssymb} \usepackage{unicode-math} \usepackage{polyglossia} \setdefaultlanguage[spelling=modern]{russian} \setotherlanguage{english}

\setmainfont{CMU Serif} \setsansfont{CMU Sans Serif} \setmonofont{CMU Typewriter Text}

\usepackage{color} \usepackage{minted} \usepackage[russian]{hyperref}

\setmathfont{Latin Modern Math}

\frenchspacing

\begin{document}

Просто буквы % Plain letters [ \text{Память: } M_{\textnormal{доп}}(n) = \Theta(N) ] \end{document}

enter image description here

It would be a different problem if you wanted to use a cyrillic letter as a math variable, but your case is not this one.

If you need cyrillic letters as math variables, here's a way to set them up:

\documentclass[12pt]{book}

\usepackage{amsmath,amssymb} \usepackage{unicode-math} \usepackage{polyglossia} \setdefaultlanguage[spelling=modern]{russian} \setotherlanguage{english}

\setmainfont{CMU Serif} \setsansfont{CMU Sans Serif} \setmonofont{CMU Typewriter Text}

\usepackage[russian]{hyperref}

\setmathfont{Latin Modern Math}

\DeclareSymbolFont{cyrletters}{\encodingdefault}{\familydefault}{m}{it} \newcommand{\makecyrmathletter}[1]{% \begingroup\lccodea=#1\lowercase{\endgroup \Umathcodea}="0 \csname symcyrletters\endcsname\space #1 } \count255="409 \loop\ifnum\count255<"44F \advance\count255 by 1 \makecyrmathletter{\count255} \repeat

\begin{document} [ (д+ф)^{2}=д^{2}+2дф+ф^{2} ] \end{document}

enter image description here

What does \makecyrmathletter do? Let's review it. The idea is that it takes as argument an integer and performs some magic. We use it in the following loop where the first cycle is

\makecyrmathletter{\count255}

with \count255 having the value "410 (hexadecimal), which corresponds to А U+0410 CYRILLIC CAPITAL LETTER A.

In order to understand the code, I'll assume the explicit value is passed. The first level expansion is then

\begingroup\lccode`a="410\lowercase{\endgroup
\Umathcode`a}="0 \csname symcyrletters\endcsname\space "410

The strange \begingroup construction is used to obtain the letter from the number: we can loop through numbers, not letters. So inside the group, the \lccode of the letter a (the backtick notation is called “alphabetic constant”) to "410. With this setting, \lowercase will scan through its argument, changing every character token into its “lowercase” counterpart, but it actually uses the \lccode table. Then the result will be delivered to be scanned again. Hence we obtain

\endgroup\Umathcode`А="0 \csname symcyrletters\endcsname\space "410

(only the a is changed, control sequences pass through \lowercase with no change). The \endgroup does its job, namely to revert the change of \lccode`a to what it was before, and vanishes.

Then the \Umathcode assignment is performed. It assigns А a math code, that is a new interpretation when found in math mode. The = should be followed by three numbers. The first one states the type of the object; 0 means an ordinary symbol; the second one tells XeTeX from what font family to take it. \csname symcyrletters\endcsname produces the number that has been assigned with the previous \DeclareSymbolFont declaration. Using a symbolic name we don't need to know what number is actually assigned. The third number tells XeTeX what slot the character should be taken from and we obviously choose "410, so a Cyrillic А. The three numbers should be separated by a space, which is explicit in the first case; we need \space in the second case, because leaving a blank space would not work. Since expansion is performed when looking for numbers, this \space is transformed in an actual space token.

A simpler loop can be used with expl3:

\ExplSyntaxOn

\int_step_inline:nnn { "410 } { "44F } { \Umathcode #1 = "0 ~ \use:c{ symcyrletters } ~ #1 }

\ExplSyntaxOff

egreg
  • 1,121,712
  • 1
    I would use M_{\mathrm{add}}(n) instead of \textup to avoid that the index can change along with the surrounding textfont (e.g. if this is in bold). For the "Memory:" it depends on the meaning. – Ulrike Fischer Sep 15 '14 at 13:05
  • @UlrikeFischer The choice between \mathrm and \textup is subjective; the safest is, in my opinion, \textnormal, but having a formula in a boldface context should be quite rare (that is, never). Using a personal command reliefs from the burden. The word “Memory” instead should inherit the font from the context. – egreg Sep 15 '14 at 13:10
  • Good solution, but not for my problem --- i really need cyrillic as part of math. Still, great thanks! – Lapshin Dmitry Sep 15 '14 at 16:16
  • @LapshinDmitry I've added the method for defining the letters as math (italic) variables. If you don't want italics, change the {it} into {n}. – egreg Sep 15 '14 at 17:15
  • @egreg Oh, that's nice to have too! That is even a little bit closer to my needs. – Lapshin Dmitry Sep 15 '14 at 17:47
  • 3
    @LapshinDmitry If you don't say what are your needs, it's just guessing. – egreg Sep 15 '14 at 17:48
  • The code is not readable. I don't understand it, so I can't use it. – facetus Oct 04 '20 at 00:40
  • @facetus What's the unreadable part? The idea in the last code is to loop over the Cyrillic block and assign a suitable math code to each letter. – egreg Oct 04 '20 at 08:42
  • The last piece of the code (the loop) is actually quite transparent. The first part I cannot comprehend even after thorough search of various documentation. What a backtick is doing? What is lccode and why it is needed? How is that an open curly brace is between begingroup and endgroup? Why you are lower casing \Umathcode? I don't understand what this code is doing and how. I am a professional SW engineer with 30 years of experience and I couldn't decipher it. – facetus Oct 04 '20 at 22:45
  • @facetus I added the explanation. Feel free to ask again if something is not clear. TeX is quite arcane in some of its parts. – egreg Oct 04 '20 at 23:08
  • Thank you for an explanation! So a is used as just a placeholder to convert it to #1? Why just \Umathcode #1 = won't work? – facetus Oct 04 '20 at 23:31
  • @facetus Good question! Actually it would, but probably I first wrote different code that required the letter and simply changed a part. – egreg Oct 05 '20 at 07:28
  • Aha! Now I understand. :) Thank you! – facetus Oct 06 '20 at 08:09
11

There has been some discussion about Cyrillic in math but the issue is still open: https://github.com/wspr/unicode-math/issues/29

As Joseph mentioned you need fonts with the glyph. In case of math (if you don't switch to a text font with \text{..}) you also need to set mathcodes. E.g.;

\documentclass[12pt]{book}
\usepackage{unicode-math}
\setmainfont{Arial Unicode MS} %for text
\setmathfont[]{xits-math.otf}
\ExplSyntaxOn
\makeatletter
\newcommand\addmathletter[1]{%
  \Umathcode  #1="\mathchar@type\mathalpha \csname sym\um_symfont_tl\endcsname #1\relax
}
\int_step_inline:nnnn {1024}{1}{1154}{
  \addmathletter{#1}
}
\ExplSyntaxOff

\begin{document}
Просто буквы % Plain letters
$$Память: M_{доп}(n) = \Theta(N)$$ % Memory: M_add(n) = \Theta(n)
\end{document}

enter image description here

(The \Umathcode line is copied and adapted from an old message in the xetex list and I don't what the quoting sign at the begin is doing there ...)

Ulrike Fischer
  • 327,261
  • 1
    The " is the hexadecimal prefix, as usual. – egreg Sep 15 '14 at 12:34
  • @egreg: Yes I suspected this. But is seems not to make a difference if it is there or not. OK. Looking at it \mathchar@type\mathalpha is 7 and so "7=7. – Ulrike Fischer Sep 15 '14 at 12:50
  • Since math types are from 0 to 7 it really doesn't make any difference. – egreg Sep 15 '14 at 12:53
  • Exactly the result I wanted. Thanks a lot. I know that non-math text shouldn't be at math-mode, but I need cyrillic, and I got it. – Lapshin Dmitry Sep 15 '14 at 16:16
  • Hmm, a bit late to this answer but I ran into needing this today as well, and the above code with xelatex 0.999992 throws an error on the } after \relax with the log file saying "End of file on the terminal!". Any idea how to fix that? – Mike 'Pomax' Kamermans Jan 10 '21 at 21:02
3

Just for fun - in the general case:

general algebra

MWE

\documentclass[12pt]{article}

\usepackage{amsmath,amssymb}
\usepackage{unicode-math}

\setmainfont{Liberation Serif}

\setmathfont{XITS Math}



%See: https://tex.stackexchange.com/questions/201239/cant-get-unicode-symbols-in-math-mode

\DeclareSymbolFont{cyrletters}{\encodingdefault}{\familydefault}{m}{it}
\newcommand{\makecyrmathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symcyrletters\endcsname\space #1
}
\count255="409
\loop\ifnum\count255<"44F
  \advance\count255 by 1
  \makecyrmathletter{\count255}
\repeat

%-----------
\setmainfont{Noto Serif Armenian}
%\familydefault = \rmdefault
\DeclareSymbolFont{armletters}{\encodingdefault}{NotoSerifArmenian(0)}{m}{n}
\newcommand{\makearmmathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symarmletters\endcsname\space #1
}
\count255="530
\loop\ifnum\count255<"587
  \advance\count255 by 1
  \makearmmathletter{\count255}
\repeat



%-----------
\setmainfont{Noto Serif Georgian}
\DeclareSymbolFont{geoletters}{\encodingdefault}{NotoSerifGeorgian(0)}{m}{n}
\newcommand{\makegeomathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symgeoletters\endcsname\space #1
}
\count255="109F
\loop\ifnum\count255<"10FA
  \advance\count255 by 1
  \makegeomathletter{\count255}
\repeat



%-----------
\setmainfont{Noto Serif Lao}
\DeclareSymbolFont{laoletters}{\encodingdefault}{NotoSerifLao(0)}{m}{n}
\newcommand{\makelaomathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symlaoletters\endcsname\space #1
}
\count255="0E80
\loop\ifnum\count255<"0EDF
  \advance\count255 by 1
  \makelaomathletter{\count255}
\repeat



%-----------
\setmainfont{Noto Sans Egyptian Hieroglyphs}
\DeclareSymbolFont{egyletters}{\encodingdefault}{NotoSansEgyptianHieroglyphs(0)}{m}{n}
\newcommand{\makeegymathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symegyletters\endcsname\space #1
}
%\count255="13000
%\loop\ifnum\count255<"1342E %too many?
%  \advance\count255 by 1
%  \makeegymathletter{\count255}
%\repeat
% has 1000 glyphs
\makeegymathletter{"13000}
\makeegymathletter{"13068}
\makeegymathletter{"1307B}
\makeegymathletter{"130D8}
\makeegymathletter{"131C1}



%-----------
\setmainfont{Noto Serif}


\begin{document}

Cyrillic:

\[
(д+ф)^{2}=д^{2}+2дф+ф^{2}м
\]

Armenian:

\[
(է+թ)^{2}=գ^{2}+2ե+ճդ^2-ա
\]

Georgian:

\[
(დ+ლ)^{2}=შ^{2}+2ლ+დშ^2-ა
\]

Lao:

\[
(ມ+ວ)^{2}=ມ^{2}+2ນ+ສວ^2-ກ
\]

Egyptian Hieroglyphs:

\[
(+)^{2}=^{2}+2+^2-
\]

\end{document}

And, of course, symbols can be combined:

for fun two

Here is a way using fontspec package's NFSSFamily= font option to refer to the font families, rather than relying on the internal auto-generated name. And the other font options, such as colour and scale, are also available.

MWE

\documentclass[12pt]{article}
\usepackage{xcolor}
\usepackage{amsmath,amssymb}
\usepackage{unicode-math}

\setmainfont{Liberation Serif}

\setmathfont{XITS Math}

\newfontfamily\fcyr{Noto Serif}[Colour=blue,NFSSFamily=mycyr]
\newfontfamily\farm{Noto Serif Armenian}[Colour=red,NFSSFamily=myarm]
\newfontfamily\fgeo{Noto Serif Georgian ExtraBold}[Colour=brown,NFSSFamily=mygeo]
\newfontfamily\flao{Noto Serif Lao}[Scale=1.2,NFSSFamily=mylao]
\newfontfamily\fegy{Noto Sans Egyptian Hieroglyphs}[Colour=blue,Scale=1.1,NFSSFamily=myegy]

%See: https://tex.stackexchange.com/questions/201239/cant-get-unicode-symbols-in-math-mode

\DeclareSymbolFont{cyrletters}{\encodingdefault}{mycyr}{m}{it}
\newcommand{\makecyrmathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symcyrletters\endcsname\space #1
}
\count255="409
\loop\ifnum\count255<"44F
  \advance\count255 by 1
  \makecyrmathletter{\count255}
\repeat

%-----------
\DeclareSymbolFont{armletters}{\encodingdefault}{myarm}{m}{n}
\newcommand{\makearmmathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symarmletters\endcsname\space #1
}
\count255="530
\loop\ifnum\count255<"587
  \advance\count255 by 1
  \makearmmathletter{\count255}
\repeat



%-----------
\DeclareSymbolFont{geoletters}{\encodingdefault}{mygeo}{m}{n}
\newcommand{\makegeomathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symgeoletters\endcsname\space #1
}
\count255="109F
\loop\ifnum\count255<"10FA
  \advance\count255 by 1
  \makegeomathletter{\count255}
\repeat



%-----------
\DeclareSymbolFont{laoletters}{\encodingdefault}{mylao}{m}{n}
\newcommand{\makelaomathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symlaoletters\endcsname\space #1
}
\count255="0E80
\loop\ifnum\count255<"0EDF
  \advance\count255 by 1
  \makelaomathletter{\count255}
\repeat



%-----------
\DeclareSymbolFont{egyletters}{\encodingdefault}{myegy}{m}{n}
\newcommand{\makeegymathletter}[1]{%
  \begingroup\lccode`a=#1\lowercase{\endgroup
  \Umathcode`a}="0 \csname symegyletters\endcsname\space #1
}
%\count255="13000
%\loop\ifnum\count255<"1342E %too many?
%  \advance\count255 by 1
%  \makeegymathletter{\count255}
%\repeat
% has 1000 glyphs
\makeegymathletter{"13000}
\makeegymathletter{"13068}
\makeegymathletter{"1307B}
\makeegymathletter{"130D8}
\makeegymathletter{"131C1}



%-----------


\begin{document}

Cyrillic:

\[
(д+ф)^{2}=д^{2}+2дф+ф^{2}м
\]

Armenian:

\[
(է+թ)^{2}=գ^{2}+2ե+ճդ^2-ա
\]

Georgian:

\[
(დ+ლ)^{2}=შ^{2}+2ლ+დშ^2-ა
\]

Lao:

\[
(ມ+ວ)^{2}=ມ^{2}+2ນ+ສວ^2-ກ
\]

Egyptian Hieroglyphs:

\[
(+)^{2}=^{2}+2+^2-
\]

Combined:

\[
(_{ມ^უ}+)^{2}=^{2}+\frac{2}{է}+ф(Զ^2)-Ⴔ
\]

\end{document}
Cicada
  • 10,129