5

ucharclasses and Greek

I find that the package ucharclasses cannot correctly identify all the Unicode Greek.

e.g. if I use

\usepackage{fontspec} \usepackage[Greek]{ucharclasses} \setTransitionsForGreek{\fontspec{Palatino Linotype}}{\fontfamily{lmodern}\selectfont}

and then in the document entered

Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας

The output text would be

Πέτρος ἀ Ἰῦ Χριστοῦ ἐῖ παρεπιδήμοις διασπορᾶ Πόντου, Γαλατίας, Καππαδοκίας, Ἀ καὶ Βιθυνίας

i.e. some missing characters.

Even more weird, from the documentation I found out the Greek actually called Coptic, GreekAndCoptic and GreekExtended, but when I called them out individually, more characters will be rendered (but still missing some).

I tried not using ucharclasses but use \fontspec{Palatino Linotype} directly on those text and it works perfectly.

Any idea why is it? Thanks!

Edit 1

Thanks. I used font family, and I also did a test to show what I mean:

\usepackage{lmodern} \usepackage{fontspec} \usepackage[Greek]{ucharclasses} \newfontfamily\mylatin{Palatino Linotype} \newfontfamily\mygreek{Palatino Linotype} \setTransitionsForGreek{@EnterGreek@ \mygreek}{\mylatin @ExitGreek@}

The output will be this:

@EnterGreek@Πέτρος@ExitGreek@@EnterGreek@ἀ@ExitGreek@πόστολος@ExitGreek@ @EnterGreek@ Ἰ@ExitGreek@ησο@EnterGreek@ ῦ@ExitGreek@ @EnterGreek@ Χριστο@EnterGreek@ ῦ@ExitGreek@ @EnterGreek@ ἐ@ExitGreek@κλεκτο@EnterGreek@ ῖ@ExitGreek@ς@ExitGreek@ @EnterGreek@ παρεπιδήμοις@ExitGreek@ @EnterGreek@ διασπορ@EnterGreek@ ᾶ@ExitGreek@ς@ExitGreek@ @EnterGreek@ Πόντου, @EnterGreek@ Γαλατίας, @EnterGreek@ Καππαδοκίας, @En- terGreek@ Ἀ@ExitGreek@σίας@ExitGreek@ @EnterGreek@ κα@EnterGreek@ ὶ@ExitGreek@ @EnterGreek@ Βιθυνίας@ExitGreek@

Comaring the input:

Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας

Focusing on the 2nd word ἀπόστολος for example, it "exits Greek" after and treated πόστολος as Latin.

So I actually think it is a bug in ucharclasses.

Edit 2

I'm not sure if it is related to the limitation, rather than a bug, of ucharclasses. From the documentation:

However, be aware that this only “just works” for Unicode blocks. If you are working with typographically overlapping languages, such as combining English and Vietnamese in one document, things get a lot more complex if you want one font for English and another for Vietnamese. Both of these languagese use Latin blocks, so it is inherently impossible to tell which language is intended based on which Unicode block a character in a word belongs to.

However it didn't mention anything about Greek. Any idea?

Edit 3

MWE like this?

\documentclass[a4paper]{article} \usepackage{fontspec} \usepackage[Greek]{ucharclasses} \newfontfamily\mygreek{Palatino Linotype} \setTransitionsForGreek{@Begin@\mygreek}{\fontfamily{lmodern}\selectfont @End@} \begin{document} Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας \end{document} The tags @Begin@ and @End@ are to show the bug I'm talking about.

Edit 4

Thanks for the answer that solve the problem. I didn't included Latin and that is the root of all the problem.

So the correct code looks like this:

The Right Way

\documentclass[a4paper]{article} \usepackage{fontspec} \usepackage[Latin, Greek]{ucharclasses} \newfontfamily\mygreek{CMU Serif} \setDefaultTransitions{\fontfamily{lmodern}\selectfont}{} \setTransitionsForGreek{\mygreek}{} \begin{document} \fontfamily{lmodern}\selectfont \section{Testing Greek Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας} \begin{itemize} \item Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας \end{itemize} testing \end{document}

But I got another problem: the Latin in the section title, Testing Greek, is not bold.

If I use my old code, that gives the bug of not producing the right Greek:

The Old Way

\documentclass[a4paper]{article} \usepackage{fontspec} \usepackage[Greek]{ucharclasses} \newfontfamily\mygreek{CMU Serif} % \setDefaultTransitions{\fontfamily{lmodern}\selectfont}{} \setTransitionsForGreek{\mygreek}{\fontfamily{lmodern}\selectfont} \begin{document} \fontfamily{lmodern}\selectfont \section{Testing Greek Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας} \begin{itemize} \item Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας \end{itemize} testing \end{document}

Then although some Greek are missing, the Latin in the section title, Testing Greek, is now bold.

The Control

A Control test like this: \documentclass[a4paper]{article} % \usepackage{fontspec} % \usepackage[Greek]{ucharclasses} % \newfontfamily\mygreek{CMU Serif} % \setDefaultTransitions{\fontfamily{lmodern}\selectfont}{} % \setTransitionsForGreek{\mygreek}{\fontfamily{lmodern}\selectfont} \begin{document} \fontfamily{lmodern}\selectfont \section{Testing Greek }% Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας} \begin{itemize} \item testing %Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας \end{itemize} testing \end{document}

would gives the correct bold section title.

I also notice other problems in "the Right Way": since when it leaves the Greek Unicode block, it doesn't change back to the default font, then the next item bullet would be in Greek font, etc. i.e. the bullets would have different fonts. In "the Old Way" when I use \setTransitionsForGreek{\mygreek}{\fontfamily{lmodern}\selectfont}, since it always ends with \fontfamily{lmodern}\selectfont, the problem wasn't there.

I would say all these originates from the mystery that the Greek blocks ends somewhere between words, which I still regard it as a bug (See the test in the 1st edit). I have a guess on the possible reason of this mystery. Referring to my original question:

Greek actually called Coptic, GreekAndCoptic and GreekExtended...

May be the breaks are between the Coptic, GreekAndCoptic and GreekExtended.

Edit 5

I think I kind of find a solution although I don't understand why. If I use "the Right Way", without adding Latin in ucharclasses, the Latin in the section title would be bold as usual.

Kolen Cheung
  • 175
  • 1
  • 1
  • 12
  • Welcome to TeX.SX! You can have a look at our starter guide to familiarize yourself further with our format. – karlkoeller Mar 03 '15 at 08:41
  • You might wan to add a real MWE that we can compile and investigate further. – moewe Mar 03 '15 at 08:52
  • You shouldn't use \fontspec to begin with, but rather defining a font family in the preamble. The manual should be fixed about that. However, ucharclasses should only be used if just single words that don't need hyphenation are used. In your case, I'd use polyglossia and mark the transitions with the suitable commands or environments. – egreg Mar 03 '15 at 09:13
  • Thanks. I did a test and found a better way to present the problem I was talking about. I edited the main text and please see it there (since it is too long for I to type it here in the comment). – Kolen Cheung Mar 05 '15 at 02:36
  • Did you try egreg's suggestion and use polyglossia instead? Note that you still haven't provided an MWE so people can't reproduce. – cfr Mar 05 '15 at 04:00
  • I want to avoid polyglossia for the moment. What I want to achieve is to freely mixing English (main), Greek, Hebrew, Traditional Chinese without specifying what I am typing. So far ucharclasses are working well except for the only troublemaker is Greek.\fontfamily{cm-unicode}\selectfont – Kolen Cheung Mar 05 '15 at 04:12
  • I have a quick question. How to select cm-unicode as the default font? I am not very familiar with fontspec. It is confusing to me that for the other fonts e.g. \setmainfont{CMU Serif} I use setmainfont but for "internal" font like lmodern I need to use \fontfamily{lmodern}\selectfont. I then tried \fontfamily{cm-unicode}\selectfont but it doesn't seem to work. – Kolen Cheung Mar 05 '15 at 04:16
  • Is this MWE?

    \documentclass[a4paper]{article}

    \usepackage{fontspec}

    \usepackage[Greek]{ucharclasses}

    \newfontfamily\mygreek{Palatino Linotype}

    \setTransitionsForGreek{@Begin@\mygreek}{\fontfamily{lmodern}\selectfont @End@}

    \begin{document}

    Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας

    \end{document}

    – Kolen Cheung Mar 05 '15 at 04:23
  • As far as I can see, it works with the following code: \newfontfamily\mygreek{Palatino Linotype} \setTransitionsForGreek{\mygreek}{} – Ludenticus Mar 05 '15 at 05:11
  • In case you want to use CMU Serif or Concrete, download and install the (0.7 version) font from sourceforge: http://sourceforge.net/projects/cm-unicode/files/ – Ludenticus Mar 05 '15 at 05:14
  • Thanks. I see that the cm-unicode is included in TeXLive, so shouldn't I be able to use it with something like \fontfamily{cm-unicode}\selectfont? – Kolen Cheung Mar 05 '15 at 06:04
  • Quick summary: 1. Make sure the font you selsct contains all the glyphs and faces you use (the .log file should tell you if it’s missing any), 2. With ucharclasses, make sure you’re loading all blocks containing Greek characters, not just “Greek and Coptic” (U+370–U+03FF), which supports only monotonic modern Greek and Coptic. – Davislor May 07 '19 at 19:13

3 Answers3

3

Try the following code:

\documentclass[12pt,a4paper]{article}
\usepackage[Latin,Greek]{ucharclasses}
\usepackage{fontspec}
\newfontfamily\mynormal{Palatino Linotype}
\setDefaultTransitions{\mynormal}{}
\newfontfamily\mygreek{Junicode} 
\setTransitionsForGreek{\mygreek}{}
\usepackage{lipsum}
\begin{document}
    \lipsum[1]
    Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας
\end{document}

ucharclass

Ludenticus
  • 1,910
  • Thanks. I am so stupid and spent almost 10 hours searching this and reading the documentations, etc. I can't believe how simple the solution is. The main mistake I made is not to include Latin.

    I then used the "tags" to test it again, and it looks like:

    @Begin@Πέτρος@end@ @Begin@ἀ@end@@Begin@πόσ τολος@end...

    Although it still breaks ἀ@end@@Begin@πόσ τολος in a strange way, but it did have the correct output if the tags are omitted.

    Now, should I go back to the original questions to tidy things up or leave it as is? And are there anyway to mark the problem solved?

    Thanks again!

    – Kolen Cheung Mar 05 '15 at 06:03
  • I found another problem using this approach. I'm adding an edit 4 at the end of the question. Thanks. – Kolen Cheung Mar 05 '15 at 07:33
  • Turns out I find that if I omitted the Latin in ucharclasses would solve the problem of bold titles. But I don't understand why it's that. @Ludenticus – Kolen Cheung Mar 05 '15 at 08:05
  • I found out that if I omit Latin, it would leads to other problem. Now I'm in a dead end. @Ludenticus – Kolen Cheung Mar 05 '15 at 08:40
  • And I found out it is related to the fact that I used \setDefaultTransitions{\fontfamily{lmodern}\selectfont}{}. If I use something else like \setDefaultTransitions{CMU Serif}{}, the title would properly bold again. However, it causes me other problems again. Dealing with unicode in LaTeX is really troublesome. @Ludenticus – Kolen Cheung Mar 05 '15 at 08:53
  • Not every TeXlive fonts is available to XeLaTex; you need the otf/ttf files. On the other hand, it looks like you are dealing with several variables. I suggest that you try to isolate the problem, each one. Here you are dealing at least with three different issues: first, the fonts as such (how to properly load them); second, the font and documentclass features; third, how to load the ucharclasses package. It is very common —it has happened to me many times— that a «bug» was actually a mistake on my part. – Ludenticus Mar 05 '15 at 14:44
  • Have you tried, for instance with \setmainfont ? – Ludenticus Mar 05 '15 at 14:54
  • The solution I find is \renewcommand\rmdefault{lmr}\renewcommand\sfdefault{lmss}\renewcommand\ttdefault{lmtt} to reset the default. The differences between an internal LaTeX font and an OTF are confusing to me, and then I just learnt from you that not all TeXLive are available to XeLaTeX. And even worse I don't quite find a documentation/guide summarizing which case do what. It was quite frustrating. And I agree to your break down of 3 things. The 1st is what I just said, and I don't understand the 2nd, and about the 3rd, I think I understand ucharclasses including some of its bugs/limits. – Kolen Cheung Mar 05 '15 at 20:14
  • So, about the 1st: how to load fonts. In XeLaTeX can I use \fontfamily{lmodern}\selectfont? How about \fontfamily{cm-modern}\selectfont? How they affect rm/sf/tt (do they change all 3)? After another font is selected, what's the difference between using \fontfamily{lmodern}\selectfont and \renewcommand\rmdefault{lmr}\renewcommand\sfdefault{lmss}\renewcommand\ttdefaul‌​t{lmtt}? @Ludenticus – Kolen Cheung Mar 05 '15 at 20:18
0

Solution 1

Finally I find the answer from this post:

\documentclass{article} \usepackage{fontspec} \usepackage[Greek]{ucharclasses} \newcommand{\mylatin}{\renewcommand\rmdefault{lmr}\renewcommand\sfdefault{lmss}\renewcommand\ttdefault{lmtt}} \newfontfamily\mygreek{CMU Serif} % \setDefaultTransitions{\fontfamily{lmodern}\selectfont}{} \setTransitionsForGreek{\mygreek}{\mylatin} \begin{document} \section{Testing Greek Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας} \begin{itemize} \item Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας \end{itemize} testing \end{document}

I don't understand why \renewcommand\rmdefault{lmr}\renewcommand\sfdefault{lmss}\renewcommand\ttdefault{lmtt} work better than \fontfamily{lmodern}\selectfont, but it did.

And I find out \setDefaultTransitions is not needed if there's no other Unicode exists.

I also find out using Latin in \usepackage[latin, Greek]{ucharclasses} is also not neccessary, if in the 2nd arguement of \setTransitionsForGreek includes whatever I need to put in \setTransitionsForLatin. And I prefer this method since it makes sure whenever it leaves a certain unicode block it switch right back to the default, and things like item bullet will be using the same font.

Now it's almost perfect, but there's still problem handling the transition to Chinese and I don't understand why. May be I should start a new question later.

Lastly, after dozens of hours figuring out how to handle Unicode in LaTeX and still get many mysteries and quirks and still have unsolved problems(The breaking within Greek words, Chinese fonts transition, etc.)... So frustrating

Solution 2

I find that the origin of the problem is that I used newfontfamily, as suggested by some. I dig into the documentation of fontspec and it says that \fontspec{} command are for 1 time use. Now if I define newfontfamily, when I let the ucharclasses to choose it, it then change to that font forever. In order to go back, I orginally used \fontfamily{lmodern}\selectfont which somehow didn't work. The final resolution above is to use \renewcommand\rmdefault{lmr}\renewcommand\sfdefault{lmss}\renewcommand\ttdefault{lmtt}. However, the simplest way is to use fontspec at the very beginning and I don't have to combat with the "going back to default" process. This is what I mean:

\documentclass{article} \usepackage{fontspec} \usepackage[Latin, Greek]{ucharclasses} \setTransitionsForGreek{\fontspec{CMU Serif}}{} \begin{document} \section{Testing Greek Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας} \begin{itemize} \item Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας \end{itemize} testing \end{document}

Solution 3

All the problems orginates from the fact that I want to go back to the default lmodern, and that lmodern doesn't support Greek. I compared lmodern and cm-unicode and found that the differences are pretty minor. If I don't want to deal with the mess mentioned above, and don't mind using a slightly different font than lmodern, then I could just change the default to cm-unicode:

\documentclass{article} \usepackage{fontspec} \setmainfont{CMU Serif} \setsansfont{CMU Sans Serif} \setmonofont{CMU Typewriter Text} \begin{document} \section{Testing Greek Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας} \begin{itemize} \item Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας \end{itemize} testing \end{document}

Kolen Cheung
  • 175
  • 1
  • 1
  • 12
0

"Default" means 'minimum required to run the app'; it doesn't mean "the recommended setting". You choose what main font(s) you want to use as a default for your document. Look in the TexMF tree, under fonts. The Latin Modern families of fonts are available as OpenType fonts (under \fonts\opentype\public\lm\). Nowadays, also, Unicode system fonts are available, not just local Tex fonts. For example, Noto Serif contains Latin, Greek and Coptic, and Cyrillic, and that means you don't have to use transition packages to manage these scripts:

All Greek

(Greek and Coptic scripts go together as a pair, because of their large overlap.)

MWE

\documentclass[12pt]{article} 
\usepackage{fontspec}
\setmainfont{Noto Serif}
\newfontface\fgla{Fedorovsk Unicode TT}
\begin{document}
\section{Testing Greek} Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας
\begin{itemize}
\item Πέτρος ἀπόστολος Ἰησοῦ Χριστοῦ ἐκλεκτοῖς παρεπιδήμοις διασπορᾶς Πόντου, Γαλατίας, Καππαδοκίας, Ἀσίας καὶ Βιθυνίας
\item Петр, Апостол Иисуса Христа, пришельцам, рассеянным в Понте, Галатии, Каппадокии, Асии и Вифинии, избранным,

from:
1-е послание Петра 1 глава – Библия: https://bible.by/syn/46/1/
\end{itemize}

A Glagolitic font in the Tex distribution:

{\fgla \huge Петр, Апостол Иисуса Христа, пришельцам,}


\end{document}

Question here describes the various ways to get back to the fontspec default settings (including Latin Modern; it used to be CM), and also how to switch font encodings if desired.

Cicada
  • 10,129