How to generate compounded diacritical fonts for Sanskrit with XeTeX and LuaTeX?

Question

I'm trying to use diacritical marks for Sanskrit with a font that does not natively support all the marks (Minion Pro). So they are compounded somehow by XeTeX. This works almost fine in using:

\documentclass{scrartcl}
\usepackage{xunicode,xltxtra}
\usepackage{fontspec,newunicodechar}
\defaultfontfeatures{Ligatures=TeX}
\setmainfont{Minion Pro}

\newunicodechar{Ṛ}{\d{R}}
\newunicodechar{ṛ}{\d{r}}
\newunicodechar{Ṝ}{\={\d{R}}}
\newunicodechar{ṝ}{\={\d{r}}}
\newunicodechar{Ḷ}{\d{L}}
\newunicodechar{ḷ}{\d{l}}
\newunicodechar{Ḹ}{\={\d{L}}}
\newunicodechar{ḹ}{\={\d{l}}}
\newunicodechar{ṃ}{\d{m}}
\newunicodechar{ḥ}{\d{h}}
\newunicodechar{Ṭ}{\d{T}}
\newunicodechar{ṭ}{\d{t}}
\newunicodechar{Ḍ}{\d{D}}
\newunicodechar{ḍ}{\d{d}}
\newunicodechar{Ṅ}{\.{N}}
\newunicodechar{ṅ}{\.{n}}
\newunicodechar{Ṇ}{\d{N}}
\newunicodechar{ṇ}{\d{n}}
\newunicodechar{Ṣ}{\d{S}}
\newunicodechar{ṣ}{\d{s}}

\begin{document}
a  A

ā  Ā

i  I

ī  Ī

u  U

ū  Ū

ṛ  Ṛ

ṝ  Ṝ

ḷ  Ḷ

ḹ  Ḹ

e  E

ai  Ai

o  O

au  Au

ṃ ḥ

k  K

c  C

ṭ  Ṭ

t  T

p  P

kh  Kh

ch  Ch

ṭh  Ṭh

th  Th

ph  Ph

g  G

j  J

ḍ  Ḍ

d  D

b  B

gh  Gh

jh  Jh

ḍh  Ḍh

dh  Dh

bh  Bh

ṅ  Ṅ

ñ  Ñ

ṇ  Ṇ

n  N

m  M

y  Y

r  R

l  L

v  V

ś  Ś

ṣ  Ṣ

s  S

h  H

\end{document}

But the macrons above Ḹ, ḹ, Ṝ, and ṝ are misplaced and thiner than the ones above ā, ī and so on. What can I do about that?

Second problem: This fails completely in LuaTeX. xltxtra seems to be essential for this to work. But this is not supported in LuaTeX. I tried this using this code: https://tex.stackexchange.com/a/20791/19458

But the dots below are bigger than the i-dot. This is not the case with the solution above using XeTeX. Any idea how to do this in a better way in LuaTeX? Is it possible at all?

You're missing the \UndeclareUTFComposite declarations and the redefinition of \d. Probably also a redefinition of \. is necessary. — egreg, Nov 27 '12 at 23:47
In XeTeX this works without \UndeclareUTFComposite. In LuaTeX even with \UndeclareUTFComposite the dots are too big with the solution cited above. How am I supposed to redefine \d and \.? — Psychic Birdy, Nov 29 '12 at 08:46

egreg · Accepted Answer · 2017-03-28T17:26:39.577

Here is something that seems to work:

\documentclass{scrartcl}
\usepackage{fontspec,newunicodechar}
\defaultfontfeatures{Ligatures=TeX}
\setmainfont{Minion Pro}

\UndeclareUTFcomposite[\UTFencname]{x1E0C}{\d}{D}
\UndeclareUTFcomposite[\UTFencname]{x1E0D}{\d}{d}
\UndeclareUTFcomposite[\UTFencname]{x1E25}{\d}{h}
\UndeclareUTFcomposite[\UTFencname]{x1E36}{\d}{L}
\UndeclareUTFcomposite[\UTFencname]{x1E37}{\d}{l}
\UndeclareUTFcomposite[\UTFencname]{x1E43}{\d}{m}
\UndeclareUTFcomposite[\UTFencname]{x1E46}{\d}{N}
\UndeclareUTFcomposite[\UTFencname]{x1E47}{\d}{n}
\UndeclareUTFcomposite[\UTFencname]{x1E5A}{\d}{R}
\UndeclareUTFcomposite[\UTFencname]{x1E5B}{\d}{r}
\UndeclareUTFcomposite[\UTFencname]{x1E62}{\d}{S}
\UndeclareUTFcomposite[\UTFencname]{x1E63}{\d}{s}
\UndeclareUTFcomposite[\UTFencname]{x1E6C}{\d}{T}
\UndeclareUTFcomposite[\UTFencname]{x1E6D}{\d}{t}

\UndeclareUTFcomposite[\UTFencname]{x1E44}{\.}{N}
\UndeclareUTFcomposite[\UTFencname]{x1E45}{\.}{n}

\makeatletter
\let\d\relax
\DeclareRobustCommand{\d}[1]
   {\hmode@bgroup
    \o@lign{\relax#1\crcr\hidewidth\ltx@sh@ft{-1ex}.\hidewidth}\egroup}
\let\.\relax
\DeclareRobustCommand{\.}[1]{\accent"02D9#1}
\DeclareRobustCommand{\MACRON}[1]{\accent"AF#1}
\makeatother

\newunicodechar{Ḍ}{\d{D}}
\newunicodechar{ḍ}{\d{d}}
\newunicodechar{ḥ}{\d{h}}
\newunicodechar{Ḷ}{\d{L}}
\newunicodechar{ḷ}{\d{l}}
\newunicodechar{ṃ}{\d{m}}
\newunicodechar{Ṇ}{\d{N}}
\newunicodechar{ṇ}{\d{n}}
\newunicodechar{Ṛ}{\d{R}}
\newunicodechar{ṛ}{\d{r}}
\newunicodechar{Ṣ}{\d{S}}
\newunicodechar{ṣ}{\d{s}}
\newunicodechar{Ṭ}{\d{T}}
\newunicodechar{ṭ}{\d{t}}

\newunicodechar{Ṅ}{\.{N}}
\newunicodechar{ṅ}{\.{n}}

\newunicodechar{Ḹ}{\d{\MACRON{L}}}
\newunicodechar{ḹ}{\d{\MACRON{l}}}
\newunicodechar{Ṝ}{\d{\MACRON{R}}}
\newunicodechar{ṝ}{\d{\MACRON{r}}}

\begin{document}
\parbox{.5\textwidth}{
a  A
ā  Ā
i  I
ī  Ī
u  U
ū  Ū
ṛ  Ṛ
ṝ  Ṝ
ḷ  Ḷ
ḹ  Ḹ
e  E
ai  Ai
o  O
au  Au
ṃ ḥ
k  K
c  C
ṭ  Ṭ
t  T
p  P
kh  Kh
ch  Ch
ṭh  Ṭh
th  Th
ph  Ph
g  G
j  J
ḍ  Ḍ
d  D
b  B
gh  Gh
jh  Jh
ḍh  Ḍh
dh  Dh
bh  Bh
ṅ  Ṅ
ñ  Ñ
ṇ  Ṇ
n  N
m  M
y  Y
r  R
l  L
v  V
ś  Ś
ṣ  Ṣ
s  S
h  H
}
\end{document}

enter image description here

As you see it's necessary to undo some of the work done by xunicode (which is automatically loaded by fontspec and needn't to be loaded explicitly). Also some of the standard accents must be redefined, or they wouldn't use the main document font.

Update 2017

The macros above work provided fontspec is loaded with the euenc option. On the other hand, the new default TU encoding doesn't declare composites with \d or \.N and \.n, so the code is simpler.

\documentclass{scrartcl}
\usepackage{fontspec}
\usepackage{newunicodechar}
\setmainfont{Minion Pro}

\makeatletter
\let\d\relax
\DeclareRobustCommand{\d}[1]
   {\hmode@bgroup
    \o@lign{\relax#1\crcr\hidewidth\ltx@sh@ft{-1ex}.\hidewidth}\egroup}
\let\.\relax
\DeclareRobustCommand{\.}[1]{\accent"02D9#1}
\DeclareRobustCommand{\MACRON}[1]{\accent"AF#1}
\makeatother

\newunicodechar{Ḍ}{\d{D}}
\newunicodechar{ḍ}{\d{d}}
\newunicodechar{ḥ}{\d{h}}
\newunicodechar{Ḷ}{\d{L}}
\newunicodechar{ḷ}{\d{l}}
\newunicodechar{ṃ}{\d{m}}
\newunicodechar{Ṇ}{\d{N}}
\newunicodechar{ṇ}{\d{n}}
\newunicodechar{Ṛ}{\d{R}}
\newunicodechar{ṛ}{\d{r}}
\newunicodechar{Ṣ}{\d{S}}
\newunicodechar{ṣ}{\d{s}}
\newunicodechar{Ṭ}{\d{T}}
\newunicodechar{ṭ}{\d{t}}

\newunicodechar{Ṅ}{\.{N}}
\newunicodechar{ṅ}{\.{n}}

\newunicodechar{Ḹ}{\d{\MACRON{L}}}
\newunicodechar{ḹ}{\d{\MACRON{l}}}
\newunicodechar{Ṝ}{\d{\MACRON{R}}}
\newunicodechar{ṝ}{\d{\MACRON{r}}}

\begin{document}
\parbox{.5\textwidth}{
a  A
ā  Ā
i  I
ī  Ī
u  U
ū  Ū
ṛ  Ṛ
ṝ  Ṝ
ḷ  Ḷ
ḹ  Ḹ
e  E
ai  Ai
o  O
au  Au
ṃ ḥ
k  K
c  C
ṭ  Ṭ
t  T
p  P
kh  Kh
ch  Ch
ṭh  Ṭh
th  Th
ph  Ph
g  G
j  J
ḍ  Ḍ
d  D
b  B
gh  Gh
jh  Jh
ḍh  Ḍh
dh  Dh
bh  Bh
ṅ  Ṅ
ñ  Ñ
ṇ  Ṇ
n  N
m  M
y  Y
r  R
l  L
v  V
ś  Ś
ṣ  Ṣ
s  S
h  H
Ḹ  ḹ
Ṝ  ṝ
}
\end{document}

Thanks alot, that's a wonderful solution that works both in XeTeX and LuaTeX. One more thing: Is there a way to make this searchable and copyable in pdf? If I copy + paste this I only get the seperated compound parts, not the whole symbol, e.g. ¯r. instead of ṝ — Psychic Birdy, Nov 29 '12 at 13:35
Oh, and the dots remain a little bit bigger than the i dot which is not the case with my example. You can see this if you zoom in very closely. Well, it's not so important. I just wonder why it is this way. — Psychic Birdy, Nov 29 '12 at 13:42
You may be able to make the PDF file searchable with the help of the accsupp package; the cleanest solution is using a font that has the necessary glyphs. — egreg, Nov 29 '12 at 14:29
This works indeed. Just replace \newunicodechar{Ḍ}{\d{D}} with '\newunicodechar{Ḍ}{% \BeginAccSupp{method=hex,unicode,ActualText=1e0c}% \d{D}% \EndAccSupp{}% }' etc. Look up the hex code for each symbol. With this the symbols are copyable and searchable. — Psychic Birdy, Dec 24 '12 at 13:17
I find that the solution file from @egreg does not seem to compile successfully. I get this error: ! Undefined control sequence. l.6 \UndeclareUTFcomposite [\UTFencname]{x1E0C}{\d}{D}. Can someone throw light on this development? Thanks. — chandra, Mar 28 '17 at 16:50
@chandra This was for a previous version of fontspec; now to get this working you need \usepackage[euenc]{fontspec}. I'll update. — egreg, Mar 28 '17 at 16:55
Thanks, @egreg. Invoking \usepackage[euenc]{fontspec} \usepackage{newunicodechar} seems to do the trick. — chandra, Mar 28 '17 at 17:01
I am trying to get R, r, L, l, all with macron below.
I tried

\DeclareRobustCommand{\MACRONBELOW}[1]{\accent"0331#1}

with \newunicodechar{Ḻ}{\MACRONBELOW{L}} etc.,

but am getting "tofu" in my PDF.

What am I doing wrong? — chandra, Sep 27 '17 at 06:59
@egreg: Is there any other way to get these: I need them for accurate Romanization of Tamil letters according to ISO 15919. I would appreciate knowing. Thanks. — chandra, Sep 27 '17 at 09:12
@chandra Please, make a fresh question with all the details. — egreg, Sep 27 '17 at 09:15
@egreg: New question posted at https://tex.stackexchange.com/questions/393486/generating-underaccents-for-romanized-tamil — chandra, Sep 27 '17 at 09:29
One hint for anyone who, just like me, is looking to have diacritics for Sanskrit with this particular font (Minion): There is also another option. For some time, Adobe has released a revised version of this font, called "Minion 3". This version has the diacritics already built-in. — Psychic Birdy, Jun 25 '22 at 15:53

How to generate compounded diacritical fonts for Sanskrit with XeTeX and LuaTeX?

1 Answers1

Update 2017

Linked