Old sanskrit fonts and unicode input

Question

Is it possible to use the old devanagari fonts which were part of the packages sanskrit and velthuis with a modern direct devanagari unicode input method? They expect input in velthuis transliteration and one needs to run their preprocessors to generate the .tex files from .skt or .dn files respectively. I understand the velthuis font has been used to generate the devanagari range of the FreeSerif font, but at a loss of ligatures.

Examples:

A dn file:

\documentclass{article}
\usepackage{devanagari}
\begin{document}
\begin{verse}
{\dn dharmak.setre kuruk.setre samavetaa yuyutsavaa.h | \\
maamakaa.h paa.n.dava"scaiva kimakurvata sa~njaya || }
\end{verse}
\end{document}

Running the devnag preprocessor on it we generate the following tex file:

\def\DevnagVersion{2.15}\documentclass{article}
\usepackage{devanagari}
\begin{document}
\begin{verse}
{\dn Dm\0\322w\?/\? \7{k}z\322w\?/\? smv\?tA \7{y}\7{y}(svA, . \\
mAmkA, pA\317wXv\396w\4v Ekm\7{k}v\0t s\3D2wy .. }
\end{verse}
\end{document}

With the following pdf output:

In contrast:

\documentclass{article}
\usepackage{fontspec}
\setmainfont[Script=Devanagari]{FreeSerif}
\begin{document}
\begin{verse}
धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः ।\\
मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय ॥\\
\end{verse}
\end{document}

produces the following pdf:

This is particularly interesting for an edition project which started in the 1990s using the velthuis package, and they'd like their new volumes exactly the same as the old ones.

I assume it would be possible theoretically to create map files for these fonts, but that that would be a huge work, so that nobody has dared to do it yet? Or one could convert the fonts from metafont to truetype?

I think it's hard to avoid the conclusion that FreeSerif, at least, does not contain a full complement of ligatures. Certainly, your picture of the xelatex output is exactly what I see in my browser. Unfortunately, I know of no TrueType (or whatever standard) version of the velthuis font and I do believe that short of creating one yourself, you'd have to select a different font (Chandas and Sanskrit2003 are good) or do it the velthuis way. I could be wrong and you may not want to take my word for it, I cannot prove this negative, but I'm very doubtful, I'm afraid — Au101, Jun 14 '16 at 20:29
@Au101 Yes, I know the fonts Chandas and Sanskrit 2003, which, being fonts particularly for Devanagari, have a richer set of ligatures than the FreeSerif one. My question was about the old fonts though, which were very good, but they are no unicode fonts, and I was wondering what would be the options to make them work with "modern" unicode input. — muk.li, Jun 14 '16 at 21:13
As you say, there are two alternatives here: (1) convert the Metafont fonts (from the sanskrit and velthuis packages) from the Metafont source (on my system they are in texmf-dist/fonts/source/public/{sanskrit,velthuis}/*.mf) to a modern font format (OpenType), or (2) write something that goes backwards, against the direction of technological progress, and converts input like धर्मक्षेत्रे into input like {\dn Dm\0\322w\?/\?. There were some articles about the former in TUGBoat by Karel Píška; the latter is probably easier to do though; just that no one has done it. — ShreevatsaR, Mar 07 '17 at 02:54
Actually it only just occurred to me: it would be very simple to write a Devanagari-to-Velthuis converter (and in fact I've done most of the basic work myself, years ago: here), using which one could convert धर्मक्षेत्रे कुरुक्षेत्रे into {\dn dharmak.setre kuruk.setre}. Then we could pass that file through devnag, and then tex. Would that be acceptable to you? If so, I can probably write a simple wrapper script (detect maximal runs of Devanagari characters, and wrap them in {\dn ...}) and post it as an answer. — ShreevatsaR, Mar 17 '17 at 03:29
@ShreevatsaR it might be an acceptable solution, although with a lot of complications. It would be much more user-friendly if it could be solved by converting the font, but I have also no idea how much work that would practically mean, and if as you said all the old fonts' features could be implemented using OpenType. It should also be mentioned that there is a development version of the FreeSerif font which has much improved support for Devanāgarī. — muk.li, Mar 17 '17 at 14:04
I am brand new here and found this posting. I use the Wikner font for Sanskrit latex documents. I made suggestions to Charles Wikner before he originally released the font, some of which he incorporated and some he did not have space left to add them. The modern OpenType Siddhanta font is adequate, but the Wikner font does have some aesthetic advantages for me. I really know nothing about font creation, but I would be willing to do the menial work of converting the Wikner font to OpenType and testing if someone would provide detailed guidance to me. This would probably best happen offline. — Girish, May 11 '19 at 21:17
@egreg: Wikner's Sanskrit font was a LaTeX package, the proposal to convert the excellent font contained in it to a modern OpenType font and thus make it usable again not only to LaTeX users but also a wider public is highly laudable and in my opinion only slightly off-topic. Girish: I can think of a few people who might be in a better position to get you started with this than I am, feel free to contact me off this site. — muk.li, May 12 '19 at 06:29

ShreevatsaR · Answer 1 · 2017-03-20T07:05:32.343

(There's something of an answer towards the end of this long post!)

The modern way to produce Devanagari output is to use XeTeX/LuaTeX and a modern (OpenType) font, as in the answers to this question and some examples below. (Output may differ between XeTeX and LuaTeX because XeTeX uses system libraries like Harfbuzz for glyph positioning, while LuaTeX tries to avoid external dependencies and implement everything by itself: this is why LuaTeX doesn't support most Indic scripts, although its support for Devanagari has improved recently.)

This question is about two specific old fonts from the early 1990s that used legacy encoding, and which are not available in OpenType format. The reason for asking this question might be if you either want to match exactly those fonts, or use certain features that were present in those fonts but are present in no modern Devanagari font.

These two fonts are:

A font by Frans J. Velthuis, whose Metafont sources are available in texmf-dist/fonts/source/public/velthuis (or here). This font is part of the devanagari package, which involves writing your input not in Unicode (like धर्मक्षेत्रे कुरुक्षेत्रे) but in an ASCII-compatible format specific to that package (like {\dn dharmak.setre kuruk.setre}) in a .dn file, which you then preprocess with the devnag binary into a .tex file (which has lines like {\dn Dm\0\322w\?/\? \7{k}z\322w\?/\?}). Typesetting this .tex file produces Devanagari output.
A font by Charles Wikner, whose Metafont sources are available in texmf-dist/fonts/source/public/sanskrit (or here). This font is part of the sanskrit package, which (again) involves writing your input not in Unicode but in another ASCII-compatible format specific to that package (like {\skt ...}) into a .skt file, which you then preprocess with the skt binary (compiled from source skt.c) into a .tex file (which has lines like {\skt ;Da;mRa;[ea:t3ea k\ZH{-12}{u}+.r8+:[ea:t3ea}). Typesetting this .tex file produces Devanagari output.

You can search on this site for devnag and for skt to see questions about these two fonts / packages.

The question is about how to have input in Unicode (like धर्मक्षेत्रे कुरुक्षेत्रे) but still get output in one of those fonts.

The ideal solution would be if there existed a modern OpenType font that is identical to those fonts. Some steps that have been made in that direction:

The FreeSerif font from the GNU FreeFont project has apparently used the Velthuis font as basis for its glyphs for the Devanagari range of Unicode. However, it does not contain all the consonant clusters (ligatures) needed for Devanagari, as you can see in the question and in the other answer (also below). Its last official release was in 2012. The development version has more ligatures; this repository has a somewhat recent build. (See here and this bug report by the OP.) It still does not seem perfect, though it is getting close. And it doesn't include italics and other style features present in the older fonts.
Karel Píška wrote some articles in TUGboat during 2002–2005 about converting Indic fonts to an outline format. As a result, the fonts have already been converted to PostScript Type 1 format. However, those fonts use a (clever) custom encoding and their glyphs aren't mapped to Unicode codepoints and ligatures thereof—they even include parts of glyphs of real characters. See these screenshots from the Wikner documentation and from opening the Velthuis font in FontForge:

In fact, if you read the documentation for these packages, these fonts have a number of features (customizing specific ligatures or the entire “style”, sizing relative to surrounding Roman characters, changing intra- and inter-character spacing, and all sorts of stuff with Vedic accents that I don't fully understand), and I'm not sure even whether the current font standards of Unicode+OpenType (and TeX's interface to them: fontspec) support all these features.

So, until there are adequate Unicode+OpenType fonts that are equivalent to those, if you need the old fonts the best solution may still be to typeset using the old packages. Fortunately this is still possible in XeTeX: input like {\dn Dm\0\322w\?/\? \7{k}z\322w\?/\?} can still produce the same output. To make this possible with Unicode input, one might be able to reverse-engineer the devnag and skt preprocessors, and write either a TECkit mapping or a pre-processor that does this work. Alternatively, we can use a hack: we can take the Unicode Devanagari input, transliterate to the custom input encodings taken by the preprocessors that come with the packages, run the preprocessors on this transliterated text surrounded with {\dn ...} or {\skt ...} respectively, finally read that result back and use it as input.

Here's one implementation of that hack :-) A TeX macro takes the Unicode Devanagari input and writes it to a file, then calls a Python script which transliterates Unicode Devanagari to the encoding expected by the preprocessors, calls the preprocessors, extracts the output and writes it back to a file, which is read by TeX again. (In the example below, the TeX macro gets धर्मक्षेत्रे कार्त्स्न्यम् विद्भिः and the Python script (using the preprocessor) turns it into {\dn Dm\0\322w\?/\? kA(-\306wy\0\qq{m} EvE\389w, }.)

(This is just a proof-of-concept and has not been well-tested much beyond this example file; also it does not give access to any of the customization (or Vedic accents) that the packages are especially good at: you can edit the Python script for those.)

\documentclass{article}
\usepackage{fontspec}  % For modern OpenType fonts
\usepackage{devanagari}  % For the old Velthuis fonts
\usepackage{skt} % For the old Wikner fonts

\newfontfamily\noto[Script=Devanagari]{Noto Sans Devanagari}
\newfontfamily\chandas[Script=Devanagari]{Chandas}
\newfontfamily\sktwo[Script=Devanagari]{Sanskrit 2003}
\newfontfamily\nakula[Script=Devanagari]{Nakula}
\newfontfamily\freeserif[Script=Devanagari]{FreeSerif}
\newfontfamily\freeserifdev[Path=freefont-svn/,Script=Devanagari]{FreeSerif.otf}

\newwrite\myfile
\newcommand\devpreprocess[2]{%
\immediate\openout\myfile=\jobname.#1.devnagtmp%
\immediate\write\myfile{#2}%
\immediate\closeout\myfile% Seems necessary
\immediate\write18{python get-dn.py #1 \jobname.#1.devnagtmp}%
\input\jobname.#1.devnagtmp.devnagout%
}
\newcommand\prepdn[1]{\devpreprocess{dn}{#1}}
\newcommand\prepskt[1]{\devpreprocess{skt}{#1}}

\begin{document}

In different fonts:

\begin{tabular}{l l}
Noto Sans Devanagari & {\noto धर्मक्षेत्रे कार्त्स्न्यम् विद्भिः} \\
Chandas & {\chandas धर्मक्षेत्रे कार्त्स्न्यम् विद्भिः} \\
Sanskrit 2003 & {\sktwo धर्मक्षेत्रे कार्त्स्न्यम् विद्भिः} \\[1em]
Wikner (preprocessed input) & {\skt ;Da;mRa;[ea:t3ea k+:a;t=+:\ZM{rMolHneHegMi}yRa;m,a ;Y2a;va;Y5a;;d2\ZM{x0Bi0ed0E}\ZS{1}H\ZS{4}} \\
Wikner (Unicode + hack) & \prepskt{धर्मक्षेत्रे कार्त्स्न्यम् विद्भिः} \\[1em]
Free Serif & {\freeserif धर्मक्षेत्रे कार्त्स्न्यम् विद्भिः} \\
Free Serif Dev & {\freeserifdev धर्मक्षेत्रे कार्त्स्न्यम् विद्भिः} \\
Velthuis (preprocessed input) & {\dn Dm\0\322w\?/\? kA(-\306wy\0\qq{m} EvE\389w,} \\
Velthuis (Unicode + hack) & \prepdn{धर्मक्षेत्रे कार्त्स्न्यम् विद्भिः}
\end{tabular}

\end{document}

If you have all those fonts installed, then when you run the above with xelatex -shell-escape (note: using -shell-escape can be dangerous in general; make sure the input is trusted), you get:

where you can see that despite Unicode input, you're getting output from the old Sanskrit packages.

This is a fine contribution, although as you say, not really an answer, but hey, I think it still qualifies to be down here. However, I'd be interested to know in which respects you feel the legacy fonts are superior? As far as I can see the only area is taste (i.e. the look of the glyphs). It's hard to think of a word of Classical Sanskrit that modern fonts can't do? — Au101, Mar 17 '17 at 01:39
(For my future reference: of the two fonts, Wikner's skt/sanskrit is quite a bit “smaller” (in terms of the size of the .mf sources) than velthuis, so for any further investigation may be an easier place to start.) — ShreevatsaR, Sep 02 '19 at 21:31

score 2 · Answer 2 · answered Jun 14 '16 at 20:03

2

I can't read the script so I find it hard to compare but

\documentclass{article}
\usepackage{fontspec}
\setmainfont[Script=Devanagari]{FreeSerif.ttf}
\begin{document}
\begin{verse}
धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः ।\\
मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय ॥\\
\end{verse}
\end{document}

Produces noticeably different results in texlive 2016 with xelatex and lualatex. If I can understand the forms at all, xelatex is using more ligatures than luatex.

xelatex at the top, lualatex below:

answered Jun 14 '16 at 20:03

David Carlisle

757,742

actually I think it's the xetex output you show in the question. Perhaps this answer is not useful at all in that case, in which case I'll delete, but I'll leave it here for now, it's hard to judge when you can't read the text. – David Carlisle Jun 14 '16 at 20:08
The lualatex output is hopeless, but I'm sure it's possible to make lualatex do it properly? The xelatex output you give is the same as in the question and is correct (the lualatex output is outright wrong), but yes, it lacks all of the ligatures available with the velthuis package. I believe FreeSerif simply does not have the full range of ligatures unfortunately – Au101 Jun 14 '16 at 20:17
@Au101 there are a couple of projects to use harfbuzz with luatex which would then produce the same output as xetex presumably. on the other hand with luatex's lua font shaping one could in theory add necessary ligatures on the fly (if the target glyphs are in fact in the font) by modifying the lua tables after the font is loaded, but that seems tricky given the starting position of the current rendering. – David Carlisle Jun 14 '16 at 20:25
Yes the xelatex output here is better than lualatex's, but is still not as "proper" as the output from the old non-Unicode fonts. Note that the issue is that the FreeSerif font (probably) does not contain all the ligatures from the old metafont fonts, so it would probably be hard for luatex to do anything here… – ShreevatsaR Mar 07 '17 at 02:58

Old sanskrit fonts and unicode input

2 Answers2

Linked