4

I am trying to get to work the hyphenation for transliterated sanskrit provided by polyglossia. One admittedly odd problem I encounter is that a string (four compound words actually, glued together by sandhi) spanning more than one line doesn't get hyphenated:

\documentclass[12pt]{article}

\usepackage{fontspec}
\usepackage{polyglossia}

\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{TeX Gyre Pagella}

\setotherlanguage{english} 
\newfontfamily\englishfont{TeX Gyre Pagella}

\begin{document}

asmadādiviśeṣaṇaśūnyasyārthasākṣātkāritvamātrasyaivendriyādhīnatvadarśanād anaikāntikatvam asambhavīti cet | yady evam arthasākṣātkāritvamātrasyendiryavadālokādhīnatvam upalabdham iti na santamase paśyeyur ulūkādayaḥ | atha vyabhicāradarśanād ālokasyāvyāpakatvam, vyabhicāraśaṅkayā tarhīndriyasyāpy avyāpakatvam | vyāptyā śaṅkā khaṇḍyata iti cet | śaṅkāsambhavād vyāptir evāsambhavinī yadi prathamata eva vyāptiḥ, vyabhicāro 'pi na dṛśyeta | 


\end{document}

sample output

Of course I could use discretionary hyphens here, but I would rather rely on latex (xelatex) taking care of the hyphenation. Why doesn't it work here?

muk.li
  • 3,620

1 Answers1

6

You're being very unlucky: if I add \tracingparagraphs=1 in the document, the log file shows the attempts made by XeTeX at line breaking; I also put \hspace*{0pt} at the start, so that hyphenating the first word will be possible.

@firstpass
@secondpass
[]| \EU1/TeXGyrePagella(0)/m/n/10 a-sma-dā-di-vi-śe-ṣa-ṇa-śū-nya-syā-rtha-sā-kṣ
ā-tkā-ri-tva-mā-tra-syai-ve-ndri-yā-dhīnatvadarśanād 
@ via @@0 b=* p=0 d=*
@@1: line 1.3 t=0 -> @@0
a-nai-kā-nti-ka-tvam a-sa-mbha-vīti cet | yady e-vam a-rtha-sā-kṣā-tkā-ri-tva-m
ā-tra-sye-ndi-
@\discretionary via @@1 b=16 p=50 d=3176
@@2: line 2.3- t=3176 -> @@1
rya-va-dā-lo-kā-dhī-na-tvam u-pa-la-bdham iti na sa-nta-mase pa-śye-yur u-lū-kā
-da-yaḥ | 
@ via @@2 b=3 p=0 d=169
@@3: line 3.2 t=3345 -> @@2
a-tha vya-bhi-cā-ra-da-rśa-nād ā-lo-ka-syā-vyā-pa-ka-tvam, vya-bhi-cā-ra-śa-ṅka
yā ta-rhī-ndri-
@\discretionary via @@3 b=13 p=50 d=3029
@@4: line 4.3- t=6374 -> @@3
ya-syāpy a-vyā-pa-ka-tvam | vyā-ptyā śa-ṅkā kha-ṇḍyata iti cet | śa-ṅkā-sa-mbha
-vād 
@ via @@4 b=3 p=0 d=169
@@5: line 5.2 t=6543 -> @@4
vyā-ptir e-vā-sa-mbha-vinī yadi pra-tha-mata eva vyā-ptiḥ, vya-bhi-cāro 'pi na 
dṛśyeta 
@ via @@5 b=1 p=0 d=121
@@6: line 6.2 t=6664 -> @@5
| 
@\par via @@6 b=0 p=-10000 d=*
@@7: line 7.2- t=6664 -> @@6

What can be seen is that no feasible hyphenation points are found in the final part of the long word

...-yā-dhīnatvadarśanād

and hyphenating after would give too short a line.

This has to do with the inability of (Xe)TeX to correctly hyphenate words longer than 63 characters, see part 42 “Hyphenation” in “TeX, the program” (texdoc tex, p. 344ff).

You have to add discretionaries, I'm afraid, or some \penalty0 \hspace{0pt} at appropriate points so automatic hyphenation would still be possible in the compound words.

For instance, inserting \- as shown below allows hyphenation, but the line is still overfull; hyphenating between tva and da would be no good either.

\documentclass{article}
\usepackage{fontspec}
\usepackage{polyglossia}
\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{TeX Gyre Pagella}

\begin{document}

asmadādiviśeṣaṇaśūnyasyārthasākṣātkāritvamātrasyaivendriyādhīnatvada\-rśanād
anaikāntikatvam asambhavīti cet | yady evam
arthasākṣātkāritvamātrasyendiryavadālokādhīnatvam upalabdham iti na santamase
paśyeyur ulūkādayaḥ | atha vyabhicāradarśanād ālokasyāvyāpakatvam,
vyabhicāraśaṅkayā tarhīndriyasyāpy avyāpakatvam | vyāptyā śaṅkā khaṇḍyata iti
cet | śaṅkāsambhavād vyāptir evāsambhavinī yadi prathamata eva vyāptiḥ,
vyabhicāro 'pi na dṛśyeta |

\end{document}

enter image description here

egreg
  • 1,121,712
  • How hard would it be to change the 63 characters to a higher number, or remove it alltogether? Searching for strings of more than 63 characters I get 1823 hits in the corpus of texts I'm working on... – muk.li Mar 15 '16 at 13:12
  • 1
    @muk.li Unfortunately it is built in in XeTeX. You might ask on the XeTeX mailing list http://tug.org/mailman/listinfo/xetex – egreg Mar 15 '16 at 13:22