1

(Related to my other question here.) It seems that the polyglossia hyphenation rules for transliterated sanskrit by default allow hyphenation after a single character. See the example, the word ulūkādayaḥ between third and fourth line gets hyphenated after only one character. How could I change that to be, for example, two characters?

\documentclass[12pt]{article}

\usepackage{fontspec}
\usepackage{polyglossia}

\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{TeX Gyre Pagella}

\setotherlanguage{english} 
\newfontfamily\englishfont{TeX Gyre Pagella}

\begin{document}

asmadādiviśeṣaṇaśūnyasyārthasākṣātkāritvamātrasyaivendriyādhīnatvadarśanād anaikāntikatvam asambhavīti cet | yady evam arthasākṣātkāritvamātrasyendiryavadālokādhīnatvam upalabdham iti na santamase paśyeyur ulūkādayaḥ | atha vyabhicāradarśanād ālokasyāvyāpakatvam, vyabhicāraśaṅkayā tarhīndriyasyāpy avyāpakatvam | vyāptyā śaṅkā khaṇḍyata iti cet | śaṅkāsambhavād vyāptir evāsambhavinī yadi prathamata eva vyāptiḥ, vyabhicāro 'pi na dṛśyeta | 


\end{document}

sample output

Edit: I still don't fully get it. I'm now trying to reduce the minimum hyphenatable characters on either end to one:

\documentclass[12pt]{article}

\usepackage{fontspec}
\usepackage{polyglossia}
\tracingparagraphs=1
\setdefaultlanguage{sanskrit}
\setotherlanguage{english}

\PolyglossiaSetup{sanskrit}{
  hyphenmins={1,1},
}

\newfontfamily\sanskritfont{TeX Gyre Pagella}
\newfontfamily\englishfont{TeX Gyre Pagella}

\begin{document}

anaikāntikatvam asambhavīti cet | yady evam arthasākṣātkāritvamātrasyendiryavadālokādhīnatvam upalabdham iti na santamase xx paśyeyur ulūkādayaḥ | atha vyabhicāradarśanād ālokasyāvyāpakatvam, vyabhicāraśaṅkayā tarhīndriyasyāpy avyāpakatvam | vyāptyā śaṅkā khaṇḍyata iti cet | śaṅkāsambhavād vyāptir evāsambhavinī yadi prathamata eva vyāptiḥ, vyabhicāro 'pi na dṛśyeta | 


\end{document}

And get the following output:

enter image description here

I am surprised that it doesn't hyphenate the word generating the overfull \hbox, I would have expected khaṇḍya-ta.

In the log file:

@firstpass
@secondpass
[]\EU1/TeXGyrePagella(0)/m/n/12 anaikāntikatvam a-sa-mbha-vīti cet | yady e-vam
 a-rtha-sā-kṣā-tkā-ri-tva-mā-
@\discretionary via @@0 b=12 p=50 d=2984
@@1: line 1.2- t=2984 -> @@0
tra-sye-ndi-rya-va-dā-lo-kā-dhī-na-tvam u-pa-la-bdham iti na sa-nta-mase xx pa-
śye-
@\discretionary via @@1 b=26 p=50 d=13796
@@2: line 2.1- t=16780 -> @@1
yur u-lū-kā-da-yaḥ | a-tha vya-bhi-cā-ra-da-rśa-nād ā-lo-ka-syā-vyā-pa-ka-tvam,
 vya-
@\discretionary via @@2 b=56 p=50 d=16856
@@3: line 3.1- t=33636 -> @@2
bhi-cā-ra-śa-ṅkayā ta-rhī-ndri-ya-syāpy a-vyā-pa-ka-tvam | vyā-ptyā śa-ṅkā kha-
ṇḍyata 
@ via @@3 b=* p=0 d=*
@@4: line 4.3 t=33636 -> @@3
iti cet | śa-ṅkā-sa-mbha-vād vyā-ptir e-vā-sa-mbha-vinī yadi pra-tha-mata eva v
yā-
@\discretionary via @@4 b=31 p=50 d=4181
@@5: line 5.3- t=37817 -> @@4
ptiḥ, vya-bhi-cāro 'pi na dṛśyeta | 
@\par via @@5 b=0 p=-10000 d=*
@@6: line 6.2- t=37817 -> @@5

I can see that it generally doesn't want to hyphenate before the last syllable, why is that so?

muk.li
  • 3,620
  • 2
    The right hyphenation minimum is not set. – egreg Jun 16 '16 at 06:26
  • 2
    Apparently, setting the hyphenation minima with \PolyglossiaSetup has no effect. – egreg Jun 16 '16 at 06:32
  • @egreg If I copy gloss-sanskrit.ldf into my working directory and change the line with hyphenmins inside it it will do it, but that shouldn't be the way to do it, right? – muk.li Jun 16 '16 at 14:36

2 Answers2

3

You can add

\PolyglossiaSetup{sanskrit}{
  hyphenmins={2,3},% default is {1,3}
}

\documentclass[12pt]{article}

\usepackage{fontspec}
\usepackage{polyglossia}

\setdefaultlanguage{sanskrit}
\setotherlanguage{english}

\PolyglossiaSetup{sanskrit}{
  hyphenmins={1,3},
}

\newfontfamily\sanskritfont{TeX Gyre Pagella}
\newfontfamily\englishfont{TeX Gyre Pagella}

\begin{document}

asmadādiviśeṣaṇaśūnyasyārthasākṣātkāritvamātrasyaivendriyādhīnatvadarśanād anaikāntikatvam asambhavīti cet | yady evam arthasākṣātkāritvamātrasyendiryavadālokādhīnatvam upalabdham iti na santamase paśyeyur ulūkādayaḥ | atha vyabhicāradarśanād ālokasyāvyāpakatvam, vyabhicāraśaṅkayā tarhīndriyasyāpy avyāpakatvam | vyāptyā śaṅkā khaṇḍyata iti cet | śaṅkāsambhavād vyāptir evāsambhavinī yadi prathamata eva vyāptiḥ, vyabhicāro 'pi na dṛśyeta | 


\end{document}

enter image description here

egreg
  • 1,121,712
2

This post led to a github issue, which they closed as wontfix. But the ensuing thread did mention several workarounds:

  1. Call \providehyphenmins{sanskrit}{11} anytime before \setdefaultlanguage{sanskrit}
  2. Set the macro \sanskrithyphenmins to 11 (providecommand before \setdefaultlanguage{sanskrit}, renewcommand afterward)
  3. Call
\lefthyphenmin=1
\righthyphenmin=1

after \begin{document} (\AtBeginDocument won't work, etoolbox's AfterEndPreamble will). "But this gets overwritten on any language change."

An example of method 2:

\documentclass{article}

\usepackage[width=4.4in]{geometry} \usepackage{fontspec} \usepackage{polyglossia} \setdefaultlanguage{sanskrit} \renewcommand*{\sanskrithyphenmins}{11} \setotherlanguage{english}

\PolyglossiaSetup{sanskrit}{} % still necessary

\begin{document}

anaikāntikatvam asambhavīti cet | yady evam arthasākṣātkāritvamātrasyendiryavadālokādhīnatvam upalabdham iti na santamase xx paśyeyur ulūkādayaḥ | atha vyabhicāradarśanād ālokasyāvyāpakatvam, vyabhicāraśaṅkayā tarhīndriyasyāpy avyāpakatvam | vyāptyā śaṅkā khaṇḍyata iti cet | śaṅkāsambhavād vyāptir evāsambhavinī yadi prathamata eva vyāptiḥ, vyabhicāro 'pi na dṛśyeta |

\end{document}

Resulting in:

example's output

Teepeemm
  • 6,708