3

I am a new user and want to use Amharic as a main language with English as a second in LaTex. I use MAC O X 10.13 version with TexShop 4.44 version. To write the Amharic text I use Abyssinica SIL as a main font. My question is how do I make (Xe)LaTex keep the Amharic text in the source code intact and compile it normal? Your help is greatly appreciated it. I have tried polyglossia and fontspec environment but the result has so far been unattainable as you can see from the MWE provided:

    \documentclass[12pt]{article}
    \usepackage{fontspec}
    \usepackage{polyglossia}
    %\setmainfont{Abyssinica SIL} or any other is fine


    \setmainlanguage{english}
    \setotherlanguage{amharic}

    \newfontfamily{\amharicfont}[Script=Ethiopic, Scale=1]{Abyssinica SIL}

    \newenvironment{amharic}
      {\amharicfont}
      {}


    \begin{document}

    \section*{Sample in Amharic}

    \begin{amharic}
    እስመ፡አግዚአብሔር፡አምላክ፡ማእምር፡ውእቱ።እግዚአብሔር፡አስተደወ፡መንብሮ።ወአድከመ፡ቅሥተ፡ኀያላን።ወአቅነቶሙ፡ኀይለ፡ለድኩማን።ጽጉማን፡እክል፡ርኅቡ።ወርኁባን፡ጸግቡ።እስመ፡መካን፡ወለደት፡ሰብዐተ፡ወወለድሰ፡ስእነት፡ወሊደ፡እግዚአብሔር፡ይቀትል፡ወየሐዩ።ያወርድኒ፡ውስተ፡ሲእል፡ወየዐርግ።እግዚአብሔር፡ያነዲ፡ወያብዕል።ያኀስርሂ፡ወያከብር፡ዘያነሥኦ፡እምድር፡ለነዳይ።ከመ፡ያንብሮ፡ምስለ፡ዓበይተ።
    \end{amharic}

    Here is english text.

\end{document}   
Mico
  • 506,678
KGG
  • 41
  • 3
  • Please clarify which issue, or issues, you are experiencing. – Mico May 01 '20 at 08:58
  • amharic environment is already defined; you do not need to define it again. Also, see https://tex.stackexchange.com/questions/211172/is-it-possible-to-typeset-the-geez-alphabet-in-latex – Cicada May 01 '20 at 10:40

2 Answers2

4

The example only needs a few changes to compile: remove the unnecessary definition of the amharic environment, which polyglossia defined for you when you enabled the Amharic language. Then uncomment the definition of \amharicfont.

However, the results you get with the default settings will be terrible, because LaTeX does not recognize the Ethiopic word separator as a word-breaking character. Even if you fix that, the line-breaking algorithm and hyphenation patterns are completely unsuited for Amharic, and you will get incredibly ugly results.

So, fair warning, this is a hack I came up with based on these guidelines, despite not knowing any Amharic. I apologize for any errors, and I’d appreciate a native speaker improving this code.

The first part of the trick was to insert a space after all Ethiopic punctuation, and the rest was to set the spacing of the Ethiopic font to be tiny, but extremely stretchy. I also loaded microtype, which, on LuaLaTeX at least, will enable font expansion and should cut down on the amount of hyphenation and extra inter-word spacing Amharic needs. Finally, I turned on \sloppypar to make the inter-word spacing more flexible. If you intend to use it, you probably want to define a new environment that automatically turns \sloppypar on inside a group.

\documentclass[12pt]{article}
\usepackage{polyglossia}
\usepackage{microtype, newunicodechar}
\usepackage[sf, bf, big]{titlesec}

\defaultfontfeatures{Scale=MatchUppercase}
\setmainfont{Abyssinica SIL}[Scale=1]
\setsansfont{Libertinus Sans}

\setmainlanguage{english}
\setotherlanguage{amharic}

% The default hyphenation patterns for Ethiopic script in both polyglossia and
% babel do not properly treat ፡  as a word separator, so the sample you gave
% never hyphenates or line breaks.  Based on the guidelines in
% https://www.w3.org/TR/elreq/#ethiopic_hyphenation this inserts spaces after
% all Ethiopic punctuation.  It then makes the interword space tiny, but very
% stretchy.
\newunicodechar{፡}{፡\ }
\newunicodechar{።}{\@{።} }
\newunicodechar{፣}{፣ }
\newunicodechar{፤}{፤ }
\newunicodechar{፥}{፥ }
\newunicodechar{፦}{፦ }
\newunicodechar{፧}{\@{፧} }
\newunicodechar{፨}{\@{፨} }
\newunicodechar{፠}{\@{፠} }

\newfontfamily{\amharicfont}{Abyssinica SIL}[
  Script=Ethiopic,
  Ligatures=Common,
  WordSpace = {0.1,30.0,1.0}]

\begin{document}

\section*{Sample in Amharic}

This is english text.

\begin{amharic}\begin{sloppypar}
በዩኔስኮ፡ተዘጋጅቶ፡በኢትዮጵያ፡ብሄራዊ፡ኮሚሽን፡ተተረጎመ

የሰው፡ልጅ፡ሁሉ፡ሲወለድ፡ነጻና፡በክብርና፡በመብትም፡እኩልነት፡ያለው፡ነው።፡የተፈጥሮ፡ማስተዋልና፡ሕሊና፡ስላለው፡አንዱ፡ሌላውን፡በወንድማማችነት፡መንፈስ፡መመልከት፡ይገባዋል።

እያንዳንዱ፡ሰው፡የዘር፡የቀለም፡የጾታ፡የቋንቋ፡የሃይማኖት፡የፖለቲካ፡ወይም፡የሌላ፡ዓይነት፡አስተሳሰብ፡የብሔራዊ፡ወይም፡የኀብረተሰብ፡ታሪክ፡የሀብት፡የትውልድ፡ወይም፡የሌላ፡ደረጃ፡ልዩነት፡ሳይኖሩ፡በዚሁ፡ውሳኔ፡የተዘረዘሩት፡መብቶችንና፡ነጻነቶች፡ሁሉ፡እንዲከበሩለት፡ይገባል።

ከዚህም፡በተቀረ፡አንድ፡ሰው፡ከሚኖርበት፡አገር፡ወይም፡ግዛት፡የፖለቲካ፡የአገዛዝ፡ወይም፡የኢንተርናሽናል፡አቋም፡የተነሳ፡አገሩ፡ነጻም፡ሆነ፡በሞግዚትነት፡አስተዳደር፡ወይም፡እራሱን፡ችሎ፡የማይተዳደር፡አገር፡ተወላጅ፡ቢሆንም፡በማንኛውም፡ዓይነት፡ገደብ፡ያለው፡አገዛዝ፡ሥር፡ቢሆንም፡ልዩነት፡አይፈጸምበትም።

እያንዳንዱ፡ሰው፡የመኖር፣፡በነጻነትና፡በሰላም፡የመኖሩ፡መጠበቅ፡መብት፡አለው።
\end{sloppypar}\end{amharic}

\end{document}

Abyssinica SIL sample

If you’d rather have ragged-right paragraphs and not insert any extra spacing, you can tell TeX that it’s allowed to break lines at the end of words by inserting \linebreak[1] or \hspace{0} instead of spaces. Or you could turn down the stretchiness (the second number after WordSpace=) to have more hyphenation and fewer lines ending with the same punctuation.

The text should be an excerpt from the UN Declaration on Human Rights, not that I’d know it from the Generations of Adam. I also took the liberty of redefining the section-header style, since Abysinnica SIL does not come in bold.

Davislor
  • 44,045
  • You've raised an interesting point. I'd like to investigate it and I've opened an issue for babel: https://github.com/latex3/babel/issues/66 . – Javier Bezos May 03 '20 at 11:27
4

According to Approaches to line breaking when word separators are used, Ethiopic wraps after any character, except with the separators. This is what the hyphenation rules currently available for Amharic are supposed to do, but for some reason neither xetex nor luatex seem to find many breaking points in my tests.

EDIT. Found, at least for luatex. The lccode and catcode of the separators are not appropriate. Here is a new preamble using the mechanism already available for South East Asian scripts:

\documentclass[12pt, twocolumn]{article}

\usepackage[english]{babel}

\lccode`፡=`፡  \catcode`፡=11
\lccode`።=`። \catcode`።=11

\babelprovide[import,
  onchar = fonts ids,
  typography/intraspace = 0 .1 0,
  typography/linebreaking = s, 
  characters/ranges = 1200..139F 2D80..2DDF AB00..AB2F,
  ]{amharic}

\babelfont[amharic]{rm}{FreeSerif}

I'll fix the babel style in a next release to incorporate these settings. With this preamble and the body below, I get:

enter image description here

ORIGINAL POST CONTINUES

I'll investigate what's happening and a proper solution (which, I think, will be very close to that for Thai), but in the meanwhile here a workaround which I prepared as an alternative to Davislor's one, based on it, but with babel and luatex in case this a valid option for you. It uses the babel tool for non-standard hyphenation:

\documentclass[12pt, twocolumn]{article}

\usepackage[english]{babel}

\babelprovide[import,
  onchar = fonts ids,
  hyphenrules = +, % Use empty patters
  ]{amharic}

\babelposthyphenation{amharic}{([ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖሗመሙሚማሜምሞሟሠሡሢሣሤሥሦሧረሩሪራሬርሮሯሰሱሲሳሴስሶሷሸሹሺሻሼሽሾሿቀቁቂቃቄቅቆቈቊቋቌቍበቡቢባቤብቦቧቨቩቪቫቬቭቮቯተቱቲታቴትቶቷቸቹቺቻቼችቾቿኀኁኂኃኄኅኆኈኊኋኌኍነኑኒናኔንኖኗኘኙኚኛኜኝኞኟአኡኢኣኤእኦኧከኩኪካኬክኮኰኲኳኴኵኸኹኺኻኼኽኾወዉዊዋዌውዎዐዑዒዓዔዕዖዘዙዚዛዜዝዞዟዠዡዢዣዤዥዦዧየዩዪያዬይዮደዱዲዳዴጼጽጾጿፀፁፂፃፄፅፆፈፉፊፋፌፍፎፏፐፑፒፓፔፕፖፗ])}{
 { no = {1}, post = {1} }
}

\babelfont[amharic]{rm}{FreeSerif}

\usepackage{newunicodechar}

\newunicodechar{፡}{፡\hskip0pt plus 3pt\relax}
\newunicodechar{።}{።\hskip0pt plus 3pt\relax}

\begin{document}

This is english text. An Amharic word is በዩኔስኮ and another one is
ተዘጋጅቶ.

\selectlanguage{amharic}

በዩኔስኮ፡ተዘጋጅቶ፡በኢትዮጵያ፡ብሄራዊ፡ኮሚሽን፡ተተረጎመ

የሰው፡ልጅ፡ሁሉ፡ሲወለድ፡ነጻና፡በክብርና፡በመብትም፡እኩልነት፡ያለው፡ነው።፡የተፈጥሮ፡ማስተዋልና፡ሕሊና፡ስላለው፡አንዱ፡ሌላውን፡በወንድማማችነት፡መንፈስ፡መመልከት፡ይገባዋል።

እያንዳንዱ፡ሰው፡የዘር፡የቀለም፡የጾታ፡የቋንቋ፡የሃይማኖት፡የፖለቲካ፡ወይም፡የሌላ፡ዓይነት፡አስተሳሰብ፡የብሔራዊ፡ወይም፡የኀብረተሰብ፡ታሪክ፡የሀብት፡የትውልድ፡ወይም፡የሌላ፡ደረጃ፡ልዩነት፡ሳይኖሩ፡በዚሁ፡ውሳኔ፡የተዘረዘሩት፡መብቶችንና፡ነጻነቶች፡ሁሉ፡እንዲከበሩለት፡ይገባል።

ከዚህም፡በተቀረ፡አንድ፡ሰው፡ከሚኖርበት፡አገር፡ወይም፡ግዛት፡የፖለቲካ፡የአገዛዝ፡ወይም፡የኢንተርናሽናል፡አቋም፡የተነሳ፡አገሩ፡ነጻም፡ሆነ፡በሞግዚትነት፡አስተዳደር፡ወይም፡እራሱን፡ችሎ፡የማይተዳደር፡አገር፡ተወላጅ፡ቢሆንም፡በማንኛውም፡ዓይነት፡ገደብ፡ያለው፡አገዛዝ፡ሥር፡ቢሆንም፡ልዩነት፡አይፈጸምበትም።

እያንዳንዱ፡ሰው፡የመኖር፣፡በነጻነትና፡በሰላም፡የመኖሩ፡መጠበቅ፡መብት፡አለው።

\end{document}

enter image description here

Javier Bezos
  • 10,003
  • 1
    Thanks for addressing the bug! – Davislor May 03 '20 at 19:43
  • Thank you all for your response and for your help; and sorry for the belated response. (When my question was being migrated to this forum I didn't have the permission to add a comment on the responses presented to me till now.) That said the issue I have been experiencing was essentially addressed by Davislor. Working with Mac (middle 2015) and latest tex 2015, I was not able to compile a bilingual text (Amharic and English) as all of the Amharic texts were coming out as a bunch of question marks; now with the Davislor's suggestion that appears no longer to be the issue. – KGG May 06 '20 at 20:46
  • It's true that Amharic has a character space issue and the "colon" type character although no longer issued on modern texts it is definitely useful to access older Amharic texts. One thing I still couldn't understand is the purpose of \sloppypar in this context; also I have experiencing a new issue; i.e., the mismatch between TIPA and Fontspec as the latter is helpful for me to create IPA environments for linguistic fonts. I do recognize the suggestion presented here https://tex.stackexchange.com/questions/249146/more-problems-with-tipa-and-fontspec but that doesn't do it all – KGG May 06 '20 at 20:59
  • The error I'm getting is something to do with the clash on \sups . It looks tipa and fontspec packages are defining this command to do something (for a different purpose??); but when tipa package is loaded after fontspec, the error message seems to go away; however, the functionality of tipa remains limited: some characters come out as a bunch of small boxes – KGG May 06 '20 at 21:07
  • I'm currently preparing the new version for Amharic, which by default will follow the modern practice. See https://github.com/latex3/babel/wiki/What's-new-in-babel-3.44 . I'll release it in 1-2 weeks. – Javier Bezos May 07 '20 at 10:19
  • @JavierBezos Thank you; I'll have a look at it – KGG May 08 '20 at 18:43