I'm trying to create a glossary in some LaTeX variant, and I want to insert etymologies like this:
دَفْتَر • (daftar) m (plural دَفَاتِر (dafātir)): From Middle Persian dptl (daftar), from Aramaic דפתרא / ܕܦܬܪܐ, from Ancient Greek διφθέρα (diphthéra).
The raw text looks fine in my browser, and in Gedit, and in my terminal. The fonts are all there. However, when I try to follow various instructions to typeset languages in LaTeX I run into problems. I should use XeTeX, which is OK, but then I have to run a bunch of commands like \newfontfamily and \setotherlanguage and then wrap "language-switching" commands like \textgreek and \texthebrew and \textarabic around every language transition. I can't even find a \textaramaic command, does the script have a different name?
I'm not very concerned about appearances, I just want something that prints a formatted glossary where the various words in each language are legible. Is there no "plug-and-play" method for handling multilingual Unicode text in TeX-derived typesetting languages?
Maybe the best solution would be to output HTML and print from the browser? I guess the other alternative is to use Perl to insert TeX markup around each detected language, using a bunch of regular expressions: qr/\p{Arabic}/. But that seems cumbersome...
\newcommand\fromgreek[1]{from Ancient Greek \textgreek{#1}}would allow markup like\fromgreek{διφθέρα}in your entries. – David Carlisle Sep 06 '18 at 20:34ucharclassesand it works but typesets each Arabic word RTL but consecutive words are LTR. This is something that Firefox and Gedit get right without any help (e.g. they don't need Unicode bidi markers to correctly render sequences of RTL words). – Metamorphic Sep 07 '18 at 19:22