9

I use command texindy -L icelandic -M lang/icelandic/utf8 dict_main.idx to create a list of the names of the photographes, their authors and licenses. But the sorting is not correct (for example Icelandic alphabet ends in this letter order þ æ ö)

MWE:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[]{makeidx}
\usepackage[icelandic, czech]{babel}
\makeindex
\begin{document}
Hello
\index{Þari - Franz Eugen Kohler, Public Domain}
\index{Þistill - ŠARŽÍK František, COPYRIGHT/PD}
\index{Önd - Karney, Lee, PD}
\index{Æðarkóngur - Whitehouse, Laura L., PD}
\index{Avókadó - Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur - Arnstein Rønning, CC BY-SA 3.0}
\index{Asni - Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á - hvalur.org, CC Unported Licence}
\index{Álft - Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka - Jack Spellingbacon from Scotland, CC BY-SA 3.0}

\printindex
\end{document}

Running this commands:

  pdflatex test.tex
  texlua utftexindy.lua -L icelandic test.idx
  pdflatex test.tex

Details of the code in this answer.

chejnik
  • 1,441
  • 2
  • 23
  • 42
  • 3
    xindy needs input in utf8 encoding, but idx file produces by pdflatex contains TeX sequences for diacritics. you can try https://github.com/michal-h21/iec2utf#utftexindylua for workaround – michal.h21 Mar 13 '14 at 09:39
  • I have used texlua utftexindy.lua -L icelandic myfilename.idx and it has sorted þ, æ, ö properly but not the letters with accent (á, é, í, ó, ú, ý). Thank you for that script Michal. – chejnik Mar 13 '14 at 10:46
  • 1
    it is strange, could you please provide mwe? – michal.h21 Mar 13 '14 at 11:01
  • 2
    @I see, Á should go after A. it seems that it is xindy bug, I get the same result even with xelatex. maybe it would be best to report this issue on xindy mailing list – michal.h21 Mar 13 '14 at 13:31
  • I have sent question to xindy mailing list. – chejnik Mar 14 '14 at 07:46

1 Answers1

10

I tried to create an index with lualatex. I turned off several packages, esp. babel as there are some hard-coded declarations of characters as shorthands and it wasn't working well with utf-8 coded characters. You probably need those packages for typesetting, but in some cases we may omit them just for index generation.

If we check make-rules-alphabets (it is written in Perl), we spot this line in the /alphabets/icelandic/utf8.pl.in file:

['A', ['a','A'],['á','Á']@u{,['ǫ́','Ǫ́']}],

As I understand it, the @u{} part makes the letters equal for some sorting phase(s). It is included in some other lines:

['E', ['e','E']@u{,['ę','Ę']},['ë','Ë'],['é','É']],
['Æ', ['æ','Æ']@u{,['ǽ','Ǽ'],['ę́','Ę́'],['ǿ','Ǿ']},['œ','Œ'],['ä','Ä']],
['Ö', ['ö','Ö'],['ø','Ø']@u{,['ǫ','Ǫ']}],

We may expect the same behavior. So A and Á as well as Æ and Ǽ are equal at some point. If it is not a desired sorting, xindy community is probably fixing it. I am not certain of that, but it is frequent part of code in alphabets/general/utf8.pl.in where diacritics for sorting purposes is ignored very often.

I believe there is a small bug/mistyping though. It is common to use capital letters for word groups in index, but there is:

I believe this is wrong: ['ð', ['ð','Ð']],
And this should be the correct form: ['Ð', ['ð','Ð']],

We may spot it in the following example too that there is a small letter eth, not the capital letter. I enclose the TeX code and a preview of page 2. We run these lines:

lualatex mal-icelandic.tex
xindy -M texindy -L icelandic -C utf8 mal-icelandic.idx
lualatex mal-icelandic.tex

%! lualatex mal-icelandic.tex
%! xindy -M texindy -L icelandic -C utf8 mal-icelandic.idx
%! lualatex mal-icelandic.tex
% or with two changes: +xltxtra and -luatextra, we run xelatex
\documentclass{article}
%\usepackage[T1]{fontenc}
%\usepackage[utf8]{inputenc}
%\usepackage[icelandic,czech]{babel}
\usepackage{luatextra} % for lualatex run
%\usepackage{xltxtra} % for xelatex run
\usepackage[colorlinks]{hyperref}%hyperindex=false
\usepackage{makeidx}
\makeindex
\begin{document}
The first paragraph of text\ldots
\index{Þari - Franz Eugen Kohler, Public Domain}
\index{Þistill - ŠARŽÍK František, COPYRIGHT/PD}
\index{Önd - Karney, Lee, PD}
\index{Æðarkóngur - Whitehouse, Laura L., PD}
\index{Avókadó - Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur - Arnstein Rønning, CC BY-SA 3.0}
\index{Asni - Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á - hvalur.org, CC Unported Licence}
\index{Álft - Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka - Jack Spellingbacon from Scotland, CC BY-SA 3.0}
\index{Å - a fake index entry}
% a bug? ['ð',  ['ð','Ð']],
\index{Ð - another fake index entry}
\index{E - a testing index entry}
\index{Ǽ a fake}
\begingroup
\pagestyle{empty}
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

mwe


Edit 1: An improved, general version (Icelandic sorting + European Western style with many letters with diacritics)

Please download these two files to your working directory (I cannot post the first file directly here as it contains some special characters which TeX.SX doesn't display):

wget http://striz7.fame.utb.cz/tex-sx/is/icelandicmal.xdy
wget http://striz7.fame.utb.cz/tex-sx/is/icelandicmal-test.xdy

I have created a new set of sorting rules for Icelandic language mixed with general sorting rules for the Western Europe. So you can find letter groups as C, Q, W, Z and Å even if they are not in Icelandic alphabet. There are many letters with diacritics added so sorting words in Czech, Slovak, Polish, German and probably many more languages is taken into account (see general sorting in Xindy).

To get a list of letters (letter groups, order of letters) I use:

lualatex typesetme.tex

I run these lines to get an index:

lualatex mal-icelandicmal.tex
xindy -M texindy -M icelandicmal-test -M mal-style mal-icelandicmal.idx
lualatex mal-icelandicmal.tex

This is a list of letters (code, preview) and an example of index (code, preview). Please test it if it fits your needs.

%! lualatex typesetme.tex
\documentclass[a4paper]{article}
\pagestyle{empty}
\usepackage{luatextra}
\newenvironment{alphabet}{\begin{tabular}{*{16}{l}}%
   }{\end{tabular}}
\addtolength{\voffset}{-1in}
\addtolength{\textheight}{1in}

\begin{document}
\section{Icelandicmal}
\subsection{Alphabet}
\begin{alphabet}
a\,A\\
á\,Á & à\,À & ă\,Ă & â\, & ã\,à & ä\,Ä & ą\,Ą\\
b\,B\\
c\,C & č\,Č & ć\,Ć & ĉ\,Ĉ & ç\,Ç\\
d\,D & ď\,Ď\\
ð\,Ð & đ\,Đ\\
e\,E\\
é\,É & è\,È & ě\,Ě & ê\,Ê & ë\,Ë & ę\,Ę\\
f\,F\\
g\,G & ĝ\,Ĝ & ğ\,Ğ\\
h\,H & ĥ\,Ĥ & ı\,I\\
i\,I\\
í\,Í & ì\,Ì & î\,Î & ï\,Ï\\
j\,J & ĵ\,Ĵ\\
k\,K\\
l\,L & ĺ\,Ĺ & ľ\,Ľ & ł\,Ł\\
m\,M\\
n\,N & ń\,Ń & ň\,Ň & ñ\,Ñ\\
o\,O\\
ó\,Ó & ő\,Ő & ò\,Ò\\
p\,P\\
q\,Q\\
r\,R & ŕ\,Ŕ & ř\,Ř\\
s\,S & ś\,Ś & š\,Š & ŝ\,Ŝ & ş\,Ş\\
t\,T & ť\,Ť\\
u\,U\\
ú\,Ú & ù\,Ù & ŭ\,Ŭ & ů\,Ů & û\,Û & ü\,Ü & ű\,Ű\\
v\,V\\
w\,W\\
x\,X\\
y\,Y\\
ý\,Ý & ÿ\,Ÿ\\
z\,Z & ź\,Ź & ż\,Ż & ž\,Ž\\
þ\,Þ\\
æ\,Æ & ǽ\,Ǽ & œ\,Œ\\
ö\,Ö & ø\,Ø & ǿ\,Ǿ & ô\,Ô & õ\,Õ\\
å\,Å
\end{alphabet}
\subsection{Ligatures}
\begin{flushleft}
`ß' is sorted like `s\,s', but \emph{after} it in otherwise equal words.
\end{flushleft}
\subsection{Upper-/lowercase words}
Capitalized or uppercase words are sorted \emph{before} otherwise equal lowercase words.
\subsection{Special characters}
The order of special characters and letters is:
\begin{flushleft}
?\hspace{4mm}!\hspace{4mm}.\hspace{4mm}letters\hspace{4mm}-\hspace{4mm}'
\end{flushleft}
\end{document}

a preview of set with letters

%! lualatex mal-icelandicmal.tex
%! xindy -M texindy -M icelandicmal-test -M mal-style mal-icelandicmal.idx 
%! lualatex mal-icelandicmal.tex
% or with two changes: +xltxtra and -luatextra, we run xelatex
\documentclass{article}
%\usepackage[T1]{fontenc}
%\usepackage[utf8]{inputenc}
%\usepackage[icelandic,czech]{babel}
\usepackage{luatextra} % for lualatex run
%\usepackage{xltxtra} % for xelatex run
\usepackage[colorlinks]{hyperref}%hyperindex=false
\usepackage{makeidx}
\makeindex
\usepackage{filecontents}
\def\mygroup#1{\textbf{#1}}
\begin{filecontents*}{mal-style.xdy}
;; mal-style.xdy
(markup-letter-group :open-head "~n\mygroup{" :close-head "}")
\end{filecontents*}

\begin{document}
The first paragraph of text\ldots
\index{Þari -- Franz Eugen Kohler, Public Domain}
\index{Þistill -- Šaržík František, COPYRIGHT/PD}
\index{Önd -- Karney, Lee, PD}
\index{Æðarkóngur -- Whitehouse, Laura L., PD}
\index{Avókadó -- Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur -- Arnstein Rønning, CC BY-SA 3.0}
\index{Asni -- Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á -- hvalur.org, CC Unported Licence}
\index{Álft -- Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka -- Jack Spellingbacon from Scotland, CC BY-SA 3.0}
\index{Å -- a fake index entry}
\index{Ð -- another fake index entry}
\index{E -- a testing index entry}
\index{Ǽ a fake}
\begingroup
\pagestyle{empty}
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

mwe


Edit 2: A minimalistic version (Icelandic sorting and its 32 letters only)

Please download two new files:

wget http://striz7.fame.utb.cz/tex-sx/is-min/icelandicmalmin.xdy  
wget http://striz7.fame.utb.cz/tex-sx/is-min/icelandicmalmin-test.xdy  

We run the following four lines:

pdflatex mal-icelandicmalmin.tex  
texlua iec2utf.lua <mal-icelandicmalmin.idx >mal-temp.idx  
xindy -M texindy -M icelandicmalmin-test -M mal-style -o mal-icelandicmalmin.ind mal-temp.idx  
pdflatex mal-icelandicmalmin.tex  

The iec2utf.lua library programmed by michal.h21 is working well. In case you would like to review the included letters, please run:

pdflatex typesetme.tex

I enclose two new files tested with pdflatex, a list of letters in Xindy's style and a preview of sample Icelandic index.

The typesetme.tex file:

%! pdflatex typesetme.tex
\documentclass[a4paper]{article}
\pagestyle{empty}
%\usepackage{luatextra}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\newenvironment{alphabet}{\begin{tabular}{*{16}{l}}%
   }{\end{tabular}}
\addtolength{\voffset}{-1in}
\addtolength{\textheight}{1in}

\begin{document}
\section{Icelandicmalmin}
\subsection{Alphabet}
\begin{alphabet}
a\,A\\
á\,Á\\
b\,B\\
d\,D\\
ð\,Ð\\
e\,E\\
é\,É\\
f\,F\\
g\,G\\
h\,H\\
i\,I\\
í\,Í\\
j\,J\\
k\,K\\
l\,L\\
m\,M\\
n\,N\\
o\,O\\
ó\,Ó\\
p\,P\\
r\,R\\
s\,S\\
t\,T\\
u\,U\\
ú\,Ú\\
v\,V\\
x\,X\\
y\,Y\\
ý\,Ý\\
þ\,Þ\\
æ\,Æ\\
ö\,Ö
\end{alphabet}
%\subsection{Ligatures}
%\begin{flushleft}
%`ß' is sorted like `s\,s', but \emph{after} it in otherwise equal words.
%\end{flushleft}
\subsection{Upper-/lowercase words}
Capitalized or uppercase words are sorted \emph{before} otherwise equal lowercase words.
\subsection{Special characters}
The order of special characters and letters is:
\begin{flushleft}
?\hspace{4mm}!\hspace{4mm}.\hspace{4mm}letters\hspace{4mm}-\hspace{4mm}'
\end{flushleft}
\end{document}

The mal-icelandicmalmin.tex file:

%! pdflatex mal-icelandicmalmin.tex
%! texlua iec2utf.lua <mal-icelandicmalmin.idx >mal-temp.idx
%! xindy -M texindy -M icelandicmalmin-test -M mal-style -o mal-icelandicmalmin.ind mal-temp.idx
%! pdflatex mal-icelandicmalmin.tex
%
% iec2utf.lua <--- https://github.com/michal-h21/iec2utf
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[icelandic,czech,english]{babel}
%\usepackage{luatextra} % for lualatex run
%\usepackage{xltxtra} % for xelatex run
\usepackage[colorlinks]{hyperref}%hyperindex=false
\usepackage{makeidx}
\makeindex
\usepackage{filecontents}
\def\mygroup#1{\textbf{#1}}
\begin{filecontents*}{mal-style.xdy}
;; mal-style.xdy
(markup-letter-group :open-head "~n\mygroup{" :close-head "}")
\end{filecontents*}

\begin{document}
The first paragraph of text\ldots
\index{Þari -- Franz Eugen Kohler, Public Domain}
\index{Þistill -- Šaržík František, COPYRIGHT/PD}
\index{Önd -- Karney, Lee, PD}
\index{Æðarkóngur -- Whitehouse, Laura L., PD}
\index{Avókadó -- Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur -- Arnstein Rønning, CC BY-SA 3.0}
\index{Asni -- Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á -- hvalur.org, CC Unported Licence}
\index{Álft -- Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka -- Jack Spellingbacon from Scotland, CC BY-SA 3.0}
\index{Ð -- another fake index entry}
\index{E -- a testing index entry}
\index{É -- a testing index entry}
\begingroup
\pagestyle{empty}
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

mwe

Malipivo
  • 13,287
  • 1
    A and Á (E and É etc.) are separate letters in Icelandic alphabet. I have contacted xindy community but I have no idea about the progress. – chejnik Apr 09 '14 at 05:42
  • I see, the letters with diacritics are probably not the new word groups, but they enter sorting phase earlier. If your problem is still active, we could prepare new xdy style. – Malipivo Apr 09 '14 at 05:49
  • 1
    Thank you, it would be great help. The letters with diacritics (namely in active use - á, é, í, ó, ú, ý - are new word groups ([wikipedia] (http://en.wikipedia.org/wiki/Icelandic_alphabet). – chejnik Apr 09 '14 at 07:25
  • @chejnik I think that there might be request for both versions (similar to small and large version in Slovak in Xindy). The letters with acute accent are used as letter groups in Teach Yourself Icelandic (Jónsdóttir, 2004) and Icelandic for Beginners (Bartoszek, Tran, 1997), and, they are not used in A New Introduction to Old Norse, Part III (Faulkes, 2007); Icelandic (Glendening, 1986) and in Icelandic: Grammar, Texts, Glossary (Einarsson, 1949). – Malipivo Apr 09 '14 at 08:42
  • 1
    I see, it is great to consider all options. To complete the theoretical background - all modern Icelandic dictionaries recognize the letters with diacritics as separate letters. – chejnik Apr 09 '14 at 08:51
  • 1
    It is an interesting problem, I will post a sample when I am ready. I have noticed you are adding Czech (and maybe Slovak) terms/names into your Icelandic index, I will add those extra accented letters to the xdy file too. A regular user with only Icelandic terms won't recognize it, but it might help you in your work. Well, it might not be needed afterall as the first term in index is always a name in Icelandic. But that Icelandic name could occur several times and then extended version of xdy file would be useful. – Malipivo Apr 09 '14 at 09:31
  • Is it possible to include also commands for pdflatex (found in question). It would be very helpful. – chejnik Apr 10 '14 at 13:01
  • Try this: comment out line with \index{Ǽ a fake}, Ǽ is not defined in the inputenc package, then comment out \usepackage[utf8]{inputenc}. Run pdflatex, run xindy, uncomment \usepackage[utf8]{inputenc} and run pdflatex. But it is only workaround, otherwise you need to handle the idx file externally due to TeX sequences (-M tex/inputenc/cp1250 doesn't support all letters, it is not going to help you). Btw, I believe that pdflatex is history, try lualatex format, just add \usepackage{luatextra}. – Malipivo Apr 10 '14 at 13:28
  • I have presumed that xindy is able to order properly index according to Icelandic alphabet (even with pdflatex). I will try to collect points to make nice bounty :). Thank you for you help and (future) suggestions. (last advice I have received was to run this command texlua iec2utf.lua <dict_main.idx | xindy -C utf8 -L icelandic -M texindy -o dict_main.ind - with correct xindy) – chejnik Apr 10 '14 at 13:53
  • That's true (well, xindy and pdflatex are two different programs; xindy is an index engine, pdflatex is a typesetting engine), but I added some more letters with diacritics due to your non-Icelandic words/names. I hoped I am helping, if you need a minimalistic version which can handle just Icelandic language, it could be done as well; to be honest, it would be one small step back. – Malipivo Apr 10 '14 at 13:58
  • I would prefer to run pdflatex (running lualatex caused errors in my original file and I am not able to track them now). If you can provide list of commands that I have to run to achieve the correct ordering, it would help me. – chejnik Apr 10 '14 at 14:09
  • It interests me, what kind of errors? I usually only turn off the fontenc and inputenc packages and I add luatextra among other packages in the preamble. The problem might be interaction with the babel package. I believe I already sent a workaround for pdflatex (tested), otherwise I would need to prepare a minimalistic version of the module for Xindy. – Malipivo Apr 10 '14 at 14:16
  • I have tried to comment out \usepackage[utf8]{inputenc}, run pdflatex dict.tex, run xindy -M texindy -M icelandicmal-test -M mal-style dict.idx, uncomment \usepackage[utf8]{inputenc} and run twice pdflatex dict.tex. But without inputenc the generated file has app. 720 pages, with inputenc 630 pages - the index is correctly ordered but refer to wrong pages. Any new suggestions? – chejnik Apr 11 '14 at 05:26
  • 1
    Yes, it was expected. I am working on minimalistic version of style for Xindy and I will test it with pdflatex. – Malipivo Apr 11 '14 at 05:30
  • This answer is worthy a larger bounty! – chejnik Apr 11 '14 at 17:29
  • My pleasure. I have noticed that @u{} operator in the original make rule file has nothing to do with diacritics and their sorting, it is probably sort of unicode protection of a letter for not creating a ligature with other letters at a Xindy/Perl level. – Malipivo Apr 11 '14 at 18:42
  • @Malipivo Could you take a look at my related question here: http://tex.stackexchange.com/questions/284527/how-can-i-use-dtlsort-with-different-alphabets? It's a similar sorting problem. – Azor Ahai -him- Dec 25 '15 at 02:35
  • @Malipivo Your server seems to be down when I try to retrieve the "general version" icelandicmal.xdy and icelandicmal-test.xdy. The minimalistic version is reshared at https://github.com/paolobrasolin/tex-dicISCZ and works nicely but I need international letters, too. Would you mind rehosting the "general version" somewhere? Thanks a lot! – Florian Nov 06 '18 at 13:07
  • Or if anybody else might have saved the icelandicmal.xdy I would really appreciate if you would reshare it! For some weird reason archive.org has saved icelandicmal-test.xdy but not icelandicmal.xdy... – Florian Nov 07 '18 at 09:45
  • 1
    @Florian Temporary solution. My files can be downloaded here, for now: https://uloz.to/!IAZUHKrGZgOW/malipivo-tex-sx-tar – Malipivo May 15 '19 at 15:53
  • @Malipivo so much appreciated! – Florian May 15 '19 at 19:47