Index in Swedish sorting order ( ...YZÅÄÖ...) with xindy and LaTeX

Question

I still haven't succeed to sort Swedish letters in the index list with class article in the correct order.

I've tried it with specifying a sort key, i.e. \index{g~ra@göra}, but the result is not perfect. There will not be any empty lines between the groups of Å, Ä, and Ö sorting. And the extra coding job seems not be up to date 2014.

This way is quite cumbersome. Is there a better way?

Xindy should do the work (easy?), but how do I setup xindy to create an index with Swedish sorting order?

My comment: All texts are not written in English. I like the LaTeX system, it's a lot better than Word in my case, but the sorting troubles me.

michal.h21 · Answer 1 · 2014-01-15T21:40:54.993

12

The problem is that with inputenc and utf8 option, index files are written in a way that xindy cannot process. This sample:

\documentclass[11pt]{article} 
\usepackage[T1]{fontenc}
\usepackage[swedish]{babel}
\usepackage[utf8]{inputenc} 

\usepackage{makeidx}
\makeindex

\begin{document}
This kind of index in text:

\index{Säker plats|textbf}Säker plats
foo\index{Aäö}\index{Aäö}
bar\index{äö}\index{aø}\index{ohne}\index{øæ}
\printindex
\end{document}

produces this .idx file:

\indexentry{S\IeC {\"a}ker plats|textbf}{1}
\indexentry{A\IeC {\"a}\IeC {\"o}}{1}
\indexentry{A\IeC {\"a}\IeC {\"o}}{1}
\indexentry{\IeC {\"a}\IeC {\"o}}{1}
\indexentry{a\IeC {\o }}{1}
\indexentry{ohne}{1}
\indexentry{\IeC {\o }\IeC {\ae }}{1}

you need to convert this code to utf8. It is possible to do this using my script iec2utf. Download the file iec2utf.lua to the directory with your document and make batch script swedxindy:

#!/usr/bin/env sh
texlua iec2utf.lua T1 < `basename $1 .tex`.idx | texindy -i -M lang/swedish/utf8-lang -o `basename $1 .tex`.ind

I don't use texshop so I don't know how to add this script to the menu, but you can call it from command line (you will have to make it executable, I think it can be made with command chmod -x swedxindy like in Linux).

Edit

I simplified the process - there is now script called utftexindy in the iec2utf repository. Process described above can be now simplified to:

texlua utftexindy.lua -L swedish sample.idx

Result:

enter image description here

edited Jan 15 '14 at 21:40

answered Jan 13 '14 at 20:27

michal.h21

50,697

3

This should definitely go to CTAN. – egreg Jan 13 '14 at 20:50
1

@egreg I have quite a long list of packages which I want to post to CTAN, but I am really struggling with documentation. But I plan to do it :) – michal.h21 Jan 13 '14 at 21:02
2

This really ought to be fixed at xindy level given that so many LaTeX users use it. – daleif Jan 15 '14 at 10:05
Michal: Typo on your github resource, an f is missing (must be @echo off). But the real problem is: no ieclib existant here (Win7-64). – Speravir Jan 15 '14 at 22:09
@Speravir ieclib is included in iec2utf respository. I updated documentation, I hope it is more clear now :) – michal.h21 Jan 16 '14 at 09:04
Aargh, how could I overlook this … But it runs now fine here! And the batch file is not needed anymore . BTW I’m a MiKTeX user with this adjustment: How to use Xindy with MiKTeX?. – Speravir Jan 17 '14 at 01:02

score 8 · Answer 2 · edited Apr 13 '17 at 12:35

8

In his answer michal.h21 has identified the problem and found the problem in the way, UTF-8 characters are written, if \usepackage[utf8]{inputenc} is used:

\IeC{<LICR>}% LICR = LaTeX Internal Character Representation

Since package inputenc makes the 8-bit bytes active, they can be redefined to print themselves instead of the \IeC stuff.

Also \index uses verbatim category codes for its argument. LaTeX does not include 8-bit characters in its verbatim category codes, because it has to map the UTF-8 byte sequence to a character slot of a font encoding to get the correct character.

The following example, based on michal.h21's example, patches \index that it does not write expanded UTF-8 characters:

\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[swedish]{babel}
\usepackage[utf8]{inputenc}

\usepackage{makeidx}
\makeindex

\usepackage{etoolbox}
\makeatletter
\patchcmd\index{\@sanitize}{\@sanitize\index@sanitize}{}{%
  \errmessage{Patching \noexpand\index failed}%
}
\let\index@sanitize\@empty
\begingroup
  \count@=127
  \@whilenum\count@<255 \do{%
    \advance\count@\@ne
    \lccode`\*=\count@
    \lccode`\~=\count@
    \lowercase{%
      \expandafter
      \g@addto@macro\expandafter\index@sanitize\expandafter{%
        % verbatim catcode
        \expandafter\@makeother\csname *\endcsname
        % active character expands to non-expandable itself
        \def~{*}%
      }%

    }%  
  }
\endgroup
\makeatother

\begin{document}
This kind of index in text:

\index{Säker plats|textbf}Säker plats
foo\index{Aäö}\index{Aäö}
bar\index{äö}\index{aø}\index{ohne}\index{øæ}

\textbf{foo\index{Aäö}\index{Aäö}}
\printindex
\end{document}

The following .idx file is written:

\indexentry{Säker plats|textbf}{1}
\indexentry{Aäö}{1}
\indexentry{Aäö}{1}
\indexentry{äö}{1}
\indexentry{aø}{1}
\indexentry{ohne}{1}
\indexentry{øæ}{1}
\indexentry{Aäö}{1}
\indexentry{Aäö}{1}

edited Apr 13 '17 at 12:35

Community

1

answered May 06 '14 at 11:28

Heiko Oberdiek

271,626

2

Nice, perhaps this ought to be made into a package – daleif May 06 '14 at 11:35
@daleif: I am thinking about making a package. But I haven't a name yet. Something with UTF-8 is too special, because this method can be used with other encodings as well. Perhaps something like "inputencsanitize". – Heiko Oberdiek May 06 '14 at 11:42
Should it only hit \index? – daleif May 06 '14 at 12:01
@daleif: \index is indeed a prominent application, but there are others, try \typeout, for example. It is useful for any application that writes to a file or console. – Heiko Oberdiek May 06 '14 at 12:14
Hmm, it does not work with memoir because the patching fails. – daleif May 06 '14 at 12:18
because the \@sanitize is in a different macro in mmeoir (\@index ion order to be able to specify multiple index files) – daleif May 06 '14 at 12:23
@daleif: A very first pre-alpha version: iecsan.pdf. (PDF file contains iecsan.dtx as file attachment. Run it through plain TeX to get the package file.) I have to stop now and continue later with the hard part (package compatibility; perhaps I will add an package option index=(true|false) → \usepackage[index]{iecsan}); documentation; ...). – Heiko Oberdiek May 06 '14 at 13:04
Will have a look, BTW: it seems you are loading etoolbox twice. – daleif May 06 '14 at 13:20
1

@daleif: Macro \iecsan is implemented, the remaining part of patching index macros will be rewritten anyway. – Heiko Oberdiek May 06 '14 at 13:28
No problem. The code works fine. – daleif May 06 '14 at 13:50
1

Great work. From first tests, it looks like the pacakge has no effect on when using the indexing option of biblatex (which probably uses its own indexing routines. Any plans for that? – Simifilm May 06 '14 at 14:29
The package doesn't seem to be part of the oberdiek bundle. biblatex still does not "listen" to this patch. – koppor Jan 09 '17 at 06:52

score 6 · Answer 3 · answered Jan 15 '14 at 09:27

Here's an alternative (requires at least version 4.02 of glossaries):

% arara: pdflatex
% arara: makeglossaries
% arara: pdflatex
\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[swedish]{babel}
\usepackage[utf8]{inputenc}

\usepackage[index,xindy]{glossaries}

\makeglossaries

\newterm[name={Säker plats}]{Saker plats}
\newterm[name={Aäö}]{Aao}
\newterm[name={äö}]{ao}
\newterm[name={aø}]{aoslash}
\newterm{ohne}
\newterm[name={øæ}]{oae}

\begin{document}
This kind of index in text:

\gls[format=textbf]{Saker plats}
foo\glsadd{Aao}\glsadd{Aao}
bar\glsadd{ao}\glsadd{aoslash}\glsadd{ohne}\glsadd{oae}

\printindex[style=indexgroup]
\end{document}

This produces:

Image of resulting document

This works because the default behaviour of glossaries is to sanitize the sort key before writing it to the external indexing file, so the .idx file for the above looks like:

(indexentry :tkey (("Säker plats" "\\glossentry{Saker plats}") ) :locref "{}{1}" :attr "pagetextbf" )
(indexentry :tkey (("Aäö" "\\glossentry{Aao}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("Aäö" "\\glossentry{Aao}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("äö" "\\glossentry{ao}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("aø" "\\glossentry{aoslash}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("ohne" "\\glossentry{ohne}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("øæ" "\\glossentry{oae}") ) :locref "{}{1}" :attr "pageglsnumberformat" )

so xindy is able to sort the entries correctly.

The drawback is that each indexed entry must be first defined using \newterm and the label can't contain unicode characters such as ø. (This is because the label forms the names of control sequences that store the entry's information.)

Index in Swedish sorting order ( ...YZÅÄÖ...) with xindy and LaTeX

3 Answers3

Linked