10

Trying to make index with Russian words, MWE:

\documentclass{book}
\usepackage[T1, T2A]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[english, russian]{babel}
\usepackage[xindy]{imakeidx}
\makeindex

\begin{document}

\chapter{Первая}

\index{notepad}
\index{apple}
\index{часть}
\index{дерево}
\index{электрон}

\printindex

\end{document}

I use MiKTeX (full and updated), run: latexmk.exe -pdf file.tex

This gives me file.pdf with russian words in Index, but they are sorted in wrong way. file.idx is:

\indexentry{notepad}{1}
\indexentry{apple}{1}
\indexentry{\IeC {\cyrch }\IeC {\cyra }\IeC {\cyrs }\IeC {\cyrt }\IeC {\cyrsftsn }}{1}
\indexentry{\IeC {\cyrd }\IeC {\cyre }\IeC {\cyrr }\IeC {\cyre }\IeC {\cyrv }\IeC {\cyro }}{1}
\indexentry{\IeC {\cyrerev }\IeC {\cyrl }\IeC {\cyre }\IeC {\cyrk }\IeC {\cyrt }\IeC {\cyrr }\IeC {\cyro }\IeC {\cyrn }}{1}

And I see the problem is that LaTeX makes index according to these {\cyrch}, {\cyre} etc. I thought using utf8 encoding and xindy could solve all the problems with languages. How to make russian index with correct sorting: дерево, часть, электрон in my case?

Alx
  • 737
  • I think that this post might contain your answer: http://tex.stackexchange.com/questions/22669/biblatex-sorting-alphabetically-with-non-latin-characters-%C5%9A I think that XeLaTeX has better multiple language support. – A Feldman Feb 11 '16 at 14:07
  • What options do you pass to xindy? – egreg Feb 11 '16 at 14:12

3 Answers3

8

Due to some constraints, the \index command doesn't work very well with UTF-8 characters (something which could only be solved with a brand new version, I'm afraid.)

You can overcome the issue by doing

\documentclass{book}
\usepackage[T1, T2A]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[english, russian]{babel}
\usepackage[xindy]{imakeidx}
\makeindex

\makeatletter
\newcommand{\rindex}[2][\imki@jobname]{%
  \index[#1]{\detokenize{#2}}%
}
\makeatother

\begin{document}

\chapter{Первая}

\rindex{notepad}
\rindex{apple}
\rindex{часть}
\rindex{дерево}
\rindex{электрон}

\printindex

\end{document}

Calling

texindy -L russian -C utf8 <filename>.idx

produces the index as expected:

enter image description here

egreg
  • 1,121,712
  • thank you for answer. But, sorry, doesn't work for me. I just used your MWE. – Alx Feb 11 '16 at 14:54
  • @Alx “Doesn't work” in the sense that your computer screen becomes flashy yellow or… – egreg Feb 11 '16 at 15:02
  • 1
    thank you for answer. But, sorry, doesn't work for me. I just used your MWE. pdflatex 111.tex gives 111.idx with the same {\cyr} symbols, i.e. not readable (no words in Russian). Then, as you suggested, texindy -L russian -C utf8 111.idx. This gives such error: ERROR: CHAR: index 0 should be less than the length of the string and 111.ind of zero length, so that running pdflatex again prints only first page with Chapter, but no Index at all. Does this concern with my MiKTeX distribution, or utf8? I could use another encoding (cp1251, eg.) if that helps ... – Alx Feb 11 '16 at 15:06
  • 1
    @Alx I'm very sorry: I left in a line I used for debugging. Try now. – egreg Feb 11 '16 at 15:09
  • OK, thanks, this works now! But, there is some difference in English and Russian parts of Index. I mean each Russian word (group of words) has corresponding first letter in bold before, and English words are printed in common list. Can all parts of Index (all languages) be printed in a similar way? – Alx Feb 11 '16 at 15:27
  • @Alx I don't think Xindy supports more than this. You should make two indices, I guess. – egreg Feb 11 '16 at 15:37
  • 1
    many tanks for your help. BTW, I just tried xelatex instead of pdflatex (I saw @AFeldman pointing to similar topic about sorting non-latin characters) -- great, latexmk -xelatex <file>.tex makes Index with right sorting with my oroginal file (without your \detokenize block). Probably, I should use xelatex if I work with utf8. – Alx Feb 11 '16 at 15:45
  • I really wish that everything could be done in pdflatex, or using lualatex, because generally you can compile all your LaTeX code using lualatex, but not so with xelatex and microtype does not work well (still?) in xelatex. But is seems there are some things that xelatex just is better suited for, like multiple language support. – A Feldman Feb 11 '16 at 15:50
  • Hi, getting this error with your setup: ERROR: Could not find file "tex/inputenc/utf8.xdy" – Yola Sep 12 '16 at 16:25
  • Ah! It works with this command texindy -M lang/ukrainian/utf8-lang.xdy wrindex.idx – Yola Sep 12 '16 at 16:29
  • How can I troubleshoot the error "ERROR: CHAR: index 0 should be less than the length of the string"? I did a search and, of course, I do not have any empty index entries. Is this tool lacking diagnostics, or is there a method to find exactly what it does not like? – SuperAl Dec 16 '19 at 00:27
4

OK, thanks, this works now! But, there is some difference in English and Russian parts of Index. I mean each Russian word (group of words) has corresponding first letter in bold before, and English words are printed in common list. Can all parts of Index (all languages) be printed in a similar way?

I managed this with changing the \makeindex command in egreg's answer to:

\makeindex [options = -L russian -C utf8 -M latin-alph.xdy]

Here for this command could work you should pass -enable-write18 key to pdfLaTeX. Alternatively, you could run texindy manually passing -M latin-alph.xdy key to it.

Here latin-alph.xdy file should look like this:

(define-letter-groups
  ("a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
   "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"))

(require
  "rules/latin-tolower.xdy")

(use-rule-set
  :run 0
  :rule-set ("latin-tolower"))

(markup-letter-group
  :open-head "~n~n  \textbf {\Large "
  :close-head "}~n  \nopagebreak"
  :capitalize)

And the result is:

Index

stu003
  • 166
3

OK, thanks, this works now! But, there is some difference in English and Russian parts of Index. I mean each Russian word (group of words) has corresponding first letter in bold before, and English words are printed in common list. Can all parts of Index (all languages) be printed in a similar way?

Just one more improving. You can use this simple solution.

(require "lang/english/utf8.xdy")
(require "lang/russian/utf8.xdy")

(define-sort-rule-orientations (forward backward forward forward))
(use-rule-set   
    :run 0
    :rule-set (
        "en-alphabetize" 
        "ru-alphabetize" 
        "en-ignore-special" 
        "ru-ignore-special"
    )
)   
(use-rule-set 
    :run 1
    :rule-set (
        "en-resolve-diacritics"  
        "ru-resolve-diacritics" 
        "en-ignore-special" 
        "ru-ignore-special"
    )
)
(use-rule-set 
    :run 2
    :rule-set (
        "en-resolve-case" 
        "ru-resolve-case" 
        "en-ignore-special"
        "ru-ignore-special"
    )
)
(use-rule-set 
    :run 3
    :rule-set (
        "en-resolve-special"
        "ru-resolve-special"
    )
)

Store this code to multilingual.xdy. And run:

texindy -C utf8 -M multilingual.xdy -o you-paper.ind you-paper.idx

Note: You shouldn't pass language to texindy (i.e. -L russian.), because multilingual.xdy contains description of both languages.

\makeindex [options = -L russian -C utf8 -M latin-alph.xdy]

If you prefer use imakeidx code will be

\makeindex [options = -C utf8 -M multilingual.xdy]

The result will be with latin-first sort order. result of multilingual.xdy

By the way, you can change sort order. Just swap language instructions in rule-sets.