5

Is there a function in expl3 which detects lexicographic (alphabetic) ordering, e.g. returns TRUE for "Abel" precedes "able"? Browsing through the interface3 document suggests there is not. If there were it would save me the trouble of writing a base 26 conversion (and which would only be good for short words anyway).

Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036

2 Answers2

12

At present there are no built-in methods for such textual comparisons (there are generic sorting wrappers but one has to supply the comparison code). The reason for this is that sorting is complex: rules are language-specific and need a lot of Unicode data.

For simple sorting in English one can use the fact that the character codes for letters are 'correctly' ordered. This can be exploited using the \pdfstrcmp primitive in pdfTeX, which is accessible using a documented-but-internal function in expl3 (i.e. we might change the name here!). This might lead to

\documentclass{article}
\usepackage{xparse}
\ExplSyntaxOn
\NewDocumentCommand \printsorted { m }
  { \rn_sort:n {#1} }
\clist_new:N \l__rn_sort_clist
\cs_new_protected:Npn \rn_sort:n #1
  {
    \group_begin:
      \clist_set:Nn \l__rn_sort_clist {#1}
      \clist_sort:Nn \l__rn_sort_clist
        {
          \int_compare:nNnTF { \__str_if_eq:nn {##1} {##2} } < 0
            { \sort_return_same: }
            { \sort_return_swapped: }
        }
      \clist_use:Nn \l__rn_sort_clist { , }
    \group_end:
  }
\ExplSyntaxOff
\begin{document}
\printsorted{Abel, abel}
\printsorted{abel, Abel}
\printsorted{abel,Bee}
\end{document}

Notice that as we are using only character codes we end up with all strings starting with a capital before those starting with a lower case letter: one could make a more sophisticated version with a two-part sort to deal with this.


The reason that \__str_if_eq:nn is internal is that it's basically direct access to \pdfstrcmp but set up to work with pdfTeX, XeTeX, LuaTeX and e-(u)pTeX. In particular, the team only ever use this to test for equality, not for ordering, precisely because of the issues related to language and case.

Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036
7

Somthing similar can be achieved using LuaTeX. Sorting is probably much faster in Lua.

To parse the comma list I use utilities.parsers.settings_to_array from ConTeXt. To do so, I have to import the necessary ConTeXt core files.

As one can see from the example, the sorting is not Unicode aware. However, Lua’s table.sort receives two arguments, namely the table to be sorted and an optional sorting function. Here one could implement an own sorting function which does the correct sorting depending on the language.

\documentclass{article}
\usepackage{fontspec}

\directlua{ % Use ConTeXt libraries
dofile(kpse.find_file("l-lpeg.lua"))
dofile(kpse.find_file("util-sto.lua"))
dofile(kpse.find_file("util-prs.lua"))
}

\directlua{
function printsorted(keywords)
    local input = utilities.parsers.settings_to_array(keywords)
    table.sort(input)
    tex.sprint(table.concat(input,", "))
end
}

\newcommand\printsorted[1]{%
  \directlua{printsorted(\unexpanded{[==[#1]==]})}}

\begin{document}
\printsorted{Übermut, Angst, Okay, ungenau, Ökonom, Ärger}
\end{document}
Henri Menke
  • 109,596
  • 2
    Perhaps worth noting that LuaTeX is not 'natively' Unicode nor does it have any built-in libraries for locale-based sorting (cf. Perl for example) – Joseph Wright Dec 15 '16 at 09:08
  • @JosephWright Noted + example added. – Henri Menke Dec 15 '16 at 09:26
  • @HenriMenke An interesting alternative, thanks, but I am steering clear of LuaTex for the time being and will therefore run with expl3 and l3sort. LuaTex is certainly on my agenda, in this context and a number of others. Also, speed will be an important consideration. – Reinhard Neuwirth Dec 15 '16 at 20:46