5

I have a list of Chinese strings; for example, like this:

list1 = StringPartition[Import["http://text-share.com/view/c652fa55", "Data"][[-1]], 4]

Now I want to sort it according to Chinese alphabetical order.

There are two equivalent ways, but they differ in efficiency.

The faster way is

AlphabeticSort[new, Entity["Language", "ChineseMandarin"]]; // AbsoluteTiming
(*{0.00261, Null}*)

a much slower way is

SortBy[new, Transliterate]; // AbsoluteTiming
(*{0.465786, Null}*)

However, what if my data is like this:

list2 = Transpose[{list1, Range @ Length @ list1}];

I want to sort this list by the Chinese strings in it. SortBy[list2, Transliterate @ #[[1]] &] is definitely slow. AlphabeticOrder is also slow:

Sort[list2, 
   AlphabeticOrder[#1[[1]], #2[[1]], 
     Entity["Language", "ChineseMandarin"]]>=0 &]; // AbsoluteTiming
(*{0.512303, Null}*)

Is it possible to use AlphabeticSort to sort an arbitrary list to get maximum efficiency?

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
matheorem
  • 17,132
  • 8
  • 45
  • 115

1 Answers1

6

Spelunking the code of AlphabeticSort[] reveals its mechanism: it generates an index list (similar to Ordering[]) that is then used to sort the original list of strings. Extracting the relevant internal code for constructing this index list, we have:

list1 = StringPartition[Import["http://text-share.com/view/c652fa55", "Data"][[-1]], 4];
list2 = Transpose[{list1, Range @ Length @ list1}];

lang = "ChineseMandarin";
args = Prepend[System`AlphabeticOrderDump`convertOptionsToStringSortArguments[
               "Language" -> lang, "MainHeader" -> AlphabeticSort], 
               System`AlphabeticOrderDump`getCollatorID[lang]];
idx = System`AlphabeticOrderDump`callStringsOrderingFunction[{list2[[All, 1]],
                                                              Sequence @@ args}];
list2[[idx]]
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
  • Wow, this is so advanced! I am also interested in the way you Spelunking the code. – matheorem Oct 08 '16 at 03:17
  • 1
    I used PrintDefinitions[]. See this. – J. M.'s missing motivation Oct 08 '16 at 03:20
  • Great tool ! Thanks. But how do you know those functions are under System`AlphabeticOrderDump? – matheorem Oct 08 '16 at 03:23
  • You can mouse over a function name in the definition to see its full name with the contexts. – J. M.'s missing motivation Oct 08 '16 at 03:25
  • That is interesting! Thank you so much! : ) – matheorem Oct 08 '16 at 03:27
  • 1
    Spelunking is an interesting word I have never seen in English. In Germany we use the word "Spelunke" for a restaurant or inn of dubious reputation. The song title "Honky tonk woman" is sometimes translated into German as "Spelunkenweib". It is of latin/greek origin and refers to a cave. – Dr. Wolfgang Hintze Oct 08 '16 at 14:24
  • @Dr., Hintze, that is more or less the context of it; the original meaning was "exploring a cave", and now one can draw parallels between exploring unknown corners of a cave and exploring the deep code underbelly of Mathematica... the English equivalent of your Spelunke would be "hole in the wall"; so, effectively a cave, too! – J. M.'s missing motivation Oct 08 '16 at 14:27
  • @J.M. Hi, J.M. I got a problem when I run this code today. It warns me "Throw::nocatch: Uncaught Throw[AlphabeticOrderUnevaluatedTag,AlphabeticOrderCatchThrowTag] returned to top level." I don't understand why, because several days ago , it works fine. Do you know what is wrong? – matheorem Oct 10 '16 at 04:57
  • Have you tried restarting your Mathematica session, @matheorem? – J. M.'s missing motivation Oct 10 '16 at 05:00
  • @J.M. yes, I have closed it and restart it. So your mathematica has no such warning? – matheorem Oct 10 '16 at 05:09
  • Hmm. Try this, @matheorem: in a fresh session, call AlphabeticSort[list, "ChineseMandarin"] first for some short list with Mandarin strings. Then, run the code that I have. I suspect that this is because these internal functions are not loaded until AlphabeticSort[] is called first. – J. M.'s missing motivation Oct 10 '16 at 05:13
  • @J.M. Great! It works! You are so clever! Thank you! : ) – matheorem Oct 10 '16 at 05:53