22

I came across the following situation:

Evaluating

Sort[{".m", ".a", "co"}]

Results in

{".a", "co", ".m"}

I wondering: What was the criterion that Mathematica use to output this? According to the documentation:

...Sort orders strings as in a dictionary, with uppercase...

After reading this, I expect a dictionary sort like this:

{".a", ".m", "co"}

Or

{"co", ".a", ".m"}

Can anyone explain me what was the criterion used by Mathematica to perform this sort?

Thanks in advance.

5 Answers5

16

According to The Chicago Manual of Style, para. 18.57/18.58, punctation marks are ignored.

18.57

The letter-by-letter system. In the letter-by-letter system, alphabetizing continues up to the first parenthesis or comma; it then starts again after the punctuation point. Word spaces and all other punctuation marks are ignored. Both open and hyphenated compounds such as New York or self-pity are treated as single words. The order of precedence is one word, word followed by a parenthesis, and word followed by a comma, number, or letters. The index to this manual, in accordance with Chicago’s traditional preference, is arranged letter by letter.

I won't say it's a definitive answer, but it supports Mathematica's behavior to a certain extent.

Michael E2
  • 235,386
  • 17
  • 334
  • 747
7

Sort orders strings as in a dictionary, with uppercase versions of letters coming after lowercase ones.Sort places ordinary letters first, followed in order by script, Gothic, double - struck, Greek, and Hebrew.Mathematical operators appear in order of decreasing precedence.

Sort[list, p] applies the function p to pairs of elements in list to determine whether they are in order.The default function p is OrderedQ[{#1, #2}] &.

OrderedQ[h[Subscript[e, 1], Subscript[e, 2], [Ellipsis]]] gives True if the Subscript[e, i] are in canonical order, and False otherwise.

Some Test


Just one Sort without function


Let' s see the CharacterCode of sampleList1

  1. (*Input 1 ==< *)
    sampleList1 = {compute, Tes, ., , Etz, .a, .m, a, z, T, .T, wha, {}; 
    ToCharacterCode[sampleList1]
    
  2. (*
    Output 1 ==>
    {{99,111,109,112,117,116,101},{84,101,115},{46},{},{69,116,122},{46,97},{46,109},{97},{122  
    },{84},{46,84},{119,104,97},{123}}
    *)
    
  3. (*Input 2 ==< *)
    Sort[sampleList1]
    
  4. (*
    Output 2 ==>
    {,{,.,.a,a,compute,Etz,.m,.T,T,Tes,wha,z}
    *)
    
  5. (*Input 3 ==< *)
    ToCharacterCode[%]
    
  6. (*
    Output 3 ==>
    {{},{123},{46},{46,97},{97},{99,111,109,112,117,116,101},{69,116,122},{46,109},{46,84},{84  
    },{84,101,115},{119,104,97},{122}}
    *)
    
  7. (*Input 4 ==< *)
    Total /@ %
    
  8. (*
    Output 4 ==>
    {0,123,46,143,97,765,307,155,130,84,300,320,122}
    *)
    

My thoughts


I do not know whether the default order of Sort is related with CharacterCode.

I guess Mathematica treats some characters orderless and skips some characters in sorting strings with Sort.

I think some characters are treated as trivial elements, and are put before the alphabet.

Though there maybe one order table for all characters whichIdon't know.

My opinion


Firstly, you should show what's the correct answer for a variety lists, as @JonathanShock mentioned in the comment.

Otherwise, each time you come up one result differ from Python, then ask why Mathematica does not work like Python.

I think that is ....

The important thing is how to get the results as expected.

Sort with one function


  1. (*Input 5 ==< *)
    Sort[sampleList1, ToCharacterCode[#1] & ]
    
  2. (*
    Output 5 ==>
    {compute,Tes,.,,Etz,.a,.m,a,z,T,.T,wha,{}
    *)
    
  3. (*Input 6 ==< *)
    ToCharacterCode[%]
    
  4. (*
    Output 6 ==>
    {{99,111,109,112,117,116,101},{84,101,115},{46},{},{69,116,122},{46,97},{46,109},{97},{122  
    },{84},{46,84},{119,104,97},{123}}
    *)
    
  5. (*Input 7 ==< *)
    Total /@ %
    
  6. (*
    Output 7 ==>
    {765,300,46,0,307,143,155,97,122,84,130,320,123}
    *)
    
  7. (*Input 8 ==< *)
    First /@ %%
    
  8. (*
    Output 8 ==>
    {99,84,46,First[{}],69,46,46,97,122,84,46,119,123}
    *)
    

Still does not work well, of course, it maybe wrong, forTotalis not the correct criteria of order.

The following is what you expected? note theStringLengthof characters.

  1. (*Input 9 ==< *)
    FromCharacterCode[Sort[ToCharacterCode[sampleList1]]]
    
  2. (*
    Output 9 ==>
    {,.,T,a,z,{,.T,.a,.m,Etz,Tes,wha,compute}
    *)
    
  3. (*Input 10 ==< *)
    Sort[ToCharacterCode[sampleList1]]
    
  4. (*
    Output 10 ==>
    {{},{46},{84},{97},{122},{123},{46,84},{46,97},{46,109},{69,116,122},{84,101,115},{119,104  
    ,97},{99,111,109,112,117,116,101}}
    *)
    
  5. (*Input 11 ==< *)
    First /@ %
    
  6. (*
    Output 11 ==>
    {First[{}],46,84,97,122,123,46,46,46,69,84,119,99}
    *)
    

update

Note: totalList here is neither ascending nor descending.

  1. (*Input 12 ==< *)
    totalList = Total /@ %%
    
  2. (*
    Output 12 ==>
    {0,46,84,97,122,123,130,143,155,307,300,320,765}
    *)
    
  3. (*Input 13 ==< *)
    Transpose[{totalList, Sort[totalList]}]
    
  4. (*
    Output 13 ==>
    {{0,0},{46,46},{84,84},{97,97},{122,122},{123,123},{130,130},{143,143},{155,155},{307,300}  
    ,{300,307},{320,320},{765,765}}
    *)
    

The above is the same to that use SortBy

  1. (*Input 14 ==< *)
    SortBy[sampleList1, ToCharacterCode]
    
  2. (*
    Output 14 ==>
    {,.,T,a,z,{,.T,.a,.m,Etz,Tes,wha,compute}
    *)
    

update

As@MichaelE2 mentioned in the comment,

Note that SortBy[sampleList1, ToCharacterCode] effectively orders them by length first. - Michael E2

one method in Rojo's comment

  1. (*Input 15 ==< *)
    yourSort = Max[StringLength[#1]] /. len_ :> SortBy[#1, PadRight[ToCharacterCode[#1], len] & ] & ; 
    

My conclusion


So the conclusion maybe the canonical order in Mathematica is different from that in Python.

HyperGroups
  • 8,619
  • 1
  • 26
  • 63
  • Dictionary order is correctly used in: Evaluating Sort[{"a", ".", "m", "c"}] result in {".", "a", "c", "m"}. It's skip some characters? The answer is no! Every element is a string and should be treated as in dictionary order. WMath should not ignore any characters. Segundo a documentação: "...Sort orders strings as in a dictionary, with uppercase...". I'm thing this is a bug. –  Jun 13 '13 at 16:44
  • @PedroR I think some characters are in different orders in different encodings. And we can use a sort function in Sort or SortBy to get any result. – HyperGroups Jun 13 '13 at 16:49
  • @PedroR How to compare the order of { and . ? See one example in my edit. – HyperGroups Jun 13 '13 at 17:00
  • @PedroR, when you say the right and wrong answer, you mean by the criterion which you wish to apply SortBy. – Jonathan Shock Jun 13 '13 at 17:02
  • 1
    @PedroR If you wanna do that translation. Use one function is easy to done. SortBy[lista,ToCharacterCode[#]&] – HyperGroups Jun 13 '13 at 17:11
  • @PedroR I got no clue how Python does it. Is this what you want yourSort = Max@StringLength@# /. len_ :> SortBy[#, ToCharacterCode@#~PadRight~len &] &? – Rojo Jun 13 '13 at 19:41
  • It seems from this discussion that precisely what PedroR considers to be the correct ordering for a variety of lists should be added to the question. – Jonathan Shock Jun 13 '13 at 22:19
  • @Silvia thanks, welcome to give me suggestions. – HyperGroups Jun 14 '13 at 06:51
  • Note that SortBy[sampleList1, ToCharacterCode] effectively orders them by length first. – Michael E2 Jun 15 '13 at 04:51
  • @MichaelE2 ha, indeed, So Total/@% is meaningless. I'll edit. One reason why I edit so frequently is I'm practising editing. :) – HyperGroups Jun 15 '13 at 05:26
4

Do you mean this?

 In[11]:= AlphabeticSort[{".m", ".a", "co"}]

 Out[11]= {".a", ".m", "co"}
yode
  • 26,686
  • 4
  • 62
  • 167
4

Since there is perhaps an implicit question of how to get a sort more along the lines you expect, as I proposed here you might use:

asciisort = #[[Ordering @ PadRight @ ToCharacterCode @ #]] &;

asciisort @ {".m", ".ast", "co"}
{".ast", ".m", "co"}

Or with the default character-wise order:

charsort = #[[Ordering @ PadRight @ Characters @ #]] &;

charsort @ {".m", ".ast", "co"}
{".ast", ".m", "co"}

If you are comfortable with shorter strings being ordered first you can also use:

SortBy[{".ast", ".m", "co"}, Characters]
{".m", "co", ".ast"}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
1
sortby[list : {__String}] := 
StringJoin /@ SortBy[First@GroupBy[Characters@list, First], 
ToCharacterCode[#[[2]]] &] ~ Join ~
Sort[StringJoin /@ DeleteCases[Characters@list, {".", __}]]


sortby@{".m", ".at", "monday", ".limp", "dart"}
(* {".at", ".limp", ".m", "dart", "monday"} *)

I think the better way is what @yode mentioned

AlphabeticSort[{".m", ".at", "monday", ".limp", "dart"}]
(* {".at", ".limp", ".m", "dart", "monday"} *)
Ali Hashmi
  • 8,950
  • 4
  • 22
  • 42