9

Bug introduced in 7.0 or earlier and fixed in 8.0.4 or earlier


I have come across what appears to be a bug in GatherBy. It appears similar to the problem of using Table[Random[], {1000}] in older versions in that the behavior changes depending on the size of the data.

  • Has this problem been fixed in version 8?
  • Is there a system option that effects this, or another global work-around for version 7?

Examples:

SeedRandom[1];
set = RandomInteger[4, {500, 3}];

DeleteDuplicates[Sort /@ set] ~Partition~ 5 // Column
{{2,4,4},{0,0,1},{0,0,2},{0,2,3},{0,3,4}}
{{1,3,4},{1,2,4},{1,4,4},{0,3,3},{1,1,3}}
{{1,2,3},{0,1,4},{0,2,4},{2,2,3},{1,1,2}}
{{1,3,3},{2,3,4},{2,2,2},{0,1,3},{0,2,2}}
{{3,3,4},{2,2,4},{3,4,4},{0,1,2},{2,3,3}}
{{0,0,3},{0,0,4},{3,3,3},{0,4,4},{1,2,2}}

But this does not agree:

GatherBy[set, Sort][[All, 1]]
{{4, 2, 4}, {0, 1, 0}, {0, 3, 2}, {4, 1, 3}, {3, 1, 1}}

If I change the gather function to something that apparently does not compile:

GatherBy[set, ("x"; Sort@#) &][[All, 1]] ~Partition~ 5 // Column
{{4,2,4},{0,1,0},{0,2,0},{0,3,2},{0,3,4}}
{{4,1,3},{4,2,1},{1,4,4},{0,3,3},{3,1,1}}
{{3,2,1},{1,4,0},{0,2,4},{2,3,2},{1,2,1}}
{{3,3,1},{2,4,3},{2,2,2},{0,3,1},{2,0,2}}
{{4,3,3},{4,2,2},{3,4,4},{0,1,2},{2,3,3}}
{{0,3,0},{0,4,0},{3,3,3},{4,0,4},{2,1,2}}

This problem does not appear to affect smaller sets:

SeedRandom[1];
set = RandomInteger[4, {60, 3}];

GatherBy[set, Sort][[All, 1]]~Partition~5 // Column
{{4,2,4},{0,1,0},{0,2,0},{0,3,2},{0,3,4}}
{{4,1,3},{4,2,1},{1,4,4},{0,3,3},{3,1,1}}
{{3,2,1},{1,4,0},{0,2,4},{2,3,2},{1,2,1}}
{{3,3,1},{2,4,3},{2,2,2},{0,3,1},{2,0,2}}
{{4,3,3},{4,2,2},{3,4,4},{0,1,2},{2,3,3}}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371

4 Answers4

7

We can compare the results of the two forms of GatherBy for varying data set sizes:

RandomSeed[1];
ListPlot @ Table[
  RandomInteger[4,{n,3}] /.
  set_ :> { n
         , Boole @ SameQ[
             GatherBy[set, Sort]
           , GatherBy[set, ("x";Sort@#)&]
           ]
       }
, {n, 1, 300}
]

The x-axis shows the set size and the y-axis shows 1 where the GatherBy results match and 0 where they do not. The chart shows the set size on the x-axis and The results for Mathematica 7 show a problem when there are 100 or more elements:

Mathematica 7 chart

Mathematica 8 does not show this problem:

Mathematica 8 chart

Following @ruebenko's suggestion, let's take a look for a compiler option with the magic number 100:

Cases["CompileOptions" /. SystemOptions[], HoldPattern[_ -> 100]]

{FoldCompileLength->100, MapCompileLength->100, NestCompileLength->100}

Some experimentation demonstrates that MapCompileLength is the culprit:

SetSystemOptions["CompileOptions" -> "FoldCompileLength" -> 100];
SetSystemOptions["CompileOptions" -> "MapCompileLength" -> 50];
SetSystemOptions["CompileOptions" -> "NestCompileLength" -> 100];

Recreating the chart after reducing MapCompileLength to 50 produces:

Mathematica 7 chart revisited

It appears we have a compiler bug involving the compilation of an internal use of Map -- a bug that seems to be fixed in Mathematica 8.

Increasing MapCompileLength to Infinity appears to correct the problem.

WReach
  • 68,832
  • 4
  • 164
  • 269
  • +1 for the plots, but out of curiosity did you see my own answer to this question before posting? – Mr.Wizard Feb 04 '12 at 23:51
  • @Spartacus I did, and upvoted it. I thought my answer would be useful as a "consolidation answer", showing the experiments directly and suggesting a workaround. I'm happy to delete my answer if you would prefer just to add the workaround to your answer. – WReach Feb 04 '12 at 23:56
  • No, it's fine, but it was not clear to me that you were drawing on previous answers. I believe the problem is with packed arrays rather than Map specifically, as evidenced by Gather. An acceptable work-around that does not require a global change to MapCompileLength is to change any function given to GatherBy into something that won't compile, like ("x"; Sort@#) &. – Mr.Wizard Feb 04 '12 at 23:58
6

In V 8.0.4 Both GatherBy[set, Sort][[All, 1]] and GatherBy[set, ("x"; Sort@#) &][[All, 1]] give the same result. As for options, here is a wild guess (no V7 here): try tweaking the

"CompileOptions" /. SystemOptions[]

XYCompileLength ones and see what happens.

5

warning: not an answer, just showing you what I get on 8.04, as too long to fit in a comment. Will delete later.

RandomSeed[1];
set = RandomInteger[4, {500, 3}];

DeleteDuplicates[Sort /@ set]~Partition~5 // Column

gives

{{0,0,2},{3,4,4},{0,1,1},{0,0,1},{0,4,4}}
{{0,2,3},{1,2,2},{0,3,3},{0,0,3},{2,2,2}}
{{2,2,4},{0,2,4},{1,2,4},{1,1,3},{2,3,4}}
{{1,1,4},{0,1,2},{1,2,3},{0,1,4},{0,3,4}}
{{0,0,4},{0,2,2},{2,2,3},{0,1,3},{1,1,2}}
{{3,3,4},{0,0,0},{2,3,3},{1,3,3},{1,3,4}}

and

  GatherBy[set, Sort][[All, 1]]

gives

{{0,0,2},{4,3,4},{1,1,0},{0,0,1},{0,4,4},{0,2,3},
{2,2,1},{0,3,3},{0,0,3},{2,2,2},{2,2,4},{0,4,2},{4,2,1},
{1,3,1},{3,2,4},{1,1,4},{0,2,1},{2,1,3},{0,1,4},{4,0,3},
{0,4,0},{0,2,2},{2,3,2},{1,3,0},{1,2,1},{4,3,3},{0,0,0},
{3,3,2},{1,3,3},{4,3,1},{1,1,1},{4,4,2},{4,1,4},{3,3,3}}

edit(1) removed extra un-needed output

Nasser
  • 143,286
  • 11
  • 154
  • 359
  • This indicates that it has been fixed. I prefer that you do not delete your answer, but IMO all you need to include in it is the output for GatherBy[set, Sort][[All, 1]] as the rest is predictable. – Mr.Wizard Jan 23 '12 at 13:36
4

The problem appears to be with packed arrays, and it affects both Gather and GatherBy.

SeedRandom[1]
dat = RandomInteger[2, {100, 2}];
sorted = Sort /@ dat;

Developer`PackedArrayQ[sorted]

True
First /@ Gather[sorted]
DeleteDuplicates[sorted]
{{0, 1}, {1, 1}, {1, 1}}

{{0, 1}, {1, 1}, {0, 0}, {0, 2}, {1, 2}, {2, 2}}

unpacked = Developer`FromPackedArray@sorted;

First /@ Gather[unpacked] DeleteDuplicates[unpacked]

{{0, 1}, {1, 1}, {0, 0}, {0, 2}, {1, 2}, {2, 2}}

{{0, 1}, {1, 1}, {0, 0}, {0, 2}, {1, 2}, {2, 2}}

This further affects GatherBy on lists of length >= SystemOptions["CompileOptions" -> "MapCompileLength"] because it turns unpacked arrays into packed arrays:

First /@ GatherBy[unpacked, Sort]
{{0, 1}, {1, 1}, {1, 1}}
First /@ GatherBy[Most@unpacked, Sort]
{{0, 1}, {1, 1}, {0, 0}, {0, 2}, {1, 2}, {2, 2}}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371