The Problem
I have a list myList, which is a 487135 x 3 x 2 array of integers between -1000 and 1000. I want to be able to gather elements from a different list by their indices using elements from myList, like so:
GatherBy[Range[99], f[myList[[#]] ] &];
(* {0.000691, Null} *)
but even with a simple f, such as OddQ, calling any more than 99 elements caused the time to increase dramatically:
GatherBy[Range[100], f[myList[[#]]] &]; // AbsoluteTiming
(* {1.15235, Null} *)
At first, I thought this was an error with GatherBy or with my f, but after experimenting, I discovered the same issue happening with Map itself.
myList[[#]] & /@ Range[99]; //AbsoluteTiming
(* {0.000113, Null} *)
myList[[#]] & /@ Range[100]; // AbsoluteTiming
(* {1.1329, Null} *)
I figured out a way to work around the problem, but I have no idea what's causing this, or even how to make test code to replicate the problem fully. I uploaded the first thousand elements of myList on PasteBin here; the same qualitative behavior occurs on my computer and version of Mathematica (11.3 for Linux) for that partial list as for the full myList, though the slowdown is only a factor of ~40.
What I've tried:
Using a different set of indices.
myList[[#]] & /@ Range[1001, 1099]; // AbsoluteTiming
(* {0.000101, Null} *)
myList[[#]] & /@ Range[1001, 1100]; // AbsoluteTiming
(* {1.16167, Null} *)
I tried several more times, including taking numbers at random rather than using Range (in case I had a bizarre issue on every 100th element or something).
Replicating with RandomInteger.
myRandomList = RandomInteger[{-1000, 1000}, {487135, 3, 2}];
myRandomList[[#]] & /@ Range[100]; // AbsoluteTiming
(* {0.000149, Null} *)
This newly-generated array does not have the same problem.
Appending a RandomInteger to the end of myList.
myList2 = Join[myList, RandomInteger[{-1000, 1000}, {1, 3, 2}]];
myList2[[#]] & /@ Range[100]; // AbsoluteTiming
(* {0.000179, Null} *)
This was even more baffling - why would adding something to the end change behavior of the indices at the beginning?
Appending a RandomInteger to the end of myList, and then deleting it.
myList3 = Drop[Join[myList, RandomInteger[{-1000, 1000}, {1, 3, 2}] ], -1];
myList3 === myList
(* True *)
myList3[[#]] & /@ Range[100]; // AbsoluteTiming
(* {0.000161, Null} *)
myList3 is identical to myList, but does not have the problem.
Appending a non-random element to the end of myList, and then deleting it.
myList4 = Drop[Join[myList, {{{1, 1}, {1, 1}, {1, 1}}}], -1];
myList4 === myList
(* True *)
myList4[[#]] & /@ Range[100]; // AbsoluteTiming
(* {1.13683, Null} *)
myList4 is identical to myList, and does have the problem.
Running ByteCode on the three identical arrays.
ByteCount /@ {myList, myList3, myList4}
(* {163677440, 23382640, 163677440} *)
The arrays that have the problem are the same size, and 6.99996 times bigger than the array without.
Taking a slice of myList to replicate the problem on a smaller scale.
myList5 = myList[[1 ;; 1000]];
myList5[[#]] & /@ Range[99]; // AbsoluteTiming
(* {0.000131, Null} *)
myList5[[#]] & /@ Range[100]; // AbsoluteTiming
(* {0.005996, Null} *)
Even with only a thousand elements, it still takes many times longer to pull 100 elements than to pull 99.
My Questions
So, I have a workaround now for my actual program - add a random element to the end, and then delete it. But I very much want to figure out:
- Why is
myListso much larger than the same array with one element added and then removed? - What's so special about the number
100when calling indices? - Why is the slowdown more than three orders of magnitude?
myListis not packed (you can check withDeveloper\PackedArrayQ), in which case most (if not all) of the things you tried ended up packing it. You should also see a difference between mapping over Range[99] and Range[100] if you doOn["Packing"]` first. – jjc385 Oct 23 '18 at 21:34myListis packed andmyList3is unpacked according toDeveloper\PackedArrayQ`.Doing
– HiggstonRainbird Oct 23 '18 at 21:51On["Packing"]first on themyList[[#]] & /@ Range[99]code gave 'Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.', while forRange[100]it gave "Developer`FromPackedArray::unpack: Unpacking array in call to List". Which seems odd, sincemyListwasn't packed to begin with.OddQon the entire array in{0.683298, Null}EDIT* this timing is baring in mind I have a slower system – Teabelly Oct 23 '18 at 22:56
– HiggstonRainbird Oct 23 '18 at 23:04GatherBy[otherList[[i]], Union[Flatten[myList[[{i, #}]],1]] &], and then select the groups of a specific size. The problem doesn't lend itself (as far as I can tell) to straight mapping, and might not be the most efficient way to do it, but when using packed arrays (as I just learned), or whenLength[otherList[[i]] ]is less than 100, it works fast enough for my purposes.