4

I am applying Select function to a large list called xyCoordinateCentreCircle.

The dimensions of the list are:

Dimensions[xyCoordinateCentreCircle]
(* {197796, 3} *)

All the other variables used in code below are single values.

I am using the following line of code.

gridmean = 
  ParallelTable[
   Mean[Last /@ 
     Select[xyCoordinateCentreCircle, 
      z <= #[[1]] <= z + dz && x <= #[[2]] <= x + dx &]], {z, minZ , 
    minZ + dz*(GridSpacingDivisionZ - 1), dz}, {x, minX , 
    minX + dx*(GridSpacingDivisionX - 1), dx}]; 

I have parallelized table to increase speed - still very slow.

I notice this blog (read tip 7): http://blog.wolfram.com/2011/12/07/10-tips-for-writing-fast-mathematica-code/

Is the function Last slow, for the same reason AppendTo (read tip 7) is slow?

In which case, could I use Sow and Reap instead of Last? Or is there another reason (possibly use of Select)? If you could write an example that would be great.

Following advise from Mike, I've added some numbers...

xyCoordinateCentreCircle (sample only, as too many numbers, Z and X vary unevenly but always increasing, see dimensions at top of post)

xyCoordinateCentreCircle = {{5163, 11872, -1228}, {5163, 12640, -1146}, {5163, 

12672, -1144}, {5163, 12800, -1137}, {5163, 12832, -1136}, {5163, 12864, -1139}, {5163, 12896, -1142}, {5163, 12928, -1139}, {5163, 12960, -1137}, {5163, 12992, -1137}, {5163, 13024, -1134}, {5163, 13056, -1129}, << 197772 >>, {20970, 12864, -1050}, {20970, 12896, -1052}, {20970, 12928, -1051}, {20970, 12960, -1050}, {20970, 12992, -1050}, {20970, 13024, -1050}, {20970, 13056, -1048}, {20970, 13088, -1049}, {20970, 13120, -1048}, {20970, 13152, -1049}, {20970, 13184, -1050}, {20970, 13216, -1052}}

Dimensions[gridmean]
(* {15, 16}

gridmean = {{Mean[{}], Mean[{}], Mean[{}], 
  Mean[{}], -1179.28, -1108.68, -1089.9, -1084.37, -1077.63, \
-1074.41, -1073.45, -1070.65, -1078.52, Mean[{}], Mean[{}], 
  Mean[{}]}, {Mean[{}], Mean[{}], 
  Mean[{}], -1117.45, -1101.15, -1086.7, -1081.24, -1075.97, \
-1071.45, -1068.16, -1063.15, -1059.36, -1055.48, -1063.79, -1070.02, 
  Mean[{}]}, {Mean[{}], 
  Mean[{}], -1123.93, -1097.66, -1089.47, -1083.27, -1079.06, \
-1074.04, -1069.73, -1065.79, -1060.3, -1057.05, -1051.91, -1054., \
-1046.89, -1058.53}, {Mean[{}], -1124.22, -1100.1, -1092.29, \
-1086.75, -1082.1, -1077.02, -1072.23, -1067.28, -1064.27, -1058.62, \
-1054.72, -1048.92, -1046.88, -1041.33, -1042.94}, {-1200.63, \
-1109.53, -1092.68, -1090.71, -1083.95, -1080.54, -1074.62, -1070.63, \
-1065.49, -1062.8, -1057., -1053.41, -1048.23, -1048.28, -1039.07, \
-1035.35}, {-1123.35, -1097.05, -1091.21, -1088.95, -1081.91, \
-1078.57, -1073.07, -1068.8, -1063.64, -1061.76, -1055.51, -1052.07, \
-1046.89, -1046.12, -1037.53, -1032.99}, {-1107.77, -1095.25, \
-1089.34, -1087.53, -1080.67, -1076.12, -1071.44, -1067.36, -1061.88, \
-1060.29, -1054.07, -1050.75, -1045.05, -1044.69, -1036.46, \
-1031.18}, {-1102.98, -1092.99, -1087.24, -1085.36, -1079.45, -1075., \
-1070.57, -1065.88, -1060.48, -1058.94, -1052.8, -1049.19, -1044.1, \
-1042.64, -1035.08, -1029.88}, {-1095.54, -1090.84, -1086.89, \
-1081.99, -1078.71, -1073.36, -1068.81, -1064.28, -1058.86, -1057.36, \
-1051.76, -1047.39, -1043.11, -1040.75, -1033.29, -1027.76}, \
{-1092.51, -1088.1, -1085.65, -1080.34, -1077.43, -1072.06, -1067.56, \
-1063.29, -1057.48, -1055.33, -1050.53, -1045.06, -1042.02, -1038.78, \
-1031.39, -1026.79}, {-1092.52, -1088.05, -1084.71, -1079.42, \
-1076.23, -1070.97, -1066.25, -1062.19, -1055.94, -1053.8, -1048.4, \
-1043.88, -1040.7, -1037.45, -1029.28, -1026.02}, {-1090.27, \
-1088.19, -1081.68, -1078.58, -1073.89, -1069.65, -1064.56, -1061.17, \
-1054.36, -1052.02, -1046.51, -1041.92, -1038.6, -1035.64, -1027.84, \
-1024.28}, {Mean[{}], -1084.31, -1080.94, -1076.38, -1072.75, \
-1068.46, -1063.21, -1060.06, -1053.12, -1050.63, -1045.01, -1040.35, \
-1036.84, -1034.13, -1027.63, -1023.53}, {Mean[{}], 
  Mean[{}], -1079.73, -1075.39, -1072.06, -1067.47, -1062.15, \
-1058.93, -1052.32, -1049.62, -1043.91, -1039., -1035.96, -1033.12, \
-1027.7, Mean[{}]}, {Mean[{}], Mean[{}], 
  Mean[{}], -1074.72, -1070.22, -1066.32, -1060.89, -1057.42, \
-1051.45, -1048.91, -1043.25, -1038.04, -1035.38, -1033.22, Mean[{}], 
  Mean[{}]}} *)

I am ok with the empty means, as I replace them later.

Other variables

minZ
(* 5162.62 *)

dz
(* 1000. *)

GridSpacingDivisionZ
(* 15 *)

minX
(* 4032 *)

dz
(* 1000. *)
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
SPIL
  • 627
  • 3
  • 10
  • You want the mean "height" (3rd coord.) in each rectangular bin (in a 2D array corresponding to the bins)? (Sometimes a description of the objective is helpful and encouraging to others.) – Michael E2 May 04 '16 at 16:27
  • A working MWE will also encourage others. – Michael E2 May 04 '16 at 16:29
  • Yes you are right. I mean "height" (3rd coord.) in each rectangular bin (in a 2D array corresponding to the bins). Does MWE mean - Mathematica Working Example? – SPIL May 04 '16 at 16:43
  • MWE is net-speak for "minimal working example." Sorry for the jargon. Anyway, I came up with something without it. But with the missing parameters, it's impossible to know the scale of the answer. Sometimes such details affects the choice of strategy. – Michael E2 May 04 '16 at 16:45
  • Hi Michael, I appreciate your advice. I've added a numerical example. – SPIL May 05 '16 at 09:49

1 Answers1

15

Here's a million points processed in half a second:

SeedRandom[0];  (* updated for reproducibility *)
xyCoordinateCentreCircle = RandomReal[1, {1*^6, 3}];

Map[
  Mean[#[[All, -1]]] &,
  BinLists[xyCoordinateCentreCircle, {0, 1, 1/10}, {0, 1, 1/10}, {0, 1, 1}],
  {3}] // AbsoluteTiming

(*
  {0.513721,
   {{{0.506174}, {0.497757},..., {0.50284}},
    {{0.501131}, {0.497317},..., {0.501209}},
    ...,
    {{0.496007}, {0.503033},..., {0.500413}}}
*)

And a procedural method that's a bit faster (updated to handle zero bin counts):

With[{totalscounts = Compile[{{points, _Real, 2}},
    Module[{totals, counts, i1, i2},
     totals = Table[0., {10}, {10}];
     counts = Table[0., {10}, {10}];
     Do[
      i1 = 1 + Floor[p[[1]]/0.1];
      i2 = 1 + Floor[p[[2]]/0.1];
      totals[[i1, i2]] += p[[3]];
      counts[[i1, i2]] += 1.,
      {p, points}];
     {totals, counts}
     ],
    RuntimeOptions -> "Speed"
    ]},
 bl = Quiet[Divide @@ totalscounts[#], Divide::indet] &
 ];

bl[xyCoordinateCentreCircle] // AbsoluteTiming
(*  {0.401225, {{0.506174,...}}  *)

(Note: The braces can be removed from the first method, too; or added in here.)

Michael E2
  • 235,386
  • 17
  • 334
  • 747
  • Thanks Michael, I think I am very close with the above. Please can you take another look with the numerical example I added just now. I need output dimensions of Dimensions[gridmean] ={15, 16}. – SPIL May 05 '16 at 09:50
  • Hi Michael, sorted now --- although slightly different results. GridMeanOld - GridMeanNew = {0, 0, 0, 0, 0., 0., 0., 0.109692, 0., 0., 0., 0.1213, 0., 0, 0, 0},
    {0, 0, 0, 0.675241, 0., 0., 0., 0.0486794, 0., 0., 0., 0.0379739, 0.,
    0., 0., 0}, {0, 0, 0., 0.220785, 0., 0., 0., 0.0199794, 0., 0., 0.,
    0.0387975, 0., 0., 0., 0.} Is this just numerical error with the different methods?
    – SPIL May 05 '16 at 10:22
  • For rounding error for n points, one would expect GridMeanOld - GridMeanNew)/GridMeanOld be within an order of magnitude of Sqrt[n] * 10^-16. Your errors seem a little large based on the snippet of data in the question. – Michael E2 May 05 '16 at 11:50
  • It's a little unclear to me what you did. If the boundaries of the grid shifted, then a point on the old boundary might find itself in a new grid cell. That would change the means of two cells. The errors do seem to be in adjacent cells, so maybe that's the explanation. – Michael E2 May 05 '16 at 11:55