19

I have a set of (x,y,z) data, 45,000 to be precise and I want to bin the z values in 256 equidistant bins based on their (x,y) values. The final array should be a set of 256x256 array with each slot containing an average of binned z values.

Being new to mathematica, I came up with the following code:

 data = RandomReal[{12000, 35000}, {45000, 3}];
data1 = data[[All, {1, 2}]];(*strip the zvalues from the set*)
xValues = data[[All, 1]];
yValues = data[[All, 2]];
zValues = data[[All, 3]];
(*Compute maximum/minimum of x values*)
maxXvalue = Max[xValues];
minXvalue = Min[xValues];

(*Compute maximum/minimum of y values*)
maxYvalue = Max[yValues];
minYvalue = Min [yValues];

(*Compute maximum/minimum of z values*)
maxZvalue = Max[zValues];
minZValue = Min[zValues];

bbx = {Floor[minXvalue], Floor[maxXvalue], 
   Floor[((maxXvalue - minXvalue)/256)]}; (* equidistant x bins*)
bby = {Floor[minYvalue], Floor[maxYvalue], 
   Floor[((maxYvalue - minYvalue)/256)]};(* equidistant y bins*)
bList = BinLists[data1, {bbx}, {bby}];
bCount = BinCounts[data1, {bbx}, {bby}];(*Gives a count of the number of items in \
each bins*)

(*Defining array to contain final z average values*)
meanZValues = Table[0, {Length[bList]}, {Length[bList]}]; 

i = 0; (*initialising loop variables*)
j = 0;
k = 0;

f[x_] := zValues[[x]];(*Defining function to get z values back*)

For[i = 1, i <= Length[bList], i++,
 For [j = 1, j <= Length[bList], j++, m1 = {};    (*Re-empty m1 list*)      
  For [k = 1, k <= Length[bList[[i, j]]], k++,
   AppendTo[m1, Position[data1, bList[[i, j]][[k]]] (*accessing only the x-
    coordinate index of the position on original matrix*)
    ];
   (*Getting the indices of the binned values*)
   indices = Flatten[DeleteDuplicates[Take[m1, All]]]; (*Position command above gives multiple indices if  these values occur more than once, hence deleting the duplicate ones*)

   meanZValues[[i, j]] =  Mean[Map[f,indices]];  (*Compute average values of Z by accessing the original array, getting the z values  *)
   ]
  ]
 ]
meanZValues

It gives an output in a reasonable amount of time for up to couple of thousand values, however, it lags and maybe crashes without any output for 45,000 set of data.

How do I make this code more efficient? Thank you

Mun
  • 377
  • 3
  • 10
  • 2
    related: http://stackoverflow.com/q/8178714/884752 – faysou Jan 14 '13 at 06:51
  • 1
    "bList = BinLists[data1, {bbx}, {bby}]; bCount = BinCounts[data1, {bbx}, {bby}];"should be "bList = BinLists[data1, bbx, bby]; bCount = BinCounts[data1, bbx, bby];"? – kptnw Jan 14 '13 at 08:54

3 Answers3

17

Modifying @ruebenko's answer in the StackOverflow Q/A linked in Faysal's comment (Mathematica fast 2D binning algorithm) to get the means of z-values for each bin (using yet another undocumented setting for the option "TreatRepeatedEntries" that works in version 9 only):

 zvalues = data[[All, 3]];
 epsilon = 1*^-10;
 indexes = 1 + Floor[(1 - epsilon) 256 Rescale[data[[All, {1, 2}]]]];
 System`SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> (Mean[{##}] &)}];
 binmeansZ = SparseArray[indexes -> zvalues];
 System`SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> First}];

A picture:

 MatrixPlot[binmeansZ]

enter image description here

Update: Timings

Mr.Wizards's version 7 settings (also works in versions 8.0.4.0 and 9):

  SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> 2}];
  AbsoluteTiming[binmeans =  Normal[SparseArray[indexes -> zvalues]] /. 
  "List"[x__] :> Mean@{x};] 
  SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];
  (* {0.086009, Null} *)

Version 9 settings:

  System`SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> (Mean[{##}] &)}]; 
  AbsoluteTiming[binmeansZ = SparseArray[indexes -> zvalues];]
  System`SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> First}];
  (* {0.035003, Null}*)
  binmeansZ == SparseArray[binmeans]
  (* True *)

Update 2: Default settings in versions 8.0.4.0 and 9:

  "TreatRepeatedEntries" /. SystemOptions["SparseArrayOptions"][[1, 2]]
   (* 0          (Version 8.0.4.0) *)
   (* First      (Version 9)   *)
kglr
  • 394,356
  • 18
  • 477
  • 896
  • How did you learn of that option? – Mr.Wizard Jan 14 '13 at 09:20
  • 1
    @Mr.W learned about SparseArrayOptions from ruebenko's answer in the linked Q/A. That the option setting could be a pure function was a wild guess and it worked. – kglr Jan 14 '13 at 09:24
  • This doesn't work on version 7, so sadly I cannot vote for it, but that's fantastic if it works for others. – Mr.Wizard Jan 14 '13 at 09:27
  • @Mr.W, do you get anything that looks similar from SystemOptions["SparseArrayOptions"] in version 7? – kglr Jan 14 '13 at 09:29
  • I think I've got a workaround. I'll post an answer soon, and I hope you'll compare the performance of the two for me. – Mr.Wizard Jan 14 '13 at 09:34
  • Done. Please tell me if the output is the same as yours. – Mr.Wizard Jan 14 '13 at 09:40
  • By the way, is "TreatRepeatedEntries" -> First the new default? It's 0 here and in ruebenko's answer. – Mr.Wizard Jan 14 '13 at 09:42
  • Mr.W, just in case the settings 1 works in version 7: with these settings i used two sparse arrays binsums=SparseArray[indexes->zvalues] and bincounts=SparseArray[indexes->cvalues] where cvalues=ConstantArray[1,45000]. Taking the ratio of the two (wrapped in Quiet and Indeterminate replaced with 0 ) gives the same resultsas in my post. – kglr Jan 14 '13 at 09:45
  • Yes, 1 works. Since you are in the position to do so would you please compare the performance of all three methods? – Mr.Wizard Jan 14 '13 at 09:46
  • @kguler and @ Mr.Wizard, thanks for the quick responses. It looks good but one the possible problem with this rescaling method is that it can round 2 different values to 1 value.And this will introduce error in my final result. My original data are decimal real values and accuracy is important. Is there any other way of doing it by not using Rescale ? Thank you – Mun Jan 14 '13 at 23:24
  • And is the final result average of binned z values? – Mun Jan 15 '13 at 03:54
  • Or where should I use BinLists in this code? – Mun Jan 15 '13 at 03:54
  • @Mun, the final result is the mean of z values in each bin. Re use of Rescale and accuracy --not sure if I understand the concern correctly--, perhaps you can play with $MaxExtraPrecision? – kglr Jan 15 '13 at 04:05
  • @kguler Can you please tell me how the indexes are calculated? i mean how is it similar to binLists? – Mun Jan 15 '13 at 04:11
  • @kguler, I understand that you overloaded the SparseArray function to calculate mean of the repeated entries in the bin.. But what does the last command do ? System`SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> First}]; – Mun Jan 15 '13 at 04:17
  • Does it set the SparseArray function back to normal? – Mun Jan 15 '13 at 04:28
  • @Mun, yes, the last line resets the option to its default setting. (I will try to add an update later on how the indices are calculated.) – kglr Jan 15 '13 at 04:44
  • @kguler thank you – Mun Jan 15 '13 at 05:32
11

kguler's answer looks great but unfortunately it doesn't work on version 7.
However, I was able to find a similar method that does.

data = RandomReal[{12000, 35000}, {45000, 3}];

zvalues = data[[All, 3]];
epsilon = 1*^-10;
indexes = 1 + Floor[(1 - epsilon) 256 Rescale[data[[All, {1, 2}]]]];

SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> 2}];

AbsoluteTiming[
  binmeans = Normal[SparseArray[indexes -> zvalues]] /. "List"[x__] :> Mean@{x};
]

SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];

MatrixPlot[binmeans]

{0.0300000, Null}

Mathematica graphics

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • The AbsoluteTiming i get is 0.085000 (versus 0.030003 with the setting (Mean[{#}]&).(So, the setting 2 gives "List" of the values?) +1 of course... – kglr Jan 14 '13 at 09:54
  • @kguler Thank you. Setting 2 does give "List"[val1, val2, . . .] which seems like an odd format but still useful. It's not surprising the additional operation slows this down. Do you know if your method works in v8 or only v9? – Mr.Wizard Jan 14 '13 at 09:59
  • 1
    Just checked version 8.0.4.0: the method in my post works only for version 9. – kglr Jan 14 '13 at 10:47
  • @Mr.Wizard, i tried your codes and ran it with my data. I tried with 10,000 data and 100x100 bins. The final array of z values has a lot of zeros (implying empty bins). I ran the same data using the software Gwyddion which binned the data and returned 100x100 z values without a single zero in the array. Is there anything wrong with the code above? Thank you – Mun Jan 16 '13 at 05:41
  • 1
    @Mun if you had the same number of bins as data and no empty bins then there must have been one and only one item in each and every bin... is this realistic? While I wouldn't rule out a mistake, this remarkable coincidence makes me a little suspicious of the result given by Gwyddion. – Oleksandr R. Jan 16 '13 at 10:42
  • @OleksandrR. Sorry I don't quite understand your comment. Gwyddion gave a set of 100x100 z values (all real values) while the code above returned lots of zeros. Is it to do with my bin intervals? I did (max - min)/100 to calculate the interval – Mun Jan 17 '13 at 23:20
  • @Mr.Wizard, just saw this now, yes, I changed the settings in V9 to be more readable and functional. The settings prior to V9 still work, as you found out. This will be documented once I figure out how to make this functionality work with pattern sparse arrays... which may still take some time.... –  Jul 02 '13 at 10:27
2
data = RandomReal[{12000, 35000}, {45000, 3}];
n = 256;
dataT = Transpose@data;
r[x_, m_] :=  IntegerPart@N@Rescale[x, {Min[dataT[[m]]], Max[dataT[[m]]]}, {1, n + 1}]
Timing[(Mean /@ Transpose@#) & /@ GatherBy[
                                 data /. {x_, y_, z_} -> {r[x, 1], r[y, 2], z}
                                   /. {n + 1, x__} -> {n, x} 
                                   /. {x_, n + 1, z_} -> {x, n, z}, {#[[1]], #[[2]]} &]][[1]]

1.188

Dr. belisarius
  • 115,881
  • 13
  • 203
  • 453