7

I have a 2D dataset like this one: {{x1,y1}, {x2,y2},...} I need to smooth/histogram the data in such a way that binning is over the x-coordinate, while counting/height is over the y-coordinate. Let me explain.

Given the following input, bin-width = 0.2 and height-function = Mean[].

data = {{0.1,1.0}, {0.2,2.0}, {0.3,3.0}, {0.35,3.5}, {0.4,4.0}, {0.5,5.0}};

The output would be:

{(1 + 2)/2, (3 + 3.5 + 4)/3, 5/1}

The average of y-values from the first two data points goes into the first bin, the following three points form the second bin etc...

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Petr
  • 549
  • 4
  • 10

4 Answers4

8

Let's start with your sample data:

In[1]:= data = {{0.1, 1.0}, {0.2, 2.0}, {0.3, 3.0}, {0.35, 3.5}, {0.4, 4.0}, {0.5, 5.0}};

First we can use GatherBy to group entries by bin:

In[2]:= GatherBy[data, Ceiling[First[#], 0.2] &]

Out[2]= {{{0.1, 1.}, {0.2, 2.}}, {{0.3, 3.}, {0.35, 3.5}, {0.4, 4.}}, {{0.5, 5.}}}

Then select the second element of each pair (Last) and calculate the means:

In[3]:= Mean[Last /@ #] & /@ %

Out[3]= {1.5, 3.5, 5.}
Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
  • This method groups elements 0.29 and 0.31 -- shouldn't these be in separate bins? – Mr.Wizard May 07 '12 at 19:35
  • @Mr.Wizard No, because with a bin size of 0.2 everything between 0.2 and 0.4 should be in the same bin. I just realized though that $MachineEpsilon is too small a number to add here if the $x$-values are greater than 1. (I added a note about this.) – Szabolcs May 07 '12 at 19:37
  • If that is the intent couldn't you use Ceiling[#, 0.2]? – Mr.Wizard May 07 '12 at 19:39
  • @Mr.Wizard Yes, you are right, I fixed it. – Szabolcs May 07 '12 at 19:44
  • Voted. If the OP actually wants bins to 0.1, 0.3, 0.5 which was my own reading, this is more complicated I think. – Mr.Wizard May 07 '12 at 19:45
  • @Mr.Wizard that's how I understood the question as well. You could – Heike May 07 '12 at 19:53
6

I propose:

data = {{0.1, 1.0}, {0.2, 2.0}, {0.3, 3.0}, {0.35, 3.5}, {0.4, 4.0}, {0.5, 5.0}};

Mean /@ Reap[#2 ~Sow~ Ceiling[#, 0.2] & @@@ data][[2]]
{1.5, 3.5, 5.}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
4

Another offering making use of BinLists:

data = {{0.1, 1.0}, {0.2, 2.0}, {0.3, 3.0}, {0.35, 3.5}, {0.4, 
    4.0}, {0.5, 5.0}};

Smoothed[data_, binSize_] := 
 With[{xs = data\[Transpose] // First, ys = data\[Transpose] // Last},
   Mean@ys[[#]] & /@ (Flatten[Map[Position[xs, #] &, #]] & /@ 
     BinLists[xs, 0.2 + $MachineEpsilon])]

Smoothed[data, 0.2]

(* ->  {1.5, 3.5, 5.}  *)
image_doctor
  • 10,234
  • 23
  • 40
1

An alternative way to use BinLists:

Mean /@ (Join @@ BinLists[data, .2 + $MachineEpsilon,
  Max[data[[All, 2]]] + 1][[All, All, All, -1]])

{1.5, 3.5, 5.}

kglr
  • 394,356
  • 18
  • 477
  • 896