3

I have a very large set of 2D points:

numberOf2DPoints = 10^6;
pointList = RandomReal[{0, 1000}, {numberOf2DPoints, 2}];

I'd like to find a way to quickly generate a distribution I can study for the number of points within a distance $r$ from each point, and then I'd like to select points that have at least a lowerbound $k_a$ and an upperbound $k_b$ number of points within a distance $r$ of themselves. Is there a way to use a function like Nearest to accomplish this?

Clarification --- The lowerbound $k_a$ and upperbound $k_b$ refers strictly to the count for the number of points in a circular disk of radius $r$ centered on a particular point (hopefully this makes sense). So I'd want basically a simple histogram for what this distribution of point counts looks like, and to select points that have satisfy the upper- and lowerbound point count criterion.

RTaylor
  • 237
  • 1
  • 5

2 Answers2

5

It's not easy to find in the documentation on Nearest and NearestFunction but they can return all points within a certain radius.

From tutorial/UsingNearest

Nearest[data, x, {n, r}]
give up to the n nearest elements to x within a radius r

So you can get all points that lie between a distance of 2 and 3 like so:

numberOf2DPoints = 10^6;
pointList = RandomReal[{0, 1000}, {numberOf2DPoints, 2}];
nf = Nearest[pointList];

Complement[
 nf[pointList[[31]], {Infinity, 3}],
 nf[pointList[[31]], {Infinity, 2}]]

Perhaps there is yet another way to call a NearestFunction that removes the need for Complement

ssch
  • 16,590
  • 2
  • 53
  • 88
  • Awesome, sometimes I have so wanted a Nearest[data, x, {n, r}]. – Michael E2 Oct 27 '13 at 20:16
  • @ssch Fantastic! To clarify something, I meant selecting for points that had an upper and lowerbound count of points, not to count the number of points between an upper or lowerbound Euclidean distance from the point. So the Complement operation isn't necessary. :) – RTaylor Oct 27 '13 at 20:24
2

This is the distribution you're after. Not as fast as one might want, but:

numberOf2DPoints = 10^5;
pointList = RandomReal[{0, 1000}, {numberOf2DPoints, 2}];
f = Nearest[pointList];
leuc = EuclideanDistance[#, f[#, 2][[2]]] & /@ pointList;
h[leuc_, min_, max_] := Length@Select[leuc, min <= # <= max &]
Plot3D[h[leuc, min, max], {min, 0, 7}, {max, 0, 7}, PlotRange -> All]

Mathematica graphics

Dr. belisarius
  • 115,881
  • 13
  • 203
  • 453
  • Perhaps there is a way for computing h[] faster. Let's see. – Dr. belisarius Oct 27 '13 at 20:17
  • This is also great, but I think I was unclear in my writing, and I apologize. I meant a distribution for the count of the number of points within a disk centered on each point. So it should be a simple 1D curve or histogram. – RTaylor Oct 27 '13 at 20:26
  • 1
    @RTaylor "for each point" or "for one point" ? – Dr. belisarius Oct 27 '13 at 20:31
  • I mean that we place a disk of radius $r$ on the plane centered at each point. I'm then looking to generate a distribution for the number of points contained in a disk by looking at all disks. – RTaylor Oct 27 '13 at 20:33
  • For the selection part, I then want to select points where their corresponding disks contain at least $k_a$ points and at most $k_b$ points. Is this clearer? Apologies again. – RTaylor Oct 27 '13 at 20:34
  • @RTaylor That's what I've done. For all disks, h[min, max] is the number of points between min and max :) – Dr. belisarius Oct 27 '13 at 20:35
  • Ah I see, I misunderstood your plot. Would it work faster to use the Nearest specification ssch is referring to? It seems like that does some kind of binning steps and works more quickly than computing the distance from one point to every other point. – RTaylor Oct 27 '13 at 20:40