Incresing efficiency of the code:

Question

I have a set of points, representing location of events in a map, and I want to find the closest neighbour for each point. Variable xys is a 2D matrix containing the set of coordinates for each point (column 1 for x- and column 2 for y-coordinate).

xys = Table[RandomReal[{1, 9000}], {rr, 1, 300}, {cc, 1, 2}]

To calculate the distance I come up with two approaches:

Approach 1

Table[ ArcLength[Line[{xys[[jj]], xys[[ii]]}]], {ii, 1, Length[xys]}, {jj, 1, Length[xys]}]

{21.8557, Null}

Approach 2

Table[Sqrt[(xys[[1, 1]] - xys[[ii, 1]])^2 + (xys[[1, 2]] - xys[[ii, 2]])^2], {ii, 1, 1000}];

{0.468003, Null}

Approach 1 quite inefficient: for calculating 300 points, it takes 21 sec. Approach 2 is almost two order of magnitude faster, at 0.4 sec.

What I would like to understand is why is there such a big difference. I thought ArcLength[Line[...]] would be the faster and more efficient since it uses Mathematica functions. Since my real data set ultimately will be 10000-30000 points, the whole process will end up to be quite long even with the second approach. Therefore I would be grateful if anyone could suggest an even more efficient and faster approach.

Thanks,

You may also be interested in Nearest, GeoNearest, NearestNeighborGraph and other built in functionality. — MarcoB, Jun 27 '18 at 13:09
If the desire is to get the geographical distances, fastest is to use Nearest (as noted by others) with Euclidean distance, and postprocess to get geodistances between relevant pairs. This will work correctly if all points are in the same hemisphere. — Daniel Lichtblau, Jun 28 '18 at 15:21

score 9 · Accepted Answer · answered Jun 27 '18 at 13:52

Reading the documentation of ArcLength, it seems to perform an integration to determine the length of an arbitrarily complex arc. It is therefore written to support much more complex shapes than simple point-to-point lines and is simply oversized for your purpose.

The built-in function you are probably looking for is EuclideanDistance, which is about two orders of magnitude faster than ArcLength:

ArcLength[Line[{{0, 0}, {1, 0}}]] // RepeatedTiming // ScientificForm
EuclideanDistance[{0, 0}, {1, 0}] // RepeatedTiming // ScientificForm

{2.0*10^(-4), 1}

{4.97*10^(-6), 1}

As for the faster solutions, they have already been stated in the comments and answers. Nearest is usually the fastest way to do such tasks. By the way, when constructing a DistanceMatrix as suggested by @kglr, it also uses EuclideanDistance as the default DistanceFunction for numerical data.

kglr · Answer 2 · 2018-06-27T13:18:40.343

xys = RandomReal[{1, 9000}, {10, 2}] 
DistanceMatrix[xys] //MatrixForm // TeXForm

$\tiny\left( \begin{array}{cccccccccc} 0. & 2811.96 & 1448.01 & 4793.19 & 4974.56 & 3210.02 & 3812.18 & 4829.63 & 3786.22 & 4540.3 \\ 2811.96 & 0. & 2223.34 & 7489.06 & 2545.11 & 5986.82 & 6588.37 & 4435.52 & 6566.55 & 1979.34 \\ 1448.01 & 2223.34 & 0. & 6082.27 & 3805.6 & 4428.17 & 5014.48 & 5624.91 & 4980.33 & 4194.48 \\ 4793.19 & 7489.06 & 6082.27 & 0. & 9767.75 & 1683.23 & 1176.8 & 7061.33 & 1226.24 & 8869.71 \\ 4974.56 & 2545.11 & 3805.6 & 9767.75 & 0. & 8164.34 & 8762.34 & 6654.91 & 8732.36 & 2983.62 \\ 3210.02 & 5986.82 & 4428.17 & 1683.23 & 8164.34 & 0. & 602.571 & 6362.66 & 579.835 & 7528.06 \\ 3812.18 & 6588.37 & 5014.48 & 1176.8 & 8762.34 & 602.571 & 0. & 6798.88 & 53.2738 & 8113.77 \\ 4829.63 & 4435.52 & 5624.91 & 7061.33 & 6654.91 & 6362.66 & 6798.88 & 0. & 6812.2 & 3916.62 \\ 3786.22 & 6566.55 & 4980.33 & 1226.24 & 8732.36 & 579.835 & 53.2738 & 6812.2 & 0. & 8100.6 \\ 4540.3 & 1979.34 & 4194.48 & 8869.71 & 2983.62 & 7528.06 & 8113.77 & 3916.62 & 8100.6 & 0. \\ \end{array} \right)$

Nearest[xys -> {"Index", "Element", "Distance"}, xys, 2][[All, 2]]

{{3, {4186.12, 3056.38}, 1448.01},
{10, {126.15, 4110.1}, 1979.34},
{1, {4655.43, 4426.22}, 1448.01},
{7, {7903.7, 6421.58}, 1176.8},
{2, {2004.91, 3487.15}, 2545.11},
{9, {7905.61, 6368.34}, 579.835},
{9, {7905.61, 6368.34}, 53.2738},
{10, {126.15, 4110.1}, 3916.62},
{7, {7903.7, 6421.58}, 53.2738},
{2, {2004.91, 3487.15}, 1979.34}}

score 4 · Answer 3 · answered Jun 27 '18 at 16:12

These are the tests and timings of different approaches, and lastly what I think is the best implementation

xys = Table[RandomReal[{1, 9000}], {rr, 1, 1000}, {cc, 1, 2}];

dist = Table[EuclideanDistance[xys[[ii, 1]], xys[[jj, 1]]], {ii, 1, Length[xys]}, {jj, 1, Length[xys]}]; // Timing

dist = DistanceMatrix[xys]; // Timing

{1.38841, Null}

{0.0312002, Null}

The EuclideanDistance is indeed the right function, as Theo Tiger has nicely explained. However, the inbuilt function DistanceMatrix is 1-2 order of magnitude faster. Therefore, DistanceMatrix appears to be the better option for calculating all distance between points and generate a matrix.

Flatten[Table[TakeSmallest[dist[[ii, All]], 11][[2 ;; 11]], {ii, 1, Length[dist]}]]; // Timing

Flatten[Table[Sort[dist[[ii, All]]][[2 ;; 11]], {ii, 1, Length[dist]}]]; // Timing

{0.0780005, Null}

{0.156001, Null}

For finding the first nearest object(s), Sort is a good way, but my test I found that TakeSmallest is an even better approach. In conclusion DistanceMatrix and TakeSmallest are the one I used for my solution and I when I computed to find the 10 closest neighbors' distances for 10 sets of 10000 points each, it took mere seconds

Table[
xys = RandomReal[{1, 9000}, {10000, 2}];

dist[rr] = DistanceMatrix[xys];

nearneighbor[rr] = 
Flatten[Table[ 
  TakeSmallest[dist[rr][[ii, All]], 11][[2 ;; 10]], {ii, 1, 
   Length[dist[rr]]}]];

, {rr, 1, 10}]; // Timing

{3.47882, Null}

Thank you all for your help.

You may be interested in the Difference between Timing and Absolute Timing. — Michael E2, Jun 27 '18 at 23:54
A variation on @kglr's, Nearest[xys -> "Distance", xys, 10], does the last task in under 0.1 sec. on laptop. (Note it returns ten distances, but the span 2 ;; 10in your code will yield only nine. I suspect it's typo and you meant to write 2 ;; 11. You can also get the same effect with Rest@TakeSmallest[..]) — Michael E2, Jun 28 '18 at 00:07

Incresing efficiency of the code:

3 Answers3