Way to measure the similarity/difference of 2D point clouds

Question

i need a way to measure the similarity or difference of two point clouds? The number of points in each point cloud can be different. The Point clouds are already aligned. By similarity I mean the similarity of the shapes.

I have already tried the following approaches:

principal component analysis
Hausdorff distance
mean squared distance of the points
mean distance of the points

All of them didn't work.

I included examples for similar and non similar point clouds.

not similar point clouds

In some cases the results were similar for both similar and non similar point clouds. — similarity problem, May 21 '15 at 07:10
@similarityproblem Then you probably want to tweak your threshold of "similarity". Most of your methods output a number wich is up to you to interpret. You could also try to combine some of the metrics into a new number if that works for you. mean squared distance doesn't look promising to me, unless you only mean the minimum distance from a point in cloud $A$ to any point in cloud $B$. Imagine $A = {(0,0), (1000,0)}$ and $B = A + \mathcal N(0,\epsilon)$. MSD would be somewhere around $500000$ while the mean of minimum distances squared would be around $\epsilon^2$. — AlexR, May 21 '15 at 07:17
The pictures are good, but you could add more information. For example, do you consider the following point clouds similar or not? $$\circ~~\cdot~~\circ~~~~~~\cdot~~~~~~\circ~~\cdot$$ You could say they both form a line, so they are, or that the points are too far from each other, so they are not. — Regret, May 21 '15 at 07:20
@AlexR With mean squared distance i meant the mean of the MSD of all points in cloud A. — similarity problem, May 21 '15 at 07:23
@Regret I consider the point clouds in your example fairly similar. They would be even more similar if the start and end points were aligned. In my case similarity means that the lines, which you recieve if you connect all the points in the clouds, are almost the same. — similarity problem, May 21 '15 at 07:27
@similarityproblem Are translated and rotated (congruent) point clouds similar to you? (That would imply that any $1$-point clouds are similar and clouds translated by any arbitrarily large amount would also be considered similar). — AlexR, May 21 '15 at 07:30
@AlexR Yes and no. The point clouds i am working with were aligned via the Iterative closest point algorithm, so i already computed the translation and rotation and therefore no further rigid motion is allowed. But if you find a way that makes the ICP needless i would also be happy — similarity problem, May 21 '15 at 07:37
@similarityproblem It's actually nicer to not have to consider rotated and translated variants as similar. This way the geometric distances do have a meaning. I will try to write up a few suggestions. You might want to apply a learning algorithm to determine weights and thresholds for each of the measures for optimal results. — AlexR, May 21 '15 at 07:39
@Regret Not in all cases. For example there are cases where the curve of cloud B is only a part of cloud A — similarity problem, May 21 '15 at 07:40

AlexR · Answer 1 · 2015-05-21T08:10:20.850

I suggest a family of metrics wich may be combined (by weighted means, for example) to yield a number. Then finding a good threshold for this number (i.e. if $\mathrm{mydist}(A,B) \le \epsilon$, the clouds $A$ and $B$ are deemed similar) will complete the algorithm.

$$d_i(A,B) = \frac1{|A|+|B|} \left( \sum_{a\in A} \mathrm{dist}_i^i(a,B) + \sum_{b\in B} \mathrm{dist}_i^i(b,A) \right)$$ where $\mathrm{dist}_i$ is the usual distance from a point to a set using the $\|\cdot\|_i$-norm. The exponentiation is optional. The way it is, $d_1$ would be the average Manhattan-distances and $d_2$ would be the average euclidean distance squared.

Another option would be $$\tilde d_k (A,B) := \|f_A - f_B\|_k$$ where $f_C$ is an appropriate interpolation of the point cloud as a curve (for example using splines). This will be appropriate if you don't really care about the points themselves but rather about the curves they form. Note that this approach must take care to chose the "correct" point sequence for the curve (the order in wich the points are traversed). If your cloud has some inherent ordering (for example a measured trajectory), you can use that. If your cloud can resemble a two-dimensional shape (for example a square), this approach will not work properly. In such a case you might want to find the convex hull of the cloud first.
To deal with the "halfway" problem discussed in the comments to @Regret's answer, you can enforce $\|\nabla f_C\|_2 \equiv c$, ensuring a constant-speed traversal of the curve. Chosing $c$ appropriately such that $f_C : [0,1]\to\mathbb R^2$ will then allow for the distances $\tilde d_k$ to be well-defined and $0$ iff the interpolation curves are geometrically identical.

Alex here seems to know much more about this subject than I do, so it would probably be wise to heed his advice! I will leave my answer up for the comments it has garnered, though. — Regret, May 21 '15 at 08:37
@AlexR Your approaches look quite promising. However i don't know how to implement your second approach yet. — similarity problem, May 21 '15 at 09:06
It's much more difficult to implement than the first one, so I'd try the first one before attempting to even implement the second one. The biggest problem is enforcing $|\nabla f_C|2 \equiv c$ for some constant $c$, because that means you need to know the curve length before chosing the "time" knots $t_i$ such that $f_C(t_i) = p_i$ for the interpolating points. Possibly a two-pass method is simplest: First chose the $t_i$ to get _some parameterization, use that to compute the curve length and curve lengths between two points and this data to chose the $t_i$ appropriately for a second pass. — AlexR, May 21 '15 at 09:11
$$\sum_{a\in A} \mathrm{dist}_i^i(a,B)$$ is the sum of the distances of all points in cloud A to the nearest point in cloud B, is this correct? — similarity problem, May 21 '15 at 09:21
Yes, that's correct. $$\mathrm{dist}i(a,B) = \min{b\in B} |a-b|_i$$ — AlexR, May 21 '15 at 09:23

Regret · Answer 2 · 2015-05-21T08:00:08.180

2

Here is one method I can think of. Since your point clouds form a curve, create a function from $[0,1]$ to $\Bbb R^2$ out of both point clouds. In essence, "connect the dots". Let's call the function of the first cloud $A$ and the function of the second cloud $B$. So basically, $A(0)$ would be the first point in cloud $A$ and $A(1)$ would be the final point in cloud $A$. $A(0.5)$ would be halfway along the curve.

To measure the similarity, compute the following integral.

$$\int_0^1\|A(x)-B(x)\|dx$$

A value of $0$ means that $A=B$, and everything greater than $0$ indicates some type of dissimilarity. Note that if two curves are somewhat similar, making them longer will increase this value, although their "similarity" should probably stay the same. To make up for this, you could divide the integral by the sum of the lengths of the curves.

edited May 21 '15 at 08:00

answered May 21 '15 at 07:41

Regret

3,817

If you have any comments/questions/suggestions, let me know. I am aware this is probably not very good, it is just something I quickly came up with. – Regret May 21 '15 at 07:41
This is halfway to a good idea. But if the first point set has a lot of points near the beginning of the curve and the second has lots of points near the end of the same curve, your metric will still give a large dissimilarity. What you should do, once you have a notion of the curves represented by each point set, is to measure the geometrical dissimilarity between the curves themselves, for example using the Hausdorff distance. – May 21 '15 at 07:47
@Rahul: I was thinking of constructing the function based on distance between points rather than the number of the point: in a cloud of $3$ points, $A(0.5)$ is not necessarily the second point, but rather the halfway point of the curve by distance. This way, I don't think it should matter very much where the points are concentrated, as long as the curves formed by them are similar. – Regret May 21 '15 at 07:55
@Regret how would you interpolate the curves if there can be up to 200 points per cloud and I don't know the number of points in advance? – similarity problem May 21 '15 at 08:07
@similarityproblem See my answer. I suggest a spline-like approach. The only think you need to take care of is that $|\nabla f_C|_2$ must stay constant, as outlined sic. – AlexR May 21 '15 at 08:11
If the curves are noisy, that could still throw the arc length parametrizations out of sync (imagine one curve is noisier near the beginning and the other is noisier near the end, and you get the same situation as my first comment). Which is why I suggested the Hausdorff distance, which is independent of parametrization. – May 21 '15 at 08:14
@AlexR: Would splines be very computationally expensive? I have no experience in using them, I was just thinking linear interpolation. – Regret May 21 '15 at 08:15
@Regret linear interpolation is a special case (order $1$ splines). Since splines are piecewise polynomial functions of constant degree, they are quite efficient. They are especially broadly in use as B-Splines (or more general NURBS) in CAD & co. There exist several very efficient algorithms to evaluate a spline at a given parametric coordinate using it's control knots (The points in the cloud, in an ordered fashion). – AlexR May 21 '15 at 08:18

Foivos · Answer 3 · 2015-05-29T07:35:25.453

Some ideas that might prove useful: Assuming you've all ready applied the iterative closest point algorithm, as you mentioned in the comments, I would first fit (least squares / quadratic programming - fast fit in all cases) a smooth curve (b-spline) in each data set, say $\mathbf{C}_1(t)$ , $\mathbf{C}_2(t)$, where $t\in [0,1]$ some affine parameter. It is to be understood that each function represents a set of points. The good thing about these curves is that you can take advantage of their geometric properties (gradient, curvature, even topological properties etc). Clearly I would expect two similar set of points, respectively curves, to be similar in many levels.

You can find starting point information on b-splines and fitting here: http://en.wikipedia.org/wiki/B-spline#Curve_fitting

There exist several implementations in several programming languages for your needs.

Now there exist many interesting scalar (i.e. orientation independent) nice properties that you can use to compare your result. Some examples: Similar spatial positions: $$d_1 = \int_0^1 ||\mathbf{C}_1(t) - \mathbf{C}_2(t) || dt$$ Similar spatial positions for tangents: $$d_2 = \int_0^1 ||\dot{\mathbf{C}}_1(t) - \dot{\mathbf{C}}_2(t) || dt$$ Minimize angle between tangent vectors: $$d_3 = \int_0^1 ||\dot{\mathbf{C}}_1(t) \cdot \dot{\mathbf{C}}_2(t) || dt$$ Minimize curvature differences: $$d_4 = \int_0^1 ||\kappa_1(t) - \kappa_2(t) || dt,$$ $\kappa_i$ being the curvature of each curve etc. Minimize angle between acceleration vectors: $$d_3 = \int_0^1 ||\ddot{\mathbf{C}}_1(t) \cdot \ddot{\mathbf{C}}_2(t) || dt$$

I assume there must exist topological criteria as well (closed / open /interloping curves) that you can use.

I would not try to find a scalar that is some linear combination of these, instead I would work with a vector similarity/fitness function: $$\mathbf{f} = (d_1, \ldots, d_n)$$ and assign different numerical thresholds for each entry. In a software engineering approach, two similar curves would be identified when each entry of the fitness function is smaller than some threshold (different for each slot). The idea of not combining all the above in one scalar, comes from multi-objective optimization.

Important: I would identify the optimal thresholds for each entry from ideal theoretical models (construct points from similar curves with random noise of some order, and establish similarity criteria / threshold).

Hope it helps.

Way to measure the similarity/difference of 2D point clouds

3 Answers3