Some ideas that might prove useful:
Assuming you've all ready applied the iterative closest point algorithm, as you mentioned in the comments, I would first fit (least squares / quadratic programming - fast fit in all cases) a smooth curve (b-spline) in each data set, say $\mathbf{C}_1(t)$ , $\mathbf{C}_2(t)$, where $t\in [0,1]$ some affine parameter. It is to be understood that each function represents a set of points. The good thing about these curves is that you can take advantage of their geometric properties (gradient, curvature, even topological properties etc). Clearly I would expect two similar set of points, respectively curves, to be similar in many levels.
You can find starting point information on b-splines and fitting here:
http://en.wikipedia.org/wiki/B-spline#Curve_fitting
There exist several implementations in several programming languages for your needs.
Now there exist many interesting scalar (i.e. orientation independent) nice properties that you can use to compare your result. Some examples:
Similar spatial positions:
$$d_1 = \int_0^1 ||\mathbf{C}_1(t) - \mathbf{C}_2(t) || dt$$
Similar spatial positions for tangents:
$$d_2 = \int_0^1 ||\dot{\mathbf{C}}_1(t) - \dot{\mathbf{C}}_2(t) || dt$$
Minimize angle between tangent vectors:
$$d_3 = \int_0^1 ||\dot{\mathbf{C}}_1(t) \cdot \dot{\mathbf{C}}_2(t) || dt$$
Minimize curvature differences:
$$d_4 = \int_0^1 ||\kappa_1(t) - \kappa_2(t) || dt,$$
$\kappa_i$ being the curvature of each curve etc.
Minimize angle between acceleration vectors:
$$d_3 = \int_0^1 ||\ddot{\mathbf{C}}_1(t) \cdot \ddot{\mathbf{C}}_2(t) || dt$$
I assume there must exist topological criteria as well (closed / open /interloping curves) that you can use.
I would not try to find a scalar that is some linear combination of these, instead I would work with a vector similarity/fitness function:
$$\mathbf{f} = (d_1, \ldots, d_n)$$
and assign different numerical thresholds for each entry. In a software engineering approach, two similar curves would be identified when each entry of the fitness function is smaller than some threshold (different for each slot). The idea of not combining all the above in one scalar, comes from multi-objective optimization.
Important: I would identify the optimal thresholds for each entry from ideal theoretical models (construct points from similar curves with random noise of some order, and establish similarity criteria / threshold).
Hope it helps.
\cdot\circ~~~~\cdot~~\circ\cdot$$ You could say they both form a line, so they are, or that the points are too far from each other, so they are not. – Regret May 21 '15 at 07:20