3

Bug introduced in 8.0 or earlier and fixed in 11.2.0


CorrelationDistance[{-53.0, 4.3}, {-23.0, 5.4}] is a negative number. It is close to zero, to be sure, but negative. Shouldn't all distance functions guarantee a non-negative result?

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Scott Guthery
  • 303
  • 1
  • 8

2 Answers2

4

This is again a situation where one needs to use a stable formula for computations.

Let's look at the result of CorrelationDistance[] again for reference:

v1 = {-53.0, 4.3}; v2 = {-23.0, 5.4};

CorrelationDistance[v1, v2] // InputForm
   -2.220446049250313*^-16

Note that there is a simple relationship between CorrelationDistance[] and CosineDistance[]:

CosineDistance[v1 - Mean[v1], v2 - Mean[v2]] // InputForm
   -2.220446049250313*^-16

It's the same result. Let's look at the result of using explicit formulae:

c1 = v1 - Mean[v1]; c2 = v2 - Mean[v2];
1 - c1.c2/(Norm[c1] Norm[c2])
   -2.220446049250313*^-16

1 - Normalize[v1 - Mean[v1]].Normalize[v2 - Mean[v2]]
   0.

The second explicit formula gives the correct result.

Still, the need to subtract two quantities that are very nearly equal should give one much reluctance. Thus, here is a stable algorithm, derived from work by Velvel Kahan:

cosDistance[v1_?VectorQ, v2_?VectorQ] := 
   Module[{n1 = Normalize[v1], n2 = Normalize[v2], y},
          y = Norm[n1 - n2]^2; 2 y/(Norm[n1 + n2]^2 + y)]

correlationDistance[v1_?VectorQ, v2_?VectorQ] :=
cosDistance[v1 - Mean[v1], v2 - Mean[v2]]

and thus

correlationDistance[v1, v2]
   0.
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
2

The expected result is exactly zero.

Simplify[CorrelationDistance[{x, y} , {p, q}],
 Assumptions -> {Element[{x, y, p, q}, Reals], x < y, p < q}]

0

The result , -2.22045*10^-16 is zero to within machine precision. Use Chop if you like.

george2079
  • 38,913
  • 1
  • 43
  • 110
  • 1
    It will cause problems when used with some functions (FindClusters) that expect a strictly non-negative value though. For this reason I would consider it a bug, or at least worth reporting. – Szabolcs Nov 07 '17 at 16:23
  • 1
    Another problem is that it may return a Complex result with complex input. The imaginary part is 0., but the head is Complex, which may again cause trouble. – Szabolcs Nov 07 '17 at 16:26
  • Another point of view is that if we were to supply symbolic input, then substitute machine numbers for the symbolic result, then the same thing could happen. There is no way around this. (I would still expect the function to give a strictly non-negative result when given machine precision input.) – Szabolcs Nov 07 '17 at 16:28
  • obviously there is a performance trade if you want to handle the special case. – george2079 Nov 07 '17 at 16:55
  • Isn't that negligible though? Suppose this function is internally optimized for machine numbers. The check for a negative value (and replacing it with 0.0) would likely take less time than a C-language function call (!), not to mention just initiating a Mathematica evaluation. – Szabolcs Nov 07 '17 at 16:58