33

Bug introduced in 8.0 and fixed in 11.1


I found a strange behavior regarding the CDF of the bivariate Normal distribution

CDF[MultinormalDistribution[{0,0},({{1,37/40},{37/40,1}})],{0,0.2}]

gives

0.446357

On the other hand, the direct way

NIntegrate[PDF[MultinormalDistribution[{0,0},({{1,37/40},{37/40,1}})],{x, y}],{x,-\[Infinity],0},{y,-\[Infinity],0.2}]

gives

0.470073

which is right, confirmed by R software and other programs.

What the hell is going on here?

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
JHT
  • 1,005
  • 8
  • 16
  • 4
    Please do not add the [tag:bugs] tag when you post a new question. It is a special tag that is meant to be added by someone else than the original asker. – Szabolcs Oct 29 '16 at 12:53
  • 1
    Problem is that .2 Try CDF[MultinormalDistribution[{0, 0}, ({{1, 37/40}, {37/40, 1}})], {0, .200000000000000000000}], yields 0.4700730326142416330 on my machine. – bobbym Oct 29 '16 at 12:59
  • Not on my system. I'm using Version 8.0. – JHT Oct 29 '16 at 13:01
  • What does this get for you N[CDF[MultinormalDistribution[{0, 0}, ({{1, 37/40}, {37/40, 1}})], {0, 1/5}],20] – bobbym Oct 29 '16 at 13:06
  • Thanks, surprisingly 0.47007303261424163299, but why is that? – JHT Oct 29 '16 at 13:08
  • Somebody better at teaching in here than me can write a nice answer why it is not surprising. My one liner ain't good enough to be an answer. – bobbym Oct 29 '16 at 13:13
  • 1
    It appears that when calculations are done with machine precision, something goes very wrong. When they are done in arbitrary precision, all is fine, even if the precision is low. I suspect that entirely different methods are used in the two cases, and I don't think that this is due to precision loss. – Szabolcs Oct 29 '16 at 13:16
  • 3
    If you look at this plot, those cusps shouldn't be there. They should be "round". ContourPlot[ CDF[d, {x, y}], {x, -3, 3}, {y, -3, 3}, PlotRange -> All, MaxRecursion -> 3 ]. Does look like a bug. Please do report this to Wolfram Support. – Szabolcs Oct 29 '16 at 13:17
  • @Szabolcs that integral might be chewing up lots of digits. Numerical integration can be tricky. Sorry, did not see your added comments in time. – bobbym Oct 29 '16 at 13:18
  • 1
    @bobbym This is not a hard integral numerically and I'm not even sure it internally uses simple integration for this ... also, Block[{$MaxExtraPrecision = 0}, N[CDF[d, {0, 1/2}], 6]] gives the correct result and it only has 6 digits to work with, much less than the 15 digits machine precision has. – Szabolcs Oct 29 '16 at 13:21
  • 1
    My concern is that many scientists like me used these very basic functions, their results might be corrupted. – JHT Oct 29 '16 at 13:24
  • @Szabolcs nevertheless it is always a good starting place when a argument is entered like .2 to start right there and try other ideas. I know you know that already so I am just making the point in passing. – bobbym Oct 29 '16 at 13:26
  • 5
    @fwgb Hamming said the purpose of computing is insight not numbers. You have to learn to mistrust computers too. – bobbym Oct 29 '16 at 13:30
  • @bobbym Good quote. You never stop learning. I'm goint to contact Support. – JHT Oct 29 '16 at 13:35
  • 1
    @bobbym He also said, many years later, "The purpose of computing numbers is not yet in sight." – evanb Oct 29 '16 at 13:41
  • 1
    Also Probability[ x <= 0 \[And] y <= 0.2, {x, y} \[Distributed] MultinormalDistribution[{0, 0}, {{1, 37/40}, {37/40, 1}}]] and NProbability[ x <= 0 \[And] y <= 0.2, {x, y} \[Distributed] MultinormalDistribution[{0, 0}, {{1, 37/40}, {37/40, 1}}]] give different results. – Karsten7 Oct 29 '16 at 13:45
  • 6
    People who are voting to close as "simple mistake": what exactly is the mistake here? I don't see why this is the OP's mistake. It looks like a bug to me. A very disturbing bug that could invalidate results, as the OP said. I don't see good evidence that this is really a precision loss issue. Maybe it is, but I am not convinced. – Szabolcs Oct 29 '16 at 15:07
  • So the problem is that 'CDF[Multinormal...]' can go completely wrong at 'MachinePrecision'. Not really a simple mistake... – JHT Oct 29 '16 at 15:15
  • @Szabolcs Unfortunately I don't have access to any Mathematica licence for a few months (sniff) so I can't try it but won't the argument of the $N$ simplify to something algebraic in your example? Then $MaxExtraPrecision would only apply to the latter, which is a different problem. – The Vee Oct 29 '16 at 15:31
  • @TheVee No, it doesn't in this particular case. (In other cases it does, not in this particular example.) – Szabolcs Oct 29 '16 at 15:35
  • 1
    I read "Bug introduced in 8.0 or earlier..." but the word EARLIER surprises me very much. Earlier than 8.0 it was 7.0.1. Today I have Mathematica 7.0.1 at hand, on Windows 7x64, I check the subject problem CDF[MultinormalDistribution[{0,0},({{1,37/40},{37/40,1}})],{0,0.2}] and get the correct result 0.470073, the same as from using NIntegrate. Therefore the statement about "earlier" seems to be incorrect. – innaiz Nov 01 '16 at 11:56
  • 1
    Thank you! That was added by an admin. Good to know that only 8 and younger are affected. – JHT Nov 01 '16 at 22:15
  • 4
    @fwgb Thank you for the reporting the issue. It has now been addressed in the sources and the fix will become available in the next release. The bug affects machine precision computation for $\rho^2 > 0.95^2$. Please use computation at $MachinePrecision as a temporary work-around. – Sasha Nov 17 '16 at 14:59

2 Answers2

29

I almost believe the precision argument. But not quite.

dist = MultinormalDistribution[{0, 0}, ({{1, 37/40}, {37/40, 1}})];
CDF[dist, {0, 0.2`3}]
Precision[0.2]
$MachinePrecision
CDF[dist, {0, 0.2}]
(*  0.47  *)
(*  MachinePrecision  *)
(*  15.9546  *)
(*  0.446357  *)

So with only three digits of initial and intermediate precision, we get the right answer, but with nearly 16 (initially), we do not. Even assuming less than one digit of precision in the input, the correct output cannot be produced by the MachinePrecision computation. (The function is monotonic over the interval used.)

NMinimize[{CDF[dist, {0, y}], 0.0 <= y <= 0.4}, y]
NMaximize[{CDF[dist, {0, y}], 0.0 <= y <= 0.4}, y]
(*  {0.437977, {y -> 0.}}  *)
(*  {0.465287, {y -> 0.4}}  *)

The discrepancy between the correct CDF (blue) and the MachinePrecision CDF (yellow) can be quite large.

Plot[{
    NIntegrate[ PDF[ MultinormalDistribution[
      {0, 0}, ({{1, 37/40}, {37/40, 1}})], {x, y}], 
      {x, -∞, 0}, {y, -∞, u}],
    CDF[dist, {0, u}]}, 
  {u, 0, 1}]

Mathematica graphics

(I've weakly checked that the large discrepancy is not a result of numerical integration. Taking $m$ to be the off-diagonal element of the covariance matrix and assuming $0 < m < 1$, either the $x$ integral or the $y$ integral can be performed by Integrate, giving $\frac{1}{2 \sqrt{2 \pi}} \mathrm{e}^{-\frac{y^2}{2}} \text{erfc}\left(\frac{m y}{\sqrt{2-2 m^2}}\right)$ and $\frac{1}{2 \sqrt{2 \pi }} \mathrm{e}^{-\frac{x^2}{2}} \left(\text{erf}\left(\frac{u-m x}{\sqrt{2-2 m^2}}\right)+1\right)$, respectively. Replacing the double numerical integral with the single numerical integral of either of these does not visibly change the graph.)

Also, Karsten 7. is correct. This discrepancy suddenly turns on for a critical value of the off-diagonal covariance elements near 0.925.

Plot[
  CDF[ MultinormalDistribution[
      {0, 0}, ({{1, SetPrecision[x, 15]}, {SetPrecision[x, 15], 1}})], 
    {0, 0.2`15}] - 
  CDF[MultinormalDistribution[{0, 0}, ({{1, x}, {x, 1}})], 
    {0, 0.2}], 
  {x, 0.8, 1}, PlotRange -> All]

Mathematica graphics

This last is very strong evidence of a method switch introducing error, not precision loss.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
Eric Towers
  • 3,040
  • 12
  • 14
  • Thanks a lot. I was not aware that a different method is used for diagonal values larger than 0.925. Now the gap makes sense and only this method is corrupted. – JHT Oct 30 '16 at 09:33
11
dist = MultinormalDistribution[{0, 0}, ({{1, 37/40}, {37/40, 1}})];

MachinePrecision: "Machine-precision numbers (often called simply 'machine numbers') always contain a fixed number of digits and maintain no information about precision. ... Machine-precision computations are typically performed using native floating-point unit and low-level numeric library operations that are typically very fast (particularly so in matrix arithmetic), but provide no tracking of precision loss that may occur due to numerical round-off and other factors during a computation. As a result, machine arithmetic gives fast but numerically unvalidated results that may differ substantially from correct values."

cdf1 = CDF[dist, {0, .2}]

(*  0.446357  *)

Arbitrary-Precision Numbers "When you do calculations with arbitrary-precision numbers, the Wolfram Language keeps track of precision at all points. In general, the Wolfram Language tries to give you results which have the highest possible precision, given the precision of the input you provided."

Changing 0.2 to an arbitrary-precision number

cdf2 = CDF[dist, {0, .2`15}]

(*  0.47007303261424  *)

Similarly, using WorkingPrecision when plotting

ContourPlot[
 CDF[dist, {x, y}], {x, -3, 3}, {y, -3, 3},
 PlotRange -> All,
 MaxRecursion -> 3,
 WorkingPrecision -> 15]

enter image description here

Bob Hanlon
  • 157,611
  • 7
  • 77
  • 198
  • Good explanation, but changing 37/40 to 37./40. makes the above plot wrong again. How to aviod that? – JHT Oct 29 '16 at 14:58
  • @fwgb Make sure you do not introduce any machine precision numbers. 37. is machine precision. MachinePrecision is contagious and will turn everything else it touches into machine precision. SetPrecision can convert to arbitrary precision. Rationalize can convert to exact. – Szabolcs Oct 29 '16 at 15:05
  • @fwgb I think you might find this useful: http://mathematica.stackexchange.com/q/6421/121 – Mr.Wizard Oct 29 '16 at 15:18
  • 1
    I don't think that is the full story here. There is something else going wrong. I suspect that the implementation switches to a buggy implementation, when one parameter has MachinePrecision. – Karsten7 Oct 29 '16 at 17:49
  • 1
    @Karsten7. Yes you're right. The funny thing is that for <= 36/40 everything is fine. For >= 37/40 the problem occurs. – JHT Oct 29 '16 at 18:36
  • 7
    @fwgb Glancing over the code, it looks like Statistics`MultinormalDistributionsDump`GenzDreznerWeselowskyBvnlCDF gets called, which contains an If[ar < 0.925, ...]. Test: Plot[CDF[MultinormalDistribution[{0, 0}, ({{1, SetPrecision[x, 15]}, {SetPrecision[x, 15], 1}})], {0, 0.2`15}] - CDF[MultinormalDistribution[{0, 0}, ({{1, x}, {x, 1}})], {0, 0.2}], {x, 0.8, 1}, PlotRange -> All] – Karsten7 Oct 29 '16 at 20:01