Trace shows Linux has worse error than Windows in 1/(-1.)^1.2

Question

So I questioned why there are differences between Wolfram Cloud and Windows 11 Evaluate $\int_{0}^{\pi/2} \cos^a (x)\sin(ax) dx$ using Mathematica

E^(-((4 I \[Pi])/5)) Beta[-1, -(6/5), 11/5] // N

I FullSimplified as that gets better results (even though it should not, this is school math, there difference just happens somehow when -0.8090169943749475/0.587785252292473 changes to -0.8090169943749473/0.5877852522924732 with no reason whatsoever, but that is not what this question is about):

E^(-((4 I \[Pi])/5)) Beta[-1, -(6/5), 11/5] //FullSimplify//N

So, the result on windows is -6.6974 - 8.88178*10^-16 I (actually true value is -6.697403197294095 - 8.881784197001252-16 I) vs result on linux cloud that is -6.6974 - 1.33227*10^-15 I.

Why do you think that happens?

Well, I did trace as here: https://mathematica.stackexchange.com/a/271955/82985

and got the following result after 20 minutes of debug. Apparently it gets to

1/(-1.)^1.2 and not only lowers accuracy (why?) and thus loses one 1 in the complex part (on windows and linux), but it also makes a mistake in the real part (only on Linux).

That, that is why. You are aware this exists, right? https://core-math.gitlabpages.inria.fr/

Libm comparisions also exist: https://members.loria.fr/PZimmermann/papers/accuracy.pdf

Trace[-(-1)^(1/5) Beta[-1, -(6/5), 11/5] // N, TraceInternal -> True, 
 TraceDepth -> 10]

will give this on my Windows 11: https://pastebin.com/h9espi8G (wolfram cloud: https://pastebin.com/DGHQzaQX).

P.S. So there are 3 bugs here: on Linux rounding for floating precision (that cannot be controlled by N as in for 1/(-1.)^1.2) is horrible, even on Windows the error is just off-by-one in last digit, maybe because of AMD/Microsoft libm. Still an error. 2nd bug is that the general solver has somehow variable precision, not equal to this floating default and yet the precision in complex part is lower on one digit (or it just gets an error and sees 0 not 1). 3rd bug is that on windows it can sometimes just copy values through trace wrong with no reason (maybe it is some precision autotracking, dunno).

P.S.2 Opened a thread in community: https://community.wolfram.com/groups/-/m/t/2851685

You can specify the numerical accuracy when using N. For instance, N[E^(-((4 I [Pi])/5)) Beta[-1, -(6/5), 11/5], 100] gives it to 100 decimal places. — bill s, Mar 14 '23 at 21:05
And btw, it is not going to work with 1/(-1.)^1.2, only with N[1/(-1)^(12/10), 50], yet there is an error on linux inside the trace of the calculation. — Валерий Заподовников, Mar 14 '23 at 22:09
(1) On "13.2.0 for Mac OS X ARM (64-bit) (November 18, 2022)" versus "13.2.0 for Linux x86 (64-bit) (December 12, 2022)" I get a 1 ulp difference in numericizing one factor N[ Beta[-1, -(6/5), 11/5]] and no difference in numericizing the -(-1)^(1/5) factor from the full-simplified expression. This accounts for the results on Macos and Linux. (2) Different rounding errors from different expressions should be expected (it's a chapter in most num. anal. books), so if there is a point to comparing simplified and unsimplified expressions, it needs to be made clearer. — Michael E2, Mar 15 '23 at 21:58
(2) We are not comparing FullSimplified and not FullSimplified expressions. At least I do not. We compare only FullSimplified expressions, but on Linux (Cloud) and Windows. — Валерий Заподовников, Mar 15 '23 at 22:00
BTW, you can get to perfect result by doing -1. (0.8090169943749475 + 0.587785252292473 I) (5.418313004792032 - 3.936634828025925 I). — Валерий Заподовников, Mar 15 '23 at 22:55
My (2) was a reaction to your claim "with no reason whatsover," which is not true. Of course, you go on to say, "that is not what this question is about." So I understand it is not related to the main question. (I don't know why you want to obfuscate the issue with FullSimplify, if the question is really about numericizing -(-1)^(1/5) Beta[-1, -(6/5), 11/5]. The result of FullSimplify is the same on all systems, isn't it?) — Michael E2, Mar 16 '23 at 02:37
This seems a better beta, at least on the above example: myBeta[x_, a_, b_] = MeijerGReduce[Beta[x, a, b], x] // Activate; — Michael E2, Mar 16 '23 at 02:40
After FullSimplify there is less of an error in both windows and linux. — Валерий Заподовников, Mar 16 '23 at 03:17
"less of an error": so you are comparing simplified and non-simplified expressions. -- That aside (it seems irrelevant to the other part of the question), it's still unclear to me whether are you interested in evaluating the original FullSimplified expression with Beta[] in the body of the question or just the expression 1/(-1.)^1.2 in the title? — Michael E2, Mar 16 '23 at 13:51
From Trace you can see the biggest error in original FullSimplified expression happens in 1/(-1.)^1.2. — Валерий Заподовников, Mar 16 '23 at 20:53

Henrik Schumacher · Answer 1 · 2023-03-15T11:43:26.733

10

This made me curious.

I work under macos on Apple Silicon. I get these results:

a = E^(-((4 I \[Pi])/5)) Beta[-1, -(6/5), 11/5] // N
b = E^(-((4 I \[Pi])/5)) Beta[-1, -(6/5), 11/5] // FullSimplify // N
Abs[Re[a] - Re[b]]/Abs[Re[b]]
Abs[Im[a] - Im[b]]/Abs[Im[b]]
Abs[a - b]/Abs[b]

-6.697403197294095 - 1.7763568394002505*^-15 I

-6.697403197294095 - 8.881784197001252*^-16 I

0.

1.

1.3261534262398443*^-16

So it looks as if we had 100% relative error in the imaginary part. But as a complex number, the relative error is almost as low as it can be. I agree, in a perfect world this should not happen, as double complex numbers have separate matissas and exponents for real and imaginary part.

Let's see where this error might come from; let's check the "complicated" functions first:

a1 = E^(-((4 I \[Pi])/5)) // N
b1 = E^(-((4 I \[Pi])/5)) // FullSimplify // N
Abs[Re[a1] - Re[b1]]/Abs[Re[b1]]
Abs[Im[a1] - Im[b1]]/Abs[Im[b1]]
Abs[a1 - b1]/Abs[b1]

-0.8090169943749473 - 0.5877852522924732 I

-0.8090169943749475 - 0.5877852522924731 I

1.3723111286221163*^-16

1.8888242266968722*^-16

1.5700924586837752*^-16

So both results are basically as good as it can get with double precision.

a2 = Beta[-1, -(6/5), 11/5] // N
b2 = Beta[-1, -(6/5), 11/5] // FullSimplify // N
Abs[Re[a2] - Re[b2]]/Abs[Re[b2]]
Abs[Im[a2] - Im[b2]]/Abs[Im[b2]]
Abs[a2 - b2]/Abs[b2]

5.418313004792032 - 3.936634828025925 I

Of course, the FullSimplify cannot do anything here.

Okay, but we have

a - a1 a2
b - b1 b2

0.+0.I

So where the heck does the error in a - b come from? Just from the multiplication? Indeed! The simple "naive" way to compute the products would be

Re[a] == Re[a1] Re[a2] - Im[a1] Im[a2]
Im[a] == Re[a1] Im[a2] + Im[a1] Re[a2]

And apprently, that's what Mathematica does:

Re[a1] Re[a2] - Im[a1] Im[a2] - Re[a]
Re[a1] Im[a2] + Im[a1] Re[a2] - Im[a]
Re[b1] Re[b2] - Im[b1] Im[b2] - Re[b]
Re[b1] Im[b2] + Im[b1] Re[b2] - Im[b]

The problem with this approach is catastrophic cancellation in the imaginary parts:

Re[a1] Im[a2]
Im[a1] Re[a2]

3.1848044765212715

-3.1848044765212733

Re[b1] Im[b2]
Im[b1] Re[b2]

3.184804476521272

-3.184804476521273

As you can see the two summands are of almost equal magnitudes, but with opposite signs.

So it seems that this can be blamed solely on Mathematica's product routine for complex function. But I am pretty sure, this would also happen in C or C++ with the -Ofast or -ffast-math flags...

edited Mar 15 '23 at 11:43

answered Mar 14 '23 at 23:18

Henrik Schumacher

106,770
7
179
309

"I work under macos on Apple Silicon. I get these results:" Do you use ARM native compilation? Same as in Windows 11 up to last digit in hidden floats. That means the bug comes from the difference "-0.8090169943749475/0.587785252292473 changes to -0.8090169943749473/0.5877852522924732 with no reason whatsover" You need to use Trace to understand where the error comes from and to see this error. Trace is very small in this case. – Валерий Заподовников Mar 15 '23 at 13:41
I do use ARM compilation, yes. The problems lies solely in the product; the factors are computed accurately. I come more and more to the conclusion that compontent-wise precision in complex multiplication cannot be guaranteed; only norm-wise precision. The actual exact imaginary part of your number is 0. So computing the relative error would always entail some division by 0 (or by a very small number). – Henrik Schumacher Mar 15 '23 at 15:54
That is not the main problem I am conserned with here. That error is small and improved with FullSimplify; we all know how slow FullSimplify is, so it cannot just be auto used. It makes the order of operations very different. Again, operations inside Trace have variable precision, because there is autotracking of it... What you showed here is not interesting, because all interesting things happen in C code, not even inside Trace. Lower level, x86_64 assembly. I can reproduce my problem with Linux vs windows even without Trace though. See: https://community.wolfram.com/groups/-/m/t/2851685 – Валерий Заподовников Mar 15 '23 at 16:10
2

I didn't see much of a question in the original post. As for the basic example, some of the difference can be seen comparing In[282]:= InputForm[N[1/(-1)^(6/5)]] Out[282]//InputForm= -0.8090169943749473 + 0.5877852522924732*I, which is computed under the hood as In[281]:= InputForm[N@1/Exp[I*Pi*6/5]] Out[281]//InputForm= -0.8090169943749473 + 0.5877852522924732*I, with In[283]:= InputForm[1/(-1)^N[(6/5)]] Out[283]//InputForm= -0.8090169943749475 + 0.587785252292473*I. Once the approximate value is in that power, chances for to-the-last-ULP computation drop considerably. – Daniel Lichtblau Mar 15 '23 at 17:21
"Why do you think that happens?" That is the question. Again you are all jumping on the change after FullSimplify, while the question is about Linux (Wolfram Cloud) bug vs windows 11, both after FullSimplify. And again, that difference is not important because the bug happens on one of the operations only, as you can see in Trace. – Валерий Заподовников Mar 15 '23 at 19:14
"Again, operations inside Trace have variable precision, because there is autotracking of it..." I don't get what you mean by that. When you apply N, then everything should be converted to doubles (64bit floating point). And IIRC variable precision tracking should be deactivated quite early in the process for performance reasons. So we are in the floating point realm. – Henrik Schumacher Mar 15 '23 at 19:26
Did you check what trace prints? It does not have 0.5877852522924731. It loses one digit, and prints 0.587785252292473. please check the trace! Why cannot you? See 1/(-1.)^1.2 part, yet 1/(-1.)^1.2 just typed in has 1 in the end. So that is why I think it has some autotracking, as it sees you do not need all precision. Trace[-(-1)^(1/5) Beta[-1, -(6/5), 11/5] // N, TraceInternal -> True, TraceDepth -> 10] – Валерий Заподовников Mar 15 '23 at 19:32
How am I supposed to run and see the trace on your system? – Henrik Schumacher Mar 15 '23 at 19:39
No, 1/(-1.)^1.2 won't have autotracking. Whenever one operand is floating point, Mathematica converts the other operands also to floating point representation. That is, Mathematica first computes (-1.)^1.2 and gets -0.8090169943749475` -0.587785252292473` I as result. Then it computes the inverse of this number, which it computes to be -0.8090169943749476` + 0.5877852522924731` I. Which seems to be correct, too (up to floating point errors). – Henrik Schumacher Mar 15 '23 at 19:52
Okay, on Wolfram Cloud it gets -0.8090169943749476 - 0.587785252292473` I for (-1.`)^1.2. So what? This involves an exp or a pow function. You push floating point numbers in -- and you have to expect to get floating point numbers out. With the rounding errors from the C math library used underneath. Which is system dependend. (Observe that all but the last digit are correct.) – Henrik Schumacher Mar 15 '23 at 20:05
"That result is wrong then, it has one less digit in complex part than in real and it is not the same as in normal execution." Yes, we have a floating point error in the very last place: 1 is replaced by 0. But this is allowed, no? – Henrik Schumacher Mar 15 '23 at 20:08
"Which is system dependend." Oh, so you use glibc on linux and AMD libm on windows? Okay that is the anwer. Thanks. "But this is allowed, no? " What is? lowering precision? I do not know, you say there is no autotracking here. – Валерий Заподовников Mar 15 '23 at 20:08
It does not lower the precision. It just does not show you the last digit. Because it is 0. And yes, that is what I am saying: The floating point math libraries are system dependend, i.e., they depend on (i) architecture, (ii) OS, and even (iii) the compiler used. It can even depend on further external libraries that are linked. For example, Mathematica uses Intel MKL quite a lot, e.g. for vectorized floating point calculation. But MKL does not run natively on ARM. So one is forced to link to another library. The best to expect is low relative error. – Henrik Schumacher Mar 15 '23 at 20:13
"It does not lower the precision. It just does not show you the last digit. Because it is 0." But it is not 0. 1/(-1.)^1.2 will give you 1 there in normal execution and it is still 0 in Trace. Moreover, -0.8090169943749476 or ..75 are both wrong, it must be -0.8090169943749474. – Валерий Заподовников Mar 15 '23 at 20:18
Yes, and no. You are right: a = SetPrecision[N[Power[(-1), 6/5], 100], $MachinePrecision] would show you -0.8090169943749474... - 0.5877852522924731... I. But b = -0.8090169943749476 - 0.587785252292473` I is not wrong either because the relative error Abs[a-b]/Abs[a] is 1.5700924586837752`*^-16. This is not less than 0.5 $MachineEpsilon, but closeby. It's definitely less than $MachineEpsilon and thus people would agree, it's (almost) as good as you can get with double precision. – Henrik Schumacher Mar 15 '23 at 20:24
And 0.5 $MachineEpsilon is required by IEEE 754 only for elementary arithmetic operations like +, -, *, /, and sqrt. Not for exp or pow. – Henrik Schumacher Mar 15 '23 at 20:25
"required by IEEE 754 only for elementary arithmetic" exp and pow are implemented in software. Even sincos is nowadays done in software because of a bug in Intel CPUs in fsin instruction. https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/ – Валерий Заподовников Mar 15 '23 at 20:32
BTW, 1.5700924586837752^-16 is 1.570092458683775`^-16 on windows 11. So some differences with ARM too. Trace from Windows 11: https://pastebin.com/HF4ARp1Z – Валерий Заподовников Mar 23 '23 at 23:17

Michael E2 · Accepted Answer · 2023-03-17T10:28:13.183

5

I'm not sure I understand the problem, despite attempts get clarification.

So far I'm not sure there is much more to this than to the following, this-is-school-math, behavior:

Sqrt[3. (4./3. - 1.) - 1.]
(*  0. + 1.49012*10^-8 I  *)

But in this instance, this behavior is predicted. In the OP's case, there is the added variability of different systems using different libraries on different CPUs.

About the variability, my Mac M1 Pro and the online Linux compute different values for $1/z$ for some complex values of $z$, and neither produces a correctly rounded result all the time. So far, all I've witnessed is incorrect rounding (<1ulp error).

If we consider complicated expressions (that is, involving more than one arithmetic operation), such as the OP's 1/(-1.)^1.2 that arises in Trace[N@Beta[-1, -(6/5), 11/5], TraceInternal -> True], we have to consider error propagation and additional rounding errors at each step. Even if operations are correctly rounded, the errors get propagated, and subtractive cancellation may lead to large relative error. One may also have correlated errors that cancel each other out. Different formulas have different numerical properties, even if they are school-math equivalent. For instance (-1.)^(-1.2) is a better way to compute 1/(-1.)^1.2, if you have a good Power function. (Possibly the Beta[] algorithm should be criticized for this.)

Now, interesting things happen in evaluating 1/(-1.)^1.2 on the Mac M1 and on Linux. The Mac computes z1 = (-1.)^1.2 correctly rounded and Linux does not; and neither the Mac nor Linux compute 1/z1 correctly rounded; however, on Linux, the two rounding errors cancel each other out, so that the result of 1/(-1.)^1.2 ends up correctly rounded. One might criticize the complex arithmetic of both systems for division not being correctly rounded; however, if complex division is done in real double-precision, then it is a complicated operation (in the above sense) and entails multiple roundings and error propagation.

That said, neither system computes 1/(-1)^(6/5) correctly rounded. That should be no surprise, since 6/5 is not the same as 1.2, since 1.2 equals 5404319552844595/4503599627370496 exactly, in binary double precision. (This applies to 11/5 and 2.2 in the Beta[] calculation; indeed, the error is 4 times as great, due to the way the bits line up in the binary expansion of the fractions. This rounding error is then propagated through the computation of Beta[].)

It is not made clear in the OP what is so horrible, but I infer it is that the imaginary part of a real result is nonzero in the first expression in the OP, whether simplified or not. I have not looked into how expensive it is to make complex arithmetic be correctly rounded, and that seems to be a significant source of error. (The error in the Beta[] cannot be completely traced.) The sources cited seem to consider real arithmetic and functions only. IEEE 754 does not treat complex functions (e.g. "$pow (x, y)$ signals the invalid operation exception for finite $x < 0$ and finite non-integer $y$"). I think most applications can tolerate errors of a couple ulp. This seems inevitable to me in computations that involve several operations such 1/(-1.)^1.2 or the Beta[] one (unless it is possible to write a special routine for your problem). Mathematica provides arbitrary precision for applications in which such small errors are too large.

Finally, I will point out that it is easy to write a numerically better routine in Mathematica for the OP's initial problem:

myBeta[x_, a_, b_] = 
  MeijerGReduce[Beta[x, a, b], x] // Activate // FunctionExpand;
Block[{Beta = myBeta},
 E^(-((4 I π)/5)) Beta[-1, -(6/5), 11/5] // FullSimplify // N
 ]
(*  -6.6974  *)

Appendix

What $1.2$ and $2.2$ equal in binary floating-point

Binary expansions of 1.2, 2.2. The color squares indicate a 1, the white squares indicate a 0, and the gray squares indicate a nonexistant or insignificant bit.

edited Mar 17 '23 at 10:28

answered Mar 16 '23 at 23:05

Michael E2

235,386
17
334
747

"such as the OP's 1/(-1.)^1.2, which arises in Trace[N@Beta[-1, -(6/5), 11/5], TraceInternal -> True]" Nice! Yes, apparently the issue is that it cannot see that this is just 1: N[1/(-1.)^1.2*E^(-((4 I [Pi])/5))], even if you increase second parameter of N. Yet if you do FullSimplify of the second part it sees it is 1. + 0. I. – Валерий Заподовников Mar 16 '23 at 23:44
"It is not made clear in the OP what is so horrible, but I infer it is that the imaginary part of a real result is nonzero in the first expression in the OP, whether simplified or not." That too, but I mostly do not like different behaviour on different OS. Is MeijerGReduce faster than out of the box or slower? – Валерий Заподовников Mar 16 '23 at 23:52
2

(Note to poster, not Michael E2): Different behaviors on different operating systems is generally due to (1) different libraries, (2) different arithmetic processors and (3) different compilers. But you knew this. As indicated in responses, one can use software arithmetic instead. It's slower by far, but will generally not show such system dependencies. As for those dependencies, I really do not see a way to avoid them. – Daniel Lichtblau Mar 17 '23 at 00:05
So you use Intel libm from OneAPI on windows? Cause it has complex.h. Ideally you would use that on linux too. https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-0/use-the-intel-oneapi-dpc-c-compiler-math-library.html – Валерий Заподовников Mar 17 '23 at 00:24
I have only one computer to use, plus whatever is on WolframCloud. I don't have access to Windows (on either Intel or AMD). (Or maybe you're asking Daniel?) MeijerGReduce converts Beta to ${}_2F_1$, so it probably is of similar speed since the trace shows Beta calls ${}_2F_1$. – Michael E2 Mar 17 '23 at 00:47
@MichaelE2 Yep, I asked Daniel. I also noticed it uses Hypergeometric2F1. Can the same be done to remove the complex part in cases like 1/4 (-PolyGamma[0, 1/2 - I Sqrt[2]] + PolyGamma[0, 1 - I Sqrt[2]] - PolyGamma[0, 1/2 + I Sqrt[2]] + PolyGamma[0, 1 + I Sqrt[2]]) // N – Валерий Заподовников Mar 17 '23 at 01:11
@ВалерийЗаподовников Re different behavior: That was the initial impetus for IEEE 754 on real floating-point math. The variability was both expensive and life-threatening, so the cost of not standardizing real FP was significant. I suppose the IEEE and industry don't feel a complex FP standard is needed (at this time). Re PolyGamma: I don't know how to prevent the imaginary part, or convert it to an expression that yields a real number. The input is complex, and I think that leads to a complex output. I use something like result /. z_Complex /; Im[z] == 0 :> Re[z] to convert. – Michael E2 Mar 17 '23 at 10:47
Only sqrt() is mandated to be 0.500 ulp by IEEE 754 (and is as such in practice, see accuracy.pdf, and sqrts are also correctly rounded, not just 0.5 ulp), back then perfect algorithms did not exist for other functions. Nowadays they do, and you can bruteforce all values in single precision to be sure. – Валерий Заподовников Mar 17 '23 at 12:40
@ВалерийЗаподовников As far as different behavior goes, MKL on Intel Core hardware can produce different results when using the vectorized and non-vectorized version of the same function on the same computer: see this chat message. – Michael E2 Mar 18 '23 at 13:19
Intel libm is not part of MKL, it is part of Intel compiler, legacy one. New one is based on llvm and llvm has all functions correctly rounded. – Валерий Заподовников Mar 18 '23 at 20:56
@ВалерийЗаподовников Are you saying Mathematica no longer uses MKL on Intel processors? My point was simply that the same software (Mathematica) on the same hardware (Mac/Intel) produces different results for seemingly equivalent calculations. It had nothing to do with libm, as far as I can see. You were concerned about different results on different systems. This is different results on the same system. – Michael E2 Mar 18 '23 at 22:41
OneMKL is not used in M1/M2 macs, that is for sure. Feast, Pardiso, FGMRES, etc. Libm is not part of OneMKL. Libm is C standard library, implementation of math.h and complex.h prototypes. It is part of OneAPI as a whole. Mac should be using https://github.com/libxsmm/libxsmm Intel is dumping some of the worse algorithms of OneMKL there. – Валерий Заподовников Mar 19 '23 at 09:08
"OneMKL is not used in M1/M2 macs" -- the chat date was in 2017, which was well before M1. – Michael E2 Mar 19 '23 at 14:26
@MichaelE2 BTW, using actual float representation can you calculate where it is more correct, Linux or Windows for 1/(-1.)^1.2 (where two float numbers are not perfectly equal to -1 and 6/5). I said in the header it is Linux that is less correct, but now I have doubts. – Валерий Заподовников Mar 22 '23 at 16:49

Валерий Заподовников · Answer 3 · 2023-03-15T22:06:35.383

Okay, so the answer is that on Windows it is possible to use Intel C/C++ math.h libm library, which is much better than glibc on windows, indeed accuracy.pdf paper above shows that pow is just 0.817 for glibc (and gcc has its own pow, will not use glibc one) compared to 0.515 ulp for Intel C++ compiler library version 2023.0. And again, here we have complex numbers, that paper does not actually check those. That is worse than core-math, which is ideal 0.5 ulp.

I would love to clarify this and have https://gitlab.inria.fr/core-math/core-math/-/blob/master/src/binary32/pow/powf.c to be used on Mathematica on Linux.

Trace shows Linux has worse error than Windows in 1/(-1.)^1.2

3 Answers3

Linked