Floating point precision and speed

Question

I am attempting to understand how to speed up the evaluation of numerical expression. Specifically, following the first point of this blog, Mathematica is much quicker dealing with floating-point arithmetic than arbitrary precision,

N@Det@Table[1/(1 + Abs[i - j]), {i, 1, 150}, {j, 1, 150}] // RepeatedTiming
Det@Table[1/(1. + Abs[i - j]), {i, 1., 150.}, {j, 1., 150.}] // RepeatedTiming
(* Out: {0.591, 9.30311*10^-21} *)
(* Out: {0.0050, 9.30311*10^-21} *)

What puzzles me is that asking for a precision other than machine precision, Mathematica is now almost as slow as with arbitrary precision, even if the precision is smaller than machine precision:

Det@Table[1/(1.`10 + Abs[i - j]), {i, 1.`10, 150.`10}, {j, 1.`10, 150.`10}] // RepeatedTiming
(* Out: {0.47, 9.30310687*10^-21} *)

Why is that?

Edit: Thanks to the comments, I realised that using "`10" implies that Mathematica will use arbitrary precision, instead of machine precision, hence the loss in speed. In that case, is there a way for Mathematica to do the calculations using machine precision, but increasing the number of digits after the decimal point it is taking into account? I do not want it do any of the additional things it does in arbitrary precision (like tracking precision/error, etc).

Whether greater, equal to or less than machine precision, it is still using arbitrary precision arithmetic with all the overhead that entails. — Daniel Lichtblau, Sep 14 '22 at 18:22
For some additional context: even with very low precision numbers, arbitrary precision arithmetic does additional things like precision tracking, which naturally makes it slower than machine precision. — eyorble, Sep 15 '22 at 00:19
Machine precision operations are implemented in dedicated hardware of your CPU, that can add and multiply two numbers in a very few clock cycles. Arbitrary precision operations are implemented in software. See this Mathematica tutorial for example, or this old post. — user293787, Sep 15 '22 at 02:34
It is also probably worth mentioning that computing a determinant of a floating point matrix is one of those things which has highly polished algorithms which have been under intense scrutiny of supercomputing/HPC folks for at least half a century, and which tends to have efficient library primitives available. Modern computer microarchitectures are also specifically designed to be very efficient on these kind of operations (specifically, vectored multiply-accumulate operations and other vector processing support), while variable-length number computing misses most of these benefits. — kirma, Sep 15 '22 at 07:02
Re your last question: "Machine precision" means using the native floating point representation of your hardware, which is very fast to work with. This cannot be changed. It is a property of your hardware, not of Mathematica. You seem to be asking whether Mathematica can use fixed-precision arithmetic, and whether this would give any performance benefits. No, it does not have fixed-precision arithmetic. If it did, but the precision were still adjustable (which is what you seem to be looking for), I doubt if it would be significantly faster than it already is. — Szabolcs, Sep 15 '22 at 10:16
Actually much of the bignum linear algebra uses fixed precision, that is, no explicit precision tracking (condition number approximations then in some cases are used to give approximated error). But it is still implemented in software and so is slower than hardware floating point by an order of magnitude or two. — Daniel Lichtblau, Sep 15 '22 at 15:10

score 2 · Accepted Answer · answered Sep 15 '22 at 15:02

Here's what I think I know. My impression is that I've "known" it for such a long time, I can't recall whether I "learned" it from documentation, Wolfram developers, or my own experimentation. Some of it is probably slightly inaccurate, but I doubt any of it is far off.

Arbitrary precision uses a variable-length array to represent the mantissa, probably an array of unsigned ints in a base 2^64 representation. I don't recall if I know anything about the exponent other than what is implied by $MaxNumber. The precision seems to be kept as a machine real. It is meant to be a bound on the error ("uncertainty" in the docs). It is approximate, but it's unlikely to be too small.

The mantissa is longer than necessary, using the extra bits as guard bits to ensure actual round-off error is less than that implied by the rules of error propagation. The minimum length mantissa seems to be 64 bits, which is longer than the length of a double-precision machine real, which is 53 bits.

The length of the mantissa is not a function of the precision only. The mantissa is extended and truncated by the internal algorithm according to its rules, whatever they are. Also, when N[] creates an arbitrary precision number, the length of the mantissa depends on the algorithm use to compute the number. These 100-digit approximations to Pi have an internal difference:

ClearSystemCache[];
pi1 = N[Pi, 100];
pi2 = N[N[Pi, 1000], 100];
ByteCount /@ {pi1, pi2}
pi1 - pi2
SetPrecision[pi1, 120] - SetPrecision[pi2, 120]
(*
{120, 128}
0.*10^-100
5.91530794*10^-111
*)

N[] caches results. If we recompute, we get a different result for N[Pi, 100], the same as we get for pi2.

pi3 = N[Pi, 100];
pi4 = N[N[Pi, 1000], 100];
ByteCount /@ {pi3, pi4}
SetPrecision[pi3, 200] - SetPrecision[pi4, 200]
(*
{128, 128}
0.*10^-200
*)

One can see where the mantissa is extended, assuming ByteCount is an accurate indirect measure. We can also see that where the mantissa is extended depends on which number N[] is approximating:

ListLinePlot[{
  Table[ByteCount@N[Sqrt[2], wp], {wp, 100}],
  Table[ByteCount@N[2, wp], {wp, 100}],
  Table[ByteCount@N[E, wp], {wp, 100}],
  Table[ByteCount@N[Pi, wp], {wp, 100}]},
 PlotStyle -> Reverse@Table[AbsoluteThickness[2 t], {t, 4}],
 GridLines -> {10 Range[12], None},
 PlotLegends -> {Sqrt[2], 2, E, Pi},
 AxesLabel -> {"Precision", "Bytes"}]

We see that the smallest data structure to keep track of the mantissa, exponent, precision and whatever else is 80 bytes, much bigger than an 8-byte machine real. As others have mentioned, bignums are computed in software and machine numbers in hardware, which accounts for the principal difference in speed.

You can speed up arbitrary precision computation by setting $MaxPrecision = $MinPrecision. I think this turns off precision tracking. I've never understood why people use the setting WorkingPrecision -> 10 (slower, pretends less precision: lose-lose, no?) and often comment when I see it on site. It occurs in the docs, imho, because it produces output that is short and readable, not because it's a good value to use. It indirectly affects PrecisionGoal and AccuracyGoal, but if that's one's purpose, they should be set explicitly, not indirectly. But at the risk of propagating a bad practice:

data = N[Range@100000, 10];
Total@data; // RepeatedTiming
Block[{$MaxPrecision, $MinPrecision},
 $MaxPrecision = $MinPrecision = Precision@data;
 (Total@data; // RepeatedTiming)
 ]
(*
{0.0136405, Null}
{0.00927122, Null}
*)

One reason to use WorkingPrecision->10 is when machine precision goes horribly wrong, and you are only interested in the the first few digits. — Carl Woll, Sep 15 '22 at 16:31
To expand on the comment by @CarlWoll, for functions such as FindRoot this will typically decrease the XXXGoal values so the function does not work too hard trying to find more digits. So it's still (slow) software arithmetic, but stopping criteria kick in sooner. — Daniel Lichtblau, Sep 15 '22 at 22:27
@CarlWoll, Daniel: Isn't one better off (or just as well off) with WorkingPrecision -> 16, PrecisionGoal -> 5? That is what I meant by "It indirectly affects PrecisionGoal and AccuracyGoal, but if that's one's purpose, they should be set explicitly, not indirectly." But maybe there's something about WorkingPrecision -> 10 I don't appreciate. — Michael E2, Sep 16 '22 at 00:24
One can indeed use WorkingPrecision -> 16, PrecisionGoal -> 5. WorkingPrecision -> 10 offers no real speed advantage here, other than saving the trouble of explicitly setting the XXXGoal values. — Daniel Lichtblau, Sep 16 '22 at 15:39
@DanielLichtblau Thanks for the reply. I guess I'm thinking that if you have a poorly conditioned problem that fails at machine precision, running it at precision 16 is more likely to succeed than at precision 10. Simple example: mat = N[HilbertMatrix[8], 10]; Norm[Inverse[mat] . mat] fails but mat = N[HilbertMatrix[8], 16]; Norm[Inverse[mat] . mat] succeeds. As I understand bignums, the floating-point computations at precisions 10 and 16 will be more or less the same in speed and numerically (except for different precisions) until the precision falls to zero or less. — Michael E2, Sep 16 '22 at 17:41

Floating point precision and speed

1 Answers1