37

I have a simple question that is really hard to Google (besides the canonical What Every Computer Scientist Should Know About Floating-Point Arithmetic paper).

When should functions such as log1p or expm1 be used instead of log and exp? When should they not be used? How do different implementations of those functions differ in terms of their usage?

Kirill
  • 11,438
  • 2
  • 27
  • 51
Tim
  • 475
  • 1
  • 4
  • 10
  • 2
    Welcome to Scicomp.SE! That's a very reasonable question, but would be easier to answer if you explained a bit which log1p you're referring to (especially how it's implemented, so we don't have to guess). – Christian Clason Sep 02 '15 at 14:53
  • 4
    For real-valued arguments, log1p$(x)$ and expm1$(x)$ should be used when $x$ is small, e.g., when $1 + x = 1$ in floating point accuracy. See, e.g., http://docs.scipy.org/doc/numpy/reference/generated/numpy.expm1.html and http://docs.scipy.org/doc/numpy/reference/generated/numpy.log1p.html. – GoHokies Sep 02 '15 at 14:58
  • @ChristianClason thanks, I am referring mostly to C++ std or R, but as you ask I start to think that learning about differences in the implementations would be also very interesting. – Tim Sep 02 '15 at 15:04
  • Here's an example: http://scicomp.stackexchange.com/questions/8371/compute-ex-1-x-near-x-0/8374#8374 – Juan M. Bello-Rivas Sep 02 '15 at 16:01
  • Thanks for examples but the problem is that there are multiple examples available, but it is hard to find information about general rules and properties of those functions. Examples are not always easy to extrapolate. – Tim Sep 03 '15 at 07:26
  • 1
    @user2186862 "when $x$ is small" is correct, but not only "when $1+x=1$ in floating point accuracy" (which happens for $x\approx 10^{-16}$ in the usual double precision arithmetic). The documentation pages you linked show that they are useful already for $x\approx 10^{-10}$, for instance. – Federico Poloni Sep 03 '15 at 08:25
  • @FedericoPoloni yes, the $1+x == 1$ case was just an example. – GoHokies Sep 03 '15 at 09:22
  • Very interesting and useful answer. Thanks a lot. I was looking for this too. So just to confirm if we plot log1p() or log() in Python we will get the same shape of outcome, but a more accurate version if we use log1p()? My data has a lot of x<<1 values so makes sense to use log1p(). – Matt G Nov 10 '16 at 09:34

2 Answers2

31

We all know that \begin{equation} \exp(x) = \sum_{n=0}^\infty \frac{x^n}{n!} = 1 + x + \frac12 x^2 + \dots \end{equation} implies that for $|x| \ll 1$, we have $\exp(x) \approx 1 + x$. This means that if we have to evaluate in floating point $\exp(x) -1$, for $|x| \ll 1$ catastrophic cancellation can occur.

This can be easily demonstrated in python:

>>> from math import (exp, expm1)

>>> x = 1e-8 >>> exp(x) - 1 9.99999993922529e-09 >>> expm1(x) 1.0000000050000001e-08

>>> x = 1e-22 >>> exp(x) - 1 0.0 >>> expm1(x) 1e-22

Exact values are \begin{align} \exp(10^{-8}) -1 &= 0.000000010000000050000000166666667083333334166666668 \dots \\ \exp(10^{-22})-1 &= 0.000000000000000000000100000000000000000000005000000 \dots \end{align}

In general an "accurate" implementation of exp and expm1 should be correct to no more than 1ULP (i.e. one unit of the last place). However, since attaining this accuracy results in "slow" code, sometimes a fast, less accurate implementation is available. For example in CUDA we have expf and expm1f, where f stands for fast. According to the CUDA C programming guide, app. D the expf has an error of 2ULP.

If you do not care about errors in the order of few ULPS, usually different implementations of the exponential function are equivalent, but beware that bugs may be hidden somewhere... (Remember the Pentium FDIV bug?)

So it is pretty clear that expm1 should be used to compute $\exp(x)-1$ for small $x$. Using it for general $x$ is not harmful, since expm1 is expected to be accurate over its full range:

>>> exp(200)-1 == exp(200) == expm1(200)
True

(In the above example $1$ is well below 1ULP of $\exp(200)$, so all three expression return exactly the same floating point number.)

A similar discussion holds for the inverse functions log and log1p since $\log(1+x) \approx x$ for $|x| \ll 1$.

Stefano M
  • 3,839
  • 16
  • 24
  • 2
    This answer was already contained in the comments to the OP question. However I felt useful to give a longer (although basic) account just for clarity, in the hope it will useful to some readers. – Stefano M Sep 03 '15 at 09:44
  • OK, but then one can simply conclude "so I can always use expm1 instead of exp"... – Tim Sep 03 '15 at 10:03
  • 1
    @tim your conclusion is wrong: you can always use expm1(x) instead of exp(x)-1. Of course exp(x) == exp(x) - 1 does not hold in general. – Stefano M Sep 03 '15 at 10:13
  • OK, that is clear. And are there any clear cut criteria for $x \ll 1$? – Tim Sep 03 '15 at 13:13
  • 2
    @Tim there is no clear cut threshold, and the answer depends on the precision of the floating point implementation (and the problem being solved). While expm1(x) should be accurate to 1ULP over the entire range $0\leq x \leq 1$, exp(x) - 1 progressively looses precision from few ULP's when $x\approx 1$ to a complete breakdown when $x < \epsilon$, where $\epsilon$ is machine-epsilon. – Stefano M Sep 03 '15 at 20:17
  • 2
    "where f stands for fast" -- No, it stands for float as in the standard C math functions. The intrinsics are named like __expf and can also be accessed using fast math flags. – Jed Brown Dec 30 '23 at 06:57
  • @JedBrown: thanks for pointing out my error. (Yes, my C/C++ skills are a little bit rusted...) – Stefano M Dec 30 '23 at 16:30
4

To expand on the difference between log and log1p it might help to recall the graph if the logarithm:

Logarithm

If your data contains zeroes, then you probably don't want to use log since it's undefined at zero. And as $x$ approaches $0$, the value of $\ln(x)$ approaches $- \infty$. So if your $x$ values are close to $0$, then the value of $\ln(x)$ is potentially a large negative number. For example $\ln(\frac{1}{e}) = -1$ and $\ln(\frac{1}{e^{10}}) = -10$ and so on. This can be useful, but it can also distort your data towards large negative numbers, especially if your dataset also contains numbers much larger than zero.

On the other hand, as $x$ approaches $0$, the value of $\ln(x+1)$ approaches $0$ from the positive direction. For example $\ln(1+\frac{1}{e}) \sim 0.31$ and $\ln(1+\frac{1}{e^{10}}) \sim 0.000045$. So log1p produces only positive values and removes the 'danger' of large negative numbers. This generally insures a more homogeneous distribution when a dataset contains numbers close to zero.

In short, if the dataset is all greater than $1$, then log is usually fine. But, if the dataset has numbers between $0$ and $1$, then log1p is usually better.

sfmiller940
  • 157
  • 1
  • 3
  • I fully disagree with these arguments. There is no "danger" with negative values. If a problem asks you to compute the log of $10^{-24}$, this is what you have to do, full stop. And you speak as if one could trade log for log1p and conversely without compensating elsewhere in the process. – Yves Daoust Jan 22 '20 at 12:46
  • 1
    @YvesDaoust I'm not talking about doing a homework problem. I'm talking about how to analyze data in a useful way. In the real world there are many data sets where having a zeros prohibits logarithmic analysis or having numbers close to zero distorts logarithmic analysis. In these situations it's standard practice to use log1p if adding one to the data is not qualitatively significant. That's a major reason why log1p exists in the first place. – sfmiller940 Jan 22 '20 at 18:11
  • Not at all. I wonder where you found this nonsense. The reason is avoidance of a catastrophic cancellation. – Yves Daoust Jan 22 '20 at 18:26
  • 1
    @YvesDaoust Again it's standard data analysis. Perhaps the "avoidance of catastrophic cancellation" is another reason for log1p. But it's basically sidestepping unwanted zeros/poles either way. – sfmiller940 Jan 22 '20 at 19:10
  • 1
    This is used for $\log x$ close to $x=1$. Read the accepted answer and accept it. – Yves Daoust Jan 23 '20 at 07:19
  • @YvesDaoust There's no contradiction between the accepted answer and mine. I'm just expanding on the logarithm aspect. Also the accepted answer doesn't say that log1p is for close to x=1. It says for x<<1 which means close to x=0. Thus log1p is largely for avoiding zero/poles as I previously stated. – sfmiller940 Jan 23 '20 at 17:56
  • 1
    It seems that you don't want to hear my comment. I specified $\log x$ close to $x=1$ to avoid any ambiguity, which is obviously $\text{log1p }x$ close to $x=0$, but you skipped that. – Yves Daoust Jan 23 '20 at 18:14
  • @YvesDaoust TBH it's hard to make sense of your nonsense complaints. For the third time the main point is handling the zero/pole. Thanks for agreeing. – sfmiller940 Jan 23 '20 at 21:54
  • Moving away from the singularity by translation means that you don't care about accuracy as the modified values are not the true ones. In this case you use $\log(x+1)$, which is good enough. If you do care about accuracy, and are in the vicinity of $x=0$, $\log1p(x)$. – Yves Daoust Jan 23 '20 at 22:00
  • There is still hope that someday you will understand catastrophic cancellation. – Yves Daoust Jan 23 '20 at 22:04
  • @YvesDaoust Obviously $log1p \neq log$. There is still a chance that you'll stop wasting people's times with trolling pointless complaints. – sfmiller940 Jan 23 '20 at 22:04
  • 1
    Late to the party. Very clear and practical explanation. Thank you. – Ronie Sep 24 '20 at 19:05
  • 1
    This post lacks context. I understand that e.g. in a machine learning pipeline it is common to use log1p as a transformer, but this type of use is clearly very specific. Nobody would introduce an intrinsic function in C or C++ only for this type of use, since log(x + 1) would be sufficient to shift the argument away from the “dangerous” zone. IMHO log1p(x) is about reducing the rounding error for x close to zero, not for a handy transformation. – Stefano M Dec 30 '23 at 18:40