Why is random noise assumed to be normally distributed?

Question

From residual in the linear regression to noise in signal processing are assumed to be normally distributed? By considering them as normally distributed we are kind of telling the pattern in the noise but shouldn't noise be considered random. This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed. Shouldn't the noise distribution be just random?

I believe there is some lacking in my understanding of the concept of statistical distribution which has lead me to this confusion, or I am looking at it all wrong.

One more example- when one augment data by adding Gaussian noise then it is not expected to change the overall distribution of data, why?

some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist. — Dan Boschen, Nov 06 '18 at 23:37
I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature. — A_A, Nov 07 '18 at 08:55
While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used. — MSalters, Nov 07 '18 at 11:05
One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval. — Dave, Nov 07 '18 at 21:58
Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption. — JosephDoggie, Nov 07 '18 at 22:05

gidds · Answer 1 · 2018-11-07T15:59:48.590

Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:

This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.

Perhaps the issue here is what ‘random’ means?

To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other. ‘Random’ simply means that we can't predict exactly what the next value will be. But that doesn't mean we can't make probabilistic statements about it.

Consider two experiments:

If you throw a (fair) die, then it could show any number from 1 to 6. We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).
If you throw two dice and take their sum, that can be any number from 2 to 12. Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely. (For example, 7 is six times more likely than 12.) So in this case it has a non-uniform distribution. (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)

So there's no contradiction: both cases are random and have a known distribution.

In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns… Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.

(As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution. So you see that crop up a lot.)

This is the answer that addresses the confusion the original questioner has. More like this please. — JonathanZ, Nov 07 '18 at 21:17
@gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification? — zeal, Nov 08 '18 at 01:04

score 9 · Answer 2 · edited Nov 07 '18 at 02:15

the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.

A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.

The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.

The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.

One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.

I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.

There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.

I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+\sqrt{E}$ and the other at $-\sqrt{E}$, with the same distribution from each mean. — Dan Boschen, Nov 06 '18 at 23:46
The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either. — Dilip Sarwate, Nov 08 '18 at 04:21
The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct — , Nov 08 '18 at 07:45

score 8 · Answer 3 · answered Nov 07 '18 at 02:26

normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.

Olli Niemitalo · Answer 4 · 2018-11-07T09:01:41.337

I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.

Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:

$$\begin{array}{l|llll}&\rlap{\text{Sample value and its prob-}}\\ \text{Probability}&\rlap{\text{ability in the distribution}}\\ \text{of distribution}&-2&-1&0&1\\ \hline \color{blue}{0.3}&0.4&0.2&0.3&0.1\\ \color{blue}{0.2}&0.5&0.1&0.2&0.2\\ \color{blue}{0.5}&0.1&0.4&0.4&0.1\end{array}$$

Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:

$$\begin{array}{llll}\rlap{\text{Sample value and}}\\ \rlap{\text{its total probability}}\\ -2&-1&0&1\\ \hline 0.27&0.28&0.33&0.12 \end{array}$$

The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:

$$0.4\times\color{blue}{0.3} + 0.5\times\color{blue}{0.2} + 0.1\times\color{blue}{0.5} = 0.27\\ 0.2\times\color{blue}{0.3} + 0.1\times\color{blue}{0.2} + 0.4\times\color{blue}{0.5} = 0.28\\ 0.3\times\color{blue}{0.3} + 0.2\times\color{blue}{0.2} + 0.4\times\color{blue}{0.5} = 0.33\\ 0.1\times\color{blue}{0.3} + 0.2\times\color{blue}{0.2} + 0.1\times\color{blue}{0.5} = 0.12$$

The laws of probability that were applied:

$$P(A_i\cap B_j) = P(A_i|B_j)\color{blue}{P(B_j)}\quad\text{conditional probability}$$ $$P(A_i) = \sum_jP(A_i\cap B_j)\quad\text{total probability}$$

where $A_i$ are the events of the $i\text{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $j\text{th}$ distribution.

With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.

score -1 · Answer 5 · answered Nov 07 '18 at 21:03

-1

Noise is not random. It is fractal in nature.

Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.

answered Nov 07 '18 at 21:03

Mike Waters

107
3

Why is random noise assumed to be normally distributed?

5 Answers5