Why do we divide by standard deviation when standardizing a normal distribution?

Question

We have this random variable $Y= \frac{x - μ}{\sigma}$ to convert a normal distribution $N(\mu, \sigma)$ to a $N(0, 1)$. It is quite intuitive to subtract $\mu$, since you move all the values in the $x$-axis, and thus move the mean $\mu$ to the origin of coordinates. But it does not seem intuitive to divide by the standard deviation.

This answer is basically this, however, I have not understood the answers, specifically this:

$$E[Y] = \frac{E[X] - \mu}{\sigma} = \frac{\mu-\mu}{\sigma} = 0.$$

$$\text{Var}(Y) = \frac{1}{\sigma^2}\text{Var}(X) = \frac{1}{\sigma^2}\sigma^2 = 1.$$

And I would also like to get an intuitive answer.

Dividing by standard deviation squeezes it into a distribution with standard deviation 1. — Neal, Jun 30 '19 at 22:35
Viktor Glombik, Oh true. So the expected value in any normal distribution is equal to the mean? — ESCM, Jun 30 '19 at 22:43
Mean = expected value? Or which terminology are you working with? — ViktorStein, Jun 30 '19 at 22:50

ViktorStein · Accepted Answer · 2019-07-04T22:09:22.930

The first of the formulas uses the linearity of the expected value. To be more specific: For $a,b \in \mathbb{R}$ and a random variable $X$ we have $$ \mathbb{E}[aX] = a \mathbb{E}[X] \qquad \text{and} \qquad \mathbb{E}[X + b] = \mathbb{E}[X] + b $$ This especially implies $\mathbb{E}[a] = a$ for every constant $a \in \mathbb{R}$. Since $\mathbb{E}[X]$ is a constant, this implies $$\mathbb{E}[\mathbb{E}[X]] = \mathbb{E}[X].$$

In your case (loosely speaking) $a = \frac{1}{\sigma}$ and $b = - \mu$.

For the second. Using the definition and the result from the first formula we obtain \begin{align} \text{Var}(Y) & \overset{\textrm{Def.}}{\underset{(\star)}{=}} \mathbb{E}[(Y - \mathbb{E}[Y])^2] = \mathbb{E}[Y^2] = \mathbb{E}\left[ \left(\frac{X - \mu}{\sigma}\right)^2 \right] \overset{\textrm{L}}{=} \frac{1}{\sigma^2}\mathbb{E}\left[ \left(X - \mu\right)^2 \right] \\ & = \frac{1}{\sigma^2}\mathbb{E}\left[X^2 - 2 \mu X + \mu^2 \right] \overset{\textrm{L}}{=} \frac{1}{\sigma^2} \left(\mathbb{E}[X^2]- 2 \mu \mathbb{E}[X] + \mu^2 \right) \\ & = \frac{1}{\sigma^2} \left(\mathbb{E}[X^2]- \mu^2 \right) = \frac{1}{\sigma^2} \left(\mu^2 + \sigma^2 - \mu^2 \right) = 1, \end{align} where in the last step we use $\mathbb{E}[X^2] = \mu^2 + \sigma^2$.

The alternative definition of the variance $(\star)$ can be obtained like this: \begin{align} \mathbb{E}[(X - \mathbb{E}[X])^2] & \overset{\textrm{(L)}}{=} \mathbb{E}[X]^2 - 2 \mathbb{E}[X \mathbb{E}[X]] + \mathbb{E}[(\mathbb{E}[X])^2] \overset{\textrm{(L)}}{=} \mathbb{E}[X^2] - 2 \mathbb{E}[X] \cdot \mathbb{E}[X] + \mathbb{E}[X]^2 \\ & = \mathbb{E}[X^2] - 2 \mathbb{E}[X]^2 + \mathbb{E}[X]^2 = \mathbb{E}[X^2] - \mathbb{E}[X]^2. \end{align}

score 2 · Answer 2 · answered Dec 11 '20 at 06:34

Another method could be by understanding its reverse.

Say the times taken by students to get up in the morning can be modelled by a normal distribution with mean 20.0 minutes and standard deviation 3.0 minutes.

And label those who take more than 1.5 standard deviations from the mean as "lazy" So, how late is "lazy" exactly?

                 (1.5 x 3) + 20 = 24.5 minutes

You get it? You just reverse-coded(?) the coding. So in order to get back to the z format, you need to minus the mean and divide by the standard deviation

Hope this helps. :D

score 1 · Answer 3 · answered Jun 30 '19 at 22:41

1

As you have mentioned you like to go from $$N(μ, σ)$$ which has a standard deviation of $\sigma$ to $$N(0, 1)$$ with standard deviation of $1$

Now the standard deviation is a measure of spread so if you divide your shifted data $$X-\mu $$ by its standard deviation you get a data $$Z=\frac {X-\mu}{\sigma}$$ with standard deviation of $$\sigma /{\sigma}=1.$$

answered Jun 30 '19 at 22:41

Mohammad Riazi-Kermani

68,728

Thank for your answer, can you please explain $\text{Var}(Y) = \frac{1}{\sigma^2}\text{Var}(X) = \frac{1}{\sigma^2}\sigma^2 = 1.$ – ESCM Jun 30 '19 at 22:45

Camus · Answer 4 · 2020-12-11T06:28:41.217

The way I see it, the Normal Variable Z with which mathematicians have built the Z-table isn't very special at all. It's only a matter of conveniency that it has parameters Z~N(0,1).

Now, to answer your question. Consider a normal distribution X with mean μ and σ=10. On the x-axis, every point can be represented as the number of σ from the mean as it usually is. We now want to CONVERT the values of X to the normal distribution Z. To do that we have to translate the graph by µ units to the left hence the new function becomes: $$X - \mu$$

Now, I assume you can intuitively see how this can be. Next, you have to change the x-axis in such a way that σ should equal 1. This is done by dividing by the value of X's σ as it changes the values from -30,-20,-10,0,10,20,30 to -30/10,-20/20,-10,10,0,10/10 etc In other words, the values have all been converted by a constant multiplier. And that multiplier is σ because we want our values to match the Z random variable in order to use the table.

Hence becoming:

                    (X-μ)/σ

It could have been any value, really. Z could've had a σ of 5 and we would have converted the values accordingly by the constant multiplier of 1/2 in this example.

I hope this helps. I'm no mathematician but this is how I visualise it.

Why do we divide by standard deviation when standardizing a normal distribution?

4 Answers4