7

There is a correct and an incorrect proof going around when it comes to the Chain Rule (see below). The problem with the incorrect proof is that $g(x)-g(a)$ might be $0$ if $x\to a$ creating a division by zero.

Question

I can't get my head around why the correct proof solves the problem of the incorrect proof. Why can we just define a function $E$ and suddenly all our problems disappear?

I just don't really get what actually happens in the correct proof. It just didn't "click" in my brain yet. Any help would be much appreciated.

By the way; is my "correct proof" below indeed correct?

Incorrect proof:

$$\lim \limits_{x \to a}\frac{f(g(x))-f(g(a))}{x-a}=\lim \limits_{x \to a}\frac{f(g(x))-f(g(a))}{g(x)-g(a)}\times\frac{g(x)-g(a)}{x-a}=f'(g(x))g'(x)$$

Correct proof:

We first define a function $E$

$$E(0)=0$$ $$E(g(x)-g(a))=\frac{f(g(x))-f(g(a))}{g(x)-g(a)}-f'(g(x))$$

In any case: $$f(g(x))-f(g(a))=(E(g(x)-g(a))+f'(g(x)))\times(g(x)-g(a))$$

Dividing by $x-a$ and taking the limit we get:

$$\begin{align} \frac{d}{dx}f(g(x))&=\lim \limits_{x \to a}\frac{f(g(x))-f(g(a))}{x-a}\\ &=\lim \limits_{x \to a}(E(g(x)-g(a))+f'(g(x)))\times\frac{g(x)-g(a)}{x-a}\\&=f'(g(x)g'(x) \end{align}$$


EDIT: In other words: we basically state that when $g(x)=g(a)$:

$$\frac{f(g(x))-f(g(a))}{g(x)-g(a)}-f'(g(x))=0$$

But why can we state that? As I understand it, this is true for the limit, but why are we allowed to also state it for the actual value?

  • 2
    You cannot "define a function $E$" by declaring $E\bigl(g(x)-g(a)\bigr):=\ldots$. – Christian Blatter Oct 26 '17 at 09:47
  • Isn't the problem with the first proof eliminated if $g(x)\neq g(a)$ in a neighborhood of $a$? And if that isn't the case, if $g$ is differentiable, it must be constant around $a$. And the locally constant case is trivially true. – Jack M Oct 26 '17 at 09:50
  • 4
    @JackM: Consider the function $g(x):=x^2\sin{1\over x}$, $g(0):=0$. – Christian Blatter Oct 26 '17 at 10:06
  • 1
    @JackM Alas it's just not true that in that case $g$ must be constant near $a$, as Christian points out. It would be true if we were talking about complex differentiability for holomorphic functions - I once heard Rudin remark that this is one of the nice things about complex analysis: The traditional wrong proof of the chain rule becomes correct. – David C. Ullrich Oct 26 '17 at 16:07
  • @JackM : The problem is indeed eliminated in cases where $g(x)\ne g(a)$ for $x$ in some neighborhood of $a.$ But the other case does not always involve $g$ being constant near $a. \qquad$ – Michael Hardy Nov 22 '17 at 19:51

5 Answers5

5

You can avoid the "correct" proof this way:

Case 1: $g'(a) \ne 0.$ Here the "fake proof" works! That's simply because $(g(x) - g(a))/(x-a)$ is nonzero for $x$ close to, but not equal to, $a.$ For such $x,$ we have $g(x)\ne g(a),$ and now the fake news is actually news.

Case 2: $g'(a) = 0:$ Because $f'(g(a))$ exists, there exists a constant $c>0$ and a $\delta > 0$ such that

$$\tag 1 |f(y)-f(g(a))|\le c|y-g(a)|\, \text { for } y\in (g(a)-\delta, g(a)+\delta).$$

Now $g$ is continuous at $a,$ so there exists $\gamma > 0$ such that $x\in (a-\gamma, a + \gamma)$ implies $g(x) \in (g(a)-\delta, g(a)+\delta).$ For such $x$ we can use $(1)$ to see

$$|f(g(x))-f((g(a))| \le c |g(x)-g(a)|.$$

Now divide by $|x-a|$ and let $x\to a.$ On the right we get limit $0$ because $g'(a)=0.$ Therefore the limit on the left is $0,$ which is exactly the same as saying $(f\circ g)'(a) = 0.$ That is the desired conclusion in this case.

zhw.
  • 105,693
  • 1
    +1 my answer is based on the same idea as in your answer but presented in a slightly different way. If you wish I can delete my answer. – Paramanand Singh Oct 27 '17 at 22:37
  • @ParamanandSingh No, don't worry about it. – zhw. Oct 27 '17 at 23:47
  • Maybe can be useful to say that (1) comes from the fact that $\lim\limits_{y\to g(a)}\frac{f(y)-f(g(a))}{y-g(a)}$ exists and is equal to $f'(g(a))$. Thus, the quotient $\frac{f(y)-f(g(a))}{y-g(a)}$ is bounded in a punctured neighborhood $A=(g(a)-\delta,g(a)+\delta)\setminus{g(a) }$. Then, if $x\in A$ we have that $\left| \frac{f(y)-f(g(a))}{y-g(a)}\right|\leq C$ and therefore, $| f(y)-f(g(a))|\leq C |y-g(a)|$. But this inequality is also true in all the neighborhood $(g(a)-\delta,g(a)+\delta)$. – Carlos Jiménez Nov 08 '23 at 05:28
  • In Case 1, isn't there also the problem of showing that $\lim_{x \to a}\frac{f(g(x))-f(g(a))}{g(x)-g(a)}=f'(g(x))$? – user182601 Jan 08 '24 at 07:20
3

There are two things wrong with your original proof, and the "EDIT" section of the original post is also wrong.

First problem: To define a function $E$, you have to say how to apply $E$ to an arbitrary number $h$. You haven't done that. Here is a better definition of $E$: $E(0) = 0$, and if $h \ne 0$ then \begin{equation} E(h) = \frac{f(g(a)+h) - f(g(a))}{h} - f'(g(a)). \end{equation} For $h \ne 0$, the formula defining $E(h)$ can be rearranged to read: \begin{equation} (E(h) + f'(g(a))) \times h = f(g(a)+h) - f(g(a)). \end{equation} But notice that this last equation is also true if $h=0$, since both sides are $0$, so the equation is true for all values of $h$. Plugging in $g(x)-g(a)$ for $h$, we get \begin{equation} (E(g(x)-g(a))+f'(g(a))) \times (g(x)-g(a)) = f(g(x))-f(g(a)). \end{equation} This is (almost) the same as your "in any case" equation.

Second problem: In your final calculation, you are mixing up the derivative with the value of the derivative at a particular point. The limit \begin{equation} \lim_{x \to a} \frac{f(g(x))-f(g(a))}{x-a} \end{equation} doesn't give you the derivative, it gives you the value of the derivative at $a$. So the proof should end like this: \begin{align} \left.\frac{d}{dx}f(g(x))\right|_{x=a} &= \lim_{x \to a} \frac{f(g(x))-f(g(a))}{x-a}\\ &= \lim_{x \to a} (E(g(x)-g(a))+f'(g(a))) \times \frac{g(x) - g(a)}{x-a}\\ &= f'(g(a))g'(a). \end{align}

There is a subtle point in the last step that you may be missing. Since $g$ is differentiable at $a$, it is continuous at $a$, so $\lim_{x \to a} (g(x) - g(a)) = g(a)-g(a) = 0$. But why does it follow that $\lim_{x \to a}E(g(x)-g(a)) = E(0) = 0$? The answer is: because $E$ is continuous at $0$. (Look in your calculus book in the section on continuous functions. You will find a theorem that says that if $\lim_{x \to a} f(x) = L$ and $g$ is continuous at $L$, then $\lim_{x \to a} g(f(x)) = g(L)$. That theorem is being used in this step.) So to have a complete proof, you need to verify that $E$ is continuous at $0$. To verify that, check that $\lim_{h \to 0} E(h) = 0 = E(0)$. In this limit, $h$ is approaching $0$ but it is not equal to $0$, so we can use the formula for $E(h)$ when $h \ne 0$: \begin{equation} \lim_{h \to 0} E(h) = \lim_{h \to 0} \left(\frac{f(g(a)+h)-f(g(a))}{h} - f'(g(a))\right) = f'(g(a))-f'(g(a)) = 0. \end{equation}

Finally, the problem with the "EDIT" section of the original post: You seem to think that by defining $E$, we are somehow changing the meaning of the expression \begin{equation} \frac{f(g(x))-f(g(a))}{g(x)-g(a)}. \end{equation} We are not. That expression still means what it meant before, so it is undefined when $g(x) = g(a)$. All we're doing is defining a new function $E$, and it is only formulas involving the letter $E$ whose meaning is affected by that definition. No justification is needed for this--you can define a new function however you want.

Dan Velleman
  • 2,746
2

Here is a "correct" proof:

From the usual definition of the derivative one immediately deduces the following

Lemma. A function $f$ is differentiable at the point $a$ with $f'(a)=A$ iff there is a function $m_{f,a}=:m$, continuous at $a$ with $m(a)=A$, such that for all $x$ one has $$f(x)-f(a)=m(x)(x-a)\ .$$

Under the hypotheses of the chain rule one therefore has $$f\bigl(g(x)\bigr)-f\bigl(g(a)\bigr)=m_{f,g(a)}\bigl(g(x)\bigr)\bigl(g(x)-g(a)\bigr)=m_{f,g(a)}\bigl(g(x)\bigr)m_{g,a}(x)(x-a)\ .$$ Since $g$ is continuous at $a$ the product $x\mapsto m_{f,g(a)}\bigl(g(x)\bigr)m_{g,a}(x)$ is continuous at $a$ as well, and takes the value $f'\bigl(g(a)\bigr)g'(a)$ there. By the reverse direction of the Lemma the chain rule follows.

  • Right! Heh, didn't read it carefully - I imagine it's correct, and assuming it's correct this is absolutely imo the "right" way to do it, rephrasing differentiability so as to avoid division. – David C. Ullrich Oct 26 '17 at 16:04
2

Here is one proof which does not require you to have any special definition for the difference quotient. Consider the ratio $$\frac{f(g(x)) - f(g(a))} {x-a} \tag{1}$$ It can be written as $$\frac{f(g(x))-f(g(a))}{g(x)-g(a)}\cdot \frac{g(x) - g(a)} {x-a} \tag{2}$$ provided $g(x) - g(a) \neq 0$ for all $x$ in some deleted neighborhood of $a$. Under this assumption the usual proof works and we get the result $(f\circ g) '(a) =f' (g(a)) g'(a) $ by taking limit as $x\to a$ in equation $(2)$.

Let's see what happens when this assumption does not hold. It means that in every deleted neighborhood of $a$ we have some $x$ for which $g(x) =g(a) $. It is easy to prove that in this case we have $g'(a) =0$ (prove this and let me know if you need help here, you can start by assuming $g'(a) >0$ and try to get a contradiction and similarly handle $g'(a) <0$). Now we can see that if $g(x) =g(a) $ then the difference quotient in $(1)$ is $0$. And if $g(x) \neq g(a) $ then the difference quotient can be written as in $(2)$ and the first factor is bounded (because $f'(g(a)) $ exists) and second factor tends to $0$ so that the overall product also tends to $0$ as $x\to a$ and thus $(f\circ g) '(a) =0$. The reasoning in the last sentence can be formalized with the definition of limit as shown below.


Let $\epsilon >0$ be arbitrary. There exists a $\epsilon' >0$ such that $$\left|\frac{f(y) - f(g(a))} {y-g(a)} - f'(g(a)) \right|<1$$ for all $y$ with $0<|y-g(a)|<\epsilon '$. Therefore $$\left|\frac{f(y) - f(g(a))} {y-g(a)} \right|<|f' (g(a)) |+1=K\text{(say)}\tag{3}$$ whenever $0<|y-g(a)|<\epsilon '$. Next note that $g$ is continuous at $a$ (because it is differentiable at $a$) therefore we have a $\delta_{1}>0$ such that $$|g(x) - g(a) |<\epsilon' \tag{4}$$ whenever $|x-a|<\delta_{1}$. Further since $g'(a) =0$ there is a $\delta_{2}>0$ such that $$\left|\frac{g(x) - g(a)} {x-a} \right|<\frac{\epsilon} {K}\tag{5} $$ whenever $0<|x-a|<\delta_{2}$. Let $\delta=\min(\delta_{1},\delta_{2})$. If $0<|x-a|<\delta$ then both the inequalities $(4)$ and $(5)$ hold. Further if $g(x) =g(a)$ then difference quotient in $(1)$ is $0$ and if $g(x)\neq g(a) $ then by all the previous equations we can see that the difference quotient in $(1)$ is less than $\epsilon$ in absolute value. In other words we have $$\left|\frac{f(g(x)) - f(g(a))} {x-a} \right|<\epsilon$$ whenever $0<|x-a|<\delta$. Thus $(f\circ g) '(a) =0$.


The above proof is taken from Hardy's A Course of Pure Mathematics and it avoids the trick used by Spivak (defining the difference quotient $(1)$ in a continuous manner when $g(x) =g(a) $). The essential idea of the proof is easy to understand and the last part of the proof dealing with $\epsilon, \delta$ is necessary only to satisfy those who insist.

1

Note that we get into trouble with $\frac{f(g(x))-f(g(a))}{g(x)-g(a)}$ when $g(x) = g(a)$. However, as a function of $x$, it has well-defined limit at all those points, namely $f'(g(a))$. So what they do when introducing $E$ is simply "filling in" those holes so that we get an expression that is valid for all $x$. We could just as well have said

Consider the expression which is $$ \frac{f(g(x))-f(g(a))}{g(x)-g(a)}\times\frac{g(x)-g(a)}{x-a} $$ when $g(x) \neq g(a)$, and $$ f'(g(a))\times \frac{g(x)-g(a)}{x-a} $$ when $g(x) = g(a)$, and take its limit when $x\to a$.

and this would've been more or less the exact same thing.

Arthur
  • 199,419
  • But what is f'(g(a) if $x=a$? Is that even defined? – GambitSquared Oct 26 '17 at 09:45
  • 1
    @GambitSquared $f'(g(a))$ when $x = a$ is very much defined. Just take the derivative of $f$, and insert $g(a)$ in there. (At that point, $E = 0$). However, $\frac{g(x)-g(a)}{x-a}$ is not defined there, allthough it has the limit $g'(a)$ as $x\to a$. – Arthur Oct 26 '17 at 09:55
  • But the definition of $f'(g(a))$ involves dividing by $g(x)-g(a)$ right? Which is zero if $x=a$ – GambitSquared Oct 26 '17 at 10:24
  • @GambitSquared No, the definition of $f'(g(a))$ involves finding the function $f'(x)$ (which does involve a limit, yes, but an unproblematic one), and then inserting the number $g(a)$ as the input to that function. Nothing chain-ruly or circular going on there, and nothing about $g(x) - g(a)$. Finally, the entire point of this is to find the limit of the above expression as $x\to a$, so while we may have to be careful with $g(x) = g(a)$, the case $x = a$ is completely uninteresting. – Arthur Oct 26 '17 at 10:26
  • Why can we "fill the holes"? How can we be sure that if the limit goes to $f'(g(x))$ that then the actual value is also $f'(g(x))$? – GambitSquared Oct 26 '17 at 15:00
  • 1
    @GambitSquared There is no "actual value". That's the whole point. The original expression is undefined at those points. However, since the limit as we close in on those points is $f'(g(a))$, we can define a new expression that is exactly the old one at the unproblematic points, and equal to $f'(g(a))$ at the problematic points, and because we know that the limit of the original expression is $f'(g(a))$, the new expression is continuous and undefined only for $x = a$, instead of $g(x) = g(a)$. – Arthur Oct 26 '17 at 15:27