I am trying to understand the proof of the implicit function theorem for multivariable functions. If I have a function $F(x,y,z) = 0$ with the assumption that $z = f(x,y)$ and we want to find $\frac{\partial z}{\partial x}$, then taking the derivative with respect to x (partial derivative must be used because F is a multivariable function) on both sides gives: $$\frac{\partial F}{\partial x} = 0 \tag{1}\label{1}$$ The left side can be evaluated using the chain rule: $$\frac{\partial F}{\partial x} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial x} = 0 \tag{2}\label{2}$$ The thing that I am having trouble understanding is that the term $\frac{\partial F}{\partial x}$ appears in both equations (1) and (2), so if I write something like $$\frac{\partial F}{\partial x} = \frac{\partial F}{\partial x} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial x} = 0 \tag{3}\label{3}$$ Subtracting $\frac{\partial F}{\partial x}$ from each side gives: $$\frac{\partial F}{\partial z} \frac{\partial z}{\partial x} = 0$$ which is incorrect. Which step am I messing up?
2 Answers
In the context of implicit function theorem especially, the Leibniz notation for partial derivatives is absolutely horrible and confusing at best when first learning. One needs to be very careful about the distinction between a function, vs its values at a given point.
Recall that a function is a "rule", $F$, with a certain domain and a certain target space. In your case, it seems like $F : \Bbb{R}^3 \to \Bbb{R}$, in other words $F$ has $\Bbb{R}^3$ (or an open subset thereof) as its domain, and has $\Bbb{R}$ as its target space. What this means is that if $(x,y,z) \in \Bbb{R}^3$ is a 3-tuple of real numbers, then $F(x,y,z) \in \Bbb{R}$ denotes the value of the function $F$ when evaluated at the point $(x,y,z)$.
Next, for partial derivatives, I suggest you don't write $\dfrac{\partial F}{\partial x}$ or something similar (atleast until you know exactly what the notation means). Rather, I think it is better to use notation like $\partial_1F, \partial_2F, \partial_3F$ to denote the partial derivatives of the function $F$. Notice that since $F$ has the domain $\Bbb{R}^3$, we can take three partial derivatives; one with respect to each direction. Now, $\partial_iF$ is once again a function with domain $\Bbb{R}^3 $ and target space $\Bbb{R}$; in short $\partial_iF : \Bbb{R}^3 \to \Bbb{R}$. If we want to talk about the value of this function at particular point, $(x,y,z)$, of its domain, we can use the notation $(\partial_iF)(x,y,z)$, or $(\partial_iF)_{(x,y,z)}$ (putting it in subscript is just to make some formulas look neater).
Ok, so now, let's address your question directly. Originally, we have a function $F: \Bbb{R}^3 \to \Bbb{R}$. Next, we have a function $f: \Bbb{R}^2 \to \Bbb{R}$, such that for every $(x,y) \in \Bbb{R}^2$, we have \begin{align} F(x,y,f(x,y)) &= 0. \tag{$*$} \end{align} Once again, be very careful to distinguish a function vs its values. $(x,y) \in \Bbb{R}^2$ simply means $x$ and $y$ are real numbers; $f: \Bbb{R}^2 \to \Bbb{R}$ means $f(x,y)$ is a particular real number. So, $(x,y,f(x,y))$ is a 3-tuple of real numbers, hence we can evaluate the function $F$ on this element of its domain to get a certain real number; now, $(*)$ is saying that the real number obtianed by this procedure is equal to $0$.
Hopefully that is clear enough. One final piece of notation: let's define a function $g : \Bbb{R}^2 \to \Bbb{R}$ by the rule: \begin{align} g(x,y) &:= F(x,y,f(x,y)). \end{align} Now, what $(*)$ is telling us is that $g$ is the constant zero function. Hence, all its partial derivatives vanish identically on all of $\Bbb{R}^2$; in particular, for all $(x,y) \in \Bbb{R}^2$, we have \begin{align} (\partial_1g)_{(x,y)} &= 0. \end{align} Notice how the notation goes: you have to first calculate the partial derivative function $\partial_1g$, and only after that, you have to evaluate this function on the point $(x,y)$. Now, let's apply the chain rule: \begin{align} 0 &= (\partial_1g)_{(x,y)} \\ &= \left( \partial_1F\right)_{(x,y,f(x,y))} \cdot 1 + \left( \partial_2F\right)_{(x,y,f(x,y))} \cdot 0 + \left( \partial_3F\right)_{(x,y,f(x,y))} \cdot \left( \partial_1f\right)_{(x,y)} \\ &= \left( \partial_1F\right)_{(x,y,f(x,y))} + \left( \partial_3F\right)_{(x,y,f(x,y))} \cdot \left( \partial_1f\right)_{(x,y)} \end{align} Hence, \begin{align} \left( \partial_1f\right)_{(x,y)} &= - \dfrac{\left( \partial_1F\right)_{(x,y,f(x,y))}}{\left( \partial_3F\right)_{(x,y,f(x,y))}}. \end{align}
This would be the precise way of carrying out the computation, with all the derivatives being treated carefully, and all the points of evaluation being made very explicit. The trouble you're having is that in your equation $(1)$, when you wrote \begin{align} \dfrac{\partial F}{\partial x} &= 0 \end{align} what was really meant is that $\partial_1g = 0$. Notice that $F$ and $g$ are completely different functions! One has $\Bbb{R}^3$ as its domain while the other has $\Bbb{R}^2$ as its domain. The whole confusion of "subtracting $\dfrac{\partial F}{\partial x}$ from both sides" is because one used the same letter $F$ for two completely different functions.
Note that this kind of notational abuse occurs all the time in mathematics; even in single variable calculus: for example, what does the chain rule $\dfrac{df}{dt} = \dfrac{df}{dx} \cdot \dfrac{dx}{dt}$ even mean? How can $f$ be a function of $t$ on the LHS, while on the RHS $f$ is a function of $x$? This is of course utter nonsense if you take it literally. The issue here again is that the $f$ on the LHS means something completely different to the $f$ on the RHS. Of course, the clearest way to write the chain rule is to say $(f \circ x)'(t) = f'(x(t)) \cdot x'(t)$.
Anyway, if you want to see the same calculation done in Leibniz notation, here's how I'd write it: \begin{align} 0 &= \dfrac{\partial g}{\partial x}\bigg|_{(x,y)} \\ &= \dfrac{\partial F}{\partial x}\bigg|_{(x,y, f(x,y))} \cdot 1 + \dfrac{\partial F}{\partial y}\bigg|_{(x,y, f(x,y))} \cdot 0 + \dfrac{\partial F}{\partial z}\bigg|_{(x,y, f(x,y))} \cdot \dfrac{\partial f}{\partial x}\bigg|_{(x,y)} \\ &= \dfrac{\partial F}{\partial x}\bigg|_{(x,y, f(x,y))}+ \dfrac{\partial F}{\partial z}\bigg|_{(x,y, f(x,y))} \cdot \dfrac{\partial f}{\partial x}\bigg|_{(x,y)} \end{align} Hence, \begin{align} \dfrac{\partial f}{\partial x}\bigg|_{(x,y)} &=- \dfrac{\frac{\partial F}{\partial x}\bigg|_{(x,y, f(x,y))}}{\frac{\partial F}{\partial z}\bigg|_{(x,y, f(x,y))}}. \end{align}
The usual way this formula is written is \begin{align} \dfrac{\partial z}{\partial x} &= - \dfrac{\dfrac{\partial F}{\partial x}}{\dfrac{\partial F}{\partial z}} \end{align} but of course such compact notation suppresses everything and reuses the same letters for different purposes, so you should only use it once you really understand what's going on.
Here's a similar question which I addressed a while back; it contains an explicit example calculation.
- 55,725
- 2
- 45
- 89
-
@eagle123 No we don't. You have to use the chain rule in the form I presented above. – peek-a-boo May 24 '20 at 05:16
-
Sorry original comment needed editing. I was hoping to get a bit more clarification on one point. You defined a new function $$g(x,y) = F(x,y,f(x,y))$$ and then took the partial derivative of $g(x,y)$ with respect to the first argument. If we were to take the partial derivative of both sides of the equation we get $$ (\partial_1g){(x,y)} = (\partial_1F){x,y,f(x,y)}$$. Using the chain rule on the right side seems to take me back to my original question. $$(\partial_1F){(x,y,f(x,y))} = (\partial_1F){(x,y,f(x,y))} + (\partial_3F)_{(x,y,f(x,y))} \cdot \partial_1f(x,y)$$ where 1st term cancels – eagle123 May 24 '20 at 05:26
-
Like I mentioned in the answer, $\partial_1F$ is a function. This means it can be evaluated at a point of its domain. $(\partial_1F){(x,y,f(x,y))}$ is a number! There is no more differentiation to be done. The place where you're going wrong is that $(\partial_1g){(x,y)}$ is NOT equal to $(\partial_1F){(x,y,f(x,y))}$. Your third line is completely wrong. THe only correct statement is that $(\partial_1F){(x,y,f(x,y))} = (\partial_1F)_{(x,y,f(x,y))}$ (obviously). That's it. There is no contradiction here. – peek-a-boo May 24 '20 at 05:33
-
It seems you're still having trouble with the distinction between a function vs its value when you evaluate it at a point in its domain. This is a very very very very very important distinction which needs to be straightened out before proceeding any further. The only other way I can think of to describe it is to introduce $h: \Bbb{R}^2 \to \Bbb{R}^3$ by $h(x,y) = (x,y, f(x,y))$. Then $g(x,y) = F(x,y,f(x,y)) = (F\ circ h)(x,y)$. So, $g = F \circ h$. Thus, $\partial_1g = \partial _1(F \circ h)$. This is a true equality of functions. – peek-a-boo May 24 '20 at 05:34
-
If you want to write this pointwise then $(\partial_1g){(x,y)} = \partial_1(F \circ h){(x,y)} = \sum_{a=1}^3 (\partial_aF){h(x,y)} \cdot [\partial_1(h_a)]{(x,y)}$. In this equation, $h_1, h_2, h_3$ are the 3 component functions of $h$ (because the target space is $\Bbb{R}^3$). $\partial_1(h_a)$ means the partial derivative of the function $h_a: \Bbb{R}^2 \to \Bbb{R}$ with respect to its first coordinate. Then finally, $[\partial_1(h_a)]_{(x,y)}$ denotes the value of this function at the point $(x,y) \in \Bbb{R}^2$. – peek-a-boo May 24 '20 at 05:40
The equation (1) actually means $$ \frac{\partial}{\partial x}F(x,y,f(x,y))=0. $$ If it were to be understood literally, it would mean that $F$ does not depend on $x$, which makes no sense in the context of the question.
For example, let $F(x,y,z)=x^2+y^2+e^z-1$. The solution to the equation $F(x,y,z)=0$ is the function $$\tag{4} z=\ln(1-x^2-y^2)=f(x,y). $$ This means that substituting $z=f(x,y)$ into the expression $F(x,y,z)$, we get a constant $0$: $$ F(x,y,f(x,y))=x^2+y^2+e^{f(x,y)}-1=x^2+y^2+1-x^2-y^2-1=0. $$ This constant does not depend on $x$, therefore its partial derivative with respect to $x$ is equal to zero, what gives us (2). For our example (2) has the form (pretend we do not know $f(x,y)$) $$ \frac{\partial}{\partial x}(x^2+y^2+e^{f(x,y)}-1)=\overbrace{2x}^{\frac{\partial F}{\partial x}}+\overbrace{e^{f(x,y)}}^{\frac{\partial F}{\partial z}}\frac{\partial f}{\partial x}=0. $$ This gives us the derivative $$ \frac{\partial z}{\partial x}= \frac{\partial f}{\partial x}=-2xe^{-f(x,y)}=-2xe^{-z}, $$ which coincides with the expression that we would get by differentiating (4): $$ \frac{\partial z}{\partial x}= \frac{-2x}{1-x^2-y^2}=-2x\cdot e^{-\ln(1-x^2-y^2)}. $$
- 5,083
-
Maybe the issue I am having is that I don't understand the notation. Are you saying that there is a difference between $\frac{\partial F}{\partial x}$ and $\frac{\partial}{\partial x}F(x,y,f(x,y))$? – eagle123 May 06 '20 at 08:03
-
1@eagle123 Yes. More precisely, $F(x,y,z)$ and $F(x,y,f(x,y))$ are not the same. Returning to my example, $F(x,y,z)=x^2+y^2+e^{z}-1$ and $F(x,y,f(x,y))=x^2+y^2+e^{\ln(1-x^2-y^2)}-1=0$ – AVK May 06 '20 at 08:38