1

My understanding of derivatives is that:

$$f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}$$

Where the limit is defined with the usual $\epsilon$-$\delta$ style statement with first order logic.

And so $\frac{df(x)}{dx} = f'(x)$, as per usual.

This doesn't work so well when people start talking about $\frac{df'(x)}{df(x)}$.

In the special case where $y = f(x)$ is invertible we can rephrase this with the chain rule :

Let $g(y) = f^{-1}(y) = x$, then $\frac{df'(x)}{df(x)}$ is:

$$\begin{align*} \frac{d}{dy}\left( f'(g(y)) \right) &= f''(g(y)) \cdot \frac{d}{dy}( g(y)) \\ &= f''(x) \cdot g'(y) \\ &= f''(x) \cdot g'(f(x)) \\ &= f''(x) \cdot \left( \frac{d}{dx}\left(g(f(x))\right) \cdot \frac{1}{f'(x)} \right) \text{ by chain rule } g'(h(x)) = \frac{d}{dx}( g(h(x)) ) \cdot \frac{1}{h'(x)} \\ &= f''(x) \cdot \left( \frac{d}{dx}\left( x\right) \cdot \frac{1}{f'(x)} \right) \text{ since $g(f(x)) = f^{-1}(f(x))$}\\ &= \frac{f''(x)}{f'(x)} \end{align*}$$

It is asserted by many on MSE that: $\frac{df'(x)}{df(x)} = \frac{f''(x)}{f'(x)}$ in general.

However I can't seem to make sense of this in terms of the usual $\epsilon-\delta$ definitions.


This leads me to think that there are multiple notions of derivatives:

  • The ordinary derivative defined with $\epsilon-\delta$.
  • The notion of a differential, which builds on top of the ordinary derivative.
    • Here $df_x(t) = f'(x) \cdot t$, where $d$ operates on a function. Thus being capable of proving the above derivative trivially: $$\frac{df'_x(t)}{df_x(t)} = \frac{f''(x)}{f'(x)}$$

    • I suspect this is what many of the answers are using, and this what people mean when they say "using Leibniz notation"?


My question is the following:

  • Is it possible to prove the general case without using the notion of differentials?
  • Is it wrong in thinking there are multiple notions of derivatives and that "differentiating with respect to a function" is not the same thing as the ordinary $\epsilon-\delta$-based derivative?

Edit: Here are some MSE answers which claim this is true in general:

  1. Why is $\frac{dy'}{dy}$ zero, since y' depends on y?
  2. Simplifying $\frac{dy'}{dy}$ where $y=f(x)$
  3. Derivative of a function with respect to another function.
  4. differentiate with respect to a function
  5. What is $\frac{d}{dx}\left(\frac{dx}{dt}\right)$?
  6. Circular Motion
  7. Showing $\ddot{x} = \frac{\mathrm{d}}{\mathrm{d}x}(\frac{1}{2} \dot{x}^2)$
  8. Is there a way to rigorously define "taking the derivative with respect to a function"
  9. Derivative with respect to another function
  10. Taking a derivative of a function with respect to another function
  • 1
    Not an answer at all but two comments as you flag this "notation". You should not write $(f(x))'$, the correct notation is $f'(x)$. You differentiate functions, not values of functions. And secondly you don't differentiate with respect to a function $f(x)$, this is an abuse of notation, and one that often leads to confusion. – ancient mathematician May 12 '23 at 06:53
  • @ancientmathematician Right, I'm not sure where/how I picked it up, but I'd been using $( ... )' = \frac{d}{dx}( ... )$ or $( ... )'_y = \frac{d}{dy}( ... )$. I realise this is unconventional & nonstandard as stated in these answers, so I'll try use a different notation.

    I agree that differentiating /w respect to $f$ makes no sense. I would be interested in clarification as to whats going on, as the result does seem to be consistent to some extent.

    – Some Dinosaur May 12 '23 at 08:03
  • 2
    What you want to prove "in general" makes no sense in general. The equality is proved at points where you can write the derivative as a function of the function: for example, at points where the function has nonzero derivative (and is therefore locally bijective) – Mariano Suárez-Álvarez May 12 '23 at 08:11
  • 1
    If f is a function like $x\mapsto x^2$ at the origin, then it is simply not true that the value of the derivative is determined by the value of the function. – Mariano Suárez-Álvarez May 12 '23 at 08:12
  • 1
    A suggestion: if you are trying to study calculus, I'd suggest you stop worrying about whether the definition of the derivative can be formalized in first order logic and the rest of that... – Mariano Suárez-Álvarez May 12 '23 at 08:16
  • @MarianoSuárez-Álvarez I totally agree with you, however my math prof has been trying to convince me otherwise (see this MSE post, I've included more posts in OP). I figured I couldn't make sense of it, so I've presumed it something fancier going on. I imagine if I hadn't worried about formalising the derivative, I would've been able to accept this abuse of notation much easier, as "intuitive". – Some Dinosaur May 12 '23 at 08:22
  • If you want to stick to hard-nosed epsilon-delta standard notation for functions then I think that you are trying to evaluate $(f'\circ g)'$ [with $g$ the inverse function of $f$], and this is $(f''\circ g)g'$, and we have from $f\circ g=id$ that $g'=y\mapsto \frac{1}{(f'\circ g)(y)}$. – ancient mathematician May 12 '23 at 08:55
  • @ancientmathematician So is the proof for the invertible case presented in the original post valid? In fact, I feel that it is the only case this could be valid is when $f$ is invertible? Although someone has presented an answer proving otherwise. – Some Dinosaur May 12 '23 at 11:54
  • 1
    @SomeDinosaur I think that most of the posts you reference are to be understood "in the generous spirit of applied mathematics". On the whole they don't distinguish between functions and expressions, and use meaningless conversational fillers like "independent variable". Valuable no doubt in their place, but not place I want to go to. – ancient mathematician May 12 '23 at 13:35
  • @ancientmathematician I agree, my current view is that "differentiating /w respect to a function" is nonsense under the definition of a derivative. However, regarding the claim that this is "common practice" in differential geometry (which sounds believable), I'm assuming they're using differentials defined via differential forms or something else fancy. And is essentially an "extended derivative". – Some Dinosaur May 12 '23 at 14:44

3 Answers3

0

Let us operate with the usual $\epsilon-\delta$ definition of derivative. It is standard within this framework to prove (without any handwaving) the usual "rules of differentiation": Linearity, Product Rule, Chain Rule.

Suppose then we have open intervals $X,Y$ and twice-differentiable functions $f:X\to Y$, $g:Y\to X$ satisfying $f\circ g=I_Y$ and $g\circ f=I_X$ where $I_X, I_Y$ are the identity maps on $X$, $Y$. Suppose that $f'\not=0$ on $X$ and $g'\not=0$ on $Y$.

We then have by applying the Chain Rule to $f\circ g=I_Y$ that $$ (f'\circ g) g'=1, \tag{*} $$ where $1$ denotes the constant function whose value is always $1$.

From $(*)$ we get using the Product Rule that $$ (f'\circ g)'g'+(f'\circ g)g''=0 $$ which we can rewrite using $(*)$ as $$ (f'\circ g)'=-\frac{g''}{(g')^2}.\tag{1} $$

But we also have from $(*)$ using the Chain Rule and the Product Rule that $$ (f''\circ g) (g')^2+(f'\circ g)g''=0 $$ which we can rewrite as $$ \frac{g''}{(g')^2}=-\frac{(f''\circ g)}{(f'\circ g)}.\tag{2} $$

From (1) and (2) we then have $$ (f'\circ g)'=\frac{(f''\circ g)}{(f'\circ g)}. \tag{3} $$

Let us now re-write this in old-fashioned language, evaluating each side at the point $y\in Y$, and writing $x=g(y)$ (so that $y=f(x)$).

The right hand side is clearly just $\frac{f''(x)}{f'(x)}$.

The left hand side can be re-written, if we allow ourselves to abuse notation, as follows. The outermost derivative is with respect to $y=f(x)$: so write it as $\frac{d}{df}$. The innermost derivative is with respect to $x$, so let's continue to write it as $f'$. Putting this together we have $$ \frac{d f'(x)}{df}=\frac{f''(x)}{f'(x)}. $$

ancient mathematician
  • 14,102
  • 2
  • 16
  • 31
0

In the first place, what does $\frac{\mathrm{d} f' (x)}{\mathrm{d} f (x)}$ mean? Clearly, it suffices to define what $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ means: once we know that, we can simply substitute $f'$ for $g$. We should also define it in such a way that when $f$ is the identity function – i.e. $f (x) = x$ – then $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ has the same meaning as $\frac{\mathrm{d} g (x)}{\mathrm{d} x}$. There is an obvious choice: $$\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)} = \lim_{h \to 0} \frac{g (x + h) - g (x)}{f (x + h) - f (x)}$$ Since $f (x + h) - f (x)$ could be $0$ for $h \ne 0$ we should be a little bit more careful, so let us say that the value of $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ at $x = x_0$ is $M$ if, for all $\epsilon > 0$, there exists $\delta > 0$ such that for all $h$ such that $0 < \left| h \right| < \delta$, $$\left| g (x_0 + h) - g (x_0) - M \cdot (f (x_0 + h) - f (x_0)) \right| < \epsilon \cdot \left| f (x_0 + h) - f (x_0) \right|$$

(This definition generalises straightforwardly to the vector-valued multivariable case, provided we understand $M$ needs to be a matrix of the appropriate dimensions.) Since both the left and right hand side are non-negative, if $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ has a value at $x = x_0$, then there exists $\delta > 0$ such that for all $h$ such that $0 < \left| h \right| < \delta$, $\left| f (x_0 + h) - f (x_0) \right| > 0$, i.e. $f$ is not constant on any neighbourhood of $x_0$.

Now, with all that preamble out of the way, let me state:

Theorem. If $f (x)$ and $g (x)$ are differentiable at $x = x_0$, with $f' (x_0)$ and $g' (x_0)$ as the values of $\frac{\mathrm{d} f (x)}{\mathrm{d} x}$ and $\frac{\mathrm{d} g (x)}{\mathrm{d} x}$ at $x = x_0$ respectively, and $f' (x_0) \ne 0$, then $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ has value $\frac{g' (x_0)}{f' (x_0)}$ at $x = x_0$.

Proof. Let $0 < \epsilon < 1$. By hypothesis, there exists $\delta_1 > 0$ such that for all $h$ such that $0 < \left| h \right| < \delta_1$, $$\left| f (x_0 + h) - f (x_0) - f' (x_0) \cdot h \right| < \frac{1}{3} \epsilon \cdot \left| h \right| \cdot \frac{\min \left\{ \left| f' (x_0) \right|, \left| f' (x_0) \right|^2 \right\}}{\max \left\{ 1, \left| g' (x_0) \right| \right\}}$$ (Replace $\epsilon$ with $\frac{1}{3} \epsilon \cdot \frac{\min \left\{ \left| f' (x_0) \right|, \left| f' (x_0) \right|^2 \right\}}{\max \left\{ 1, \left| g' (x_0) \right| \right\}}$ in the definition.) We then have: $$\left| f (x_0 + h) - f (x_0) - f' (x_0) \cdot h \right| < \frac{1}{3} \left| f' (x_0) \cdot h \right|$$ $$\left| \frac{g' (x_0)}{f' (x_0)} \right| \cdot \left| f (x_0 + h) - f (x_0) - f' (x_0) \cdot h \right| < \frac{1}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$$

Similarly, by hypothesis, there exists $\delta_2 > 0$ such that for all $h$ such that $0 < \left| h \right| < \delta_2$, $$\left| g (x_0 + h) - g (x_0) - g' (x_0) \cdot h \right| < \frac{1}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$$ (Replace $\epsilon$ with $\frac{1}{3} \epsilon \cdot \left| f' (x_0) \right|$ in the definition.)

Let $\delta = \min \{ \delta_1, \delta_2, 1 \}$. Then, for all $h$ such that $0 < \left| h \right| < \delta$: $$\begin{multline} \left| g (x_0 + h) - g (x_0) - \frac{g' (x_0)}{f' (x_0)} \cdot ( f (x_0 + h) - f (x_0) ) \right| \\ \le \left| g (x_0 + h) - g (x_0) - \frac{g' (x_0)}{f' (x_0)} \cdot f' (x_0) \cdot h \right| + \left| \frac{g' (x_0)}{f' (x_0)} \cdot ( f (x_0 + h) - f (x_0) - f' (x_0) \cdot h ) \right| \end{multline}$$ The first term is $< \frac{1}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$. The second term is also $< \frac{1}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$. Thus the LHS is $< \frac{2}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$. But, $$\begin{multline} \left| f' (x_0) \cdot h \right| \le \left| f (x_0 + h) - f (x_0) \right| + \left| f (x_0 + h) - f (x_0) - f' (x_0) \cdot h \right| \\ < \left| f (x_0 + h) - f (x_0) \right| + \frac{1}{3} \left| f' (x_0) \cdot h \right| \end{multline}$$ so $\left| f' (x_0) \cdot h \right| < \frac{3}{2} \left| f (x_0 + h) - f (x_0) \right|$. Therefore, $$\left| g (x_0 + h) - g (x_0) - \frac{g' (x_0)}{f' (x_0)} \cdot ( f (x_0 + h) - f (x_0) ) \right| < \epsilon \cdot \left| f (x_0 + h) - f (x_0) \right|$$ as required. ◼


I also beg to differ with those who say that things like "independent variable" are meaningless conversational filler. Pure mathematicians – some of us anyway – know how to make this rigorous. The trick is to recognise the concept of context and make it a concrete thing. In probability theory, this is the purpose of the sample space. We can do the same for basic analysis... but this kind of formalisation is usually not helpful for early students, so we do not teach it.

Zhen Lin
  • 90,111
  • That's pretty neat, after following through the proof, we've basically "formalised" what would've been "abuse of notation". This answer my questions, though this does mean $\frac{df(x)}{dg(x)}$ is not defined by "default" for derivatives I presume?

    Also it seems to have entirely removed the invertible condition which is interesting. In fact we haven't assumed anything other than differentiable and $f'(x_0) ≠ 0$, which seems pretty powerful?

    – Some Dinosaur May 14 '23 at 16:03
  • Indeed, $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ is not normally defined. This is not the only possible definition but it is perhaps the simplest. – Zhen Lin May 14 '23 at 22:06
-2

If $(u,v) $ are functions of $x$, we can even bring in another new independent variable $t$

$$\frac{du}{dv} = \frac{du/dt}{dv/dt}$$

or continue, permitted by Leibnitz validity with the old independent variable $x$ itself by letting

$$ u=f'(x), v= f(x) $$

$$\frac{df'(x)}{df(x)} = \frac{\dfrac{df'(x)}{dt}}{\dfrac{df(x)}{dt}}== \frac{\dfrac{df'(x)}{dx}}{\dfrac{df(x)}{dx}}=\frac{f''(x)}{f'(x)} $$

You can even introduce a third independent variable if you wish to.

Btw, using the L'Hôpital Rule or the Quotient Rule if LHS is constant then the RHS would also be the same constant.

Narasimham
  • 40,495
  • I've never seen this technique of "bringing in another new independent variable"? I'm not really sure what it means and why it should work? Are we multiplying by the ratio of differentials: $\frac{dt}{dt}$ which I presume equals 1? Is this abusive notation? – Some Dinosaur May 12 '23 at 11:56
  • Yes, this makes no difference to the old derivative while accommodating any new arbitrary independent variable $t$ – Narasimham May 12 '23 at 12:01
  • It is a standard valid non abusive procedure/ trick/artifice to derive relationships among/between differentials in differential geometry. This trick often helps to build new ODEs by computing ratios of segments... as if they are algebraic segment lengths, agreeing with such earliest considerations by Leibnitz, Newton. – Narasimham May 12 '23 at 12:13
  • The "derivative" shown here seems to be different from the very basic $\epsilon-\delta$ limit based derivative, with only a single variable. Is the trick here using the concept of differentials? Could this same trick be done without Leibniz notation or differential operators? – Some Dinosaur May 12 '23 at 12:17
  • 2
    Bringing in a new independent variable is just meaningless and not valid. Nor is this a valid move in differential geometry. – Ted Shifrin May 13 '23 at 04:30