Using differentials to optimize a function

Question

I've read in a paper by Tevian Dray an alternative way to solve optimization problems manipulating "differentials". Here is an example of how it works (next I quote the paper).

Consider the problem of minimizing the length of a piecewise straight path connecting two fixed points with a given line, as shown in Figure 1. For instance, the line could represent a river along which a single pumping station is to be built to serve two towns. The distances $C$, $D$, and $S = a + b$ are specified; the goal is to determine $a$ and/or $b$ so that $ℓ = p + q$ is minimized.

The standard solution to this problem involves expressing $a, b, p, q$, and hence $ℓ$, in terms of a single variable, typically $a$, then minimizing $ℓ$ by computing $\frac {dℓ} {da}$ and setting it equal to zero. This computation is straightforward, but involves the derivatives of square roots and some messy algebra.

Consider instead the following solution, using differentials. First, write down what you know: $$a + b = S$$ $$a^2 + C^2 = p^2$$ $$b^2 + D^2 = q^2$$ $$p + q = ℓ$$

where $S, C, D$ are known constants. Next, take the differential of each equation: $$da + db = 0$$ $$2a\ da = 2p\ dp$$ $$2b\ db = 2q\ dq$$ $$dp + dq = dℓ$$

We are trying to minimize $ℓ$, so we set $dℓ = 0$ to obtain

$$0 = dℓ = dp + dq = \frac{a}{p} da + \frac{b}{q} db = \left ( \frac{a}{p} - \frac{b}{q} \right ) da $$

so that

$$\frac{b^2}{a^2}=\frac{q^2}{p^2}=\frac{b^2+D^2}{a^2+C^2}$$

which (since lengths must be positive) quickly yields

$$\frac{b}{a}=\frac{D}{C}$$

so that

$$a =\frac{CS}{C + D}$$ $$b =\frac{DS}{C + D}$$ and it is straightforward to verify that these values do in fact minimize $ℓ$.

I've tried to use this method to solve some problems, and sometimes it works and sometimes doesn't (when I have to optimize a function $f(x,y,z)$ with two constraint function $g_1(x,y,z)=c_1$ and $g_2(x,y,z)=c_2$, generally I have to use Lagrange multipliers because this manipulation of differentials doesn't lead to the correct answer). Also, I've seen this method been used in some thermodynamics textbooks.

So, my questions are why this manipulation of differentials is capable to solve optimization problems? When this leads to a wrong answer? Where (a book, video, etc.) can I learn more about this method? (to use it systematically)

Note: I've also asked this question on Mathematics Stack Exchange

As much as I personally like this question I should tell you someone will probably comment here saying that cross-posting between sites is frowned upon. — ChrisM, Mar 04 '15 at 20:26
I'm voting to close this question as off-topic because it is a math question. (No migrate vote as it already exists on math) — ACuriousMind, Mar 05 '15 at 00:33
possible duplicate of How to treat differentials and infinitesimals? — seldon, Mar 05 '15 at 06:21

score 1 · Accepted Answer · answered Mar 04 '15 at 22:23

You should be able to use it to solve constrained optimization. Let me give you sort of the big-picture overview.

So you know that if $du$ is small compared to $u$ then you can linearize about $u$. When you want to generalize to more variables, you have to do partial derivatives: for example $$ f(x + dx, y + dy, z + dz) \approx f(x,y,z) + \left(\frac{\partial f}{\partial x}\right)_{y,z} dx + \left(\frac{\partial f}{\partial y}\right)_{x,z} dy + \left(\frac{\partial f}{\partial z} \right)_{x,y} dz. $$ There is a nice shorthand for this expression using the dot product of the vector $d\vec r = (dx, dy, dz)$ as $$ f(\vec r + d\vec r) \approx f(\vec r) + \nabla f \cdot d\vec r.$$ If you calculate it out, the last term here is the "differential" approach that you get above. Your constraints will generate fixed conditions for $d\vec r$, $$\nabla g_1 \cdot d\vec r = \nabla g_2 \cdot d\vec r = 0.$$

Now we want to find the points $\vec r$ such that, if $d\vec r$ obeys the constraints, then $\nabla f \cdot d\vec r = 0$. There is no reason that I see that this should lead to a wrong answer. In fact, it leads to a matrix solution $$\left[\begin{array}{c}(\nabla f)^T\\(\nabla g_1)^T\\(\nabla g_2)^T\end{array}\right] \left[\begin{array}{c}dx \\ dy \\ dz\end{array}\right] = \left[\begin{array}{c} 0 \\ 0 \\ 0\end{array}\right]$$which amounts to looking for a nontrivial kernel of that matrix, which amounts to saying that the vectors are linearly dependent, which says that $\nabla f + \lambda_1 \nabla g_1 + \lambda_1 \nabla g_2 = \nabla (f + \lambda_1 g_1 + \lambda_2 g_2) = 0$, which gives you back exactly the method of Lagrange multipliers! So these are totally equivalent approaches if you do them correctly.

I guess my basic question is, "are you sure that you've been doing partial derivatives correctly when doing constrained optimization?" Because the general idea of trying to do a simultaneous-solution of differential-based equations seems pretty solid to me.

Irrespective of whether this is a good answer, I believe general policy is that if a question is obviously either off-topic or cross-posted, one shouldn't answer until those issues are resolved. I suggest you consult the FAQs and policies on this matter to know what you should do in future on these matters. — Stan Shunpike, Mar 05 '15 at 02:03

Using differentials to optimize a function

1 Answers1