How to minimize $\Vert Ax \Vert$ subject to $\hat{b}^Tx=0$ and $\Vert x \Vert = 1$?

Question

Given $n\times 3$ matrix $A$ and unit vector $\hat{b}\in R^3$, is there a closed-form solution for $x\in R^3$ that minimizes the Euclidean norm $\Vert Ax \Vert$ subject to $\hat{b}^Tx=0$ and $\Vert x \Vert =1$?

I tried to solve it using Lagrange multiplier, but to no avail...

This was my approach: Define the following Lagrangian function

$L(x,\lambda_1,\lambda_2) = x^TA^TAx-2\lambda_1\hat{b}^Tx-\lambda_2(x^Tx-1)$

Differentiating wrt each variables and setting it to zero gives:

(1) $\frac{\partial L}{\partial x} = 2A^TAx-2\lambda_1\hat{b}-2\lambda_2x=0$

(2) $\frac{\partial L}{\partial \lambda_1} = \hat{b}^Tx=0$

(3) $\frac{\partial L}{\partial \lambda_2} = x^Tx-1=0$

Then, left-multiplying both sides of Eq (1) by $\hat{b}^T$ leads to

(4) $\lambda_1=\hat{b}^TA^TAx$

So, Eq (1) becomes

(5) $A^TAx-\hat{b}\hat{b}^TA^TAx-\lambda_2 x = 0$

Left-multiplying both sides of Eq (5) by x^T leads to

(6) $\lambda_2=x^TA^TAx$

This is basically where I'm stuck now.

Welcome to math.stackexchange. Questions showing initial work or an attempted approach are better received. I encourage you to elaborate on your approach using Lagrange multipliers. Also as a hint: Try minimizing the equivalent problem with objective function $||Ax||^2$. — Jonathan, Oct 20 '18 at 17:55
@user1551 Yes, sorry I forgot to mention it. I edited the question. — Seong Hun Lee, Oct 20 '18 at 18:00
You should also specify the choice of norm (I would guess it is probably Euclidean). — parsiad, Oct 20 '18 at 18:17
One approach is to calculate the second smallest singular value and corresponding right singular-vector of $A(I - \hat b\hat b^T)$ — Ben Grossmann, Oct 20 '18 at 18:23
@SeongHunLee I'll post a full answer if I find the time. For now, I hope it helps if I say that it can be proven that the method I mention works by applying the min-max principle for singular values — Ben Grossmann, Oct 20 '18 at 19:05
@Omnomnomnom From a Monte Carlo simulation, it seems like your solution is correct! That's awesome. I will look forward to your full answer :) — Seong Hun Lee, Oct 20 '18 at 20:07

score 2 · Answer 1 · answered Oct 20 '18 at 22:33

First, we note that the minimum will be non-zero if and only if $Ax \neq 0$ for all $x \in b^\perp$. We will therefore take it as an assumption that $Ax \neq 0$ for all $x \in b^\perp$.

Let $P$ denote the operator $P = (I - bb^T)$. We note that $Px$ is the orthogonal projection onto $b^{\perp}$, which is to say that $Py$ is perpendicular to $b$ for all $y$. Thus, we have $$ \min \{\|Ax\| : b^Tx = 0, \|x\| = 1\} = \min\{\|APy\| : y \in \Bbb R, y \in b^\perp, \|y\|=1\} $$ Of course, we know that $(AP)b = 0$. By our assumption on $A$, $b$ also spans the kernel of $AP$, which is to say that it is the unique right-singular vector $v_3 \in \Bbb R^3$ for the singular value $\sigma_3 = 0$ of $AP$.

We thus note that $$ \min\{\|APy\| : y \in b^\perp, \|y\|=1\} = \min\{\|APy\| : y \in v_3^\perp, \|y\|=1\} = \sigma_2(AP) $$ Since for a matrix $M$ with $n$ columns, we have $$ \sigma_{n-k}(M) = \min\{\|Mx\| : \|x\| = 1, x \in \{v_{n-k+1},\dots,v_n\}^\perp\} $$ This is, strictly speaking, a weaker result that the min-max theorem for singular values, but it can be proven as a consequence of it.

I don't have my textbooks on hand, but I believe that Horn and Johnson's Matrix Analysis should have this result somewhere in it, if you're looking for something to site. — Ben Grossmann, Oct 20 '18 at 22:37
Thanks a lot for your explanation! I was also able to derive the same solution using your orthogonal projection trick :) — Seong Hun Lee, Oct 21 '18 at 03:12

Seong Hun Lee · Accepted Answer · 2018-10-24T21:06:36.023

EDIT: I found the solution here

====================================

ORIGINAL:

Thanks to Omnomnomnom's answer, I was able to derive the solution using Lagrange multiplier. I post this just in case someone is interested in solving this problem using Lagrange multiplier:

Assume that $\Vert A\mathbf{x}\Vert > 0$ for all $\mathbf{x}$. Otherwise, one needs to solve $ A\mathbf{x}= \mathbf{0}$ using a different method which will not be discussed here.
Let $\mathbf{y} = \mathbf{x}+s\hat{\mathbf{b}}$ for arbitrary scalar $s$.
Then, $\mathbf{y}$ is an injective function of $\mathbf{x}$, and we can make the change of variables. We can do this by subtracting the projection of $\mathbf{y}$ onto $\hat{\mathbf{b}}$ from $\mathbf{y}$, enforcing the orthogonality to $\hat{\mathbf{b}}$:

$$\mathbf{x} = \mathbf{y}-\hat{\mathbf{b}}(\hat{\mathbf{b}}\cdot\mathbf{y})=\mathbf{y}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T\mathbf{y}=({I}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T)\mathbf{y} \perp \hat{\mathbf{b}}$$

Therefore, the original problem can be reformulated as follows:

$$Minimize \ \Vert A({I}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T)\mathbf{y} \Vert \quad s.t. \quad \Vert ({I}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T)\mathbf{y} \Vert=1$$

Let ${P}={I}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T$.
Define the following Lagrange function:

$$L(\mathbf{y},\lambda)=\mathbf{y}^TP^TA^TAP\mathbf{y}-\lambda(\mathbf{y}^TP^TP\mathbf{y}-1)$$

By differentiating it and equating it to zero:

$$\frac{\partial L}{\partial \mathbf{y}}=2P^TA^TAP\mathbf{y}-2\lambda P^TP\mathbf{y}=0$$ $$\therefore P^TA^TAP\mathbf{y}-\lambda P^TP\mathbf{y}=0 \quad .......(1)$$ $$\frac{\partial L}{\partial \lambda}=\mathbf{y}^TP^TP\mathbf{y}-1=0$$ $$\therefore \mathbf{y}^TP^TP\mathbf{y} = 1 \ .................... (2)$$

Using Eq (2), left-multiplying Eq (1) by $\mathbf{y}^T$ gives: $$\lambda = \mathbf{y}P^TA^TAP\mathbf{y}=\Vert AP\mathbf{y} \Vert^2,$$ which is what we are trying to minimize.
Note that $P^T = P$, and $$P^TP = P^2 = I-\hat{\mathbf{b}}\hat{\mathbf{b}}^T-\hat{\mathbf{b}}\hat{\mathbf{b}}^T+\hat{\mathbf{b}}\hat{\mathbf{b}}^T\hat{\mathbf{b}}\hat{\mathbf{b}}^T=I-\hat{\mathbf{b}}\hat{\mathbf{b}}^T=P$$ So, $$P=P^T=P^TP = PP^T$$
Therefore, Eq(1) can be written as $$PP^TA^TAP\mathbf{y}-\lambda P \mathbf{y}=0$$
Since $\mathbf{x}=P\mathbf{y}$, $$PP^TA^TA\mathbf{x}-\lambda \mathbf{x}=0$$
This means that $\lambda$ is an eigenvalue of $PP^TA^TA$. Specifically, it is the smallest non-zero eigenvalue (non-zero from Step 1 and smallest from Step 8).
For square matrix $X$ and $Y$, eigenvalues of $XY$ are the same as the eigenvalues of $YX$ (see here). Therefore, the smallest non-zero eigenvalue of $PP^TA^TA$ is the same as the as smallest non-zero eigenvalue of $P^TA^TAP=(AP)^T(AP)$. This is equal to the smallest non-zero singular value of $AP=A(I-\hat{\mathbf{b}}\hat{\mathbf{b}}^T)$. The optimal $\mathbf{x}$ is the corresponding eigenvector.

How to minimize $\Vert Ax \Vert$ subject to $\hat{b}^Tx=0$ and $\Vert x \Vert = 1$?

2 Answers2