4

Given $n\times 3$ matrix $A$ and unit vector $\hat{b}\in R^3$, is there a closed-form solution for $x\in R^3$ that minimizes the Euclidean norm $\Vert Ax \Vert$ subject to $\hat{b}^Tx=0$ and $\Vert x \Vert =1$?

I tried to solve it using Lagrange multiplier, but to no avail...

This was my approach: Define the following Lagrangian function

$L(x,\lambda_1,\lambda_2) = x^TA^TAx-2\lambda_1\hat{b}^Tx-\lambda_2(x^Tx-1)$

Differentiating wrt each variables and setting it to zero gives:

(1) $\frac{\partial L}{\partial x} = 2A^TAx-2\lambda_1\hat{b}-2\lambda_2x=0$

(2) $\frac{\partial L}{\partial \lambda_1} = \hat{b}^Tx=0$

(3) $\frac{\partial L}{\partial \lambda_2} = x^Tx-1=0$

Then, left-multiplying both sides of Eq (1) by $\hat{b}^T$ leads to

(4) $\lambda_1=\hat{b}^TA^TAx$

So, Eq (1) becomes

(5) $A^TAx-\hat{b}\hat{b}^TA^TAx-\lambda_2 x = 0$

Left-multiplying both sides of Eq (5) by x^T leads to

(6) $\lambda_2=x^TA^TAx$

This is basically where I'm stuck now.

2 Answers2

2

First, we note that the minimum will be non-zero if and only if $Ax \neq 0$ for all $x \in b^\perp$. We will therefore take it as an assumption that $Ax \neq 0$ for all $x \in b^\perp$.

Let $P$ denote the operator $P = (I - bb^T)$. We note that $Px$ is the orthogonal projection onto $b^{\perp}$, which is to say that $Py$ is perpendicular to $b$ for all $y$. Thus, we have $$ \min \{\|Ax\| : b^Tx = 0, \|x\| = 1\} = \min\{\|APy\| : y \in \Bbb R, y \in b^\perp, \|y\|=1\} $$ Of course, we know that $(AP)b = 0$. By our assumption on $A$, $b$ also spans the kernel of $AP$, which is to say that it is the unique right-singular vector $v_3 \in \Bbb R^3$ for the singular value $\sigma_3 = 0$ of $AP$.

We thus note that $$ \min\{\|APy\| : y \in b^\perp, \|y\|=1\} = \min\{\|APy\| : y \in v_3^\perp, \|y\|=1\} = \sigma_2(AP) $$ Since for a matrix $M$ with $n$ columns, we have $$ \sigma_{n-k}(M) = \min\{\|Mx\| : \|x\| = 1, x \in \{v_{n-k+1},\dots,v_n\}^\perp\} $$ This is, strictly speaking, a weaker result that the min-max theorem for singular values, but it can be proven as a consequence of it.

Ben Grossmann
  • 225,327
  • I don't have my textbooks on hand, but I believe that Horn and Johnson's Matrix Analysis should have this result somewhere in it, if you're looking for something to site. – Ben Grossmann Oct 20 '18 at 22:37
  • Thanks a lot for your explanation! I was also able to derive the same solution using your orthogonal projection trick :) – Seong Hun Lee Oct 21 '18 at 03:12
1

EDIT: I found the solution here

====================================

ORIGINAL:

Thanks to Omnomnomnom's answer, I was able to derive the solution using Lagrange multiplier. I post this just in case someone is interested in solving this problem using Lagrange multiplier:

  1. Assume that $\Vert A\mathbf{x}\Vert > 0$ for all $\mathbf{x}$. Otherwise, one needs to solve $ A\mathbf{x}= \mathbf{0}$ using a different method which will not be discussed here.

  2. Let $\mathbf{y} = \mathbf{x}+s\hat{\mathbf{b}}$ for arbitrary scalar $s$.

  3. Then, $\mathbf{y}$ is an injective function of $\mathbf{x}$, and we can make the change of variables. We can do this by subtracting the projection of $\mathbf{y}$ onto $\hat{\mathbf{b}}$ from $\mathbf{y}$, enforcing the orthogonality to $\hat{\mathbf{b}}$:

$$\mathbf{x} = \mathbf{y}-\hat{\mathbf{b}}(\hat{\mathbf{b}}\cdot\mathbf{y})=\mathbf{y}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T\mathbf{y}=({I}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T)\mathbf{y} \perp \hat{\mathbf{b}}$$

  1. Therefore, the original problem can be reformulated as follows:

$$Minimize \ \Vert A({I}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T)\mathbf{y} \Vert \quad s.t. \quad \Vert ({I}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T)\mathbf{y} \Vert=1$$

  1. Let ${P}={I}-\hat{\mathbf{b}}\hat{\mathbf{b}}^T$.

  2. Define the following Lagrange function:

$$L(\mathbf{y},\lambda)=\mathbf{y}^TP^TA^TAP\mathbf{y}-\lambda(\mathbf{y}^TP^TP\mathbf{y}-1)$$

  1. By differentiating it and equating it to zero:

$$\frac{\partial L}{\partial \mathbf{y}}=2P^TA^TAP\mathbf{y}-2\lambda P^TP\mathbf{y}=0$$ $$\therefore P^TA^TAP\mathbf{y}-\lambda P^TP\mathbf{y}=0 \quad .......(1)$$ $$\frac{\partial L}{\partial \lambda}=\mathbf{y}^TP^TP\mathbf{y}-1=0$$ $$\therefore \mathbf{y}^TP^TP\mathbf{y} = 1 \ .................... (2)$$

  1. Using Eq (2), left-multiplying Eq (1) by $\mathbf{y}^T$ gives: $$\lambda = \mathbf{y}P^TA^TAP\mathbf{y}=\Vert AP\mathbf{y} \Vert^2,$$ which is what we are trying to minimize.

  2. Note that $P^T = P$, and $$P^TP = P^2 = I-\hat{\mathbf{b}}\hat{\mathbf{b}}^T-\hat{\mathbf{b}}\hat{\mathbf{b}}^T+\hat{\mathbf{b}}\hat{\mathbf{b}}^T\hat{\mathbf{b}}\hat{\mathbf{b}}^T=I-\hat{\mathbf{b}}\hat{\mathbf{b}}^T=P$$ So, $$P=P^T=P^TP = PP^T$$

  3. Therefore, Eq(1) can be written as $$PP^TA^TAP\mathbf{y}-\lambda P \mathbf{y}=0$$

  4. Since $\mathbf{x}=P\mathbf{y}$, $$PP^TA^TA\mathbf{x}-\lambda \mathbf{x}=0$$

  5. This means that $\lambda$ is an eigenvalue of $PP^TA^TA$. Specifically, it is the smallest non-zero eigenvalue (non-zero from Step 1 and smallest from Step 8).

  6. For square matrix $X$ and $Y$, eigenvalues of $XY$ are the same as the eigenvalues of $YX$ (see here). Therefore, the smallest non-zero eigenvalue of $PP^TA^TA$ is the same as the as smallest non-zero eigenvalue of $P^TA^TAP=(AP)^T(AP)$. This is equal to the smallest non-zero singular value of $AP=A(I-\hat{\mathbf{b}}\hat{\mathbf{b}}^T)$. The optimal $\mathbf{x}$ is the corresponding eigenvector.