3

Given $A, B \in \mathbb{R}^{n \times k}$ and S.P.D. $C \in \mathbb{R}^{n \times n}$, I would like to find an analytical solution for the matrix $X \in \mathbb{R}^{n \times n}$ that minimizes \begin{align} \lVert X A - B \rVert^2_{F} \end{align} subject to the hard constraint $$ C X = X C. $$

Given the eigendecomposition of $C$ with $C = R \Lambda R^T$, from Nearest commuting matrix it appears that a reasonable projection operator onto the space of matrices commuting with $C$ is $P_C(X) = R P_\Lambda(R^T X R) R^T$ with $$ [P_\Lambda(X)]_{ij} = \begin{cases} X_{ij}, & \lambda_i = \lambda_j \\ 0, & \textrm{otherwise} \end{cases} $$.

With this in mind, my guess at solving this would be to apply this projection operator to the unconstrained minimal solution to the least squares system, e.g. $$ X = P_C(B A^{\dagger}) $$ with $A^{\dagger}$ denoting the M.P. pseudoinverse of $A$. Alternatively, it seems like the solution $$ X = P_C(B A^T) $$ is also reasonable (similar to the orthogonal procrustes solution). That said, I don't know which one of these (if either) is the minimal solution that commutes with $C$.

EDIT:

Building on top of user1551's excellent answer below, the solution to the case where $X$ is also constrained to be orthogonal can be expressed concisely as $$ X = R U V^T R^T $$ where $U, V^T$ are recovered from the SVD $$ U \Sigma V^T = P_{\Lambda}(R^T B A^T R). $$ Pretty nifty.

tommym
  • 361
  • Generally, you can't solve the unconstrained problem and project it onto the constraints. You may do so in the framework of Projected Gradient Descent. – Royi Mar 23 '24 at 10:29

1 Answers1

3

Let $\lambda_1,\lambda_2,\ldots,\lambda_k$ be the distinct eigenvalues of $C$ and $\Pi_j=I-(C-\lambda_jI)(C-\lambda_jI)^+$ be the orthogonal projector onto the eigenspace for $\lambda_j$. Then $C=\sum_j\lambda_j\Pi_j$ and every $X$ that commutes with $C$ must satisfy $X=\sum_j\Pi_jX\Pi_j$. Therefore \begin{align*} \|XA-B\|_F^2 &=\big\|\sum_j\Pi_jX\Pi_jA-B\big\|_F^2\\ &=\big\|\sum_j\Pi_j(X\Pi_jA-B)\big\|_F^2\\ &=\sum_j\|\Pi_j(X\Pi_jA-B)\|_F^2\\ &=\sum_j\|(\Pi_jX\Pi_j)(\Pi_jA)-\Pi_jB\|_F^2. \end{align*} Hence a global minimiser $X$ is given by $\sum_j(\Pi_jB)(\Pi_jA)^+$.

user1551
  • 139,064
  • 1
    @tommym For your first question, in general, since $\Pi_i\Pi_j=0$ whenever $i\ne j$, we have \begin{align} |\sum_j\Pi_jM_j|_F^2 &=tr\big[(\sum_j\Pi_jM_j)^T(\sum_j\Pi_jM_j)\big]\ &=tr\big[(\sum_jM_j^T\Pi_j)(\sum_j\Pi_jM_j)\big]\ &=\sum_jtr(M_j^T\Pi_j\Pi_jM_j)\ &=\sum_j|\Pi_jM_j|_F^2. \end{align} – user1551 Mar 23 '24 at 19:07
  • 1
    For your second question. Let $C=Q(\lambda_1 I_{m_1}\oplus\cdots\oplus\lambda_kI_{m_k})Q^T$ be an orthogonal diagonalisation, where $m_j$ denotes the dimension of the eigenspace for $\lambda_j$. Partition $Q^TA$ and $Q^TB$ as$\pmatrix{A_1\ \vdots\ A_k}$ and $\pmatrix{B_1\ \vdots\ B_k}$ respectively, where each of $A_j$ and $B_j$ has $m_j$ rows. Every $X$ that commutes with $C$ is in the form of $Q(X_1\oplus\cdots\oplus X_k)Q^T$ where each $X_j$ is $m_j\times m_j$. Hence $|XA-B|_F^2=\sum_j|X_jA_j-B_j|_F^2$. ... – user1551 Mar 23 '24 at 19:08
  • 1
    ...For $X$ to be orthogonal, each diagonal sub-block $X_j$ is orthogonal. Now this becomes the usual orthogonal Procrustes problem. An optimal $X_j$ is given by $V_jU_j^T$, where $U_jS_jV_j^T$ is a singular value decomposition of $A_jB_j^T$. – user1551 Mar 23 '24 at 19:08
  • Thank you, this is exactly what I was looking for. From the last equality and your proposed solution, is the minimal solution solution to $\sum_i \lVert X A_i - B_i \rVert^2$ given by $X = \sum B_i A_i^{\dagger}$ in general or you using specific properties of $\Pi_j$? – tommym Mar 23 '24 at 19:14
  • Ah wonderful, thanks! I deleted my previous comment in haste as I thought my first question was trivial, but turns out it wasn't. – tommym Mar 23 '24 at 19:16
  • For any subsequent readers, I asked 1). how the sum was brought outside the norm moving from step 2 to 3 and 2). how the solution would change if we wanted to additionally constrain $X$ to be orthogonal, $X^T X = I$. – tommym Mar 23 '24 at 19:22
  • 1
    @tommym In terms of block matrices, the $X$ in my answer is $Q(X_1\oplus\cdots\oplus X_k)Q^T$ where $X_j=B_jA_j^+$, where $Q,X_j,A_j$ and $B_j$ are defined as in my comments above. – user1551 Mar 23 '24 at 19:42