Solving regularized least squares problem using black-box computation of $\mathbf{A}\mathbf{x}$ and $\mathbf{A}^T\mathbf{x}$

Question

Let $\mathbf{A} \in \mathbb{R}^{n \times n}$. I'm working in a problem where I have a black-box algorithmic solution to compute the products $\mathbf{A}\mathbf{x}$ and $\mathbf{A}^T \mathbf{x}$ given an input $\mathbf{x} \in \mathbb{R}^n$. Knowing that I can't access the value of $\mathbf{A}$ directly, what are some examples of algorithms for solving the regularized linear least square problem

\begin{align} \min_{\mathbf{x}} \frac{1}{2} \| \mathbf{y} - \mathbf{A} \mathbf{x}\|^2 + \lambda \phi(\mathbf{x}) \end{align}

that only rely in the computation of $\mathbf{A}\mathbf{x}$ and $\mathbf{A}^T\mathbf{x}$? Here, $\lambda >0$ is a given regularization parameter. I'm more interested in the likelihood $ \frac12 \|\mathbf{y} - \mathbf{A} \mathbf{x}\|^2$ updates so we can assume pretty much anything we want for $\phi$ for the sake of this question. If it makes the problem more concrete, you can assume that $\phi$ is convex lower semicontinuous, so that $\text{prox}_{\lambda\phi}$ exists and is single valued.

For instance, if $\phi$ is differentiable, given a stepsize $\mu>0$, the gradient descent update \begin{align} \mathbf{x}^{i+1} &= \mathbf{x^i} - \mu \mathbf{A}^T(\mathbf{A}\mathbf{x^i} - \mathbf{y}) - \mu \lambda \nabla \phi(\mathbf{x^i}) \end{align} can be implemented in my system. What are some other examples?

EDIT: If it makes it more concrete, the prior $\phi$ of my problem is the total variation function for 2d signals.

You will get better answers at https://or.stackexchange.com/ — mhdadk, Apr 08 '23 at 20:49
@Royi $\phi$ is the total variation function in my problem, so I believe a method that works for generic convex lsc prior would work. — mlbj, Apr 09 '23 at 12:21
@mlbj, I gave you the answers for the general cases. Now that we know the regularization things are easier. Could you add that to the question? Is your problem about 2D signals or 1D signals? — Royi, Apr 09 '23 at 12:24

Royi · Accepted Answer · 2023-04-09T08:43:31.227

1

If the problem is only given by:

$$ \arg \min_{\boldsymbol{x}} \frac{1}{2} {\left\| A \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} $$

Then you can use the Conjugate Gradient (Or even better, Preconditioned Conjugate Gradient).

For instance, if $ A $ stands for convolution matrix, in practice all you need is convolution and correlation.
See my answer and MATLAB code at Automatic Image Enhancement of Images of Scanned Documents (Auto Whitening).

For the regularized problem we need to know more on $ \phi \left ( \cdot \right) $.
If it is a projection onto a convex set, you may use the alternating projection method. If it is a quadratic function of $ \boldsymbol{x} $ you may use some specialized solvers.

I also think that the ADMM can also work in some cases.

edited Apr 09 '23 at 08:43

answered Apr 09 '23 at 08:37

Royi

19,608
4
197
238

I wrote down the ADMM iterations, but then get a term the needs the calculation of $(A^TA + \mu I)^{-1}$, where $I$ is the identity matrix, and I am not able to compute it, since I only know how to operate $A$ and $A^T$ on a vector $x$. – mlbj Apr 09 '23 at 12:25
$A$ is related to 2d convolution, but it is not exactly circulant nor doubly block circulant. Actually, if you want specifics, $A$ is a sum of several diagonal times doubly block circulant factors. I wrote a question in https://math.stackexchange.com/questions/4675207/diagonalization-of-a-combination-of-circulant-matrices with the complete specification, but couldn't get anything yet. – mlbj Apr 09 '23 at 12:31
I am not after $ A $ just wanted to know if this is a 1D or 2D total variation problem. – Royi Apr 09 '23 at 13:51
it is a 2d problem written in vector notation – mlbj Apr 09 '23 at 13:51
@mlbj, For me, in the ADMM, I get you need to be able to calculate ${\left({A}^{T} A + \mu {D}^{T} D \right)}^{-1}$. Where $D$ is the finite differences matrix. Are you sure you get something else? – Royi Apr 09 '23 at 18:26
In my case, I derived the ADMM using three iterations to minimize the augumented lagrangian $\frac12 |y-Ax|_2^2 + \lambda \phi(z) + \frac{1}{2\mu}| x- z + m|_2^2$, where $m$ the dual variable and $z=x$ is an auxiliary variable. This is what you did? – mlbj Apr 09 '23 at 18:36
You can do it, but then you won't have a closed form solution for the iteration of $z$ since it won't be just the ${L}_{1}$ norm. – Royi Apr 09 '23 at 19:02
could you share your solution as an answer? – mlbj Apr 09 '23 at 19:32
1

The question to your answer depends on the fact whether you have an efficient prox() for $ \phi \left( \boldsymbol{x} \right) $. Pay attention that we don't have an efficient prox() for $ \phi \left( \boldsymbol{x} \right) = {\left| D x \right|}_{1} $ if we did, then your question would be easy. – Royi Apr 09 '23 at 19:46
1

@mlbj, You may find this interesting: https://dsp.stackexchange.com/questions/87500. – Royi Apr 13 '23 at 06:10
that's actually very enlightening. Now I see why that $D^T D$ popped out in your ADMM solution. Thank you very much again! – mlbj Apr 13 '23 at 21:14
1

@mlbj, You may also have a look at https://dsp.stackexchange.com/questions/87542. – Royi Apr 16 '23 at 07:35

Solving regularized least squares problem using black-box computation of $\mathbf{A}\mathbf{x}$ and $\mathbf{A}^T\mathbf{x}$

1 Answers1

Linked