Solving Convex Optimization Problem Used for High Quality Denoising

Question

The highest voted answer to this question suggests that to denoise a signal while preserving sharp transitions one should

minimize the objective function:

$$ |x-y|^2 + b|f(y)| $$

where $x$ is the noisy signal, $y$ is the denoised signal, $b$ is the regularziation parameter, and $|f(y)|$ is some L1 norm penalty. Denoising is accomplished by finding the solution $y$ to this optimization problem, and $b$ depends on the noise level.

However, there is no indication of how one might accomplish that in practice as that is a problem in a very high dimensional space, especially if the signal is e.g. 10 million samples long. In practice how is this sort of problem solved computationally for large signals?

Are you concerned with run time? Otherwise, the iterature on how to minimize a function is quite extensive (Levenberg-Marquardt, Nelder-Mead, etc. come to mind). There are even some modified versions that are made specifically for this. — thang, Feb 02 '13 at 01:09
Actually, I have a question for the people answering below. Besides being slow, what is wrong with just something like Levenberg-Marquardt or Nelder-Mead? These are generalized optimizers, so you can even numerically approximate $f$. — thang, Feb 02 '13 at 01:13
Yes, I am concerned with run time, but thanks for pointing these methods out. — John Robertson, Feb 04 '13 at 22:06

Deniz · Accepted Answer · 2012-10-02T20:04:40.327

7

Boyd has A Matlab Solver for Large-Scale ℓ1-Regularized Least Squares Problems. The problem formulation in there is slightly different, but the method can be applied for the problem.

Classical majorization-minimization approach also works well. This corresponds to iteratively perform soft-thresholding (for TV, clipping).

The solutions can be seen from the links. However, there are many methods to minimize these functionals by the extensive use of optimisation literature.

PS: As mentioned in other comments, FISTA will work well. Another 'very fast' algorithm family is primal-dual algorithms. You can see the interesting paper of Chambolle for an example, however there are plethora of research papers on primal-dual methods for linear inverse problem formulations.

edited Oct 02 '12 at 20:04

answered Sep 27 '12 at 19:46

Deniz

648
4
8

What does 'primal-dual' refer to exactly? – Spacey Oct 04 '12 at 17:28
Mohammad, I did not implement any primal-dual algorithm for inverse problems. However, you can see an example from the link I mentioned in the answer: the paper of Chambolle. From this paper, you can see what a primal-dual algorithm means precisely. These methods provide just another (and fastly convergent) solution to inverse problems. – Deniz Oct 05 '12 at 13:41
I thought primal dual is combinatorial optimization? How can you transform this problem generically (for a generic $f$) into that framework? – thang Feb 02 '13 at 01:12
thang, as I mentioned before, I am not an expert on this domain. You can see the paper of Chambolle and see that how primal-dual methods can be used to solve the problems like $\ell_1$ or TV regularization. – Deniz Feb 03 '13 at 00:04

score 5 · Answer 2 · edited Oct 18 '19 at 13:44

5

To solve optimization problems with TV penalty, we use a recently proposed algorithm called Fast Gradient Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems (FISTA), which has better convergence rate than conventional iterative methods, such as ASD-POCS.

edited Oct 18 '19 at 13:44

Royi

19,608
4
197
238

answered Oct 02 '12 at 02:25

chaohuang

1,079
1
9
11

1

Is it possible for you to add some more information on the algorithm, since the only reference you linked requires purchasing the article? – Jason R Oct 02 '12 at 12:35
http://iew3.technion.ac.il/~becka/papers/71654.pdf – chaohuang Oct 02 '12 at 13:47
@JasonR, It is basically Nesterov Acceleration of the Prox operator. Really nice work. – Royi Oct 18 '19 at 13:45

Alejandro · Answer 3 · 2012-10-04T17:33:35.007

4

In the particular case where $f(y)=\|y\|_1$, the objective function can be written as

$$ \|x-y\|^2 + b\|y\|_1 = \sum_i(x_i - y_i)^2 + b\sum_i |y_i|, $$

minimizing it requires to minimize each entry of the sum:

$$ \hat{y_i} = argmin \{(x_i-y_i)^2 + b|y_i|\} $$

Using subdifferentials it is possible to show that the minimizer is the soft-thresholding operator with threshold $b$. That's the method proposed by Donoho and Johnstone for signal denoising. See their paper Ideal spatial adaptation by wavelet shrinkage for more details.

So in this case, I think that you don't need a more sophisticated solver to estimate your signal.

edited Oct 04 '12 at 17:33

answered Oct 04 '12 at 16:20

Alejandro

256
1
3

You have an $L^1$ norm penalty $\vert y_i\vert$ rather than a total variation penalty $\vert y_{i+1}-y_i\vert$. Is that a typo? – John Robertson Oct 04 '12 at 16:35
In the question it says: "and |f(y)| is some L1 norm penalty", so I just plugged the $\ell_1$ norm, which is the classic case in signal denoising. But may be I'm misunderstanding the question. – Alejandro Oct 04 '12 at 16:49
Yah, that could have been more clear. In that quote $f$ is a function on the entire signal, not necessarily a function being run on each component of the signal, that is $f$ can combine different signal samples together e.g. $f(x_0,x_1,...)=(x_1-x_0,x_2-x_1,...)$ is perfectly legitimate. – John Robertson Oct 04 '12 at 17:05
I see. I will add that my answer if for the particular case where $f(y)$ is the $\ell_1$ norm. – Alejandro Oct 04 '12 at 17:32

denis · Answer 4 · 2013-10-30T16:13:34.500

Added: if $f(x) = \ell_1(x) = \sum |x_i|$, the terms are all independent — as @Alejandro points out, you can just minimize each term by itself. It's more interesting to minimize
$\qquad\qquad$ $\| Ax - b \|_2^2 + \lambda \|x\|_1 $
where $\|x\|_1$ instead of $\|x\|_2$ is intended to push many $x_i$ to 0.
The following notes are for this case. (I call the variables $x$, not $y$.)

(A year later) another name for this for the case $f(x) = \ell_1$ norm is Elastic net regularization.
Hastie et al., Elements of Statistical Learning p. 661 ff. discuss this for classification.

A fast simple way to get an approximate solution with many $x_i = 0$ is to alternate

minimize $\|Ax - b\|$ by plain least squares
shrink a.k.a. soft-threshold: set small $x_i = 0$.

This is a form of Iteratively reweighted least squares, with weights 0 or 1. I'd expect that methods in papers cited in previous answers will give better results; this is simple.

(When minimizing a sum $f() + \lambda g()$, it's a good idea to plot $f()$ and $\lambda g()$ on a log-log scale for iter 1 2 3 ... Otherwise, one term may swamp the other, and you won't even notice —
especially when they scale differently.)

Solving Convex Optimization Problem Used for High Quality Denoising

4 Answers4