9

The highest voted answer to this question suggests that to denoise a signal while preserving sharp transitions one should

minimize the objective function:

$$ |x-y|^2 + b|f(y)| $$

where $x$ is the noisy signal, $y$ is the denoised signal, $b$ is the regularziation parameter, and $|f(y)|$ is some L1 norm penalty. Denoising is accomplished by finding the solution $y$ to this optimization problem, and $b$ depends on the noise level.

However, there is no indication of how one might accomplish that in practice as that is a problem in a very high dimensional space, especially if the signal is e.g. 10 million samples long. In practice how is this sort of problem solved computationally for large signals?

Royi
  • 19,608
  • 4
  • 197
  • 238
John Robertson
  • 1,122
  • 1
  • 9
  • 13
  • Are you concerned with run time? Otherwise, the iterature on how to minimize a function is quite extensive (Levenberg-Marquardt, Nelder-Mead, etc. come to mind). There are even some modified versions that are made specifically for this. – thang Feb 02 '13 at 01:09
  • Actually, I have a question for the people answering below. Besides being slow, what is wrong with just something like Levenberg-Marquardt or Nelder-Mead? These are generalized optimizers, so you can even numerically approximate $f$. – thang Feb 02 '13 at 01:13
  • Yes, I am concerned with run time, but thanks for pointing these methods out. – John Robertson Feb 04 '13 at 22:06

4 Answers4

7

Boyd has A Matlab Solver for Large-Scale ℓ1-Regularized Least Squares Problems. The problem formulation in there is slightly different, but the method can be applied for the problem.

Classical majorization-minimization approach also works well. This corresponds to iteratively perform soft-thresholding (for TV, clipping).

The solutions can be seen from the links. However, there are many methods to minimize these functionals by the extensive use of optimisation literature.

PS: As mentioned in other comments, FISTA will work well. Another 'very fast' algorithm family is primal-dual algorithms. You can see the interesting paper of Chambolle for an example, however there are plethora of research papers on primal-dual methods for linear inverse problem formulations.

Deniz
  • 648
  • 4
  • 8
  • What does 'primal-dual' refer to exactly? – Spacey Oct 04 '12 at 17:28
  • Mohammad, I did not implement any primal-dual algorithm for inverse problems. However, you can see an example from the link I mentioned in the answer: the paper of Chambolle. From this paper, you can see what a primal-dual algorithm means precisely. These methods provide just another (and fastly convergent) solution to inverse problems. – Deniz Oct 05 '12 at 13:41
  • I thought primal dual is combinatorial optimization? How can you transform this problem generically (for a generic $f$) into that framework? – thang Feb 02 '13 at 01:12
  • thang, as I mentioned before, I am not an expert on this domain. You can see the paper of Chambolle and see that how primal-dual methods can be used to solve the problems like $\ell_1$ or TV regularization. – Deniz Feb 03 '13 at 00:04
5

To solve optimization problems with TV penalty, we use a recently proposed algorithm called Fast Gradient Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems (FISTA), which has better convergence rate than conventional iterative methods, such as ASD-POCS.

Royi
  • 19,608
  • 4
  • 197
  • 238
chaohuang
  • 1,079
  • 1
  • 9
  • 11
4

In the particular case where $f(y)=\|y\|_1$, the objective function can be written as

$$ \|x-y\|^2 + b\|y\|_1 = \sum_i(x_i - y_i)^2 + b\sum_i |y_i|, $$

minimizing it requires to minimize each entry of the sum:

$$ \hat{y_i} = argmin \{(x_i-y_i)^2 + b|y_i|\} $$

Using subdifferentials it is possible to show that the minimizer is the soft-thresholding operator with threshold $b$. That's the method proposed by Donoho and Johnstone for signal denoising. See their paper Ideal spatial adaptation by wavelet shrinkage for more details.

So in this case, I think that you don't need a more sophisticated solver to estimate your signal.

Alejandro
  • 256
  • 1
  • 3
  • You have an $L^1$ norm penalty $\vert y_i\vert$ rather than a total variation penalty $\vert y_{i+1}-y_i\vert$. Is that a typo? – John Robertson Oct 04 '12 at 16:35
  • In the question it says: "and |f(y)| is some L1 norm penalty", so I just plugged the $\ell_1$ norm, which is the classic case in signal denoising. But may be I'm misunderstanding the question. – Alejandro Oct 04 '12 at 16:49
  • Yah, that could have been more clear. In that quote $f$ is a function on the entire signal, not necessarily a function being run on each component of the signal, that is $f$ can combine different signal samples together e.g. $f(x_0,x_1,...)=(x_1-x_0,x_2-x_1,...)$ is perfectly legitimate. – John Robertson Oct 04 '12 at 17:05
  • I see. I will add that my answer if for the particular case where $f(y)$ is the $\ell_1$ norm. – Alejandro Oct 04 '12 at 17:32
3

Added: if $f(x) = \ell_1(x) = \sum |x_i|$, the terms are all independent — as @Alejandro points out, you can just minimize each term by itself. It's more interesting to minimize
$\qquad\qquad$ $\| Ax - b \|_2^2 + \lambda \|x\|_1 $
where $\|x\|_1$ instead of $\|x\|_2$ is intended to push many $x_i$ to 0.
The following notes are for this case. (I call the variables $x$, not $y$.)


(A year later) another name for this for the case $f(x) = \ell_1$ norm is Elastic net regularization.
Hastie et al., Elements of Statistical Learning p. 661 ff. discuss this for classification.

A fast simple way to get an approximate solution with many $x_i = 0$ is to alternate

  1. minimize $\|Ax - b\|$ by plain least squares
  2. shrink a.k.a. soft-threshold: set small $x_i = 0$.

This is a form of Iteratively reweighted least squares, with weights 0 or 1. I'd expect that methods in papers cited in previous answers will give better results; this is simple.

(When minimizing a sum $f() + \lambda g()$, it's a good idea to plot $f()$ and $\lambda g()$ on a log-log scale for iter 1 2 3 ... Otherwise, one term may swamp the other, and you won't even notice —
especially when they scale differently.)

denis
  • 608
  • 5
  • 11