Image / Video Upscaling (Super Resolution) Algorithm Explanation (Image and Video Upscaling from Local Self Examples)

Question

So, I'm trying to implement the classical algorithm described in this paper Image and Video Upscaling from Local Self-Examples and this presentation to serve as a baseline for comparison with AI/NN-based approaches for the same problem. My issue is that I have some trouble with the notation used, and I believe there is some information missing (or implied) for someone looking to implement this by simply going through the paper.

In particular, on page 6/11 of the paper, in equations (1) and (2), it's not clear to me how the convolution between the image and the filter is supposed to happen, as the filters described in the paper (and shown in the appendix) are, to my understanding, 1-dimensional. Do they simply convolve each row separately and then do the same with the columns, using the same filter coefficients (as described in Jorg Ritter - Wavelet transforms on images - Chapter 2 - Wavelet transforms on images)? Something else? Another thing that strikes me as a bit odd is the notation used in equation (4), page 6/11 -- if the filters are 1d then the indexing operation should yield a scalar value, so why is the dot product used in the equation?

I'd really appreciate any help here -- I've been looking into related material (regarding e.g. discrete wavelet transform) to figure out if there are some commonly used conventions that I'm missing and could provide some insight, but with no result so far.. Of course, if someone is aware of an available implementation, I'd be very grateful if you could point me to it. Many thanks for any help, in advance!

Yes they are 1D filters, as the appendix gives a table of computed optimal values... How they apply these 1D filters is not clear though. It may be horizontal alone (as the equations imply) or horizontal followed by vertical as typically done in this field. — Fat32, Nov 22 '21 at 13:15
Thanks for confirming this! Regarding how the filters are applied, I suppose perhaps the best way to find out now would be to simply implement something and see what the results look like... — Br4veSt4rr, Nov 22 '21 at 14:14
Absolutely... If you intend to make a body of Matlab/Octave/C/C++\Python application, we would like to help in there too... — Fat32, Nov 22 '21 at 18:04
I'll make an extended response later today, but in a nutshell you make a column vector of the image by appending all the transposed image rows together, and you form the wavelet/Fourier transform/any linear operator matrix on the left, like the figure in this post: http://www.computersdontsee.net/post/2012-07-04-linearized-model-of-lbps/ In practice you almost never need to form this huge matrix vector multiplication though. — sansuiso, Nov 23 '21 at 08:15
@Br4veSt4rr, Could you review my answer? Is there anything missing? — Royi, Mar 26 '23 at 09:50

Royi · Answer 1 · 2021-12-09T17:58:15.247

The concept of using different scales data was very promising in the pre Deep Learning era (Look for Michal Irani's work on the subject as well).

Indeed the filters are applies over the rows and columns. This is basically a separable 2D filter. Have a look at: How to Decompose a Separable Filter, How to Prove a 2D Filter Is Separable, When to Use Composite Filters and When to Use Separable Filters and How to Check Separability of 2D Filter / Signal / Matrix.
The notation in equation 4 means that the inner product (Basically applying the filters) yields a delta function for filters which are upsample and downsample on the same level. This means that equation 3 holds as the filters are orthogonal.

Image / Video Upscaling (Super Resolution) Algorithm Explanation (Image and Video Upscaling from Local Self Examples)

1 Answers1