So, I'm trying to implement the classical algorithm described in this paper Image and Video Upscaling from Local Self-Examples and this presentation to serve as a baseline for comparison with AI/NN-based approaches for the same problem. My issue is that I have some trouble with the notation used, and I believe there is some information missing (or implied) for someone looking to implement this by simply going through the paper.
In particular, on page 6/11 of the paper, in equations (1) and (2), it's not clear to me how the convolution between the image and the filter is supposed to happen, as the filters described in the paper (and shown in the appendix) are, to my understanding, 1-dimensional. Do they simply convolve each row separately and then do the same with the columns, using the same filter coefficients (as described in Jorg Ritter - Wavelet transforms on images - Chapter 2 - Wavelet transforms on images)? Something else? Another thing that strikes me as a bit odd is the notation used in equation (4), page 6/11 -- if the filters are 1d then the indexing operation should yield a scalar value, so why is the dot product used in the equation?
I'd really appreciate any help here -- I've been looking into related material (regarding e.g. discrete wavelet transform) to figure out if there are some commonly used conventions that I'm missing and could provide some insight, but with no result so far.. Of course, if someone is aware of an available implementation, I'd be very grateful if you could point me to it. Many thanks for any help, in advance!