1

I am trying to implement smooth zoomable audio waveform but am puzzled with the correct approach to implement zoom. I searched internet but there is very little or no information.

So here is what I have done:

  1. Read audio samples from file and compute waveform points with samplesPerPixel = 10, 20, 40, 80, ....,10240. Store the datapoints for each scale (11 in total here). Max and min are also stored along with points for each samplesPerPixel.

  2. When zooming, switch to the closest dataset. So if samplesPerPixel at current width is 70, then use dataset corresponding to samplesPerPixel = 80. The correct dataset index is easily found using log2(samplesPerPixel).

  3. Use subsampling of the dataset to draw waveform points. So if we samplesPerPixel = 41 and we are using data set for zoom 80, then we use the scaling factor 80/41 to subsample.

     let scaleFactor = 80.0/41.0
     x = waveformPointX[i*scaleFactor]
    

I am yet to find a better approach and not too sure if the above approach of subsampling is correct, but for sure this approach consumes lot of memory and also is slow to load data at the start. How do audio editors implement zooming in waveform, is there an efficient approach?

  • 1
    The python package pyqtgraph has an example for loading h5 files and being able to zoom quickly. The resampling code for that example can be found here. – Ash Mar 09 '22 at 18:55
  • I can't exactly figure out in code how it zooms/scales. Is the code reading the data from the file of visible portion after scaling and computing the waveform? If yes, such an approach is too slow on mobile platforms such as iOS. – Deepak Sharma Mar 09 '22 at 19:47
  • 1
    Some ideas here too : https://www.bbc.co.uk/rd/blog/2013-10-audio-waveforms – audionuma Mar 10 '22 at 19:46
  • @audionuma Good info, thanks a lot. Really helpful. – Deepak Sharma Mar 11 '22 at 11:26
  • @audionuma It looks like audacity is also storing tables at number of zoom levels, starting from block size of 256 and all the way upto 64K. – Deepak Sharma Mar 11 '22 at 19:38
  • @audionuma I don't understand the algorithm used in WaveformRescaler::rescale() (= Sequence::GetWaveDisplay() in Audacity)...do you have any idea what's going in that function? – Deepak Sharma Mar 11 '22 at 20:22
  • Ok min, max computation is fast. What is not fast is taking average values. – Deepak Sharma Mar 12 '22 at 09:23

1 Answers1

3

This seems related to audio sample rate conversion and D/A conversion. You can think of it as two subproblems:

  1. How do I maximize the knowledge about the source waveform
  2. How do I convey the maximum visual information through a discrete pixel display

1 is particularly important when zooming in (source limited), while 2 is particularly important when zooming out (display limited). I shall cover mostly 1 here.

What about directly fitting analytic sinc functions to a local neighbourhood of samples and discretizing them directly at the target pixel grid? Or some practical approximation to sinc (windowed? Splines?)

Similar to the standard visualization of reconstruction of analog waveforms like this post: Sampling and reconstruction of signal in Matlab

At some point, your waveform will be oversampled to a degree where simple linear interpolation should be sufficient to upsample further. I don’t think that having a precalculated vector of 10000 points per sample is really needed?

Edit: plain code to show my thoughts

N = 10;
x = 2*rand(N,1)-1;
us = 50;
grd = 1:(1/us):N;
y = x.*sinc(grd-(1:N)');
z = sum(y);

Note that the summed waveform will be inaccurate along the edges. To improve that, you could sum up contributions from beyond the displayed region. And/or try windowing the sinc to limit it reach (there will be some compromise)

enter image description here

Even with a discrete waveform that is generated from an analytically probed function, you might want to improve rendering. If a sample hits approximately between two pixels in the vertical dimension, you might want to distribute it to those two pixels (or more) rather than picking the nearest neighbour.

I would guess that this solution scales decently to, say, 100 samples and a display of 1000 or 3000 pixels in each dimension. If you are showing much more samples than that in one crop, you likely won't be able to appreciate much visual benefit while the computational cost will increase. That kind of view is perhaps better created with a simple static resampling and more effort in the rendering ("combating the limitations of the display rather than those of the source data").

The solution above relies upon the MATLAB plot rendering for anti aliasing. A crude attempt at rendering is shown below, where the vertical offset is picked by nearest neighbour. That results in visible staircasing:

vgrid = linspace(-2, 2, N*us);
[mv, mvi] = min(abs(z-vgrid'), [], 1, 'linear');
M = zeros(length(vgrid), length(grd));
M(mvi) = 1;
imwrite(M, 'test.bmp')

enter image description here

I am no graphics guy, but I assume that MATLAB and their Open GL (?) based renderer have a notion about "line thickness" and that a discrete vector should be plotted as a continuous waveform using anti aliasing to distribute local weight to a discrete neighbourhood of pixels?

Knut Inge
  • 3,384
  • 1
  • 8
  • 13
  • Please excuse me for being a newbie in this, but I am simply drawing amplitudes of samples in the waveform. What you are referring to is drawing the signal in frequency domain? – Deepak Sharma Mar 09 '22 at 19:07
  • No, time domain. For a rendering of 10 samples on a 100x100 pixel display, cue up 10 sincs (ideally, a bit more) offset so their peaks fall on each of the input samples, sample them at 10 times per input sample, and sum them. – Knut Inge Mar 09 '22 at 19:17
  • 2
    Knut is, indeed, correct. This is about interpolation. There are sample-rate conversion kernels in which the interpolated waveform is not guaranteed to pass through the original sample points (the brickwall LPF would be derived from firpm() or firls() ). These might be good-sounding interpolation kernels, but for visual display, you don't want these. What you want is a windowed $\operatorname{sinc}(\cdot)$ kernel for interpolation. That guarantees that the interpolated waveform passes through the original sample points. – robert bristow-johnson Mar 09 '22 at 21:43
  • So what you are saying is instead of processing lot of samples to draw, take a few sample points only and draw them using sinc (which is interpolation). The interpolated waveform may not pass through all sample points, but will be accurate enough. Do I understand it correctly? – Deepak Sharma Mar 10 '22 at 07:20
  • If you use a sinc interpolator, the interpolated waveform will pass through all sample points, but it will be an inaccurate rendering of the continuous waveform close to the edges. – Knut Inge Mar 10 '22 at 12:15
  • My problem is I have lot many samples and less space to draw. For a 10 min 44.1Khz audio, number of samples are 44100x600 and the space to draw is say 1000x100 pixels. I have more samples than pixels. But as I zoom in, the space drawing width increases exponentially. How does interpolation fit in the scene I am trying to understand. Are you saying I choose very few samples out of 44100x600 and interpolate using sinc? – Deepak Sharma Mar 10 '22 at 14:11
  • I would think that when plotting 10 minutes of audio in a 1000 pixel window, the problem is not so much how to maximize the knowledge about the input vector, but to maximize the "robust information transfer" of your display. It is not an "upsampling problem" but a "downsampling problem". Perhaps computer graphics people would know that problem better than audio dsp people? Font rendering, splines, anti-aliasing in 2 dimensions when our eyes will be the judge. – Knut Inge Mar 11 '22 at 08:32