Removing Extra Points When Finding an Audio Envelope

Question

I am trying to reproduce Camtasia's way of displaying audio signals which is a strictly positive envelope of the audio signal and very easy for me to identify some sounds visually. These are displays from Audacity and Camtasia (green) of a recording of someone saying the word "brown" sampled at 44.1 kHz. The Audacity display goes negative and oscillates a lot (spikey).

Here I read in the Audio file into Mathematica (using Audio[filename]) and extract channel 1 of the audio data into a List (First[AudioData[audio]]). My ultimate goal is to call the new Wolfram (Mathematica) Engine API from Python to get the outline, and then plot the points using a Python package.

Wolfram provides this example for envelope detection: https://www.wolfram.com/mathematica/new-in-10/enhanced-sound-and-signal-processing/envelope-estimation.html

but it doesn't work very well as it's still way too spikey.

After a lot of playing in Mathematica, I found some functions and settings that seem to give a good approximation to Camtasia's display. Here I zoom in on the word "brown". It's 5000 samples and had a lowpass filter applied at 11k Hz.

ListLinePlot[{
  FindPeaks[Abs[tfad[9070, 14500, fadlp]], 10]},
  Epilog -> {PointSize[Small], Red, 
    Point[FindPeaks[Abs[tfad[9070, 14500, fadlp]], 10]]},
  AspectRatio -> 1/2, ImageSize -> Medium, PlotRange -> Full, 
  Filling -> Axis, 
  FillingStyle -> Blue
]

Here, tfad[9070, 14500, fadlp] is just a 1-D list of the audio samples. The problem is that there are 100 or so points that define the outline, but there are three times as many other points that don't form the outline. It would be really inefficient to plot all those extraneous points, but I don't know how to get rid of them. How can I get rid of those extra points or calculate the envelope in a better way (with an eye towards displaying in Python)?

EDIT:

Thank you for the suggestions! I don't have any place to upload the mp3, but I was able to put the mp4 of the entire phrase ("quick brown fox ...") on Vimeo: https://vimeo.com/343760131

I've been adding a really low lowpass filter and performed a resample at 8 kHz with:

AudioResample[ LowpassFilter[a,  Quantity[500, "Hertz"]], "Telephone"]

I tried the AudioLocalMeasurement with RMS, Loudness and Power, and here is the result for Loudness:

It might look good if it had finer resolution.

Can you provide a link to the audio file you were using? It would be better to have it in order to compare the results. — halirutan, Jun 21 '19 at 03:02
There are four or five reasonable answers to the problem of finding envelopes here: https://mathematica.stackexchange.com/q/94770/1783 — bill s, Jun 21 '19 at 23:23

score 0 · Accepted Answer · answered Jun 21 '19 at 03:19

0

I would advise that you look into AudioLocalMeasurements. Of course, I'm not sure what exactly Camtasia does, but I assume it's simply a plot of the amplitude. Look here:

audio = ExampleData[{"Audio", "Bird"}];
AudioLocalMeasurements[audio, "RMSAmplitude"]

That gives you a time-series with only 81 points. And then it's simply a matter of plotting it:

green1 = RGBColor[{0.39, 0.71, 0.31}];
green2 = RGBColor[{0.23, 0.56, 0.16}];
ListLinePlot[
 AudioLocalMeasurements[audio, "RMSAmplitude"],
 InterpolationOrder -> 0,
 Filling -> Axis,
 FillingStyle -> green1,
 PlotStyle -> green1,
 Background -> green2,
 PlotRange -> {Automatic, {0, .2}},
 Axes -> False
 ]

answered Jun 21 '19 at 03:19

halirutan

112,764
7
263
474

Thank you for the answer! I tried this and I edited my original question and placed an mp4 link (couldn't do mp3) and picture of function output. – Gene Jun 21 '19 at 23:15
Using PartitionGranularity I was able to make the output much nicer. Thank you! – Gene Jun 21 '19 at 23:59
@Gene Yes, I figured once you have a start, you can look up the options yourself. Nice job! – halirutan Jun 22 '19 at 02:10

Removing Extra Points When Finding an Audio Envelope

1 Answers1