Audio sample analysis - Comparing two audio samples, the original sample with a reproduction capture sample

Question

Already asked in sound.stackexchange.com but it was suggested by one of the comments that it would be better suited to be asked here.

Introduction

I'm currently working on a project where I'm implementing a basic software phone, which connects two users. Since I'm implementing both server service and the client applications, I'm able to manipulate the used codec and bitrate used in the audio communication. At this moment, I'm focused on testing the quality of the audio that is received by a user.

In order to do this, I'm reproducing a testing audio sample (Harvard Sample - Female voice) into a virtual audio device, which is then captured by the sender client and sent to the backend. The backend then resends the audio to the receiver user which will promptly reproduce the audio into a virtual audio device and captured by a recording application (audacity in this case). At the end of this procedure, I have two audio samples, the original sample and the captured sample.

My Objective

Now, my objective is to know how degraded is the captured sample in relation to the original sample. For this, I would like to have a "magic number" from 0 to 100, where 0 would be completely degraded and 100 would be exact match.

My Question

Unfortunately, I do not have a background in audio engineering and as so, I would like to as the community if my objective is possible to achieve and if so, what kind of audio analysis do I have to do to achieve it. If this is not possible or if there are better ways to achieve the quality comparison objective, I am open to other possible suggestions. I'm also looking for possible open source libraries that can help me in audio analysis, preferably in Python.

@DanBoschen It's a start but that only gives me an answer towards noise, where I could also have a situation where packets were lost and audio chunks will be missing from the captured audio sample.
But thank you for your suggestion. I believe that a weighted sum of normalized results might deliver what I'm looking for. I just need to know other possible analysis that I can do to achieve my goal. If you have suggestions of other analysis, please let me know. — Carlos Ferreira, Nov 23 '21 at 17:49
What's your application? Voice only ? What do you care about? Intelligibility, spectral fidelity, distortions & artifacts, pre/post ringing, naturalness, etc. ? Your quality criteria should be based on what really matters to your application. — Hilmar, Nov 23 '21 at 17:51
@Carlos It needn't be "noise" as you may be thinking of noise but all variations (distortion) compared to the original signal. We can refer to any deviation from the original signal as "noise" in this regard. You may be interested in weighting such deviations differently (a total absence would carry a significant weight on its own) but the process would be similar once you establish what "good" means to you. For this purpose the correlation coefficient would be a useful metric to compare two audio samples as to which one matches the original better. — Dan Boschen, Nov 23 '21 at 17:55
@Hilmar I'm currently only considering voice. I care about perceptivity of speech, meaning I care bout how well a receiver will be able to quickly understand what the sender is saying. Anything that affects this criteria is to be considered as a negative effect. I'm sorry if I cannot provide more information but I'm limited by my own knowledge of audio signal processing. — Carlos Ferreira, Nov 23 '21 at 18:27
@DanBoschen I see. That is actually helpful, since I can establish a link between a correlation coefficient result and an audio codec configuration. The, I would compare the results with the original sample and see which audio codec configuration would yield a better value. — Carlos Ferreira, Nov 23 '21 at 18:31
Exactly, and then you can expand on that and weigh different effects by weighing the deviation within the correlation computation differently (such as imposing a max deviation etc) but I suspect the straight calculation will be of immediate value to you. Further spectral processing can reveal harmonic distortion etc which also may be of interest so you may want to look into that as well. — Dan Boschen, Nov 23 '21 at 18:34
@DanBoschen Okay, Thank you for the information. I will attempt to build something based on this knowledge. — Carlos Ferreira, Nov 23 '21 at 19:22

Audio sample analysis - Comparing two audio samples, the original sample with a reproduction capture sample

Introduction

My Objective

My Question

0 Answers0