4

There is a recording which contains very noisy voice. (voice with a low Signal-to-Noise ratio).

It is a female voice of a stationary speaker (right in front of the microphone) and there are many other recordings of the same speaker with much better quality.

What can be done to denoise the audio?

From the other recordings, it might be possible to extract statistical values that allow some model to filter the speaker's voice.

Existing questions only address snoring, have non-random noise, want to identify a specific artifact or a moving speaker. The question is-it-possible-to-filter-out-a-persons-voice-out-of-100-of-other-voices is similar, but the answer there lacks details and is more about distinguishing, less about extraction.

Some of the tapes are direct copies, some are copies of copies, which have also been recorded using a lower speed setting (4.75 cm/s). This affected the quality prior to digitalization.

serv-inc
  • 95
  • 10
  • 1
    The distorted recording sounds more like the speaker has a large obstacle between themselves and the microphone. The noise is relatively low level. I'm wondering if it's more of a recording problem (e.g. bad tape bias?). – Peter K. Sep 01 '15 at 11:46
  • @PeterK.: All recordings were digitalized with the same setup, if that is what you mean. If you refer to the bias of the original tape, sure, that may well be. How would that affect the solution? – serv-inc Sep 01 '15 at 15:47
  • 1
    Well, it might mean you are looking for a non-stochastic solution. A simpler "de-biassing" filter might do (though this might exacerbate the noise). You might want to compare the spectrum of the noisy speech with the spectrum of the non-noisy speech and see if you can infer what the "de-biassing" filter's frequency response might look like. – Peter K. Sep 01 '15 at 15:51
  • 1
    For clarity: you still might want to do some stochastic filtering to remove the random noise. – Peter K. Sep 01 '15 at 15:51
  • 2
    By "bias" I mean magnetic / electromagnetic bias in the recording medium. The way that would affect the recording is to filter it. It sounds like it's a low-pass filter operation.... so you may need to invert that (make a high-pass filter?). – Peter K. Sep 01 '15 at 16:03
  • 1
    In general, a good method of removing noise would be useful to a lot of people. The examples in this question aren't good examples of that use case, though. The recording that most needs help isn't just noisy, it has actually had a lot of its content filtered out before being digitized. I also strongly suspect it was originally sampled at only 8 bits, so there's not much chance of rooting some remnant of the signal out of the noise floor. – JRE Sep 02 '15 at 12:52
  • 1
    @PeterK.: you were right. The recording does have some DC bias, so normalization as a first steps helps. – serv-inc Sep 02 '15 at 14:30
  • @JRE: I tried to sample it at 32bit, even. It was layman's equipment, though, so maybe that played a part. Some of these magnetic tapes were lying around for decades. Some are also copies of copies. Yet, they were all digitized using the same equipment. What would be good examples? – serv-inc Sep 02 '15 at 14:34
  • You couldn't have sampled it at 32 bits with normal PC components. Consumer sound cards only sample at 16 and sometimes 8 bits. You need professional equipment for 24 bit. I don't know of any equipment that samples at 32 bits. A lot of programs use 32 bits internally to process audio, and there are also 32 bit .wav formats as well. – JRE Sep 02 '15 at 14:38
  • Audacity was set to record at 32bit. Probably the sound card had the usual 16 bit (I would probably have recognized 8 bit ;-) – serv-inc Sep 02 '15 at 14:41
  • 1
    Yup. Audacity lets you choose 32 bits, but it has to convert to that from the 16 bits that your sound card delivers. – JRE Sep 02 '15 at 14:44
  • Your file better_quality_1_15_446_to_2_01_954.wav is a good example for using spectral subtraction - Audacity does a fair job of reducing the noise without mangling the speech. – JRE Sep 02 '15 at 15:04
  • Is there a chance you could play the tape you got noisy_00_41_718_to_01_04_287.wav from and see whether it sounds bad already or if maybe something happened during the transfer? – JRE Sep 02 '15 at 15:52
  • @JRE: The question has been edited to adress the tape quality. Some of the tapes were copies of copies with a slower band speed than the others. – serv-inc Sep 04 '15 at 08:14
  • @JRE: Additionally, the tape owner replied that some of the tapes are indeed of such quality as the recording. (Please forgive the late reply) – serv-inc Sep 26 '15 at 12:06

4 Answers4

4

I am afraid there's not much you can do. The voice part seems to have gone through the equivalent of a low pass filter with a cut off of around 1000 Hz. Basically, all of the speech components above 1000Hz are gone.

The filtering action may not have been an intentional filter, but may have been due to the improper biasing of the tape during recording. If it is an old tape, it may simply have deteriorated over time. Also, the playback head may have needed degaussing.

Running it through a high pass flattens the frequency response, but doing so pushes the level down so far it drowns the signal in the noise - no help there.

The best result I got was from using a very steep low pass with a cutoff of 1000Hz together with a very steep high pass with a cutoff of 160 Hz. That gets rid of the noise by only passsing what is left of the actual speech, but obviously it can't recover what was lost.

Your real problem is not the noise, it is the lost frequency range.


This is the spectrum of the bad recording: enter image description here

This is the spectrum of the good recording: enter image description here

As you can see, there's a lot of stuff missing from the bad recording. So, it isn't simply a problem of removing noise. The problem is that there's stuff that's just GONE.

Look at the range from 1000Hz to 7000Hz. There's lots of stuff there in the good recording, but in the bad one it is just a flat spectrum a good 30dB below the voice peaks aroudn 400Hz.

Some of what's missing might be buried in the noise, but recovering it would cause artifacts that are worse than the noise and muffled sound.


Looking at just the noise, it doesn't seem like there would be much to recover out of it. It looks just like the portions with speech (except for between 160Hz to 1000Hz,) so anything that is in there is going to be buried really deep. enter image description here

JRE
  • 2,240
  • 1
  • 11
  • 20
  • Yes, that sounds right. – Peter K. Sep 02 '15 at 14:42
  • Would it be theoretically possible to extract the phonemes from the good recording, identify them on the bad one, and replace the lacking frequencies? – serv-inc Sep 04 '15 at 08:15
  • 1
    That would be a lot of manual work. I've ofthen thought that it might be possible to use voice sythesis software to analyse a large body of recordings from one person and then use that to synthesize recordings of the person speaking new texts. That'd be a lot of work too, though. Depending on how many bad recordings you've got, it might be cheaper or easier to find a voice actor (synchronsprecher) and have them re-read the texts. – JRE Sep 04 '15 at 08:21
  • Yes, you are right. Yet, the noisy recording contains at least some parts of the speech which might make it possible to "fit" the phonemes into the original. Some people need the original, and not a voice actor. (Which is an otherwise good suggestion). – serv-inc Sep 04 '15 at 08:32
  • Might be easier to just copy words and or phonemes from the good recordings and just "splice" the sentences together to replace the bad recordings entirely. Matching the frequencies and phases together to "fix" the existing recordings would be really tough. – JRE Sep 04 '15 at 08:54
  • 1
    Just had a really silly idea. I'll see if I can find time to try it out and post back later, maybe this week end. – JRE Sep 04 '15 at 08:54
  • 1
    Silly ideas are often the most fun. :-) – Peter K. Sep 04 '15 at 12:07
  • 1
    Maybe bandwidth extension techniques can help to recover the part above 1kHz. – Brian Sep 04 '15 at 12:43
  • 1
    Silly idea didn't pan out. I used a cleaned up copy of the good recording to create an FIR filter that passes components of the woman's voice. Then pushed white noise through the filter and modulated the output with the envelop of the bad recoding (after first filtering out the noise.) The spectrum looked good, but it just sounded like a bunch of noise bursts. Oh, well. It was fun to try. – JRE Sep 06 '15 at 12:51
  • @Brian: Interesting idea. Are there some good pointers except Wikipedia? – serv-inc Sep 26 '15 at 12:20
  • @user I have no experience with bandwidth extension, only know that the technique exists... I also know that there are some books and IEEE certainly has many papers – Brian Sep 26 '15 at 15:30
3

The noise sounds very stationary, so I guess spectral subtraction should work well. Note however that most implementations usually have quite a few parameters to tweak, and spectral subtraction can sound either very good or completely useless depending on whether the parameters are chosen well for the given problem. If you search for Matlab implementations you find several ones, among which is this one which you could try to get some idea of what spectral subtraction can do.

Matt L.
  • 89,963
  • 9
  • 79
  • 179
  • First attempts at using spectral subtraction (via the "Remove Noise" function of audacity) did not work well, leaving artifacts. Maybe I did not tweak the parameters enough. – serv-inc Sep 01 '15 at 15:48
2

On file: noisy_00_41_718_to_01_04_287.wav, I tried spectral subtraction and then some high-pass filtering to taste. You can download the snippet here.

There are definite artifacts, but I fear the source audio is too degraded. Noise aside, the speaker is very muffled and it is hard (esp. not speaking Schwiizertüütsch) to make out anything clearly. High pass filtering, brought out some detail, but again, not enough to be useful.

ruoho ruotsi
  • 1,770
  • 9
  • 10
0

Thank you very much, to all who provided solutions.

Summary

This is a summary of what has been proposed, with an example of what it does to the noisy soundfile when combined. If you like it, please do upvote the originals.

  1. (Kudos to @PeterK) The recording has a DC bias. You can see that most of the waves center below the 0:

    DC bias

    This can for example be removed via Audacity's "Normalize..." with default settings.

  2. (Kudos to @MattL) When this is normalised, spectral subtraction can be used. I got the same results as @ruohoruotsi when I tried it at first, leaving me to abandon that, but if you tweak (as suggested) for example the Sensitivity setting to 5dB, you get very little to no artifacts. The waveform looks different: enter image description here

  3. (Great Kudos to @JRE) This can be further filtered with a high pass as proposed to get some more of the noise out. Yet, as @JRE also said, there is very little information in the signal above 1000Hz. Thus, for example the section between 12 and 14 seconds remains very hard to hear.

You can listen to the result of all three approaches combined.

serv-inc
  • 95
  • 10