5

The steps of computing the Mel-Frequency Cepstrum Coefficients (MFCC) are:

Frame blocking -> Windowing-> abs(DFT) -> Mel filter bank-> Sum coefficients for each filter-> Logarithm -> DCT

But what is the purpose of the logarithm step?

Morten
  • 333
  • 4
  • 13

1 Answers1

11

The logarithm serves to transform a multiplication into an addition. It is part of the computation of the cepstrum. The basic idea is as follows:

Assume a source signal $x$ is convolved by some impulse response $h$. The resulting magnitude spectrum is

$$|Y(\omega)| = |X(\omega)||H(\omega)|$$

By applying the logarithm we get

$$\log |Y(\omega)| = \log |X(\omega)| + \log |H(\omega)|\tag{1}$$

If we want to equalize / undo the effect of filtering by $H(\omega)$ we can hope that this task is easier if we transform the convolution into additive noise. This is exactly what happens by taking the logarithm in (1).

Matt L.
  • 89,963
  • 9
  • 79
  • 179
  • Hi Matt thanks for your answer. So the advantage is that I can remove impulse response h, as substraction? Should that substraction be performed after the DCT is computed? – Morten Apr 26 '13 at 11:23
  • Yes, it is usually performed after the DCT, i.e. on the cepstral coefficients. It is a standard method used in speech recognition front ends, called 'Cepstral Mean Normalisation (CMS)' (or 'mean subtraction'). – Matt L. Apr 26 '13 at 12:53
  • So DCT and mean subtraction are commutable. But if you do not perform mean subtraction, then log could (theoretically) be left out, or is there other usefull effects of it e.g. smoothing the data? – Morten Apr 27 '13 at 12:58
  • Yes, in principle they are commutable, but normally there are fewer DCT coefficients than filter bank outputs, so it's more efficient to do CMS after the DCT. The function of the log operation is also to compress the data in a way similar to the human auditory system. – Matt L. Apr 27 '13 at 13:38
  • @MattL. Hi. I am trying to follow your answer. As I understand, Cepstrum let us remove impulse response from the resulted signal by subtraction so we can have $x$. Can you please elaborate on CMS a little bit. I don't follow why CMS is better to apply after DCT. When you say mean, what is this mean calculated from? MFCC has more than one coefficient. Is it mean of each coefficient along the time? I am struggling to understand it. Thanks. – Celdor Mar 10 '15 at 11:29
  • @ZikO: The mean is computed over time (i.e. over several frames), and there is one mean value per DCT coefficient. The assumption is that the channel is stationary, so by subtracting the mean you can eliminate the influence of the channel. The mean could also be subtracted before the DCT, but since you normally have fewer DCT coefficients than filterbank outputs, that would be less efficient. – Matt L. Mar 10 '15 at 19:35
  • @MattL. OK I've got this part. Just one question regarding the mean. I am calculating mean directly from numbers, aren't I? These are log numbers. Usually it is expected to take anti log first, calculate mean and calculate to log again. I hope it doesn't sound silly :) – Celdor Mar 11 '15 at 14:30
  • 1
    @ZikO: You need the mean of the log-values. That's the whole idea, because only in the log-domain does the channel become an additive stationary noise component, which can be removed by mean subtraction. – Matt L. Mar 11 '15 at 14:32