0

I am reading this paper here and I wanted to check my understanding regarding the salience function: $$ \hat{s}(\tau)=\sum_{m=1}^M g(\tau, m) \max_{k\in \kappa_{\tau, m}}\lvert Y(k)\rvert $$

Where the set $\kappa_{\tau, m}$ defines a range of frequency bins in the vicinity of the $m^{\rm th}$ overtone partial of the F0 candidate.

We want to find the maximum of $Y(k)$ for those k. But k is a frequency. And Y is, according to Klapuri the discrete STFT of the signal. I don't understand what the STFT on a given frequency is.

  • The STFT represents a 2D vector so $Y(k)$ must be a vector, right?
  • How do we find the max of list of vectors?
  • Is something in my understanding wrong?

So let's say we are testing the salience of the period 0.002. We start with $m=1$, and get $g(\tau, m)$. Then what?

Gilles
  • 3,386
  • 3
  • 21
  • 28
pavlos163
  • 213
  • 1
  • 3
  • 13

1 Answers1

2

You are correct that STFT is a function of 2 variables (usually x-axis is time and y-axis is frequency). The author is a bit careless in calling $Y(k)$ the STFT. What he really means is that $Y(k)$ is a fixed column (i.e. fixed time slice) of the STFT. The salience function is more carefully defined by the same author in this paper titled Multipitch analysis of polyphonic music and speech signals using an auditory mode Eq. 13, where he uses a subscript $t$ to denote a fixed time slice of the STFT.

Atul Ingle
  • 4,124
  • 1
  • 14
  • 25