Is this the correct way?
This may be technically correct but won't give you any useful information. The reasons for this are actually quite simple. The phase between two signals can be computed
accurately only if your signals contain pure sine tones. You basically have only 2 scenarios where computing the phase difference between 2 signals makes sense.
a) the 2 signals both contain the same frequency component but have
a phase shift between them
b) the 2 signals contain each a
different frequency component
For case #a, the phase difference will remain the same throughout the entire length of the signals, whereas in #b, the phase difference will change as you traverse the length of the signals (it's assumed that the frequency in signal #2 is not a multiple of the frequency in signal #1 and the signals don't start in phase)
When you have a signal that contains multiple frequency components, the phase calculated by an FFT routine (including librosa.stft, which computes dft on multiple signal frames) is pretty
much random (you can think of it as being the average of all frequency components contributing to a particular frequency bin, including noise i.e. undesirable components).
This means that computing the phase difference for the purpose of finding the true phase difference between 2 random signals is not possible and is completely meaningless.
Computing the phase difference (within a single signal) can still be useful in other applications where it can be used to more accurately estimate the "true" bin frequencies but here it's the relative phases that we're dealing with so it's different.
In conclusion, if your signals don't fit either case a# or case b#, you won't be able to do what you want and it doesn't even make sense to try to do that in the first place.
but pure STFT's phase still isn't "random, an FFT procedure does produce random phase in the sense it doesn't contain the true phase shifts of the original signal components being analyzed. Extracting the true phase even for a single component signal can be tricky (as explained here https://www.gaussianwaves.com/2015/11/interpreting-fft-results-obtaining-magnitude-and-phase-information/) and you can't realistically expect to extract the true phase shifts for a complex signal – dsp_user Jun 24 '21 at 09:56