4

The question might be very simple, but I get a strange result from Kalman Filter. Let us consider the simplest state-space model, the random walk plus noise: $$ y_{t} = x_{t} + \varepsilon_{t}\\ x_{t} = x_{t-1} + \mu_{t-1} $$ Let us generate $N_{data} = 3000$ data points and with parameters: $Var(\mu_{t}) = 1$ (which is transition variance) and $Var(\varepsilon_{t}) = 1$ (which is observation variance).

Using Kalman EM in Python, I try to estimate the transition and observation variance.

First, I would like to estimate "observation_covariance" and I assume that "transition_covariance" is known:

kf = KalmanFilter(
    transition_matrices=[1.], observation_matrices=Data['x'],
    transition_covariance=[1.], initial_state_mean=3,
    initial_state_covariance=[1.],
    em_vars=['observation_covariance']
)

kf = kf.em(Data['y'])

the result is $0.79$.

Second, I do the opposite, i.e. I estimate "transition_covariance" and I assume that "observation_covariance" is known:

kf = KalmanFilter(
    transition_matrices=[1.], observation_matrices=Data['x'],
    observation_covariance=[1.], initial_state_mean=3,
    initial_state_covariance=[1.],
    em_vars=['transition_covariance']
)

kf = kf.em(Data['y'])

the result is $0.001$, which is far from $1$!

Next, I assume that both "transition_covariance" and "observation_covariance" are unknown:

kf = KalmanFilter(
    transition_matrices=[1.], observation_matrices=Data['x'],
    initial_state_mean=3, initial_state_covariance=[1.],
    em_vars=['transition_covariance', 'observation_covariance']
)

kf = kf.em(Data['y'])

the result is "observation_covariance" = 0.77 and "transition_covariance" = 0.0002.

What do I do wrong? Shouldn't those estimated variances be close to $1$?

The all code I use is below:

#%%
import pandas as pd
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
import pylab as pl
from pykalman import KalmanFilter

N_data = 3000
Data = pd.DataFrame(columns=['NoiseTrs', 'NoiseObs', 'x', 'y'], index=range(N_data))
Data['NoiseTrs'] = np.random.normal(loc=0.0, scale=1.0, size=N_data)
Data['NoiseObs'] = np.random.normal(loc=0.0, scale=1.0, size=N_data)

# random walk

x_f_v = 3 # first value for the simulation
for i in range(N_data):
    if i == 0:
        Data.loc[i, 'x'] = x_f_v
    else:
        Data.loc[i, 'x'] = Data.loc[i - 1, 'x'] + Data.loc[i, 'NoiseTrs']

# plt.plot(Data['x'])
# plt.show()

Data['y'] = Data['x'] + Data['NoiseObs']

# plt.plot(Data[['x', 'y']])
# plt.show()

F = [1.]                                               # transition matrix
H = Data['x'].values.reshape(N_data,1,1).astype(float) # observation matrix
Q = [1.]                                               # transition noise
R = [1.]                                               # observation noise

#%%
init_state_mean = [3.]
init_state_cov = [1.]

kf = KalmanFilter(
    transition_matrices=F,
    observation_matrices=H,
    # transition_covariance=Q,
    # observation_covariance=R,
    initial_state_mean=init_state_mean,
    initial_state_covariance=init_state_cov,
    em_vars=['transition_covariance', 'observation_covariance']
    # em_vars=['observation_covariance']
    # em_vars=['transition_covariance']
)

kf = kf.em(Data['y'].values.astype(float))

print("observation_covariance R :", kf.observation_covariance)
print("transition_covariance Q :", kf.transition_covariance)



```
Royi
  • 19,608
  • 4
  • 197
  • 238
ABK
  • 171
  • 5
  • I'm not familar with python enough to understand your code. My experience is that usually the variance estimates are estimated using a prediction error likelihood decomposition. Then, given those estimates, the KF is run. But maybe python is able to do both at the same time ? See Harvey's blue book ( structural models and the kf ) for what I mean by prediction error likelihood decomposition. Also the Data.loc vector update is not consistent time lag wise with what you wrote for the random walk + noise ( i and i -1 instead of i-1 and i-1) and but that shouldn't make a big difference. – mark leeds Oct 29 '19 at 14:07
  • Also, if you're familiar with R, there's a very nice package called the DLM package ( it uses notation consistent with the Harrison and West yellow book ) that you could use to see if you do get what you'd expect. – mark leeds Oct 29 '19 at 14:11
  • thank you! I am totally confused, because, from what I understand, the EM is a consistent algorithm, so estimated numbers shouldn't be so far away. My guess is that in this example the errors just sum up, which makes the variances to be undistinguishable. Though, even when I try with a very small observation covariance, the estimate of transition covariance is still far away. – ABK Oct 29 '19 at 14:15
  • Hi: you didn't say what EM was ? Is that the EM algorithm you're referring to ? If so, there's a MUCH, MUCH easier way to estimate them. If you have Harvey's blue book, he explains it quite clearly in there. It's a dense book with a lot of info but I highly recommend it for the time series viewpoint of the KF. EM is overkill for estimating the variances as long as you're willing to estimate them first and then run KF. I'm still unsure what kf.em does but it doesn't sound necessary. – mark leeds Oct 29 '19 at 20:10
  • Read below and just want to emphasize that, atleast for random walk plus noise, it's not a hard problem as long as you do the estimation first and then, given those estimated variances, run the standard KF. If you can't get your hands on Harvey, maybe google for prediction error decomposition. I don't know of another text besides Harvey's that describes it. – mark leeds Oct 29 '19 at 20:12
  • If you can't get the book, this isn't bad but the book is better. http://www.stat.yale.edu/~lc436/papers/Harvey_Peters1990.pdf Also, do you know how to refer to a link in a comment so that you point to the actual link instead of the paper itself. I know how to do that in answers but not in comments. thanks. – mark leeds Oct 29 '19 at 20:16
  • Dear mark leeds, of course, there are other methods to deal with this simple problem. The idea is to investigate Kalman EM in python. – ABK Oct 30 '19 at 15:26
  • ABK: That's fine if you want to stick with EM. It's none of my business. I'm just pointing out that EM is extreme overkill for estimating the variances in a random walk plus noise model. EM also has many pitfalls and complexities. The original paper on using EM in the KF is by Stoffer but I don't know the name of it. Good luck. – mark leeds Oct 30 '19 at 15:48
  • In case you're interested, I think that this is the original. https://www.stat.pitt.edu/stoffer/dss_files/em.pdf – mark leeds Oct 30 '19 at 15:57
  • I think if you are trying to estimate both the measurement and state covariances, the solution to the problem is not unique. In the simple scalar KF you can see that the KF takes a weighted average of the state prediction and the current measurement - the weights are determined by the covariances - thus if you scaled the measurement and state covariances they would still be consistent. – David Jun 27 '22 at 18:00

2 Answers2

2

Actually, I have found something. Indeed, it is a complicated problem to estimate transition and observation covariances.

In Wikedia it says:

https://en.wikipedia.org/wiki/Kalman_filter#Estimation_of_the_noise_covariances_Qk_and_Rk

Practical implementation of the Kalman Filter is often difficult due to the difficulty of getting a good estimate of the noise covariance matrices $Q_{k}$ and $R_{k}$. Extensive research has been done in this field to estimate these covariances from data. One practical approach to do this is the autocovariance least-squares (ALS) technique that uses the time-lagged autocovariances of routine operating data to estimate the covariances. The GNU Octave and Matlab code used to calculate the noise covariance matrices using the ALS technique is available online under the GNU General Public License.

Also, similar things are discussed here:

Kalman Filter Covariance

and here

https://quant.stackexchange.com/questions/8501/how-to-tune-kalman-filters-parameter

ABK
  • 171
  • 5
  • I think is referring to something different than the original question. In order to run a KF the user needs to supply the state and measurement covariances. That is different than estimating them from data. – David Jun 27 '22 at 17:57
2

I think there is a bug in your code. In KalmanFilter(), observation_matrices=H is probably not what we want. By the reference in https://pykalman.github.io/#pykalman.KalmanFilter, observation_matrices means coefficents $C$ in the observation equation, or $1.0$ in our case. It's not the observation data y. Therefore, set observation_matrices=[1.0] should fix the problem. I obtained

observation_covariance R : [[1.03979567]], transition_covariance Q : [[0.98697125]]

Hans
  • 121
  • 2
  • 1
    +1 Yes, good catch. The code looks like it's using the noiseless data for $C$ instead of just $1.0$ as you say. – Peter K. Jun 27 '22 at 15:48