11

I have one particular question on Gaussian processes.

A Gaussian process is fully characterized by $\mu$ and $\Sigma$. However, I do not understand how can we sample a (random) function from the so defined Gaussian process.

For example, in a tutorial here, how can we generate the figures in page 6?

Mou
  • 341

1 Answers1

9

One of the usual procedures for sampling from a multivariate Gaussian distribution is as follows.

Let $\boldsymbol{X}$ have a $n$-dimensional Gaussian distribution $N(\mu,\Sigma)$. We wish to generate a sample from $\boldsymbol{X}$.

  1. First off, you need to find a matrix $A$, such that $\Sigma = AA^T$. This is possible by something called Cholesky decomposition, and you call $A$ the Cholesky factor of $\Sigma$.

  2. Generate a vector $\boldsymbol{Z}=(Z_1,\ldots,Z_n)^T$ of independent, standard normal variables.

  3. Let $\boldsymbol{X} = \mu + A\boldsymbol{Z}$.

The $\boldsymbol{X}$ in step 3 will be the sample you are looking for.

About the Cholesky factor: I don't know which program you use for simulating, but almost all statistical programs have finding the Cholesky factor as a build-in function, e.g. in both R and Matlab, you can use the chol() function, and SAS also has a routine for finding the $A$ matrix.

Remark: It is a prerequisite for this method to work that $\Sigma$ is positive definite (instead of just positive semi-definite), but should you happen upon a degenerate case, you can reduce the dimension of your problem to a smaller space, since the distribution corresponding to a degenerate $\Sigma$ will also be degenerate.

Mankind
  • 13,170
  • 1
    Thank you so much for replying. What we need to do to generate a (random) function like the beautiful figures is that we first take discrete points in an inteval, and then sample, and then interpolate. Am I right? – Mou Apr 03 '15 at 15:44
  • @Mou yes, agreed – Mankind Apr 03 '15 at 16:35
  • Great! Thanks again. – Mou Apr 03 '15 at 16:50
  • now I have a further question. I think intuitively, interpolating is only valid when the kernel is smooth, that is to say, x and y are strongly correlated (almost surely the same) if x and y are close enough. So is it true? or do we have “non-smooth” kernels for a GP? – Mou Apr 04 '15 at 07:29
  • 1
    @Mou I suppose that there are no theoretical problems with a non-differentiable or even discontinuous kernel - except when you use a kernel which is not nice in some sense, then this has an immediate effect on your covariance structure. Choosing for instance a discontinuous kernel will cause these crazy discrete jumps in the covariance structure, which I don't imagine are very useful. So this is a vague answer from my part. :) You can have "non-smooth" kernels, but you better make sure that they do what you intended. – Mankind Apr 04 '15 at 21:56
  • Just a comment: you can have a perfectly smooth kernel (covariance) and have samples that are continuous but nowhere differentiable with probability 1, e.g. the Orstein-Uhlenbeck exponential kernel $K(x,y) = \exp{(-|x-y|/\xi)}$ – lcv Apr 19 '18 at 03:08
  • I suppose this is a separate question, but what is the physical significance of taking the Cholesky of a covariance matrix? Sure, this decomposition results in finding the covariance matrix is the product of a lower triangular matrix and its conjugate mathematically. But what is the physical significance of this lower triangular matrix? Why does multiplying it by a vector of independent, standard normal variables result in a 'sample' of the Gaussian process? Why would multiplying the original covariance matrix by this vector not work? – Mathews24 Jul 02 '19 at 03:18
  • This is the best description I have found, but simply seeking additional intuition. – Mathews24 Jul 02 '19 at 03:27