7

There is a question that was asked on stackoverflow that at first sounds simple but I think it's a lot harder than it sounds.

Suppose we have a stationary random process that generates a sequence of random variables x[i] where each individual random variable has a Bernoulli distribution with probability p, but the correlation between any two of the random variables x[m] and x[n] is α|m-n|.

How is it possible to generate such a process? The textbook examples of a Bernoulli process (right distribution, but independent variables) and a discrete-time IID Gaussian process passed through a low-pass filter (right correlation, but wrong distribution) are very simple by themselves, but cannot be combined in this way... can they? Or am I missing something obvious? If you take a Bernoulli process and pass it through a low-pass filter, you no longer have a discrete-valued process.

(I can't create tags, so please retag as appropriate... stochastic-process?)

Jason S
  • 653
  • you could try to take a process $x(t)$ like an Ornstein-Uhlenbeck, that has a correlation structure that decreases exponentially, and then define $B_n = 1_{x(n) > \alpha}$ where $\alpha$ is a well-chosen threshold - I have not done the computations, but I have the feeling that the correlation between these Bernoulli random variables also decreases exponentially.

    Do you really need the correlation to be equal to $\alpha^{|m-n|}$ ? Would an exponentially decreasing correlation be enough for your particular purpose ?

    – Alekk Mar 15 '10 at 14:59
  • thx for the suggestion... I'm posting this on behalf of someone else (see the link in the 1st sentence) so I do not know the stringency of their requirements. The problem seemed simple enough to state that I felt I could translate into a "proper" problem statement for mathoverflow. – Jason S Mar 15 '10 at 15:20
  • ...and I had kind of the same hunch (make a continuous-value process, then use a threshold to produce a binary-value output) but don't quite know how to go about characterizing the output process w/r/t correlation, other than an empirical calculation on the computer. – Jason S Mar 15 '10 at 15:56
  • By the way, the SO problem is not $\alpha^{|m-n|}$, but $c|m-n|^{-\alpha}$. – Douglas Zare Mar 15 '10 at 18:09
  • Yes, that was pointed out to me... but I am suspicious + wondering if the OP meant alpha ^ |m-n|. Using the c |m-n| ^ (-alpha) formula, correlation is undefined for m=n. – Jason S Mar 15 '10 at 19:15
  • Looks like the OP did in fact mean c |m-n| ^ (-alpha) for m != n. Oh well, this is still an interesting question to me. :-) – Jason S Mar 16 '10 at 14:05

4 Answers4

11

Here is a construction.

  • Let $\{Y_i\}$ be independent Bernouilli random variables with probability $p$.
  • Let $N(t)$ be a Poisson process chosen so that $P(N(1)=0)=\alpha$.
  • Let $X_i = Y_{N(i)}$.

In words, we have some radioactive decay which tells us when to flip a new (biased) coin. $X_n$ is the last coin flipped at time $n$. The correlation between $X_m$ and $X_n$ comes from the possibility that there are no decays between time $m$ and time $n$, which happens with probability $\alpha^{|m-n|}$.

The conditional correlation between $X_m$ and $x_n$ is $1$ if $N(m) = N(n)$, and $0$ if $N(m)\ne N(n)$, so $\text{Cor}(X_n,X_m) = P(N(m)=N(n)) = \alpha^{|m-n|}.$

You can simplify this by saying that $N(i) = \sum_{t=1}^i B_i$ where $\{B_i\}$ are independent Bernoulli random variables which are $0$ with probability $\alpha$.

Douglas Zare
  • 27,806
  • 1
    fascinating! I think I understand... thanks! – Jason S Mar 15 '10 at 19:13
  • Brilliant answer – David Bar Moshe Mar 16 '10 at 09:44
  • 1
    Phrasing it in terms of a Poisson process seems overly complicated; the properties of Poisson processes aren't actually used. Couldn't one just phrase it as follows?

    Let $$X_{i+1} = \begin{cases} X_i & \text{with probability }\alpha; \\ \text{a new Bernoulli trial independent of }X_i & \text{with probability }1-\alpha. \end{cases} $$

    – Michael Hardy Jun 02 '10 at 20:36
6

In other words:

Start with a random variable $X_0$ Bernoulli with parameter $p$, random variables $Y_n$ Bernoulli with parameter $\alpha$, random variables $Z_n$ Bernoulli with parameter $p$, and assume that all these are independent. Define recursively the sequence $(X_n)_{n\ge0}$ by setting $X_{n+1}=Y_nX_n+(1-Y_n)Z_n$ for every $n\ge0$.

Then $X_n$ and $X_{n+k}$ are conditionally correlated if and only if $Y_i=1$ for every $i$ from $n$ to $n+k-1$. This happens with probability $\alpha^k$, hence you are done.

This is Douglas Zare's idea, but with no Poisson process.

Did
  • 5,701
3

I suggest also to look a the paper: Generating spike-trains with specified correlations. By Jakob Macke, Philipp Berens, et al. (Max Planck Institute for Biological Cybernetics.).

Generating spike-trains with specified correlations

They also offer a Matlab Package for 'Sampling from multivariate correlated binary and poisson random variables' ... also available at Matlab central:

Sampling from multivariate correlated binary and poisson random variables

Also look at the page link

3

The above solution is very nice, but relies on the very special structure of the desired process. In a much more general framework, I think that one could use a perfect simulation algorithm as described in:

Processes with long memory: Regenerative construction and perfect simulation, Francis Comets, Roberto Fernández, and Pablo A. Ferrari, Ann. Appl. Probab. 12, Number 3 (2002), 921-943.