4

I am doing some simulations for simple random walks on directed random graphs. From a graph of n vertices, I get a n by n transition probability matrix transit, with 2 n non-zero entries. I would like to compute the stationary distribution pi (a vector of n non-negative entries which sum to one), such that pi * transit == pi.

Currently, I found two ways to do this. The first is to use NullSpace.

eigenVector=First[NullSpace[N[transit]-IdentityMatrix[n]]];
pi=eigenVector/Total[eigenVector]

The second is to define a Markov Process directly on the underline graph g and use StationaryDistribution as follows.

n=VertexCount[g];
mp=DiscreteMarkovProcess[1,g];
stationary=StationaryDistribution[mp];
pi=NProbability[x==#,x\[Distributed]stationary]&/@Range[n]

Both methods can handle about n = 2000 on my laptop. But I would really like to compute this for a bit higher n. Any suggestions?


Update: For example g and transit with 1000, 2000, 3000 and 4000 nodes, see this link.

NonalcoholicBeer
  • 1,453
  • 7
  • 11

2 Answers2

7

You may run into memory problems because you are constructing a full matrix with IdentityMatrix[n] instead of a sparse one with IdentityMatrix[n, SparseArray]. By using a sparse identity matrix you can do 4000x4000 in about 8 seconds, without excessive memory use:

Dimensions[transit]
(*    {4000, 4000}    *)

eigenVector = First[NullSpace[ N[transit - IdentityMatrix[Dimensions[transit], SparseArray]]]]; // AbsoluteTiming // First (* 8.32465 *)

Just make sure you keep all matrices sparse and never construct a full ("normal") matrix.

Roman
  • 47,322
  • 2
  • 55
  • 121
  • Hi, thanks for pointing this out. With this improvement, I can now deal with 4000x4000 matrix, though with as much as 60 seconds time. I guess the big time difference is because there is a of problem with the examples I provided. I have updated them now. Can you try the 4000x4000 one again? – NonalcoholicBeer Nov 14 '20 at 12:42
  • 1
    Even after your update it still takes 8 seconds on my laptop. Maybe your computer is just slow. – Roman Nov 14 '20 at 14:43
  • Yeah, it could just be my laptop is bit slow. Thanks. I am happy with the current performance anyway. – NonalcoholicBeer Nov 14 '20 at 14:58
6

edit: it's even simpler since this is a discrete-time Markov chain

I didn't know Roman's trick with SparseArray. Here's another trick: use Method->"Arnoldi", asking for only the eigenvector corresponding to the largest eigenvalue (approx. one) using 1, which is the stationary distribution.

evNull=First[NullSpace[N[transit]-IdentityMatrix[4000,SparseArray]]];//AbsoluteTiming
(* 15.6608 -- I must need a faster computer! *)

evArnoldi=Eigenvectors[N@transit,1,Method->{"Arnoldi"}][[1]];//AbsoluteTiming (* 0.024535 *)

So there's another couple of orders-of-magnitude speedup for you. I'm not patient enough to wait for StationaryDistribution to finish!

The eigenvectors look the same when plotted BTW.

Chris K
  • 20,207
  • 3
  • 39
  • 74
  • 2
    In this case it would be advantageous to point the Arnoldi algorithm towards the "right" eigenvalue by using a Shift operation: see this answer. In the present example you got lucky to catch the right eigenvalue. – Roman Nov 14 '20 at 16:46
  • 1
    @Roman True in general, but in this case I don't think it's luck, because all eigenvalues should be nonpositive except the one associated with the stationary distribution, which should be zero (conservation of probability). See https://cims.nyu.edu/~holmes/teaching/asa19/handout_Lecture4_2019.pdf for that last point. At least I hope this is true, because we're using this technique in a paper now :) – Chris K Nov 14 '20 at 17:17
  • I should add, I'm a bit unclear on OP's actual problem, whether it is a continuous- or discrete-time process and the role of the -IdentityMatrix[n] part here. – Chris K Nov 14 '20 at 17:32
  • Agreed, you're right that if the matrix is constructed correctly, it will have the right spectral property. – Roman Nov 14 '20 at 17:35
  • 1
    @Roman Upon closer reading of the problem, it's clear that it's a discrete-time Markov chain, so the solution is even easier (and doesn't require IdentityMatrix at all). – Chris K Nov 14 '20 at 17:40
  • @ChrisK Yes. This is very fast. Thanks. – NonalcoholicBeer Nov 14 '20 at 18:07