4

I experimented with the PageRank algorithm. When the number of pages is large, I encountered a situation when one formula for re-normalizing a vector (so that sum of its components is equal to 1; elements of the vector are guaranteed to be positive) works fine, while another stops working and using it causes that the iterations start to diverge (difference between old and new r starts growing bigger) after a while.

    r = r./sum(r)                  # this does not work
    r = r + (1-sum(r))/N * ones(N) # this works

My PageRank algorighm looks like this (Julia)

M # M is NxN sparse transition matrix;
  # may contain all-zero columns (sinks); 
beta = 0.8
epsilon = 1e-6

r = 1/N ./ ones(N)
rtm1 = ones(N)
while(sum(abs(r-rtm1)) > epsilon)
  rtm1 = r
  r = beta*M*r
  ######
  # renormalization
  # pick one or the other
  # r = r./sum(r)
  # r = r + (1-sum(r))/N * ones(N)
  ######
end
# now r holds the result

I am guessing that for large number of pages the elements of r can get very small (like 1.0e-7 or smaller) and the first normalization formula then does not work very well. I would like to hear an explanation why is that from somebody who has some experience with numeric computations.

user7610
  • 143
  • 5

1 Answers1

7

The two normalization formulas result in two different algorithms, that they both "normalize" a vector is not so relevant.

As an example, consider the following transition matrix: $$M = \begin{pmatrix}0&1\\1&0\end{pmatrix}.$$ Starting with $r=(1,0)$, consecutive normalized vectors $r$ will be $(0,1)$ and $(1,0)$, so the algorithm clearly will not ever converge. In this case the transition matrix is not regular, and this is not a numerical issue.

With the second formula, $r$ will instead be $(\frac{1-\beta}{2},\frac{1+\beta}{2})$ after first iteration, and $$ \tfrac12(1+(-\beta)^k, 1-(-\beta)^k) $$ after $k$ iterations. So this is computing a fixed point of a completely different function, and it clearly converges ($\beta<1$) where the first algorithm does not.

This behaviour is much easier to understand if you think of it in terms of a random walk on the underlying graph; this $M$ corresponds to the graph $1\rightleftharpoons2$. The first algorithm's random walk simply walks along the graph: the probability distribution of this walk is not guaranteed to converge to the stationary distribution. The second algorithm walks along the graph with probability $\beta$, and jumps to a random node with probability $1-\beta$: this walk is guaranteed to converge to a stationary distribution. The key property here is that some power of $M$ must have all positive entries for the first algorithm to converge.

Kirill
  • 11,438
  • 2
  • 27
  • 51