Google Pagerank and Reduced-Order Modelling
Google Pagerank and Reduced-Order Modelling
the degree of
BACHELOR OF SCIENCE
in
APPLIED MATHEMATICS
by
HUGO DE LOOIJ
Delft, Netherlands
June 2013
HUGO DE LOOIJ
Thesis advisor
2 Preliminaries 8
5 Numerical Experiments 27
5.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Convergence criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8 References 44
A Appendix 45
A.1 Constructing the hyperlink/Google matrices . . . . . . . . . . . . . . . . . . . 45
A.1.1 surfer.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
A.1.2 loadCaliforniaMatrix.m . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.1.3 loadStanfordMatrix.m . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.1.4 loadSBMatrix.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.1.5 DanglingNodeVector.m . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.1.6 GoogleMatrix.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.2 Numerical Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.2.1 PowerMethod.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.2.2 JacobiMethodS.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.2.3 JacobiMethodH.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5
A.2.4 OptimizedJacobiMethodH.m . . . . . . . . . . . . . . . . . . . . . . . 50
A.2.5 OptimalBeta.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.2.6 Compare.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
A.3 Numerical Methods for the expected PageRank vector . . . . . . . . . . . . . 53
A.3.1 OptimizedPowerMethod.m . . . . . . . . . . . . . . . . . . . . . . . . 53
A.3.2 Arnoldi.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
A.3.3 ReducedOrderMethod.m . . . . . . . . . . . . . . . . . . . . . . . . . . 54
A.3.4 Compare2.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
A.3.5 PlotResidual.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6
1 Introduction
The world wide web consists of almost 50 billion web pages. From this chaos of websites,
users would like to visit a couple of sites about a certain subject. But how could a user
possibly know which of these billion web pages he should visit?
This is where search engines come into play. Not only do search engines filter the world wide
web for a specific search query, they also sort the internet by importance. In this way, search
engines could suggest a user to visit certain web pages. However, how can a web page be
rated by importance? Since the world wide web is so huge, this cannot be done manually.
Hence algorithms need to be developed so this job can be done automatically.
Google is the first search engine that managed to do this effectively. The secret behind
Google is the PageRank method. This method is developed in 1996 by the founders of
Google, Larry Page and Sergey Brin. Every page on the world wide web is given a score of
importance (also called the PageRank value). Whenever a user is making a search request,
Google sorts all web pages which satisfy the search requests by their PageRank value. The
user will receive these search results in this order.
The Google PageRank method is a completely objective way to rate a web page. It is based
on a mathematical model which only uses the structure of the internet. In this paper, this
method is discussed. We will see that the PageRank vector (the vector containing each
PageRank value) is defined as an eigenvector of the so-called Google matrix.
The Google matrix, however, is extremely large; it is almost 50 billion by 50 billion in
size. Therefore, calculating an eigenvector is far from an easy task. In chapter 3 we will
develop and discuss numerical methods to approximate this eigenvector as fast as possible.
Then these numerical methods are put to the test, using multiple small Google matrices,
corresponding to multiple subsets of web pages in the world wide web. The tests and results
can be found in chapter 4.
In the last chapter we will make an adjustment to the model. Previously, one of the pa-
rameters in the model is considered to be a deterministic value. Things get much more
interesting if this parameter is considered to be stochastic. The PageRank vector depends
on the value of this parameter, and our goal will be to calculate the expected PageRank
vector. We will develop an algorithm which uses the shift-invariance of the Krylov space of
the hyperlink matrix. The expected PageRank can then be approximated using this algo-
rithm. This reduced-order algorithm will turn out to be far more efficient than the ‘standard’
approximations.
It is worth noting that the whole theory in this paper is based on the original paper of Google
PageRank by Brin and Page in 1998, see [1]. It is almost certain that Google has further
developed this model. The current model is kept secret, and thus the theory and results
might not be completely up to date. It has been stated by Google, however, that the current
algorithm used by Google is still based on the original PageRank model.
7
2 Preliminaries
Thus,
Pn every eigenvalue of A is contained in the union of disks with center aii and radius
j=1,j6=i |aij |.
8
Theorem 6 (Geometric series for matrices). Suppose A is an n × P n matrix such that
its spectral radius ρ(A) < 1 (i.e. |λ| < 1 for all eigenvalues λ of A). Then ∞ n
n=0 A converges
and
X∞
−1
(I − A) = An
n=0
For the PageRank model, we will also discuss Markov chains. The transition matrix of a
Markov chain is a (sub)-stochastic matrix, and it is important to look at some properties of
these matrices.
Definition 7. Let A be an n × n sub-stochastic matrix. A is said to be irreducible if for
each (i, j) there exists some n ∈ N such that (An )ij > 0. If this property does not hold, A
is called reducible.
9
3 The PageRank Method
The PageRank method is an algorithm designed by the founders of Google, Sergey Brin and
Larry Page. This algorithm is the main reason why Google turned out to be the most popular
search engine. Every page on the internet is given a score of importance. The larger this
number, the earlier this page can be found in the search results. In the following paragraphs
we will discuss how this score is being determined.
PageRank is based on the hyperlink structure of the world wide web. We can view the
internet as a directed graph. The nodes should be interpreted as the web pages of the world
wide web and an arc from node i to node j should be seen as a hyperlink on webpage i to
webpage j. See figure 1 for a small example. We will use this example to illustrate a couple
of definitions.
The goal of Brin and Page, the founders of Google, was to construct a method to rank each
page on the internet. They invented the PageRank method, which uses only the structure
of the internet. If many different webpages all link to the same page (such as node 2 in our
example), it makes sense to view this page as important.
However, the amount of links to a certain page is very easily manipulated and does not give
a clear idea how important a page should be. Brin and Page were the first to also look at the
quality of an outgoing link. If an important page links to an external webpage, this webpage
should be marked as important as well. This idea, that except for the quantity of incoming
links the quality is relevant for the PageRank value of a webpage, is the main reason why
Google PageRank is such a successful model.
The PageRank method is based on a random surfer. Suppose we are surfing on the internet,
clicking links randomly. This can be interpreted as a Markov chain where the transition
probability pij is uniformly distributed between the set of pages that i links to. We repeat
this process an infinite amount of times. The fraction of the time the surfers visits a certain
website can be seen as the PageRank of this site. Clearly, this fraction is greater for a certain
webpage if more pages link to this webpage. In our example (see figure 1) node 2 will have a
greater PageRank. However, every time the random surfer visits this node, it will also visit
10
node 1. Hence node 1 will have a good PageRank value as well, even though it has only 1
incoming link. This illustrates why the quality of links also matters.
These fractions - if they exist - form a stationary distribution corresponding to the Markov
chain of the random surfer. Denote this distribution by p. We know from Markov theory
that this distribution satisfies the linear system pT = pT A, with A the transition matrix of
the Markov chain. This equality allows us to define the PageRank vector in a more rigorous
way using only linear algebra. Because a stationary distribution does not need to exist nor
does it need to be unique, some modifications to the model need to be made before the
PageRank vector can be defined.
3.3 Ranking
• pi ≥ 0 for all 1 ≤ i ≤ n.
n
X
• pi = 1.
i=1
Here I(i) is the collection of pages which link to node i and |U (j)| is the amount of outer
links on the same page. This is well defined, because if U (j) = 0, page j does not link to
any other page, hence j ∈ / I(i).
p1
= p2
= 12 p1 + p3 + 12 p4 + p5
p2
p3
= 0
p= 12 p1
4
= 12 p4
p5
A solution of this system such that 5i=1 pi = 1 and pj ≥ 0 for all 1 ≤ j ≤ 5 is the vector
P
4/11
4/11
p=
0
2/11
1/11
One can easily show that forP any other solution q of this linear system, we have q = kp for
some k ∈ R. The property 5i=1 pi = 1 guarantees the uniqueness of the ranking p. The
larger the number pi , the more important page i. In our example we should see page 1 and
2 as the upper results, then following page 4, 5, and 3 respectively.
11
3.4 The hyperlink matrix H
We have seen that p is a ranking if it satisfies a linear system. Therefore, we will construct a
matrix such that we can make the definition more convenient. Define the (Boolean) variable
Lij by Lij = 1 if page i links to page j, and Lij = 0 otherwise. Let n be the amount of web
pages.
Definition 11. The out degree of a page i is defined by ci = nj=1 Lij .
P
The out degree of a page is simply the amount of pages that it links to. In the world wide
web, this number is usually 0 to 20 but can vary greatly. As we will see, the case ci = 0 will
require some caution.
Definition 12. A node i is called a dangling node if its out degree ci = 0.
Dangling nodes are those pages that do not link to any other pages. See figure 2. In this
example, page 6 is a dangling node. The world wide web consists mainly of dangling nodes:
images, scripts, pdf files and other common files found on the internet often do not link to
any other page. We will see in the next paragraph that dangling nodes require us to make
an adjustment to the model of the random surfer.
Definition 13. We define the n × n hyperlink matrix H by
1 if c 6= 0 and L = 1
i ij
Hij = ci
0 otherwise
12
The following matrix is the hyperlink matrix corresponding to the web of this figure. Note
that page 6 is a dangling node − in the matrix H, this corresponds to a zero row.
0 1/2 1/2 0 0 0
0 0 1/2 0 1/2 0
0 1 0 0 0 0
H= 0 0
0 0 0 1
0 1/3 1/3 0 0 1/3
0 0 0 0 0 0
Theorem 14. Let p be a ranking. let H be the corresponding hyperlink matrix. Then
pT H = pT .
Proof: For the ith component of pT we have
n
X 1 X 1 X X
(pT )i = pi = pj = pj = Hji pj = Hji pj = (H T p)i = (pT H)i
|U (j)| cj
j∈I(i) j∈I(i) j∈I(i) j=1
Note that by taking the transpose of this equality, we get the following more convenient
equality
HT p = p
• Generally, H is reducible. This is because the world wide web most likely consists of
multiple disjoint subsets.
• Generally, H is periodic. There only needs to be one node with period larger or equal
to 2. This can happen if two web pages link to each other but to no other node.
• H is a very sparse matrix. The world wide web consists of multiple billions of pages,
but every page links to only a very few other pages. So most components Hij are
equal to zero. An advantage of this is that it requires much less memory to store and
calculate this matrix. Also, computations with this matrix are much faster. We will
use these properties of H to minimise computation time and memory storage issues
for the methods to calculate the PageRank value.
13
3.5 The stochastic matrix S
We can also denote the second property by eT v = 1. Here e is the n × 1 uniform vector:
every element of e is equal to 1. This notation will turn out to be more convenient in the
proofs.
Just like the name suggest, the personalisation vector can be different for different users. A
component vi for a specific person should be seen as the probability that (s)he goes to page
i. If a person very commonly searches for information about sport, then vi will generally be
larger for a page which is about sport. The personalisation vector can therefore be chosen
to adapt the PageRank vector for the interests of a person.
Definition 16. The n × 1 dangling node vector d is given by
(
1 if ci = 0
di =
0 if ci 6= 0
S = H + dvT
The matrix S is essentially equal to H, but every zero row is replaced by the personalisation
vector. The interpretation for this is that whenever the random surfer visits a dangling node,
it will visit another web page randomly according to the personalisation (probablity) vector.
Usually, we suppose that v = e/n, i.e. the transition probability pij is uniformly distributed
if node i is a dangling node. This is the ‘democratic’ personalisation vector. In this case,
our example in figure 2 yields the stochastic matrix
0 1/2 1/2 0 0 0
0 0 1/2 0 1/2 0
0 1 0 0 0 0
S=
0 0 0 0 0 1
0 1/3 1/3 0 0 1/3
1/6 1/6 1/6 1/6 1/6 1/6
14
We will consider a couple of important properties of S.
• The matrix S is stochastic; every element of S is non-negative and the sum of every
row of S is exactly equal to 1. This is easy to see:
1. If ci 6= 0, then di = 0 and nj=1 Hij = 1. It follows that nj=1 Sij = nj=1 Hij +
P P P
0 = 1.
Pn
2. If ci = 0, then di = 1 and j=1 H
Pij = 0. SinceP v is a personalisation vector, we
have nj=1 vj = 1. It follows that nj=1 Sij = nj=1 Hij +di nj=1 vj = 0+1·1 = 1.
P P
The last step of Brin and Page was to force the matrix to become irreducible and aperiodic.
This guarantees the uniqueness of the PageRank vector. We introduce the possibility of
teleportation for the random surfer model.
Definition 18. The teleportation parameter α is a the probability that the random
surfer will follow a link. We require that 0 ≤ α < 1.
This parameter α represents the probability that a link is being clicked. The random surfer
will with a probability of 1 − α go to a random page on the internet according to the
distribution of the personalisation vector. This kind of teleportation happens when, for
example, a user enters a different link in the address bar.
For small values of α, the random surfer will almost always teleport to another page. There-
fore, the random surfer will rarely click links and thus the structure of the web will not affect
the PageRank vector much. If, on the other hand, α is close to one, the numerical methods
to approximate the PageRank vector will converge very slowly. We will see this in chapter
4. Therefore, Google[1] used α = 0.85 for their model. In chapter 5 we will discuss what will
happen if α is considered to be a stochastic variable instead of a deterministic value.
Definition 19. The Google matrix based on the personalisation vector v and teleportation
parameter α is defined as
G = αS + (1 − α)evT
Thus, this matrix corresponds to the transition matrix of the Markov chain with teleporta-
tion. This matrix has some very important properties.
• G is stochastic. This follows from the fact that both S and evT are stochastic. G is
a convex combination of the two and therefore is stochastic as well.
15
• The matrix G is nowhere sparse. Because v > 0 and α < 1, it follows that for any
element Gij we have Gij ≥ (1 − α)vj > 0. This is a very bad property, but as we
will see, this will not cause any problems. One does not need to store G and the
matrix-vector products with G can be computed very efficiently using definition 19.
• The matrix G is irreducible. This follows from the fact that each element Gij is larger
than zero. For each combination of nodes i and j, the probablity that the surfer will
visit node j immediately after leaving node i is at least equal to (1 − α)vj > 0.
• G is aperiodic. This is implied by the fact that Gii > 0 for all nodes i.
• Therefore, G is primitive.
3.7 PageRank
The matrix G is stochastic and primitive. Hence, we will define the PageRank vector as the
unique stationary distribution of the corresponding Markov chain.
Definition 20. (PageRank). The PageRank vector π is the unique vector which satisfies:
• GT π = π,
Pn T
• i=1 πi = 1. We can also write this as π e = 1.
To prove that this vector is uniquely defined, we will show that 1 is an eigenvalue of GT with
an algebraic multiplicity of one. For this, we will use the Perron-Frobenius theorem. See [5]
for a proof of this theorem.
Theorem 21 (Perron-Frobenius). Let A be an n × n primitive matrix. Then there
exists an eigenvalue λ which is strictly greater in absolute value than all other eigenvalues.
Furthermore, all elements of an eigenvector corresponding to this eigenvalue have the same
sign.
We will prove a few lemma’s. These lemma’s will be used to prove some properties of the
spectrum of the matrix GT . Furthermore, these lemma’s will be reused in chapter 3 to prove
some other very important theorems.
Lemma 22. For each λ ∈ C with |λ| > α, the matrix λI − αS T is invertible.
Proof: Let λ ∈ C such that |λ| > α. We will use Gershgorins Circle theorem to show that 0
is not an eigenvalue of M := λI − αS, which implies that M is invertible. Let 1 ≤ i ≤ n be
arbitrary. Note that the diagonal element Mii is equal to Mii = λ − αSii . Furthermore, we
have
X X n
X
|Mij | = αSij = α Sij − Sii = α(1 − Sii ) = α − αSii
j6=i j6=i j=1
16
P
In other words, 0 does not lie in the disk with center Mii and radius j6=i |Mij |. Since i was
arbitrary, the union of these disks does not contain 0. Hence, by Gershgorin’s Theorem, zero
is not an eigenvalue of M . Finally, by Lemma 2, 0 is not an eigenvalue of λI − αS T either,
which proves the Lemma.
λp = GT p = (αS + (1 − α)evT )T p
= (αS T + (1 − α)veT )p = αS T p + (1 − α)v(eT p)
=⇒ (λI − αS )p = (1 − α)(eT p)v
T
Note that the matrix (λI − αS T ) is invertible by Lemma 22, since we have assumed that
|λ| > α. This yields
p = (1 − α)(eT p)(λI − αS T )−1 v
which completes the proof.
Lemma 24. Suppose λ is an eigenvalue of GT and |λ| > α. Then (1−α)vT (λI −αS)−1 e = 1.
Proof: Let p be an eigenvector corresponding to an eigenvalue λ. Suppose furthermore
that |λ| > α. Note that if eT p = 0 should hold, Lemma 23 yields that p = 0. But this is
not an eigenvector. Thus, we can assume that eT p 6= 0. Define q = eTpp . This is still an
eigenvector of GT corresponding to the same eigenvalue λ. Since eT q = 1, it follows from
Lemma 23 that
q = (1 − α)(I − αS T )v
It follows that
(1 − α)vT (λI − αS)−1 e = (1 − α)eT ((λI − αS)−1 )T v = (1 − α)eT ((λI − αS)T )−1 v
= eT (1 − α)(λI − αS T )−1 v = eT q = 1
This lemma can be used to give an upper bound for each eigenvalue.
17
Theorem 25 (Eigenvalues of the Google matrix). Let G be the Google matrix and
denote λ1 , λ2 , . . . , λn as the eigenvalues of GT in descending absolute value. Then:
1. λ1 = 1, and
2. |λ2 | ≤ α.
Proof: The first property is fairly straightforward. Since G is a stochastic matrix, we have
Ge = e by Lemma 5. Furthermore, by the same Lemma, any eigenvalue λ of G satisfies
|λ| ≤ 1. So 1 is the largest eigenvalue of G. The spectrum of the transpose of a matrix is
equal to the spectrum of the matrix itself, hence λ1 = 1.
We will use Lemma 24 to prove the second statement. Suppose λ is an eigenvalue of GT and
|λ| > α. It follows that
∞
1−α T α −1 1 − α T X α n
(1 − α)vT (λI − αS)−1 e = v I− S e= v S e
λ λ λ λ
n=0
∞ ∞
1 − α X h α n T n i 1 − α X h α n T i
= v S e = v e
λ λ λ λ
n=0 n=0
∞
1 − α X α n 1 − α 1 1−α
= = · =
λ λ λ 1 − α/λ λ−α
n=0
n
The sum ∞ α
is convergent since ρ( αλ S) = αλ ρ(S) < ρ(S) ≤ 1. The last inequality
P
n=0 λ S
follows from Lemma 5. Since S is stochastic, Se = e. By applying this multiple times, we
see that S n e = e for any n ∈ N. Theorem 23 yields that (1 − α)vT (λI − αS)−1 e = 1, hence
1−α
λ−α = 1 must hold. So λ = 1.
18
Figure 3: The spectrum of a Google matrix.
Figure 3 illustrates the last theorem. The figure contains all eigenvalues (in blue) of the
5000 × 5000 (mathworks.com) Google matrix (see also paragraph 5.1). All eigenvalues but
one are contained in the red disk of radius α. The other eigenvalue is exactly equal to 1.
Furthermore, if G has at least two irreducible closed subsets, the second eigenvalue λ2 is
exactly equal to α. For a proof, see [6].
By Perron’s Theorem, each component of the eigenvector corresponding to eigenvalue 1 has
the same sign, so we can scale this vector (denote this vector by π) such that π T e = 1. π is
(strictly) positive, so it is indeed a probablity vector. Thus, the PageRank vector is well
defined. The fact that the second eigenvalue |λ2 | ≤ α has some important consequences.
For example, the convergence speed of the Power Method depends on the second eigenvalue
of GT . We will discuss this in the next chapter.
19
4 Calculating the PageRank vector
The PageRank vector is the unique vector π such that GT π = π and π T e = 1. The goal of
Google was to be able to rank all webpages using the PageRank model. This corresponds
with calculating the eigenvector π. However, the world wide web consists of tens of billions
of pages, so calculating an eigenvector is not an easy task. In this chapter we will look at
some numerical methods to calculate this vector as fast as possible. First, we will discuss
the Power Method. Secondly, we will see that the PageRank vector satisfies a simple linear
system, which we will solve with the Jacobi Method.
Theorem 27. The Power Method will always converge to the PageRank vector π, assuming
that the starting vector x(0) satisfies eT x(0) = 1.
Proof: Suppose that the eigenvectors T n
(0)
Pnv1 , v2 , . . . vn of G form a basis of R . Then we can
write the starting vector as x = i=1 ci vi for some coefficients c1 , . . . , cn ∈ R. It follows
that
Xn
x(0) = ci vi
i=1
n
X n
X n
X
x(1) = GT x(0) = GT ci vi = ci GT vi = ci λi vi
i=1 i=1 i=1
n
X n
X n
X
T
x(2) = GT x(1) = GT ci λi vi = ci λi G vi = ci λ2i vi
i=1 i=1 i=1
.. .. ..
. . .
n
X n
X n
X
x(k) = GT x(k−1) = GT ci λik−1 vi = ci λk−1
i GT v i = ci λki vi
i=1 i=1 i=1
By Theorem 25, we know that λ1 = 1 and |λi | ≤ α for all 2 ≤ i ≤ n. So λki goes to zero as
20
k tends to infinity. We get
We assumed that eT x(0) = 1 holds. Suppose eT x(i) = 1 for some i ∈ N. Then we also have
eT x(i+1) = eT GT x(i) = (Ge)T x(i) = eT x(i) = 1, because G is a stochastic matrix. Hence
eT x(i) = 1 for all i ∈ N. So this is also true for the limit:
Obviously, it might happen that we cannot write x(0) as a linear combination of eigenvectors
(i.e. GT is not diagonalisable). In this case, we can write GT = P JP −1 , where J is the Jordan
form of GT and P is the corresponding matrix containing the generalized eigenvectors. Each
block Jm of J can be written as λIm + N , where N is the matrix of all zeros except on its
upper diagonal. This matrix N is nilpotent and if |λ| < 1, Jm k → 0 as k → ∞. There is only
one eigenvalue of GT which does not satisfy |λ| < 1, which is λ1 = 1. So the Power Method
will converge to the eigenvector corresponding to this eigenvalue, which is the PageRank
vector.
Theorem 28. Suppose that the starting vector satisfies eT x(0) = 1. The error of the Power
Method satisfies kx(k) − πk = O(αk ).
Proof: Once again, assume that x(0) = ni=1 ci vi for some constants ci . Let M = max{kci vi k :
P
2 ≤ i ≤ n}. By Theorem 25, λ1 = 1 and |λi | ≤ α for all 2 ≤ i ≤ n. We find that
n
X Xn n
X n
X
kx(k) − πk = kπ + ci λki vi − πk = k k
ci λi vi k ≤ k
kci λi vi k = |λi |k kci vi k
i=2 i=2 i=2 i=2
n
X
k k k
≤ α M = (n − 1)M α = O(α )
i=2
For the Power Method, a starting vector x(0) is needed. Clearly, if x(0) is close to the
PageRank vector π, the error will be smaller on iteration one. Thus, the Power Method
will converge faster. It makes sense to use x(0) = v as the starting vector, since the random
surfer will prioritize some web pages more than others according to the personalisation vector.
Another option is to use x(0) = e/n.
The Power Method is a very simple numerical method to calculate the PageRank vector.
This method only requires the repetitive calculation of matrix-vector products. However,
this needs to be done with the matrix GT , which is nowhere sparse as we have stated before.
To prevent storage issues and slow computation time, it is preferred not to use the matrix
G but the very sparse matrix H. G is by definition equal to:
21
Now, it follows that
GT x = (αH + (αd + (1 − α)e)vT )T x = αH T x + v(αdT + (1 − α)eT )x
= αH T x + αv(dT x) + (1 − α)v(eT x)
= αH T x + (1 − α + α(dT x))v
Here we used the fact that eT x = 1. This follows from the fact that the starting vector, and
therefore also the next vectors, satisfy this equality. This gives us an easy and fast method
to approximate the PageRank vector by using just the very sparse hyperlink matrix H and
the dangling node vector d.
In the proof of Lemma 23, we have seen that the PageRank vector can also be calculated in
a different way. We will further discuss this method now.
Theorem 29. The PageRank vector π is equal to
π = (1 − α)(I − αS T )−1 v
Proof: Since eT π = 1 holds by definition of the PageRank vector, this theorem is a direct
consequence of Lemma 23.
The matrix I − αS T is an important matrix with a lot of useful properties. The following
properties are worth noting.
• All eigenvalues of I − αS T lie in the disk with center 1 and radius α. This follows
directly from Gershgorin’s Theorem.
• I − αS T is invertible, as stated in Lemma 22.
• I − αS T is strictly diagonally dominant, or in other words, I − αS T is an M -matrix.
For each row i the diagonal element is larger than the sum of the absolute value of the
non-diagonal elements. We have seen this in the proof of Lemma 22.
• The row sums of I − αS T are exactly 1 − α.
Furthermore, we can simplify Theorem 29 even more:
Theorem 30. Define x = (I − αH T )−1 v. The PageRank vector π is equal to
x
π= T
e x
Proof: First, note that I−αH T is indeed invertible (we can apply the same proof as in Lemma
22 for the substochastic matrix H). Theorem 29 yields the equality (I − αS T )π = (1 − α)v.
Note that the sum of row i of the matrix H is equal to zero if i is a dangling node, and 1
otherwise. Hence He = e − d. We find
GT x = αH T x + αvdT x + (1 − α)veT x
= αH T x + αv(eT − eT H T )x + (1 − α)veT x
= αH T x − αveT H T x + veT x = αH T x + veT (I − αH T )x
= αH T x + veT v = αH T x + v = αH T x + (I − αH T )x = x
So GT eTxx = eT1 x GT x = eTxx . But of course we also have eT eTxx = 1. Since the PageRank
vector is unique, it must be equal to π = eTxx .
22
The matrix I − αH T satisfies many of the same properties as I − αS T :
1. All eigenvalues of I − αH T lie in the disk with center 1 and radius α.
2. I − αH T is invertible.
3. I − αH T is an M -matrix.
4. The row sums of I − αH T are either 1 or 1 − α.
The last two theorems gives us new ways to calculate the PageRank vector. We will use
both theorems and the Jacobi method to approximate the PageRank. This method makes
use of the fact that
∞
X
(I − αS T )−1 = (αS T )n
n=0
This sum converges because ρ(αS T ) = αρ(S T ) ≤ α · 1 < 1. We approximate this series by
a partial sum. This requires the computation of multiple matrix-matrix products, which is
not preferred. The Jacobi Method is an efficient way of computing this partial sum.
Algorithm 31. The Jacobi method (applied to the matrix S) is:
k←0
x(0) ← (1 − α)v
while convergence not reached do
x(k+1) ← αS T x(k) + (1 − α)v
k ←k+1
end while
π ← x(k)
Theorem 32. The Jacobi method converges to the PageRank vector π. Furthermore, the
error after k iterations is of order O(αk ).
Proof: By induction, it is easy to seePthat x(k) = (1 − α) kn=0 (αS T )n v. This is clearly true
P
for k = 0: x(0) = (1 − α)v = (1 − α) 0n=0 (αS T )n v. Assume that the equality holds for some
k ∈ N. Then we also have x(k+1) = αS T x(k) + (1 − α)v = αS T (1 − α) kn=0 (αS T )n v + (1 −
P
as k tends to infinity. Additionally, the largest eigenvalue of αH T is important for the speed
of this method.
23
Note that we can also apply the Jacobi method on the matrix αH T to approximate x =
(I − αH T )−1 v. This is sufficient, since by Theorem 30, the PageRank vector is equal to
π = x/(eT x). Hence the following algorithm can be used as well:
Algorithm 33. The Jacobi method (applied to the matrix H) is:
k←0
x(0) ← v
while convergence not reached do
x(k+1) ← αH T x(k) + v
k ←k+1
end while
π ← x(k) /(eT x(k) )
By the same proof as in Theorem 30, this algorithm also converges to π and the error after
k iteration is of order O(αk ). The advantage of this method is that H is much sparser than
S, so the computation of the matrix-vector product H T x(k) will be faster.
It is worth checking if we can optimize these methods. We can try the following shift-and-
scale method. We have seen that if we want to calculate π, it sufficient to solve the linear
system (I − αH T )x = v. Multiply this equation by a certain constant β and then shift it
back and forth. We get
(I − αH T )x = v
=⇒ β(I − αH T )x = βv
=⇒ I − (I − β(I − αH T ))x = βv
Define Qβ := (I − β(I − αH T )). x now satisfies the equality (I − Qβ )x = βv. Thus, we could
also apply the Jacobi algorithm to this matrix Qβ for a certain value of β. It might happen
that the largest eigenvalue of Qβ is smaller than the largest eigenvalue of αH T (in absolute
value), which will imply that the Jacobi method will converge faster. Therefore, our goal is
to find the value of β such that ρ(Qβ ) is minimal.
Suppose λ1 , λn , . . . , λn are the eigenvalues of αH T with corresponding eigenvectors p1 , . . . , pn .
Then for each 1 ≤ i ≤ n, we have
24
Figure 4: Eigenvalues of the matrix αH T for α = 0.85.
See figure 4 for an example how the spectrum of αH T can look like (here we have used an
induced sub graph of the mathworks.com web) for α = 0.85. Since H is sub-stochastic, all
eigenvalues lie in the red disk of center 0 and radius 0.85 by Gershgorin’s Theorem. In this
example, there is an eigenvalue λ1 exactly equal to 0.85 but there are no eigenvalues with
real value less than the eigenvalue λ2 ≈ −0.3234. Thus, we can make the largest eigenvalue
smaller by shifting all eigenvalues to the left. It is easy to see that it is optimal to shift
these eigenvalues to the left until the right-most and left-most eigenvalues λ01 and λ02 have
the same distance r from zero, i.e.
By solving this system, we find that the optimal β is equal to β ≈ 1.357. For this value, the
spectral radius is r ≈ 0.79. This coincides with figure 5, which shows that the eigenvalues
of Qβ for β = 1.357 lie in the disk with center 0 and radius 0.79.
So it is better to apply the Jacobi Method on Qβ for β = 1.357. The error after k iterations
is of order O(0.79k ) for this method, while the error of the regular Jacobi Method is of order
O(0.85k ). It might not seem like a huge difference, but after 50 iterations, the error of the
50
Optimized Jacobi Method is rougly 0.85 0.79 ≈ 40 times as small. Thus, much less iterations
will be needed and so we will be able to approximate the PageRank vector much faster.
25
Figure 5: Eigenvalues of the matrix Qβ for α = 0.85 and β = 1.357.
The Optimized Jacobi Method has two disadvantages. The first is that it is very hard to
find the optimal value of β. In our example, we simply calculated each eigenvalue, which
makes it very easy to find the value β such that the Jacobi Method is optimized. However,
for very large matrices, it is not practical to calculate the eigenvalues. We do know, however,
an upper bound to these eigenvalues. By Gershgorin’s Theorem, all eigenvalues of Qβ lie
in the disk with center 1 − β and radius αβ. Hence any eigenvalue λ of Qβ satisfies |λ| ≤
max{(1 − β) + αβ, (1 − β) − αβ}. One can easily check that this upper bound is minimal if
and only if β = 1. Therefore, Qβ = αH T , which means that we cannot generally optimize
the Jacobi method.
The other problem is the fact that the eigenvalues of αH T usually fill the whole Gershgorin
disk of radius α. Therefore, a shift-and-scale approach will not be able to decrease the
spectral radius of the matrix, which means that it is often not even possible to optimize the
Jacobi Method in this way.
26
5 Numerical Experiments
In the previous chapter we have discussed multiple algorithms to approximate the PageR-
ank vector π, namely the Power Method and the Jacobi method (applied to two different
matrices). The error of both methods is of order O(αk ), where k is the number of iterations.
Furthermore, in some cases the Jacobi Method can be further optimised. In this chapter
we will apply these algorithms to see whether there are differences with respect to the com-
putation time. We will compare both the computation time and the amount of iterations
needed.
5.1 Data
Before we can test the algorithms, we need data to calculate the link matrix H. Cleary,
we do not know the complete graph of the world wide web, as the web consists of multiple
billion of pages and links. Thus, we will construct smaller link matrices to be able to test
the algorithms. We have done this with a couple of different methods. Each option yields
a completely different matrix (in size or structure). All algorithms will be applied on each
matrix to see what method benefits most of a certain property of the web.
• The first method of obtaining a link matrix is by getting a small subset of the web.
Just like Google, we use a surfer to obtain this graph. Starting in any webpage, the
surfer analyses the source code of this page and determines where this page links to.
We add these links to the link matrix. Then, the surfer opens one of the new pages.
This process is repeated until n pages have been found and all pages have been checked
for links. This will construct a very realistic web. We do have an option to filter certain
pages, such as images or JavaScript files. These are files which a user generally does
not want to see as search results. It is, however, also interesting to allow these pages.
This will yield a matrix with a large amount of dangling nodes, and we will discuss
why this matters for the algorithms.
The disadvantage of using a surfer is that it is very slow. It generally takes more than
a day to obtain a link matrix with more than 10 thousand nodes. See A.1.1 for the
Matlab code of the surfer we have made and used.
• Luckily for us, many other people have made their own surfer to obtain even larger
link matrices. This matrices are posted online, so that anyone could use them for
testing purposes. One of these matrices forms a collection of websites that are about
California. It consists of roughly 10 thousand pages, and about 50 percent of these are
dangling nodes. We thank Jon Kleinberg for this dataset [13].
We have also used two link matrices which are far larger in size. The first one is a
collection of all (almost 300 thousand) web pages in the Stanford University website.
The second one contains all web pages in the whole Stanford-Berkerley web and has
almost 700 thousand pages and more than 7 million links. We thank Sep Kamvar for
these datasets [14].
It is worth noting that both of these larger matrices only have a relative small number
of dangling nodes. This will have consequences for the iteration time of each numerical
method.
• The last and most obvious method is by simply generating a link matrix randomly.
27
Figure 6: A plot of the Stanford-Berkeley Web link matrix. A pixel represents a link.
This allows us to construct a large variety of link matrices. For example, one could
easily construct a link matrix with more than a million nodes. It is also possible to
choose the average amount of dangling nodes on the web or the average amount of
links.
However, a randomly generated web might not be very realistic; there does not need
to be any structure and the link matrix can even be aperiodic. A consequence of this
is that all eigenvalues of H appeared to be much smaller (in absolute value) compared
with those of a ‘realistic’ link matrix. Since ρ(H) becomes smaller, the numericals
methods will converge a lot faster. Thus, since such results are not realistic, we will
not use these matrices.
In summary, we have used the following link matrices to test the algorithms. See also the
Matlab files A.1.1−A.1.4. As one can see, there is a large variety of amount of nodes,
dangling nodes, and links.
Name Pages Links Dangling nodes Description
A 5000 1430936 244 mathworks.com (from our own crawler)
B 9664 16150 5027 Sites about California
C 281903 2312497 172 Stanford University Web
D 683446 7583376 4735 Stanford-Berkeley Web
The numerical methods we discussed are iterative. The iterations need to stop when a
satisfactory approximation of the PageRank vector has been found. But when is this the
case? Let k·k be a norm. We say that an approximation x of π is good enough if the relative
residual if small enough, i.e.
kGT x − xk
<ε
kxk
28
for some small number ε > 0. We picked ε = 10−5 and let k·k be the one-norm. The reason
for this choice is that the order of the PageRank values is important and not the exact
PageRank value. If the sup norm is used, for example, the ordered PageRank vector might
be completely different.
• Let x(i) be the approximation of π by using the Power Method for i iterations. The
Power Method is repeated until kx(i+1) − x(i) k/kxk < ε, i.e. eT |x(i+1) − x(i) | < 10−5 ,
since eT x(j) = 1 for all j.
• The Jacobi method (applied to the matrix S) after i iterations yields an approximation
x(i) such that (I −αS T )x(i) ≈ (1−α)v. The iteration will be repeated until the residual
k(I − αS T )x(i) − (1 − α)vk < ε, i.e. eT |(I − αS T )x(i) − (1 − α)v| < 10−5 .
To prevent unnecessary computations, we make use of the fact that αS T x(i) = α(H T x(i) +
v(dT x(i) )). This vector can then be reused to calculate the residual. See Matlab code
A.2.2 for the implementation.
• For the Jacobi method applied to H, one needs to normalise x(i) to get an approx-
imation of π. Thus, the iteration is repeated until kαH T x(i) − vk/(eT x(i) ) < ε, i.e.
eT |αH T x(i) − v| < 10−5 eT x(i) .
The Power Method and both Jacobi methods (applied to different matrices) will now be
applied to these four matrices. We have seen that the value of α is important for the speed
of each method. Furthermore, we will use two personalisation vectors to see if there is any
difference; the uniform vector w1 = e/n and a random probability vector w2 .
Table 1: The numerical methods applied to the 5000 × 5000 (Mathworks) link matrix A.
Note that the results are the average out of 100 simulations.
We picked β = 1.17 for the Optimized Jacobi Method, as it seemed that this value will require
the least amount of iterations (see also Matlab code A.2.5). As expected, the Optimized
Jacobi Method is always faster than the regular Jacobi Method applied to H, and it is better
to apply the Jacobi Method on H than to apply it on S. The difference in computation time
between the three methods between is relatively small. Interestingly, the Power Method
becomes more favourable if α is close to 1.
29
There does not seem to be any difference in computation time regarding the choice of the
personalisation vector. Thus, for the next experiments, we will only look at the uniform
personalisation vector v = e/n.
Table 2: The numerical methods applied to the 9664 × 9664 (California) link matrix B.
When applying the numerical methods on the second matrix B, the results are somewhat
different. There was no value of β 6= 1 that decreases the amount of iterations needed before
the Optimized Jacobi Method converges, hence the Jacobi Method could not be made faster.
It is remarkable that for this matrix, the Jacobi Method applied on H is better than the
Power Method. One reason for this is that about 48% of the nodes of this web are dangling
nodes. The Jacobi Method applied on H does not depend on the amount dangling nodes,
hence the difference in iteration time.
Table 3: The numerical methods applied to the 281903 × 281903 (Stanford University) link
matrix C.
For the large Stanford University matrix, the Power Method is only slightly faster than the
Jacobi Method applied on H. It is interesting to see that even though this web contains
almost 300 thousand nodes, each method will return an approximation of the PageRank
vector in less than 5 seconds.
Table 4: The numerical methods applied to the 683446 × 683446 (Stanford-Berkeley) link
matrix D.
30
For the largest link matrix, the results are the same as before. Note that less than 1% of the
nodes are dangling nodes, and thus the speed of the Power Method is not much affected by
these nodes.
Another way to compare the numerical methods is by first fixing an amount of iterations.
All methods will be iterated this many times, and the residual can be calculated afterwards.
This comparison might be better, since the residual should not be calculated after each
iteration. Figure 7 shows how large the residual of the approximation of π is, using the
Stanford-Berkeley link matrix D. The figure shows that Power Method is only slightly faster
than the Jacobi Method applied to H. However, after a while, the Jacobi Method becomes
faster.
Figure 7: The three numerical methods compared using the link matrix D.
5.4 Discussion
As we have seen in the previous chapter, each numerical method has a convergence rate of
O(αk ). Therefore, it is not too unexpected to see that the speed of the three methods is
about the same. In most cases, the Power Method is the fastest algorithm. However, the
difference between this method and the Jacobi Method applied on H is very small. In some
cases, for example if the amount of dangling nodes in a web is large, the latter will converge
faster.
We were only able to decrease the spectral radius of the mathworks.com link matrix A. Thus,
the Optimized Jacobi Method is of limited use.
The Jacobi method is based on the equality (I − αS T )π = (1 − α)v. As we will see in the
next chapter, using this equality will have an additional advantage.
31
6 Random Teleportation Parameter
The teleportation parameter α is the probability that the random surfer will click on a link.
The random surfer will, with a probability of 1 − α, go to any page on the web. The choice
of this page is random and distributed according to the personalisation vector. We have seen
that we can calculate the PageRank vector much faster if α is small. However, for a small
α, the structure of the web will hardly matter.
Google has stated that they used α = 0.85. But why did they pick this exact value? Except
for the fact that α shouldn’t be close to 0 or 1, there is no mathematical reason behind the
choice α = 0.85. Researchers such as [8] have proposed to use α = 0.5 instead. We will look
at this problem in a different way. The probability that a user enters a different link in the
address bar obviously depends on the user. It therefore makes sense to see α as a distribution
of teleportation parameters. Gleich et al.[9] did research about this parameter. With a large
amount of data, they concluded that this parameter fits a certain Beta distribution, as we
can see in the following figure.
Since the PageRank depends on the value of α, we will from now on denote the PageRank
vector corresponding to a certain value of α by π(α). Suppose that we want a single vector
that corresponds to the PageRank vector of everyone. The most straightforward choice is
π(E[α]): the PageRank vector calculated for the mean of all choices of α. However, this
does not yield a satisfactory vector. Instead, we will look at the expected PageRank vector.
Definition 34. Let f : [0, 1] → R+ be the probability density function of the stochastic
parameter α. We define the expected PageRank vector hπi by
Z 1
hπi = π(α)f (α)dα
0
We assume that this density function of α is very small if α is close to 1, i.e. f (α) ≈ 0 for
α ≈ 1. This is because we originally assumed that 0 ≤ α < 1 should hold; if α = 1, there is
no teleportation, hence there is no guarantee that a unique PageRank vector exists.
32
It is easy to see that the expected PageRank is a probability vector as well. Since for each
α we have eT π(α) = 1, it follows that
Z 1 Z 1 Z 1
eT hπi = eT π(α)f (α)dα = eT π(α)f (α)dα 1f (α)dα = 1
0 0 0
Additionally, since S,v, and f are all non-negative, the integral and summation can be inter-
changed. The fact that this is allowed is a direct consequence of the Monotone Convergence
Theorem, see [12, p.82]. Thus,
∞ Z
X 1
hπi = (1 − α)(αS T )n vf (α)dα
n=0 0
1
Suppose that α is uniformly distributed between 0 and r for some value r < 0, i.e. f (α) = r
for 0 ≤ α ≤ r and 0 otherwise. In this case, it is possible to calculate this integral:
∞ Z 1 ∞ Z r
n 1
X X
T n
hπi = (1 − α)(αS ) vf (α)dα = (1 − α)α · dα (S T )n v
r
n=0 0 n=0 0
∞ Z r
1X
= (αn − αn+1 )dα (S T )n v
r
n=0 0
∞ r ∞ r
1 X αn+1 αn+2 T n 1 X rn+1 rn+2
= − (S ) v = − (S T )n v
r n+1 n+2 0 r n+1 n+2 0
n=0 n=0
However, for general density functions f there is no practical way of calculating hπi. Instead,
we will try to approximate it. An obvious way to do so is by calculating the Riemann sum.
Suppose that we have picked some numbers 0 = x0 < x1 < . . . < xk = 1 and meshpoints
αi ∈ [xi−1 , xi ] for all 1 ≤ i ≤ k. The PageRank vector is approximately
k
X
hπi ≈ π(αi )f (αi ) · [xi − xi−1 ]
i=1
This requires the computation of the PageRank vector for k different values of α. In general,
k must be large if we want a good approximation of hπi. We could approximate π(α) for each
value of α with one of the numerical methods we have discussed, but as the approximation
of a single PageRank vector corresponding to a specific value of α already takes very long,
this is not preferred. However, the Power Method has the advantage that it can easily be
optimised so that the computation is faster.
As we have seen, the convergence speed of the Power Method is of order O(αk ). An advantage
of the Power Method over the Jacobi Method is that one can pick a starting vector for the
Power Method. Clearly, if this starting vector is close to the real PageRank vector, less
iterations will be needed until a satisfactory approximation will be computed.
33
In paragraph 4.1, we have proposed to use the personalisation vector v as the starting vector,
since this might be the best consequent guess we can make. However, if the PageRank
vector needs to be computed for many different values of α, we already have an idea how
the PageRank vector should look like. Thus, we can use the PageRank vector corresponding
to a different value of α as our initial guess. Suppose that π(α) should be computed for
α1 , . . . , αm , where 0 ≤ α1 < . . . < αm < 1. We can apply the following algorithm:
Algorithm 35.
π(α1 ) ← PowerMethod(v)
for k = 2 to m do
π(αk ) = PowerMethod(π(αk−1 ))
end for
Here PowerMethod(w) stands for that the Power Method should be applied with the starting
vector w. Note that π(α1 ) can also be computed using a different numerical method, but
for k = 2, . . . , m the Power Method needs to be applied.
This algorithm is based on the idea that π(α) is approximately equal to π(α0 ) if α is close
to α0 . In general, this is true. However, note that
π = (1 − α)(I − αS T )−1 v
By Theorem 29, the PageRank vector corresponding to a certain value of α is the solution
of the linear system (I − αS T )π(α) = (1 − α)v. Furthermore, we have shown that it is
sufficient to solve the system (I − αH T )x(α) = v. Our goal is to approximate this solution
x(α) (and thus for π(α)) for many different values of α. To do so, we will make use of the
shift-invariance of the so-called Krylov space. The following approach is based on [10].
Definition 36. The Krylov space Km is defined by:
Since the Krylov space Km (A, w) is a vector space, we have Km (A, w) = Km (βA, w) for
any β 6= 0. Because w ∈ Km (A, w), we can also shift this space without changing it. So
Km (A, w) = Km (I − βA, w) for any β 6= 0.
34
We will look at the Krylov space of the matrix H T with respect to the personalisation vector
v. Then we will try to find a vector x in this space such that (I − αH T )x approximates
v. If x is then normalised with respect to its 1-norm, this vector approximates π(α). The
following theorem is important; this guarantees that the algorithm will converge.
Theorem 37. The PageRank vector π satisfies π ∈ Kn (H T , v).
Proof: First, note that Km (H T , v) = Km (I − αH T , v). Let the characteristic polynomial of
I − αH T be equal to p(λ) = c0 + c1 λ + c2 λ2 + . . . + λn . By the Cayley-Hamilton theorem,
I − αH T satisfies
0 = p(I − αH T ) = c0 I + c1 (I − αH T ) + c2 (I − αH T )2 + . . . + (I − αH T )n
The last theorem shows why it makes sense to look for a solution of the equation (I−αH T )x =
v in the space Km (H T , v). To use this space, we will construct an orthonormal basis by
using the Arnoldi algorithm applied to the vectors v, H T v, (H T )2 v, . . . , (H T )n−1 v. This is
essentially a modified version of the Gram Schmidt algorithm.
Algorithm 38. The Arnoldi Algorithm applied to w1 = v/kvk2 is:
for k = 1 to m do
wk+1 ← H T wk
for j = 1 to k do
uj,k ← hwj , wk+1 i
wk+1 ← wk+1 − uj,k wj
end for
uk+1,k ← kwk+1 k2
if uk+1,k = 0 then
Stop;
end if
wk+1 ← wk+1 /uk+1,k
end for
Theorem 39. Assuming the Arnoldi Algorithm does not stop before m iterations, it creates
an orthonormal basis of Km (H T , v).
Proof: By induction, we show that wk = pk−1 (H T )v for some polynomial pk−1 of degree
k − 1. The fact that this is true is trivial for k = 1, since w1 = v/kvk2 . Let the induction
hypothesis hold for some k ∈ N. Note that
35
k k
1 T
X 1 T T
X
wk+1 = (H wk − uj,k wj ) = (H pk−1 (H )v − uj,k pj−1 (H T )v)
uk+1,k uk+1
j=1 j=1
This T
Pk shows that wk+1 = pk (H )v, where the polynomial pk is defined as pk (x) = (xpk−1 (x)−
j=1 uj,k pj−1 (x))/uk+1,k ). Since pk−1 is assumed to have degree k − 1, the degree of pk is
exactly k. Furthermore, the Arnoldi vectors are orthonormal by construction, hence they
form an orthonormal basis of Km (H T , v).
But what happens if the Arnoldi Algorithm breaks down before creating m Arnoldi vectors?
Theorem 40 shows that this should be seen as a good thing.
Theorem 40. Suppose that the Arnoldi Algorithm stops after k < m iterations. Then the
dimension of the Krylov space is maximal, i.e. Kk (H T , v) = Kl (H T , v) for all l ≥ k.
Proof: If the Arnoldi Algorithm breaks down at iteration k, we must have uk+1,k = 0 and
therefore wk+1 = 0. Thus, we also have
k
X
T
H wk = uj,k wj
j=1
Let z ∈ Kk (H T , v) be arbitrary.
Pk By Theorem 39, we can write z as the linear combination
of Arnoldi vectors: z = j=1 cj wj for some coefficients ci . It follows that
k
X k
X k−1
X
HT z = HT cj wj = cj H T wj = cj H T wj + ck H T wk
j=1 j=1 j=1
k−1
X Xk
= cj H T wj + ck uj,k wj
j=1 j=1
The consequence of this theorem is that if the algorithm stops at iteration k, we have made a
basis of Kk (H T , v) = Kn (H T , v). By Theorem 37, this space contains the PageRank vector
so there is no reason to expand this space.
Let W be the n × m matrix containing each Arnoldi vector wi on i’th column. Let U be
the m × m matrix with coefficients ui,j . Note that U is an upper Hessenberg matrix (i.e.
Uij 6= 0 implies i ≤ j + 1). Then it is possible[11, p161] to write Arnoldi Algorithm in a
single equation as
Here em is the standard m’th unit m × 1 vector. To be able to approximate the PageRank
vector we will look for an approximation u ∈ Km (H T , v) such that (I − αH T )u ≈ v. Since
u ∈ Km (H T , v), we can write u = a1 w1 + a2 w2 + . . . + am wm . In shorter notation u = W a,
where a = [a1 , a2 , . . . , am ]T . Our goal is to find good coefficients, i.e. coefficient a such that
the residual is very small. By definition, this residual r is equal to
r = v − (I − αH T )u = v − u + αH T u
36
Since v ∈ Km (H T , v) and u ∈ Km (H T , v), it follows that r ∈ Km+1 (H T , v). Hence we
can write r = b1 w1 + . . . + bm+1 wm+1 for some coefficients P
bi . Because the vectors w are
orthonormal, the 2-norm of this residual is simply equal to ( m+1 2 1/2 . Thus, the best
i=1 |bi | )
approximation u can be achieved by minimising this sum. We will take a slightly different
approach. Instead, we can pick the vector a in such a way that bi = 0 for all 1 ≤ i ≤ m.
This can be seen as a projection of x(α) onto Km (H T , v). To do so, note that
r = v − (I − αH T )W a = v − W a + αH T W a
= v − W a + α(W U + um+1,m um+1 eTm )a
= v − W (I − αU )a + αum+1,m wm+1 (eTm a)
Algorithm 41.
It is important to note that the Arnoldi algorithm is independent on the value of α. Thus,
the Arnoldi algorithm only needs to be applied once. After we have done this, we can
approximate π(α) very efficiently for many different values of α. But how large does m need
to be? We expect that the residual decreases as m increases, as the dimension of the Krylov
space increases and the first m coefficients bi of the residual are made 0. The following
numerical results confirm this hypothesis and illustrate how large m needs to be.
37
Figure 9: The residual of the approximation of π for different values of (α, m). This has
been calculated using the 5000 × 5000 matrix A, see section 5.1. Note that the residual is
approximately equal to the machine error if m is large or α is small. The red line corresponds
with a residual of 10−5 .
Figure 9 shows the relative residual of the approximation of the PageRank vector for many
different values of m and α. As one can see, the residual gets smaller if either α gets smaller
or m gets larger. Suppose that we require the residual to be less than 10−5 . Then π(α) can
be approximated for any combination of (α, m) below the red line such that the residual is
less than 10−5 . Suppose that we want to calculate π(α) for α = 0.01, 0.02, . . . , 0.95. The
figure shows that m = 80 should give sufficient results.
Clearly, it is easy to see how large m needs to be after we have applied the algorithm. In
general, one needs to pick m before applying the algorithm. A good way to do so is by
creating the space Km (H T , v) for a certain initial guess m by using the Arnoldi algorithm.
Then the approximation of the PageRank vector corresponding to the largest value of α
should be calculated by using algorithm 41. If the residual of this approximation is larger
than requested, m is too small. One can then expand Km (H T , v) by simply applying the
Arnoldi algorithm further. We repeat this until the residual is made small enough. In the
next section we will test this algorithm numerically.
We will compare this Reduced-Order Algorithm with the Optimal Power Method. Further-
more, to make the differences in computation time clearer, we will also test the normal Power
Method and the Jacobi Method applied on H.
38
The goal of each algorithm is to approximate the expected PageRank vector hπi. As we have
stated before, we do this by calculating a Riemann sum. This requires mesh points for α.
For each value of α, π(α) is calculated using one of the numerical methods. The speed of
each method depends on the values of the mesh points for α. We will try multiple sets of α
(for example, many different values of α and/or allowing values close to 1). The numerical
methods will be applied to matrix A, B, C and D as can be found in paragraph 5.2.
We have assumed that f (α) is small if α is close to one. Therefore, we can choose not to
calculate π(α) for such values of α without the total error getting too large. Suppose for
example that we have mesh points α = 0.00, 0.01, . . . , 0.90. Table 5 shows the computation
time of the expected PageRank vector using these numerical methods.
Table 5: Computation times for the expected PageRank hπi using mesh points α =
0.00, 0.01, . . . , 0.90
It appeared that for m = 40 (or even m = 25 for matrix B) the Krylov space Km (H T , v) is
large enough to approximate π(α) for all mesh points (i.e. for each mesh point, the residual
is less than 10−5 ). The Reduced-Order algorithm clearly is the fastest method, usually being
around 10 times as fast as any other algorithm. However, the real power of the Reduced-
Order algorithm becomes even more noticeable if a more precise approximation of hπi is
requested. In this case, more mesh points are needed.
Table 6: Computation times for the expected PageRank hπi using mesh points α =
0.00, 0.001, . . . , 0.899, 0.90
39
Table 6 shows the computation time of each algorithm when hπi is calculated using a more
precise grid α = 0, 0.001, . . . , 0.90. Clearly, the difference between the numerical methods is
huge; sometimes the Reduced-Order algorithm is more than 200 times faster than any other
algorithm. This is because this algorithm only needs to solve an m × m linear equation for
each value of α. The other algorithms require the computation of multiple matrix-vector
products with n elements for each value of α. Since n is much larger than m, this computation
takes much longer.
The Reduced-Order algorithm has one disadvantage: large values of α. Even though f (α) ≈ 0
if α ≈ 1, the contribution of π(0.99) to the expected PageRank hπi might be significant.
Thus, a larger mesh grid is required. As we have seen in previous paragraph, one needs to
expand the Krylov space to prevent the residual of this approximation to become too large.
Table 7: Computation times for the expected PageRank hπi using mesh points α =
0.00, 0.01, . . . , 0.99
Suppose that we use α = 0, 0.01, . . . , 0.99 as our mesh grid. The Reduced Order algorithm
will need a larger Krylov space to keep the residual below 10−5 . For matrix D for example,
this space should be expanded from m = 40 to m = 190. By [11, p165], the Arnoldi algorithm
is computationally very expensive; the total amount of flops is of the order O(nm2 ). Each
new vector is made orthogonal to every other vector. This explains why the Reduced Order
algorithm can become slightly slower than the Optimized Power Method.
Table 8: Computation times for the expected PageRank hπi using mesh points α =
0.00, 0.001, . . . , 0.989, 0.99
However, if we want a better approximation of hπi, the amount of mesh points should be
increased. As we have seen before, the Reduced Order method becomes by far the fastest
way of calculation π(α) for these mesh points.
40
6.5 Discussion
To approximate the expected PageRank vector hπi, one needs to calculate π(α) for many
different values of α. The Reduced Order algorithm we have discussed is a very efficient way
of doing so. In some of our experiments, this algorithm is as much as 200 times faster than
any other method we have discussed.
The power of the Reduced Order algorithm lies in the fact that the Arnoldi algorithm
should only be applied once. After that, we only need to solve the simple m × m system
(I − αU )x = e1 . We have seen that m can usually be very small compared to n (in our
examples m = 200 was sufficient), thus this system can be solved very quickly.
If π(α) for α close to 1 should be calculated using this Reduced Order algorithm, the Krylov
space should be made larger. We have seen that is computationally very expensive, and
in some cases (when using a small amount of mesh points but allowing values close to 1)
this might not be worth it. Another option we have not discussed is keeping m small but
applying some iterations of the (Optimized) Power Method to the approximations of π(α)
for values of α close to one.
41
7 Conclusion & Discussion
In this paper we have discussed the PageRank model. We have illustrated the idea behind
PageRank with the help of a random surfer. To make sure this vector is uniquely determined,
modifications to this model have been made; artificial transition probabilities for dangling
nodes have been implemented, as well as a probability to teleport to any web page on the
internet.
Originally defined as the unique stationary distribution of a Markov chain, we defined the
PageRank vector in a more rigorous way with the help of some linear algebra; the PageRank
vector is defined as an eigenvector of the Google matrix corresponding to eigenvalue 1. The
problem was to find an efficient way to calculate this eigenvector. We have discussed several
numerical methods for this.
The first algorithm we have discussed is the well-known Power Method. This algorithm is
often applied for PageRank problem. The error of this algorithm is of the order O(αk ), where
α is the teleportation parameter and k stands for the number of iterations. Futhermore, we
have shown that the PageRank vector π is equal to
π = (1 − α)(I − αS T )−1 v
42
Here f corresponds to the probability density function of α. To approximate this vector,
π(α) should be calculated for many different values of α. We have introduced a reduced-order
algorithm that can do that very efficiently. This algorithm makes use of the shift-invariance
of the Krylov space Km (H T , v). An orthonormal basis of this space can be found by applying
the Arnoldi algorithm. We have shown that the residual of the linear system (I −αH T )x = v
lives in the Krylov space Km+1 (H T , v) for any vector x ∈ Km (H T , v), and by smartly picking
this vector x, one can make the residual of the linear system very small. The vector x can
then be used to approximate π(α).
The only disadvantage of the Reduced Order algorithm is that one needs to expand the
Krylov space if π(α) is requested for values α close to 1. Expanding this space is computa-
tionally very expensive.
The power of the Reduced Order algorithm lies in the fact that the basis of Km (H T , v)
should only be made once using the Arnoldi algorithm. Then one can find an approximation
of π(α) for each value of α by simply solving multiple systems with only m variables, where
m is usually much smaller than n. The numerical experiments in paragraph 6.4 showed that
this algorithm is extraordinary effective. Instead of using the (Optimized) Power Method
or Jacobi Method iteratively, one can make the computation of the expected PageRank
vector hπi up to 200 times faster by applying this Reduced Order algorithm.
43
8 References
References
[1] L. Page, S. Brin, R. Motwani and T. Winograd, The PageRank Citation Ranking: Bring-
ing Order to the Web. 1998
[2] A. Langville and C. Meyer, Google’s PageRank and Beyond: The Science of Search
Engine Rankings. Princeton University Press, New Jersey, 2006
[3] C. Moler, The worlds largest matrix computation. Matlab news and notes, 2002
[4] A. Langville and C. Meyer, Deeper Inside PageRank. Internet Mathematics, 2004, 1.3:
335-380.
[5] D. Armstrong, The Perron-Frobenius theorem. http://www.math.miami.edu/
~armstrong/685fa12/sternberg_perron_frobenius.pdf
[6] T. Haveliwala and S. Kamvar, The Second Eigenvalue of the Google Matrix. Stanford
University Technical Report, 2003
[7] S. Kamvar, and T. Haveliwala, The condition number of the PageRank problem. 2003
[8] K. Avrachenkov, N. Litvak, and K.S. Pham, A singular perturbation approach for choos-
ing the PageRank damping factor. Internet Mathematics 5.1-2: 47-69, 2008
[9] D. Gleich, A. Flaxman, P. Constantine and A. Gunawardana, Tracking the random
surfer: empirically measured teleportation parameters in PageRank. Proceedings of the
19th international conference on World wide web. ACM, 2010
[10] N. Budko, and R. Remis, Electromagnetic inversion using a reduced-order three-
dimensional homogeneous model. Inverse problems 20.6 (S17), 2004
[11] Y. Saad, Iterative methods for sparse linear systems. Society for Industrial and Applied
Mathematics, 2003.
[12] A. Zaanen, Continuity, integration and Fourier theory. Vol. 2. Berlin; Springer-Verlag,
1989.
[13] J. Kleinberg, California hyperlink matrix. http://www.cs.cornell.edu/Courses/
cs685/2002fa/
[14] S. Kamvar, Stanford University hyperlink matrices. http://www.kamvar.org/
personalized_search
44
A Appendix
The following Matlab files have been created to be able to construct link matrices. Further-
more, the Google matrix can be computed with these scripts.
A.1.1 surfer.m
function [ H, names ] = s u r f e r ( r o o t , n )
% S t a r t i n g a t t h e u r l r o o t , t h e s u r f e r w i l l v i s i t web p a g e s and
determines
% where i t l i n k s t o . This p r o c e s s i s r e p e a t e d u n t i l n d i f f e r e n t
web p a g e s
% have been found . The u r l ’ s o f t h e web p a g e s and t h e l i n k m a t r i x
H are returned .
names = c e l l ( n , 1 ) ;
names {1} = r o o t ;
m = 1; %c u r r e n t amount o f u r l ’ s
H = l o g i c a l ( sparse ( n , n ) ) ;
for j = 1 : n
t r y page = u r l r e a d ( names{ j } ) ;
%Reading t h e f i l e and d e t e r m i n i n g l i n k s
for f = s t r f i n d ( page , ’ h r e f =”h t t p : ’ )
l i n k = page ( f +6: f+4+min( s t r f i n d ( page ( f +6:end ) , ’ ” ’ ) ) ) ;
%t r u c a t e u r l s t o p r e v e n t d u p l i c a t e s
f o r i = 1 : length ( t r u n c a t e )
pos = min( s t r f i n d ( l i n k , t r u n c a t e { i } ) ) ;
i f ( ˜ isempty ( pos ) )
l i n k = l i n k ( 1 : pos −1) ;
end
end
%Checking i f t h e u r l i s a l r e a d y known
known = f a l s e ;
f o r i = 1 :m
i f ( s t r c m p i ( names{ i } , l i n k ) )
45
%The u r l i s known !
known = t r u e ;
H( j , i ) = 1 ;
end
end
%I f t h e u r l i s not known , a d d i n g t h i s l i n k
i f ( ˜ known && m ˜= n )
%I s t h e u r l a l l o w e d ?
skip = f a l s e ;
f o r s t r = banned
i f ( ˜ isempty ( s t r f i n d ( l i n k , s t r { 1 } ) ) )
skip = true ;
break ;
end
end
i f (˜ skip )
m = m + 1;
names{m} = l i n k ;
H(m, j ) = 1 ;
end
end
end
catch
%We couldn ’ t open t h e u r l . C o n t i n u i n g .
end
end
end
A.1.2 loadCaliforniaMatrix.m
function [ H, names ] = l o a d C a l i f o r n i a M a t r i x
% Computes t h e h y p e r l i n k m a t r i x o f t h e C a l i f o r n i a web ( s e e
% h t t p : / /www. c s . c o r n e l l . edu / Courses / c s6 8 5 /2002 f a / d a t a / g r 0 .
California ) .
f i d = fopen ( ’ g r 0 . C a l i f o r n i a ’ ) ;
t l i n e = fgets ( f i d ) ;
%Adding t h e nodes
names = { } ;
while i s c h a r ( t l i n e ) && t l i n e ( 1 )==’ n ’
pos = s t r f i n d ( t l i n e , ’ ’ ) ;
i = t l i n e ( pos ( 1 ) +1: pos ( 2 ) −1) ;
j = t l i n e ( pos ( 2 ) +1:end ) ;
names = [ names ; c e l l s t r ( j ) ] ;
t l i n e = fgets ( f i d ) ;
end
%Adding t h e a r c s
46
i = []; j = [];
while i s c h a r ( t l i n e ) && t l i n e ( 1 )==’ e ’
pos = s t r f i n d ( t l i n e , ’ ’ ) ;
i = [ i ; str2num ( t l i n e ( pos ( 1 ) +1: pos ( 2 ) −1) ) ] ;
j = [ j ; str2num ( t l i n e ( pos ( 2 ) +1:end ) ) ] ;
t l i n e = fgets ( f i d ) ;
end
i = i +1;
j = j +1;
n = length ( names ) ;
fclose ( f i d ) ;
%C r e a t i n g t h e ( s p a r s e ) h y p e r l i n k m a t r i x
H = sparse ( i , j , 1 , n , n ) ;
H = spdiags ( 1 . /max( 1 ,sum(H, 2 ) ) , 0 , n , n ) ∗ H;
end
A.1.3 loadStanfordMatrix.m
function H = l o a d S t a n f o r d M a t r i x
% Computes t h e h y p e r l i n k m a t r i x o f t h e S t a n f o r d U n i v e r s i t y web (
see
% h t t p : / /www. kamvar . o r g / a s s e t s / d a t a / s t a n f o r d −web . t a r . g z ) .
A.1.4 loadSBMatrix.m
function [ H, r o o t u r l s ] = loadSBMatrix2
% Computes t h e h y p e r l i n k m a t r i x o f t h e S t a n f o r d −B e r k e l e y Web ( s e e
% h t t p : / /www. kamvar . o r g / a s s e t s / d a t a / s t a n f o r d −b e r k e l e y −web . t a r . g z ) .
n = 683446;
load s t a n f o r d −b e r k e l e y −bool −s o r t e d . dat ;
H = spconvert ( s t a n f o r d b e r k e l e y b o o l s o r t e d ) ;
% make t h e m a t r i x s q u a r e
H( n , n ) =0;
H = H( 1 : n , 1 : n ) ;
% n o r m a l i z e t h e rows t o sum t o 1
H = spdiags ( 1 . /max( 1 ,sum(H, 2 ) ) , 0 , n , n ) ∗ H;
load s t a n f o r d −b e r k e l e y −s o r t e d −roots . dat ;
indices = stanford berkeley sorted roots ;
i n d i c e s = i n d i c e s ( find ( i n d i c e s <n ) ) ;
r o o t u r l s = t e x t r e a d ( ’ r o o t u r l s . t x t ’ , ’%s ’ ) ;
r o o t u r l s = r o o t u r l s ( 1 :max( s i z e ( i n d i c e s ) ) ) ;
end
47
A.1.5 DanglingNodeVector.m
d = (sum(H, 2 ) == 0 ) ;
end
A.1.6 GoogleMatrix.m
n = length (H) ;
i f ( nargin <3)
v = o n e s ( n , 1 ) /n ;
end
i f ( nargin <2)
alpha = 0 . 8 5 ;
end
d = DanglingNodeVector (H) ;
G = a l p h a ∗ (H + d∗v ’ ) + (1− a l p h a ) ∗ o n e s ( n , 1 ) ∗v ’ ;
end
The next Matlab files are the numerical methods we have discussed in chapter 4.
A.2.1 PowerMethod.m
n = length (H) ;
e = ones (n , 1 ) ;
d = DanglingNodeVector (H) ;
norm = @( x ) sum( abs ( x ) ) ; %1−norm
i f ( nargin < 5 )
error = 1 e −5;
end
i f ( nargin < 3 )
v = e / n;
48
end
i f ( nargin < 4 )
startvector = v;
end
i f ( nargin < 2 )
alpha = 0 . 8 5 ;
end
K = a l p h a ∗H ’ ;
pi = s t a r t v e c t o r ; %i n i t i a l g u e s s
max = 3 0 0 ; %maximum amount o f i t e r a t i o n s
for i t e r = 2 :max
p i p r e v i o u s = pi ;
pi = K∗ pi + (1− a l p h a+a l p h a ∗sum( d . ∗ pi ) ) ∗v ;
r e s = norm( pi−p i p r e v i o u s ) ;
i f ( r e s < error )
break ;
end
end
end
A.2.2 JacobiMethodS.m
n = length (H) ;
e = ones (n , 1 ) ;
d = DanglingNodeVector (H) ;
norm = @( x ) sum( abs ( x ) ) ; %1−norm
i f ( nargin < 4 )
error = 1 e −5;
end
i f ( nargin < 3 )
v = e / n;
end
i f ( nargin < 2 )
alpha = 0 . 8 5 ;
end
K = a l p h a ∗H ’ ;
pi = (1− a l p h a ) ∗v ;
max = 3 0 0 ; %maximum amount o f i t e r a t i o n s
a l p h a s p i = K∗ pi + sum( d . ∗ pi ) ∗ a l p h a ∗v ;
for i t e r = 2 :max
pi = a l p h a s p i + (1− a l p h a ) ∗v ;
a l p h a s p i = K∗ pi + sum( d . ∗ pi ) ∗ a l p h a ∗v ;
r e s = norm( pi−a l p h a s p i − (1− a l p h a ) ∗v ) ;
49
i f ( r e s < error )
break ;
end
end
end
A.2.3 JacobiMethodH.m
n = length (H) ;
e = ones (n , 1 ) ;
norm = @( x ) sum( abs ( x ) ) ; %1−norm
i f ( nargin < 4 )
error = 1 e −5;
end
i f ( nargin < 3 )
v = e / n;
end
i f ( nargin < 2 )
alpha = 0 . 8 5 ;
end
K = a l p h a ∗H ’ ;
pi = v ;
max = 3 0 0 ; %maximum amount o f i t e r a t i o n s
a l p h a h p i = K∗ pi ;
for i t e r = 2 :max
pi = a l p h a h p i + v ;
a l p h a h p i = K∗ pi ;
r e s = norm( pi−a l p h a h p i − v ) ;
i f ( r e s < error ∗sum( pi ) )
break ;
end
end
pi = pi / sum( pi ) ;
end
A.2.4 OptimizedJacobiMethodH.m
n = length (H) ;
50
e = ones (n , 1 ) ;
norm = @( x ) sum( abs ( x ) ) ; %1−norm
i f ( nargin < 5 )
error = 1 e −5;
end
i f ( nargin < 4 )
v = e / n;
end
i f ( nargin < 3 )
alpha = 0 . 8 5 ;
end
K = H’ ;
max = 3 0 0 ; %maximum amount o f i t e r a t i o n s
pi = beta∗(1− a l p h a ) ∗v ;
Qpi = (1−beta ) ∗ pi+(beta∗ a l p h a ) ∗ (K∗ pi ) ;
for i t e r = 2 :max
pi = Qpi + v ;
Qpi = (1−beta ) ∗ pi+(beta∗ a l p h a ) ∗ (K∗ pi ) ;
r e s = norm( pi−Qpi − v ) ;
i f ( r e s < error ∗sum( pi ) )
break ;
end
end
pi = pi /sum( pi ) ;
end
A.2.5 OptimalBeta.m
n = length (H) ;
i f ( nargin < 5 )
error = 1 e −5;
end
i f ( nargin < 4 )
v = ones (n , 1 ) / n ;
end
i f ( nargin < 3 )
alpha = 0 . 8 5 ;
end
opt = Inf ;
o p t b e t a = NaN;
for beta = b e t a s
51
[ ˜ , i t e r ] = OptimizedJacobiMethodH (H, beta , alpha , v , error ) ;
i f ( i t e r <opt )
opt = i t e r ;
o p t b e t a = beta ;
end
end
end
A.2.6 Compare.m
n = length (H) ;
i f ( nargin < 4 )
beta = 1 ; %t h e Optimized J a c o b i Method w i l l not be used
end
i f ( nargin < 3 )
v = o n e s ( n , 1 ) /n ;
end
i f ( nargin < 2 )
alpha = 0 . 8 5 ;
end
%Power Method :
t i c ; [ ˜ , i t e r ] = PowerMethod (H, alpha , v ) ;
disp ( [ ’ Power Method : ’ , num2str ( toc ) , ’ s ( ’ , num2str ( i t e r ) , ’
iterations ) ’ ]) ;
%J a c o b i S
t i c ; [ ˜ , i t e r ] = JacobiMethodS (H, alpha , v ) ;
disp ( [ ’ J a c o b i Method ( S ) : ’ , num2str ( toc ) , ’ s ( ’ , num2str ( i t e r ) , ’
iterations ) ’ ]) ;
%J a c o b i H
t i c ; [ ˜ , i t e r ] = JacobiMethodH (H, alpha , v ) ;
disp ( [ ’ J a c o b i Method (H) : ’ , num2str ( toc ) , ’ s ( ’ , num2str ( i t e r ) , ’
iterations ) ’ ]) ;
i f ( beta˜=1)
%Optimized J a c o b i H
t i c ; [ ˜ , i t e r ] = OptimizedJacobiMethodH (H, beta , alpha , v ) ;
disp ( [ ’ Optimized J a c o b i Method ( b e t a= ’ , num2str ( beta ) , ’ ) : ’ ,
num2str ( toc ) , ’ s ( ’ , num2str ( i t e r ) , ’ i t e r a t i o n s ) ’ ] ) ;
end
end
52
A.3 Numerical Methods for the expected PageRank vector
The next Matlab files have been made and used to approximate the expected PageRank
vector.
A.3.1 OptimizedPowerMethod.m
n = length (H) ;
i f ( nargin <3)
v = o n e s ( n , 1 ) /n ;
end
m = length ( a l p h a s ) ;
iter = 0;
pi = zeros ( n ,m) ;
pi ( : , 1 ) = PowerMethod (H, a l p h a s ( 1 ) , v ) ;
for i = 2 : length ( a l p h a s )
[ pi ( : , i ) , k ] = PowerMethod (H, a l p h a s ( i ) , v , pi ( : , i −1) ) ;
iter = iter + k;
end
end
A.3.2 Arnoldi.m
n = length (A) ;
i f ( nargin <3)
v = o n e s ( n , 1 ) /n ;
end
W = zeros ( n ,m) ;
norm = @( x ) sqrt (sum( abs ( x . ˆ 2 ) ) ) ;
W( : , 1 ) = v / norm( v ) ;
U = zeros (m,m) ;
for k = 1 :m
z = A∗W( : , k ) ;
for j = 1 : k
U( j , k ) = sum(W( : , j ) . ∗ z ) ;
53
z = z − U( j , k ) ∗W( : , j ) ;
end
i f ( j==m)
break ;
end
U( k+1,k ) = norm( z ) ;
i f (U( k+1,k )==0)
%The A r n o l d i Method b r o k e down !
disp ( ’ The a r n o l d i Method broke down ! ’ ) ;
m = k;
break ;
end
W( : , k+1) = z / U( k+1,k ) ;
end
%R e s c a l i n g W and U i n c a s e t h e A r n o l d i A l g o r i t h m b r o k e down .
W = W( : , 1 :m) ;
U = U( 1 :m, 1 :m) ;
end
A.3.3 ReducedOrderMethod.m
n = length (H) ;
i f ( nargin <4)
v = o n e s ( n , 1 ) /n ;
end
e1 = [ 1 ; zeros (m−1 ,1) ] ;
pi = zeros ( n , length ( a l p h a s ) ) ;
%F i r s t , we c o n s t r u c t an o r t h o n o r m a l b a s i s u s i n g t h e A r n o l d i
Algorithm .
[W,U] = A r n o l d i (H’ ,m, v ) ;
%Now, we c a l c u l a t e t h e a p p r o x i m a t i o n o f p i ( a l p h a s ) i n t h e K ry l ov
space for
%each v a l u e o f a l p h a .
for i = 1 : length ( a l p h a s )
x = W∗ ( ( eye (m)−a l p h a s ( i ) ∗U) \ e1 ) ;
pi ( : , i ) = x/sum( x ) ;
end
end
A.3.4 Compare2.m
54
function Compare2 (H, a l p h a s ,m, v )
% Approximates p i ( a l p h a ) f o r t h e g i v e n v a l u e s o f a l p h a by u s i n g
t h e Optimized Power Method , t h e J a c o b i Method a p p l i e d on H and
S , and f i n a l l y by u s i n g t h e Reduced Order A l g o r i t h m . m s t a n d s
f o r t h e s i z e o f t h e K ry l ov s p a c e . Note t h a t f o r s t o r a g e i s s u e s ,
no PageRank v e c t o r w i l l be s t o r e d .
n = length (H) ;
i f ( nargin <4)
v = o n e s ( n , 1 ) /n ;
end
e1 = [ 1 ; zeros (m−1 ,1) ] ;
iter = 0;
disp ( ’ Optimal Power Method ’ ) ;
tic ;
pi = v ;
for i =1: length ( a l p h a s )
[ pi , k ] = PowerMethod (H, a l p h a s ( i ) , pi ) ;
i t e r = i t e r+k ;
end
toc ; i t e r
iter = 0;
disp ( ’ Normal Power Method ’ ) ;
tic ;
for i =1: length ( a l p h a s )
[ pi , k ] = PowerMethod (H, a l p h a s ( i ) ) ;
iter = iter + k;
end
toc ; i t e r
iter = 0;
disp ( ’ Normal J a c o b i H Method ’ ) ;
tic ;
for i =1: length ( a l p h a s )
[ pi , k ] = JacobiMethodH (H, a l p h a s ( i ) ) ;
iter = iter + k;
55
end
toc ; i t e r
A.3.5 PlotResidual.m
function P l o t R e s i d u a l (H, ms , a l p h a s , v )
% Using t h e K r yl ov s p a c e o f dimension m, t h i s f u n c t i o n p l o t s t h e
r e s i d u a l o f t h e a p p r o x i m a t i o n o f p i ( a l p h a ) f o r each v a l u e o f
alpha .
n = length (H) ;
i f ( nargin <4)
v = o n e s ( n , 1 ) /n ;
end
norm = @( x ) sqrt (sum( abs ( x . ˆ 2 ) ) ) ;
[m, a l p h a ] = meshgrid (ms , a l p h a s ) ;
r e s i d u a l = zeros ( s i z e (m) ) ;
d = DanglingNodeVector (H) ;
%For each v a l u e o f m, we a p p r o x i m a t e p i ( a l p h a ) f o r a l l a l p h a .
for k = 1 : length (ms)
pi = ReducedOrderMethod (H, a l p h a s , ms( k ) , v ) ;
for j = 1 : length ( a l p h a s )
r e s i d u a l ( j , k ) = norm( pi ( : , j ) −( a l p h a s ( j ) ∗H’ ∗ pi ( : , j ) + sum( d
. ∗ pi ( : , j ) ) ∗ a l p h a s ( j ) ∗v )−(1− a l p h a s ( j ) ) ∗v ) / norm( v ) ;
end
end
%P l o t t i n g t h e r e s i d u a l
figure ;
surf (m, alpha , r e s i d u a l ) ;
xlabel ( ’m’ ) ;
ylabel ( ’ a l p h a ’ ) ;
zlabel ( ’ r e s i d u a l ’ ) ;
set ( gca , ’ z s c a l e ’ , ’ l o g ’ )
56