0% found this document useful (0 votes)
229 views56 pages

Google Pagerank and Reduced-Order Modelling

pooku dengu

Uploaded by

Pavan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
229 views56 pages

Google Pagerank and Reduced-Order Modelling

pooku dengu

Uploaded by

Pavan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Delft University of Technology

Faculty Electrical Engineering, Mathematics and Computer Science


Delft Institute of Applied Mathematics

Google PageRank and Reduced-Order Modelling

Report for the


Delft Institute of Applied Mathematics
as part of

the degree of

BACHELOR OF SCIENCE
in
APPLIED MATHEMATICS

by

HUGO DE LOOIJ

Delft, Netherlands
June 2013

c 2013 by Hugo de Looij. All rights reserved.


Copyright
BSc report APPLIED MATHEMATICS

“Google PageRank and Reduced-Order Modelling”

HUGO DE LOOIJ

Delft University of Technology

Thesis advisor

Dr. N.V. Budko

Other members of the graduation committee

Prof.dr.ir. C. Vuik Dr. J.A.M. de Groot

Dr. J.G. Spandaw

June, 2013 Delft


Contents
1 Introduction 7

2 Preliminaries 8

3 The PageRank Method 10


3.1 The basic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Random surfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 The hyperlink matrix H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 The stochastic matrix S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.6 The Google matrix G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7 PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Calculating the PageRank vector 20


4.1 The Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Implementation of the Power Method . . . . . . . . . . . . . . . . . . . . . . 21
4.3 A direct method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 The Jacobi method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.5 Optimized Jacobi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Numerical Experiments 27
5.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Convergence criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Random Teleportation Parameter 32


6.1 Expected PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2 Optimized Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3 Reduced-order modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7 Conclusion & Discussion 42

8 References 44

A Appendix 45
A.1 Constructing the hyperlink/Google matrices . . . . . . . . . . . . . . . . . . . 45
A.1.1 surfer.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
A.1.2 loadCaliforniaMatrix.m . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.1.3 loadStanfordMatrix.m . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.1.4 loadSBMatrix.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.1.5 DanglingNodeVector.m . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.1.6 GoogleMatrix.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.2 Numerical Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.2.1 PowerMethod.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.2.2 JacobiMethodS.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.2.3 JacobiMethodH.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5
A.2.4 OptimizedJacobiMethodH.m . . . . . . . . . . . . . . . . . . . . . . . 50
A.2.5 OptimalBeta.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.2.6 Compare.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
A.3 Numerical Methods for the expected PageRank vector . . . . . . . . . . . . . 53
A.3.1 OptimizedPowerMethod.m . . . . . . . . . . . . . . . . . . . . . . . . 53
A.3.2 Arnoldi.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
A.3.3 ReducedOrderMethod.m . . . . . . . . . . . . . . . . . . . . . . . . . . 54
A.3.4 Compare2.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
A.3.5 PlotResidual.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6
1 Introduction

The world wide web consists of almost 50 billion web pages. From this chaos of websites,
users would like to visit a couple of sites about a certain subject. But how could a user
possibly know which of these billion web pages he should visit?
This is where search engines come into play. Not only do search engines filter the world wide
web for a specific search query, they also sort the internet by importance. In this way, search
engines could suggest a user to visit certain web pages. However, how can a web page be
rated by importance? Since the world wide web is so huge, this cannot be done manually.
Hence algorithms need to be developed so this job can be done automatically.
Google is the first search engine that managed to do this effectively. The secret behind
Google is the PageRank method. This method is developed in 1996 by the founders of
Google, Larry Page and Sergey Brin. Every page on the world wide web is given a score of
importance (also called the PageRank value). Whenever a user is making a search request,
Google sorts all web pages which satisfy the search requests by their PageRank value. The
user will receive these search results in this order.
The Google PageRank method is a completely objective way to rate a web page. It is based
on a mathematical model which only uses the structure of the internet. In this paper, this
method is discussed. We will see that the PageRank vector (the vector containing each
PageRank value) is defined as an eigenvector of the so-called Google matrix.
The Google matrix, however, is extremely large; it is almost 50 billion by 50 billion in
size. Therefore, calculating an eigenvector is far from an easy task. In chapter 3 we will
develop and discuss numerical methods to approximate this eigenvector as fast as possible.
Then these numerical methods are put to the test, using multiple small Google matrices,
corresponding to multiple subsets of web pages in the world wide web. The tests and results
can be found in chapter 4.
In the last chapter we will make an adjustment to the model. Previously, one of the pa-
rameters in the model is considered to be a deterministic value. Things get much more
interesting if this parameter is considered to be stochastic. The PageRank vector depends
on the value of this parameter, and our goal will be to calculate the expected PageRank
vector. We will develop an algorithm which uses the shift-invariance of the Krylov space of
the hyperlink matrix. The expected PageRank can then be approximated using this algo-
rithm. This reduced-order algorithm will turn out to be far more efficient than the ‘standard’
approximations.
It is worth noting that the whole theory in this paper is based on the original paper of Google
PageRank by Brin and Page in 1998, see [1]. It is almost certain that Google has further
developed this model. The current model is kept secret, and thus the theory and results
might not be completely up to date. It has been stated by Google, however, that the current
algorithm used by Google is still based on the original PageRank model.

7
2 Preliminaries

Theorem 1 (Gershgorin’s circle theorem). Let A be an n × n matrix. Each eigenvalue


λ of A satisfies  
[n  n
X 
λ∈ z ∈ C : |z − aii | ≤ |aij |
 
i=1 j=1,j6=i

Thus,
Pn every eigenvalue of A is contained in the union of disks with center aii and radius
j=1,j6=i |aij |.

Lemma 2. Let A be an n × n matrix. λ ∈ C is an eigenvalue of A if and only if λ is an


eigenvalue of AT .

Definition 3. An n × n matrix A is called (row)-stochastic if the following properties are


true:
1. aij ∈ R and aij ≥ 0 for all 1 ≤ i, j ≤ n.
Pn
2. j=1 aij = 1 for all 1 ≤ j ≤ n.

Definition 4. An n × n matrix A is called sub-stochastic if the following properties are


true:
1. aij ∈ R and aij ≥ 0 for all 1 ≤ i, j ≤ n.
Pn
2. j=1 aij ≤ 1 for all 1 ≤ j ≤ n.

Lemma 5. Let A be an n × n stochastic matrix. Then 1 is an eigenvalue of A. Furthermore,


this is the largest eigenvalue (i.e. if λ is an eigenvalue of A, then |λ| ≤ 1).
Proof: Let e = (1, 1, . . . , 1)T be the n × 1 vector where each component is 1. Then
   Pn   
1 i=1 a 1j · 1 1
 1   Pn a2j · 1   1 
i=1
Ae = A  .  =   =  ..  = 1e
     
..
 ..   .   . 
Pn
1 i=1 anj · 1 1

Thus, 1 is indeed an eigenvalue of A. By applying Gershgorin’s Theorem on A, Pwe find that


for each eigenvalue z of A there exists some 1 ≤ i ≤ n such that |z − aii | ≤ nj=1,j6=i |aij |.
This yields
n
X n
X n
X
|z| = |z − aii + aii | ≤ |z − aii | + |aii | ≤ |aij | + |aii | = |aij | = aij = 1
j=1,j6=i j=1 j=1

In other words, 1 is the largest eigenvalue of A.

8
Theorem 6 (Geometric series for matrices). Suppose A is an n × P n matrix such that
its spectral radius ρ(A) < 1 (i.e. |λ| < 1 for all eigenvalues λ of A). Then ∞ n
n=0 A converges
and
X∞
−1
(I − A) = An
n=0

For the PageRank model, we will also discuss Markov chains. The transition matrix of a
Markov chain is a (sub)-stochastic matrix, and it is important to look at some properties of
these matrices.
Definition 7. Let A be an n × n sub-stochastic matrix. A is said to be irreducible if for
each (i, j) there exists some n ∈ N such that (An )ij > 0. If this property does not hold, A
is called reducible.

Definition 8. An n × n matrix is said to be aperiodic if for each 1 ≤ i ≤ n one has


gcd{n ∈ N≥1 : (An )ii > 0} > 1. In particular, if Aii > 0 for each i, A is periodic. A is called
periodic if and only if A is not aperiodic.

Definition 9. A matrix A is primitive if and only if it is irreducible and aperiodic.

9
3 The PageRank Method

The PageRank method is an algorithm designed by the founders of Google, Sergey Brin and
Larry Page. This algorithm is the main reason why Google turned out to be the most popular
search engine. Every page on the internet is given a score of importance. The larger this
number, the earlier this page can be found in the search results. In the following paragraphs
we will discuss how this score is being determined.

3.1 The basic model

PageRank is based on the hyperlink structure of the world wide web. We can view the
internet as a directed graph. The nodes should be interpreted as the web pages of the world
wide web and an arc from node i to node j should be seen as a hyperlink on webpage i to
webpage j. See figure 1 for a small example. We will use this example to illustrate a couple
of definitions.

Figure 1: A small network with 5 webpages.

The goal of Brin and Page, the founders of Google, was to construct a method to rank each
page on the internet. They invented the PageRank method, which uses only the structure
of the internet. If many different webpages all link to the same page (such as node 2 in our
example), it makes sense to view this page as important.
However, the amount of links to a certain page is very easily manipulated and does not give
a clear idea how important a page should be. Brin and Page were the first to also look at the
quality of an outgoing link. If an important page links to an external webpage, this webpage
should be marked as important as well. This idea, that except for the quantity of incoming
links the quality is relevant for the PageRank value of a webpage, is the main reason why
Google PageRank is such a successful model.

3.2 Random surfer

The PageRank method is based on a random surfer. Suppose we are surfing on the internet,
clicking links randomly. This can be interpreted as a Markov chain where the transition
probability pij is uniformly distributed between the set of pages that i links to. We repeat
this process an infinite amount of times. The fraction of the time the surfers visits a certain
website can be seen as the PageRank of this site. Clearly, this fraction is greater for a certain
webpage if more pages link to this webpage. In our example (see figure 1) node 2 will have a
greater PageRank. However, every time the random surfer visits this node, it will also visit

10
node 1. Hence node 1 will have a good PageRank value as well, even though it has only 1
incoming link. This illustrates why the quality of links also matters.
These fractions - if they exist - form a stationary distribution corresponding to the Markov
chain of the random surfer. Denote this distribution by p. We know from Markov theory
that this distribution satisfies the linear system pT = pT A, with A the transition matrix of
the Markov chain. This equality allows us to define the PageRank vector in a more rigorous
way using only linear algebra. Because a stationary distribution does not need to exist nor
does it need to be unique, some modifications to the model need to be made before the
PageRank vector can be defined.

3.3 Ranking

Definition 10. Let p be an n × 1 vector. We call p a ranking if it has the following


properties:
X 1
• pi = pj for all 1 ≤ i ≤ n.
|U (j)|
j∈I(i)

• pi ≥ 0 for all 1 ≤ i ≤ n.
n
X
• pi = 1.
i=1

Here I(i) is the collection of pages which link to node i and |U (j)| is the amount of outer
links on the same page. This is well defined, because if U (j) = 0, page j does not link to
any other page, hence j ∈ / I(i).

Therefore, a stationary distribution of the Markov chain is a ranking. To illustrate this


definition, a vector p is a ranking of our example in figure 1 satisfies


 p1
= p2
= 12 p1 + p3 + 12 p4 + p5

 p2


p3
= 0
p= 12 p1

 4



= 12 p4
p5
A solution of this system such that 5i=1 pi = 1 and pj ≥ 0 for all 1 ≤ j ≤ 5 is the vector
P

 
4/11

 4/11 

p=
 0 

 2/11 
1/11
One can easily show that forP any other solution q of this linear system, we have q = kp for
some k ∈ R. The property 5i=1 pi = 1 guarantees the uniqueness of the ranking p. The
larger the number pi , the more important page i. In our example we should see page 1 and
2 as the upper results, then following page 4, 5, and 3 respectively.

11
3.4 The hyperlink matrix H

We have seen that p is a ranking if it satisfies a linear system. Therefore, we will construct a
matrix such that we can make the definition more convenient. Define the (Boolean) variable
Lij by Lij = 1 if page i links to page j, and Lij = 0 otherwise. Let n be the amount of web
pages.
Definition 11. The out degree of a page i is defined by ci = nj=1 Lij .
P

The out degree of a page is simply the amount of pages that it links to. In the world wide
web, this number is usually 0 to 20 but can vary greatly. As we will see, the case ci = 0 will
require some caution.
Definition 12. A node i is called a dangling node if its out degree ci = 0.

Dangling nodes are those pages that do not link to any other pages. See figure 2. In this
example, page 6 is a dangling node. The world wide web consists mainly of dangling nodes:
images, scripts, pdf files and other common files found on the internet often do not link to
any other page. We will see in the next paragraph that dangling nodes require us to make
an adjustment to the model of the random surfer.
Definition 13. We define the n × n hyperlink matrix H by

 1 if c 6= 0 and L = 1

i ij
Hij = ci
0 otherwise

See figure 2 for an example.

Figure 2: An example of a network with a dangling node.

12
The following matrix is the hyperlink matrix corresponding to the web of this figure. Note
that page 6 is a dangling node − in the matrix H, this corresponds to a zero row.
 
0 1/2 1/2 0 0 0
 0 0 1/2 0 1/2 0 
 
 0 1 0 0 0 0 
H=  0 0

 0 0 0 1 
 0 1/3 1/3 0 0 1/3 
0 0 0 0 0 0

Theorem 14. Let p be a ranking. let H be the corresponding hyperlink matrix. Then
pT H = pT .
Proof: For the ith component of pT we have
n
X 1 X 1 X X
(pT )i = pi = pj = pj = Hji pj = Hji pj = (H T p)i = (pT H)i
|U (j)| cj
j∈I(i) j∈I(i) j∈I(i) j=1

Note that by taking the transpose of this equality, we get the following more convenient
equality
HT p = p

This means that a ranking p is a right eigenvector of matrix H T corresponding to eigen-


value 1. We will look at some properties of this matrix H.
• The hyperlink matrix H is sub-stochastic, i.e. all elements are non-negative and the
sum of each row is at most 1. After all, let 1 ≤ i ≤ n be arbitrary. There are two
possibilities:
1. ci = 0. Hence P
i is a Danging node. Since this page does not link to any other
page, we have nj=1 Hij = 0 ≤ 1.
2. ci 6= 0. Then
n
X X 1 X 1
Hij = = P =1
ci j:Lij =1 1
j=1 j:Lij =1 j:Lij =1

• Generally, H is reducible. This is because the world wide web most likely consists of
multiple disjoint subsets.
• Generally, H is periodic. There only needs to be one node with period larger or equal
to 2. This can happen if two web pages link to each other but to no other node.
• H is a very sparse matrix. The world wide web consists of multiple billions of pages,
but every page links to only a very few other pages. So most components Hij are
equal to zero. An advantage of this is that it requires much less memory to store and
calculate this matrix. Also, computations with this matrix are much faster. We will
use these properties of H to minimise computation time and memory storage issues
for the methods to calculate the PageRank value.

13
3.5 The stochastic matrix S

We have seen that p is a ranking if it satisfies the equality H T p = p. However, there is no


guarantee that such an p exists nor does it need to be unique. Brin and Page have modified
this matrix in such a way that it became stochastic and primitive. The Perron-Frobenius
Theorem then guarantees the existence and uniqueness of this solution. This will then allow
us to define the PageRank vector.
Note that H is not a stochastic matrix. The random surfer will have no idea what to do
when it ends up on a dangling node. We will therefore construct an artificial transition
probability for dangling nodes. For this, we will need to define a personalisation vector.
Definition 15. an n × 1 vector v is called a personalisation vector if it satisfies the
following two properties:
• vi > 0 for all 1 ≤ i ≤ n, and
n
X
• vi = 1.
i=1

We can also denote the second property by eT v = 1. Here e is the n × 1 uniform vector:
every element of e is equal to 1. This notation will turn out to be more convenient in the
proofs.
Just like the name suggest, the personalisation vector can be different for different users. A
component vi for a specific person should be seen as the probability that (s)he goes to page
i. If a person very commonly searches for information about sport, then vi will generally be
larger for a page which is about sport. The personalisation vector can therefore be chosen
to adapt the PageRank vector for the interests of a person.
Definition 16. The n × 1 dangling node vector d is given by
(
1 if ci = 0
di =
0 if ci 6= 0

Definition 17. The n × n matrix S is defined as

S = H + dvT

The matrix S is essentially equal to H, but every zero row is replaced by the personalisation
vector. The interpretation for this is that whenever the random surfer visits a dangling node,
it will visit another web page randomly according to the personalisation (probablity) vector.
Usually, we suppose that v = e/n, i.e. the transition probability pij is uniformly distributed
if node i is a dangling node. This is the ‘democratic’ personalisation vector. In this case,
our example in figure 2 yields the stochastic matrix

 
0 1/2 1/2 0 0 0

 0 0 1/2 0 1/2 0 

 0 1 0 0 0 0 
S= 

 0 0 0 0 0 1 

 0 1/3 1/3 0 0 1/3 
1/6 1/6 1/6 1/6 1/6 1/6

14
We will consider a couple of important properties of S.
• The matrix S is stochastic; every element of S is non-negative and the sum of every
row of S is exactly equal to 1. This is easy to see:
1. If ci 6= 0, then di = 0 and nj=1 Hij = 1. It follows that nj=1 Sij = nj=1 Hij +
P P P
0 = 1.
Pn
2. If ci = 0, then di = 1 and j=1 H
Pij = 0. SinceP v is a personalisation vector, we
have nj=1 vj = 1. It follows that nj=1 Sij = nj=1 Hij +di nj=1 vj = 0+1·1 = 1.
P P

• Generally, S is periodic and reducible.


• S is a sparse matrix, but much less sparse than H. This is because a relatively large
amount of pages on the world wide web are dangling nodes. The consequence of this is
that all the corresponding rows of S are completely non-zero, since the personalisation
vector is by definition strictly larger than zero. However, this is not a very big problem,
since S is a rank-one update of the very sparse matrix H. We will see this in chapter 4.

3.6 The Google matrix G

The last step of Brin and Page was to force the matrix to become irreducible and aperiodic.
This guarantees the uniqueness of the PageRank vector. We introduce the possibility of
teleportation for the random surfer model.
Definition 18. The teleportation parameter α is a the probability that the random
surfer will follow a link. We require that 0 ≤ α < 1.

This parameter α represents the probability that a link is being clicked. The random surfer
will with a probability of 1 − α go to a random page on the internet according to the
distribution of the personalisation vector. This kind of teleportation happens when, for
example, a user enters a different link in the address bar.
For small values of α, the random surfer will almost always teleport to another page. There-
fore, the random surfer will rarely click links and thus the structure of the web will not affect
the PageRank vector much. If, on the other hand, α is close to one, the numerical methods
to approximate the PageRank vector will converge very slowly. We will see this in chapter
4. Therefore, Google[1] used α = 0.85 for their model. In chapter 5 we will discuss what will
happen if α is considered to be a stochastic variable instead of a deterministic value.
Definition 19. The Google matrix based on the personalisation vector v and teleportation
parameter α is defined as
G = αS + (1 − α)evT

Thus, this matrix corresponds to the transition matrix of the Markov chain with teleporta-
tion. This matrix has some very important properties.
• G is stochastic. This follows from the fact that both S and evT are stochastic. G is
a convex combination of the two and therefore is stochastic as well.

15
• The matrix G is nowhere sparse. Because v > 0 and α < 1, it follows that for any
element Gij we have Gij ≥ (1 − α)vj > 0. This is a very bad property, but as we
will see, this will not cause any problems. One does not need to store G and the
matrix-vector products with G can be computed very efficiently using definition 19.
• The matrix G is irreducible. This follows from the fact that each element Gij is larger
than zero. For each combination of nodes i and j, the probablity that the surfer will
visit node j immediately after leaving node i is at least equal to (1 − α)vj > 0.
• G is aperiodic. This is implied by the fact that Gii > 0 for all nodes i.
• Therefore, G is primitive.

3.7 PageRank

The matrix G is stochastic and primitive. Hence, we will define the PageRank vector as the
unique stationary distribution of the corresponding Markov chain.
Definition 20. (PageRank). The PageRank vector π is the unique vector which satisfies:
• GT π = π,
Pn T
• i=1 πi = 1. We can also write this as π e = 1.

• πi > 0 for all 1 ≤ i ≤ n.

To prove that this vector is uniquely defined, we will show that 1 is an eigenvalue of GT with
an algebraic multiplicity of one. For this, we will use the Perron-Frobenius theorem. See [5]
for a proof of this theorem.
Theorem 21 (Perron-Frobenius). Let A be an n × n primitive matrix. Then there
exists an eigenvalue λ which is strictly greater in absolute value than all other eigenvalues.
Furthermore, all elements of an eigenvector corresponding to this eigenvalue have the same
sign.

We will prove a few lemma’s. These lemma’s will be used to prove some properties of the
spectrum of the matrix GT . Furthermore, these lemma’s will be reused in chapter 3 to prove
some other very important theorems.
Lemma 22. For each λ ∈ C with |λ| > α, the matrix λI − αS T is invertible.
Proof: Let λ ∈ C such that |λ| > α. We will use Gershgorins Circle theorem to show that 0
is not an eigenvalue of M := λI − αS, which implies that M is invertible. Let 1 ≤ i ≤ n be
arbitrary. Note that the diagonal element Mii is equal to Mii = λ − αSii . Furthermore, we
have  
X X n
X
|Mij | = αSij = α  Sij − Sii  = α(1 − Sii ) = α − αSii
j6=i j6=i j=1

since S is a stochastic matrix. Thus, by the reverse triangle inequality, we get


|Mii − 0| = |λ − αSii | ≥ ||λ| − |αSii || = ||λ| − αSii |
≥ |λ|
P − αSii > α − αSii
= j6=i |Mij |

16
P
In other words, 0 does not lie in the disk with center Mii and radius j6=i |Mij |. Since i was
arbitrary, the union of these disks does not contain 0. Hence, by Gershgorin’s Theorem, zero
is not an eigenvalue of M . Finally, by Lemma 2, 0 is not an eigenvalue of λI − αS T either,
which proves the Lemma.

Lemma 23. Let p be an eigenvector of GT corresponding to an eigenvalue λ. Suppose that


|λ| > α holds. Then p = (1 − α)(eT p)(λI − αS T )−1 v.
Proof: By writing out the definition of the Google matrix and by using the fact that
GT p = λp, we get

λp = GT p = (αS + (1 − α)evT )T p
= (αS T + (1 − α)veT )p = αS T p + (1 − α)v(eT p)
=⇒ (λI − αS )p = (1 − α)(eT p)v
T

Note that the matrix (λI − αS T ) is invertible by Lemma 22, since we have assumed that
|λ| > α. This yields
p = (1 − α)(eT p)(λI − αS T )−1 v
which completes the proof.

Lemma 24. Suppose λ is an eigenvalue of GT and |λ| > α. Then (1−α)vT (λI −αS)−1 e = 1.
Proof: Let p be an eigenvector corresponding to an eigenvalue λ. Suppose furthermore
that |λ| > α. Note that if eT p = 0 should hold, Lemma 23 yields that p = 0. But this is
not an eigenvector. Thus, we can assume that eT p 6= 0. Define q = eTpp . This is still an
eigenvector of GT corresponding to the same eigenvalue λ. Since eT q = 1, it follows from
Lemma 23 that
q = (1 − α)(I − αS T )v
It follows that
(1 − α)vT (λI − αS)−1 e = (1 − α)eT ((λI − αS)−1 )T v = (1 − α)eT ((λI − αS)T )−1 v
= eT (1 − α)(λI − αS T )−1 v = eT q = 1

This lemma can be used to give an upper bound for each eigenvalue.

17
Theorem 25 (Eigenvalues of the Google matrix). Let G be the Google matrix and
denote λ1 , λ2 , . . . , λn as the eigenvalues of GT in descending absolute value. Then:
1. λ1 = 1, and
2. |λ2 | ≤ α.
Proof: The first property is fairly straightforward. Since G is a stochastic matrix, we have
Ge = e by Lemma 5. Furthermore, by the same Lemma, any eigenvalue λ of G satisfies
|λ| ≤ 1. So 1 is the largest eigenvalue of G. The spectrum of the transpose of a matrix is
equal to the spectrum of the matrix itself, hence λ1 = 1.
We will use Lemma 24 to prove the second statement. Suppose λ is an eigenvalue of GT and
|λ| > α. It follows that

1−α T  α −1 1 − α T X  α n
(1 − α)vT (λI − αS)−1 e = v I− S e= v S e
λ λ λ λ
n=0
∞ ∞
1 − α X h α n T n i 1 − α X h α n T i
= v S e = v e
λ λ λ λ
n=0 n=0

1 − α X  α n 1 − α 1 1−α
= = · =
λ λ λ 1 − α/λ λ−α
n=0

n
The sum ∞ α
is convergent since ρ( αλ S) = αλ ρ(S) < ρ(S) ≤ 1. The last inequality
P
n=0 λ S
follows from Lemma 5. Since S is stochastic, Se = e. By applying this multiple times, we
see that S n e = e for any n ∈ N. Theorem 23 yields that (1 − α)vT (λI − αS)−1 e = 1, hence
1−α
λ−α = 1 must hold. So λ = 1.

However, λ = 1 is the largest eigenvalue of GT and GT is primitive. Thus, this eigenvalue


must have an algebraic multiplicity one by the Perron-Frobenius theorem, i.e. it is not
possible to have λ2 = λ1 = 1. It follows that |λi | ≤ α must hold for all i = 2, 3, . . . , n.

18
Figure 3: The spectrum of a Google matrix.

Figure 3 illustrates the last theorem. The figure contains all eigenvalues (in blue) of the
5000 × 5000 (mathworks.com) Google matrix (see also paragraph 5.1). All eigenvalues but
one are contained in the red disk of radius α. The other eigenvalue is exactly equal to 1.
Furthermore, if G has at least two irreducible closed subsets, the second eigenvalue λ2 is
exactly equal to α. For a proof, see [6].
By Perron’s Theorem, each component of the eigenvector corresponding to eigenvalue 1 has
the same sign, so we can scale this vector (denote this vector by π) such that π T e = 1. π is
(strictly) positive, so it is indeed a probablity vector. Thus, the PageRank vector is well
defined. The fact that the second eigenvalue |λ2 | ≤ α has some important consequences.
For example, the convergence speed of the Power Method depends on the second eigenvalue
of GT . We will discuss this in the next chapter.

19
4 Calculating the PageRank vector

The PageRank vector is the unique vector π such that GT π = π and π T e = 1. The goal of
Google was to be able to rank all webpages using the PageRank model. This corresponds
with calculating the eigenvector π. However, the world wide web consists of tens of billions
of pages, so calculating an eigenvector is not an easy task. In this chapter we will look at
some numerical methods to calculate this vector as fast as possible. First, we will discuss
the Power Method. Secondly, we will see that the PageRank vector satisfies a simple linear
system, which we will solve with the Jacobi Method.

4.1 The Power Method

The Power Method is a well known numerical algorithm to calculate an eigenvector of a


matrix corresponding to the largest eigenvalue. It is known to be extremely slow; its conver-
gence speed depends on the difference between the first and second largest eigenvalue of the
corresponding matrix, which is usually very small. However, as we have seen in Theorem
25, the second eigenvalue of GT is smaller than or equal to α. Thus, the Power Method is
very effective for our problem.
Brin and Page have applied the Power Method to approximate the PageRank vector. We
will discuss this algorithm here.
Algorithm 26. Let x(0) be a starting vector. The Power Method is the following iterative
process:
k←0
while convergence not reached do
x(k+1) ← GT x(k)
k ←k+1
end while
π = x(k)

Theorem 27. The Power Method will always converge to the PageRank vector π, assuming
that the starting vector x(0) satisfies eT x(0) = 1.
Proof: Suppose that the eigenvectors T n
(0)
Pnv1 , v2 , . . . vn of G form a basis of R . Then we can
write the starting vector as x = i=1 ci vi for some coefficients c1 , . . . , cn ∈ R. It follows
that
Xn
x(0) = ci vi
i=1
n
X n
X n
X
x(1) = GT x(0) = GT ci vi = ci GT vi = ci λi vi
i=1 i=1 i=1
n
X n
X n
X
T
x(2) = GT x(1) = GT ci λi vi = ci λi G vi = ci λ2i vi
i=1 i=1 i=1
.. .. ..
. . .
n
X n
X n
X
x(k) = GT x(k−1) = GT ci λik−1 vi = ci λk−1
i GT v i = ci λki vi
i=1 i=1 i=1

By Theorem 25, we know that λ1 = 1 and |λi | ≤ α for all 2 ≤ i ≤ n. So λki goes to zero as

20
k tends to infinity. We get

lim x(k) = lim c1 λk1 v1 = lim c1 1k π = c1 π


k→∞ k→∞ k→∞

We assumed that eT x(0) = 1 holds. Suppose eT x(i) = 1 for some i ∈ N. Then we also have
eT x(i+1) = eT GT x(i) = (Ge)T x(i) = eT x(i) = 1, because G is a stochastic matrix. Hence
eT x(i) = 1 for all i ∈ N. So this is also true for the limit:

1 = lim eT x(k) = eT lim x(k) = c1 eT π = c1


k→∞ k→∞

So the Power Method does indeed converge to the PageRank vector.

Obviously, it might happen that we cannot write x(0) as a linear combination of eigenvectors
(i.e. GT is not diagonalisable). In this case, we can write GT = P JP −1 , where J is the Jordan
form of GT and P is the corresponding matrix containing the generalized eigenvectors. Each
block Jm of J can be written as λIm + N , where N is the matrix of all zeros except on its
upper diagonal. This matrix N is nilpotent and if |λ| < 1, Jm k → 0 as k → ∞. There is only

one eigenvalue of GT which does not satisfy |λ| < 1, which is λ1 = 1. So the Power Method
will converge to the eigenvector corresponding to this eigenvalue, which is the PageRank
vector.
Theorem 28. Suppose that the starting vector satisfies eT x(0) = 1. The error of the Power
Method satisfies kx(k) − πk = O(αk ).
Proof: Once again, assume that x(0) = ni=1 ci vi for some constants ci . Let M = max{kci vi k :
P
2 ≤ i ≤ n}. By Theorem 25, λ1 = 1 and |λi | ≤ α for all 2 ≤ i ≤ n. We find that
n
X Xn n
X n
X
kx(k) − πk = kπ + ci λki vi − πk = k k
ci λi vi k ≤ k
kci λi vi k = |λi |k kci vi k
i=2 i=2 i=2 i=2
n
X
k k k
≤ α M = (n − 1)M α = O(α )
i=2

For the Power Method, a starting vector x(0) is needed. Clearly, if x(0) is close to the
PageRank vector π, the error will be smaller on iteration one. Thus, the Power Method
will converge faster. It makes sense to use x(0) = v as the starting vector, since the random
surfer will prioritize some web pages more than others according to the personalisation vector.
Another option is to use x(0) = e/n.

4.2 Implementation of the Power Method

The Power Method is a very simple numerical method to calculate the PageRank vector.
This method only requires the repetitive calculation of matrix-vector products. However,
this needs to be done with the matrix GT , which is nowhere sparse as we have stated before.
To prevent storage issues and slow computation time, it is preferred not to use the matrix
G but the very sparse matrix H. G is by definition equal to:

G = αS + (1 − α)evT = α(H + dvT ) + (1 − α)evT


= αH + (αd + (1 − α)e)vT

21
Now, it follows that
GT x = (αH + (αd + (1 − α)e)vT )T x = αH T x + v(αdT + (1 − α)eT )x
= αH T x + αv(dT x) + (1 − α)v(eT x)
= αH T x + (1 − α + α(dT x))v
Here we used the fact that eT x = 1. This follows from the fact that the starting vector, and
therefore also the next vectors, satisfy this equality. This gives us an easy and fast method
to approximate the PageRank vector by using just the very sparse hyperlink matrix H and
the dangling node vector d.

4.3 A direct method

In the proof of Lemma 23, we have seen that the PageRank vector can also be calculated in
a different way. We will further discuss this method now.
Theorem 29. The PageRank vector π is equal to
π = (1 − α)(I − αS T )−1 v

Proof: Since eT π = 1 holds by definition of the PageRank vector, this theorem is a direct
consequence of Lemma 23.

The matrix I − αS T is an important matrix with a lot of useful properties. The following
properties are worth noting.
• All eigenvalues of I − αS T lie in the disk with center 1 and radius α. This follows
directly from Gershgorin’s Theorem.
• I − αS T is invertible, as stated in Lemma 22.
• I − αS T is strictly diagonally dominant, or in other words, I − αS T is an M -matrix.
For each row i the diagonal element is larger than the sum of the absolute value of the
non-diagonal elements. We have seen this in the proof of Lemma 22.
• The row sums of I − αS T are exactly 1 − α.
Furthermore, we can simplify Theorem 29 even more:
Theorem 30. Define x = (I − αH T )−1 v. The PageRank vector π is equal to
x
π= T
e x

Proof: First, note that I−αH T is indeed invertible (we can apply the same proof as in Lemma
22 for the substochastic matrix H). Theorem 29 yields the equality (I − αS T )π = (1 − α)v.
Note that the sum of row i of the matrix H is equal to zero if i is a dangling node, and 1
otherwise. Hence He = e − d. We find
GT x = αH T x + αvdT x + (1 − α)veT x
= αH T x + αv(eT − eT H T )x + (1 − α)veT x
= αH T x − αveT H T x + veT x = αH T x + veT (I − αH T )x
= αH T x + veT v = αH T x + v = αH T x + (I − αH T )x = x
So GT eTxx = eT1 x GT x = eTxx . But of course we also have eT eTxx = 1. Since the PageRank
vector is unique, it must be equal to π = eTxx .

22
The matrix I − αH T satisfies many of the same properties as I − αS T :
1. All eigenvalues of I − αH T lie in the disk with center 1 and radius α.
2. I − αH T is invertible.
3. I − αH T is an M -matrix.
4. The row sums of I − αH T are either 1 or 1 − α.

4.4 The Jacobi method

The last two theorems gives us new ways to calculate the PageRank vector. We will use
both theorems and the Jacobi method to approximate the PageRank. This method makes
use of the fact that

X
(I − αS T )−1 = (αS T )n
n=0

This sum converges because ρ(αS T ) = αρ(S T ) ≤ α · 1 < 1. We approximate this series by
a partial sum. This requires the computation of multiple matrix-matrix products, which is
not preferred. The Jacobi Method is an efficient way of computing this partial sum.
Algorithm 31. The Jacobi method (applied to the matrix S) is:
k←0
x(0) ← (1 − α)v
while convergence not reached do
x(k+1) ← αS T x(k) + (1 − α)v
k ←k+1
end while
π ← x(k)

Theorem 32. The Jacobi method converges to the PageRank vector π. Furthermore, the
error after k iterations is of order O(αk ).
Proof: By induction, it is easy to seePthat x(k) = (1 − α) kn=0 (αS T )n v. This is clearly true
P
for k = 0: x(0) = (1 − α)v = (1 − α) 0n=0 (αS T )n v. Assume that the equality holds for some
k ∈ N. Then we also have x(k+1) = αS T x(k) + (1 − α)v = αS T (1 − α) kn=0 (αS T )n v + (1 −
P

α)v = (1 − α) n=1 (αS T )n v + (1 − α)v = (1 − α) k+1


Pk+1 P T n
Pk n=0 (αS ) v. Thus, theP∞
equality is true
for all k ∈ N. As k tends to infinity, (1 − α) n=0 (αS ) v tends to (1 − α) n=0 (αS T )n v =
T n

(1 − α)(I − αS T )−1 v. By Theorem 29, this value is equal to π.


Note that after k iterations, the error of x is equal to

kπ − x(k) k = k(1 − α)(I − αS T )−1 v − (1 − α) kn=0 (αS T )n vk


P

(1 − α)k(I − αS T )−1 v − kn=0 (αS T )n vk


P
=
(1 − α)kP∞
P T n
Pk T n
= n=0 (αS ) v − n=0 (αS )Pvk
= (1 − α)kP n=k+1 (αS T )n vk ≤ (1 − α) ∞

P∞ k(αS
n=k+1
T )n vk
∞ n T n n
≤ (1 − α) n=k+1 α kS k kvk ≤ (1 − α) n=k α kvk
k
= (1 − α) kvkα k k
1−α = kvkα = O(α )

as k tends to infinity. Additionally, the largest eigenvalue of αH T is important for the speed
of this method.

23
Note that we can also apply the Jacobi method on the matrix αH T to approximate x =
(I − αH T )−1 v. This is sufficient, since by Theorem 30, the PageRank vector is equal to
π = x/(eT x). Hence the following algorithm can be used as well:
Algorithm 33. The Jacobi method (applied to the matrix H) is:
k←0
x(0) ← v
while convergence not reached do
x(k+1) ← αH T x(k) + v
k ←k+1
end while
π ← x(k) /(eT x(k) )

By the same proof as in Theorem 30, this algorithm also converges to π and the error after
k iteration is of order O(αk ). The advantage of this method is that H is much sparser than
S, so the computation of the matrix-vector product H T x(k) will be faster.

4.5 Optimized Jacobi

It is worth checking if we can optimize these methods. We can try the following shift-and-
scale method. We have seen that if we want to calculate π, it sufficient to solve the linear
system (I − αH T )x = v. Multiply this equation by a certain constant β and then shift it
back and forth. We get

(I − αH T )x = v
=⇒ β(I − αH T )x = βv
=⇒ I − (I − β(I − αH T ))x = βv

Define Qβ := (I − β(I − αH T )). x now satisfies the equality (I − Qβ )x = βv. Thus, we could
also apply the Jacobi algorithm to this matrix Qβ for a certain value of β. It might happen
that the largest eigenvalue of Qβ is smaller than the largest eigenvalue of αH T (in absolute
value), which will imply that the Jacobi method will converge faster. Therefore, our goal is
to find the value of β such that ρ(Qβ ) is minimal.
Suppose λ1 , λn , . . . , λn are the eigenvalues of αH T with corresponding eigenvectors p1 , . . . , pn .
Then for each 1 ≤ i ≤ n, we have

Qβ pi = (I − β(I − αH T ))pi = pi − β(I − αH T )pi = pi − βpi + β(αH T )pi


= pi − βpi + β · λi pi
= (1 − β + βλi )pi

Thus, the eigenvalues of Qβ are 1 − β + βλ1 , 1 − β + βλ2 , . . . , 1 − β + βλn . So the spectrum


of αH T gets multiplied by a factor β and then gets shifted by 1 − β.

24
Figure 4: Eigenvalues of the matrix αH T for α = 0.85.

See figure 4 for an example how the spectrum of αH T can look like (here we have used an
induced sub graph of the mathworks.com web) for α = 0.85. Since H is sub-stochastic, all
eigenvalues lie in the red disk of center 0 and radius 0.85 by Gershgorin’s Theorem. In this
example, there is an eigenvalue λ1 exactly equal to 0.85 but there are no eigenvalues with
real value less than the eigenvalue λ2 ≈ −0.3234. Thus, we can make the largest eigenvalue
smaller by shifting all eigenvalues to the left. It is easy to see that it is optimal to shift
these eigenvalues to the left until the right-most and left-most eigenvalues λ01 and λ02 have
the same distance r from zero, i.e.

λ01 = 1 − β + β · −0.3234 = −r, and


λ02 = 1 − β + β · 0.85 = r.

By solving this system, we find that the optimal β is equal to β ≈ 1.357. For this value, the
spectral radius is r ≈ 0.79. This coincides with figure 5, which shows that the eigenvalues
of Qβ for β = 1.357 lie in the disk with center 0 and radius 0.79.
So it is better to apply the Jacobi Method on Qβ for β = 1.357. The error after k iterations
is of order O(0.79k ) for this method, while the error of the regular Jacobi Method is of order
O(0.85k ). It might not seem like a huge difference, but after 50 iterations, the error of the
50
Optimized Jacobi Method is rougly 0.85 0.79 ≈ 40 times as small. Thus, much less iterations
will be needed and so we will be able to approximate the PageRank vector much faster.

25
Figure 5: Eigenvalues of the matrix Qβ for α = 0.85 and β = 1.357.

The Optimized Jacobi Method has two disadvantages. The first is that it is very hard to
find the optimal value of β. In our example, we simply calculated each eigenvalue, which
makes it very easy to find the value β such that the Jacobi Method is optimized. However,
for very large matrices, it is not practical to calculate the eigenvalues. We do know, however,
an upper bound to these eigenvalues. By Gershgorin’s Theorem, all eigenvalues of Qβ lie
in the disk with center 1 − β and radius αβ. Hence any eigenvalue λ of Qβ satisfies |λ| ≤
max{(1 − β) + αβ, (1 − β) − αβ}. One can easily check that this upper bound is minimal if
and only if β = 1. Therefore, Qβ = αH T , which means that we cannot generally optimize
the Jacobi method.
The other problem is the fact that the eigenvalues of αH T usually fill the whole Gershgorin
disk of radius α. Therefore, a shift-and-scale approach will not be able to decrease the
spectral radius of the matrix, which means that it is often not even possible to optimize the
Jacobi Method in this way.

26
5 Numerical Experiments

In the previous chapter we have discussed multiple algorithms to approximate the PageR-
ank vector π, namely the Power Method and the Jacobi method (applied to two different
matrices). The error of both methods is of order O(αk ), where k is the number of iterations.
Furthermore, in some cases the Jacobi Method can be further optimised. In this chapter
we will apply these algorithms to see whether there are differences with respect to the com-
putation time. We will compare both the computation time and the amount of iterations
needed.

5.1 Data

Before we can test the algorithms, we need data to calculate the link matrix H. Cleary,
we do not know the complete graph of the world wide web, as the web consists of multiple
billion of pages and links. Thus, we will construct smaller link matrices to be able to test
the algorithms. We have done this with a couple of different methods. Each option yields
a completely different matrix (in size or structure). All algorithms will be applied on each
matrix to see what method benefits most of a certain property of the web.
• The first method of obtaining a link matrix is by getting a small subset of the web.
Just like Google, we use a surfer to obtain this graph. Starting in any webpage, the
surfer analyses the source code of this page and determines where this page links to.
We add these links to the link matrix. Then, the surfer opens one of the new pages.
This process is repeated until n pages have been found and all pages have been checked
for links. This will construct a very realistic web. We do have an option to filter certain
pages, such as images or JavaScript files. These are files which a user generally does
not want to see as search results. It is, however, also interesting to allow these pages.
This will yield a matrix with a large amount of dangling nodes, and we will discuss
why this matters for the algorithms.
The disadvantage of using a surfer is that it is very slow. It generally takes more than
a day to obtain a link matrix with more than 10 thousand nodes. See A.1.1 for the
Matlab code of the surfer we have made and used.
• Luckily for us, many other people have made their own surfer to obtain even larger
link matrices. This matrices are posted online, so that anyone could use them for
testing purposes. One of these matrices forms a collection of websites that are about
California. It consists of roughly 10 thousand pages, and about 50 percent of these are
dangling nodes. We thank Jon Kleinberg for this dataset [13].
We have also used two link matrices which are far larger in size. The first one is a
collection of all (almost 300 thousand) web pages in the Stanford University website.
The second one contains all web pages in the whole Stanford-Berkerley web and has
almost 700 thousand pages and more than 7 million links. We thank Sep Kamvar for
these datasets [14].
It is worth noting that both of these larger matrices only have a relative small number
of dangling nodes. This will have consequences for the iteration time of each numerical
method.
• The last and most obvious method is by simply generating a link matrix randomly.

27
Figure 6: A plot of the Stanford-Berkeley Web link matrix. A pixel represents a link.

This allows us to construct a large variety of link matrices. For example, one could
easily construct a link matrix with more than a million nodes. It is also possible to
choose the average amount of dangling nodes on the web or the average amount of
links.
However, a randomly generated web might not be very realistic; there does not need
to be any structure and the link matrix can even be aperiodic. A consequence of this
is that all eigenvalues of H appeared to be much smaller (in absolute value) compared
with those of a ‘realistic’ link matrix. Since ρ(H) becomes smaller, the numericals
methods will converge a lot faster. Thus, since such results are not realistic, we will
not use these matrices.
In summary, we have used the following link matrices to test the algorithms. See also the
Matlab files A.1.1−A.1.4. As one can see, there is a large variety of amount of nodes,
dangling nodes, and links.
Name Pages Links Dangling nodes Description
A 5000 1430936 244 mathworks.com (from our own crawler)
B 9664 16150 5027 Sites about California
C 281903 2312497 172 Stanford University Web
D 683446 7583376 4735 Stanford-Berkeley Web

5.2 Convergence criteria

The numerical methods we discussed are iterative. The iterations need to stop when a
satisfactory approximation of the PageRank vector has been found. But when is this the
case? Let k·k be a norm. We say that an approximation x of π is good enough if the relative
residual if small enough, i.e.
kGT x − xk

kxk

28
for some small number ε > 0. We picked ε = 10−5 and let k·k be the one-norm. The reason
for this choice is that the order of the PageRank values is important and not the exact
PageRank value. If the sup norm is used, for example, the ordered PageRank vector might
be completely different.
• Let x(i) be the approximation of π by using the Power Method for i iterations. The
Power Method is repeated until kx(i+1) − x(i) k/kxk < ε, i.e. eT |x(i+1) − x(i) | < 10−5 ,
since eT x(j) = 1 for all j.
• The Jacobi method (applied to the matrix S) after i iterations yields an approximation
x(i) such that (I −αS T )x(i) ≈ (1−α)v. The iteration will be repeated until the residual
k(I − αS T )x(i) − (1 − α)vk < ε, i.e. eT |(I − αS T )x(i) − (1 − α)v| < 10−5 .
To prevent unnecessary computations, we make use of the fact that αS T x(i) = α(H T x(i) +
v(dT x(i) )). This vector can then be reused to calculate the residual. See Matlab code
A.2.2 for the implementation.
• For the Jacobi method applied to H, one needs to normalise x(i) to get an approx-
imation of π. Thus, the iteration is repeated until kαH T x(i) − vk/(eT x(i) ) < ε, i.e.
eT |αH T x(i) − v| < 10−5 eT x(i) .

5.3 Numerical Results

The Power Method and both Jacobi methods (applied to different matrices) will now be
applied to these four matrices. We have seen that the value of α is important for the speed
of each method. Furthermore, we will use two personalisation vectors to see if there is any
difference; the uniform vector w1 = e/n and a random probability vector w2 .

α v Power Method Jacobi (S) Jacobi (H) Optimized Jacobi


0.50 w1 0.0685s 0.0739s 0.0687s 0.0592s
(13 iterations) (16 iterations) (15 iterations) (13 iterations)
0.50 w2 0.0662s 0.0724s 0.0683s 0.0583s
(13 iterations) (16 iterations) (15 iterations) (13 iterations)
0.85 w1 0.1365s 0.1656s 0.1532s 0.1298s
(45 iterations) (60 iterations) (55 iterations) (48 iterations)
0.85 w2 0.1341s 0.1678s 0.1524s 0.1303s
(44 iterations) (60 iterations) (55 iterations) (48 iterations)
0.95 w1 0.2704s 0.3856s 0.3418s 0.2977s
(110 iterations) (167 iterations) (152 iterations) (134 iterations)
0.95 w2 0.2682s 0.3862s 0.3443s 0.2984s
(110 iterations) (167 iterations) (152 iterations) (134 iterations)

Table 1: The numerical methods applied to the 5000 × 5000 (Mathworks) link matrix A.
Note that the results are the average out of 100 simulations.

We picked β = 1.17 for the Optimized Jacobi Method, as it seemed that this value will require
the least amount of iterations (see also Matlab code A.2.5). As expected, the Optimized
Jacobi Method is always faster than the regular Jacobi Method applied to H, and it is better
to apply the Jacobi Method on H than to apply it on S. The difference in computation time
between the three methods between is relatively small. Interestingly, the Power Method
becomes more favourable if α is close to 1.

29
There does not seem to be any difference in computation time regarding the choice of the
personalisation vector. Thus, for the next experiments, we will only look at the uniform
personalisation vector v = e/n.

α Power Method Jacobi (S) Jacobi (H)


0.50 0.0034s 0.0047s 0.0021s
(12 iterations) (16 iterations) (11 iterations)
0.85 0.0111s 0.0154s 0.0067s
(47 iterations) (60 iterations) (42 iterations)
0.95 0.0330s 0.0432s 0.0201s
(142 iterations) (167 iterations) (128 iterations)

Table 2: The numerical methods applied to the 9664 × 9664 (California) link matrix B.

When applying the numerical methods on the second matrix B, the results are somewhat
different. There was no value of β 6= 1 that decreases the amount of iterations needed before
the Optimized Jacobi Method converges, hence the Jacobi Method could not be made faster.
It is remarkable that for this matrix, the Jacobi Method applied on H is better than the
Power Method. One reason for this is that about 48% of the nodes of this web are dangling
nodes. The Jacobi Method applied on H does not depend on the amount dangling nodes,
hence the difference in iteration time.

α Power Method Jacobi (S) Jacobi (H)


0.50 0.3802s 0.4907s 0.4240s
(14 iterations) (16 iterations) (16 iterations)
0.85 0.9921s 1.3807s 1.1580s
(50 iterations) (60 iterations) (60 iterations)
0.95 2.7790s 3.8045s 3.0155s
(150 iterations) (167 iterations) (166 iterations)

Table 3: The numerical methods applied to the 281903 × 281903 (Stanford University) link
matrix C.

For the large Stanford University matrix, the Power Method is only slightly faster than the
Jacobi Method applied on H. It is interesting to see that even though this web contains
almost 300 thousand nodes, each method will return an approximation of the PageRank
vector in less than 5 seconds.

α Power Method Jacobi (S) Jacobi (H)


0.50 0.5609 0.7747s 0.6011s
(14 iterations) (16 iterations) (16 iterations)
0.85 1.5586s 2.3120s 1.6927s
(52 iterations) (60 iterations) (59 iterations)
0.95 4.2054s 6.0926s 4.4134s
(154 iterations) (167 iterations) (164 iterations)

Table 4: The numerical methods applied to the 683446 × 683446 (Stanford-Berkeley) link
matrix D.

30
For the largest link matrix, the results are the same as before. Note that less than 1% of the
nodes are dangling nodes, and thus the speed of the Power Method is not much affected by
these nodes.
Another way to compare the numerical methods is by first fixing an amount of iterations.
All methods will be iterated this many times, and the residual can be calculated afterwards.
This comparison might be better, since the residual should not be calculated after each
iteration. Figure 7 shows how large the residual of the approximation of π is, using the
Stanford-Berkeley link matrix D. The figure shows that Power Method is only slightly faster
than the Jacobi Method applied to H. However, after a while, the Jacobi Method becomes
faster.

Figure 7: The three numerical methods compared using the link matrix D.

5.4 Discussion

As we have seen in the previous chapter, each numerical method has a convergence rate of
O(αk ). Therefore, it is not too unexpected to see that the speed of the three methods is
about the same. In most cases, the Power Method is the fastest algorithm. However, the
difference between this method and the Jacobi Method applied on H is very small. In some
cases, for example if the amount of dangling nodes in a web is large, the latter will converge
faster.
We were only able to decrease the spectral radius of the mathworks.com link matrix A. Thus,
the Optimized Jacobi Method is of limited use.
The Jacobi method is based on the equality (I − αS T )π = (1 − α)v. As we will see in the
next chapter, using this equality will have an additional advantage.

31
6 Random Teleportation Parameter

The teleportation parameter α is the probability that the random surfer will click on a link.
The random surfer will, with a probability of 1 − α, go to any page on the web. The choice
of this page is random and distributed according to the personalisation vector. We have seen
that we can calculate the PageRank vector much faster if α is small. However, for a small
α, the structure of the web will hardly matter.
Google has stated that they used α = 0.85. But why did they pick this exact value? Except
for the fact that α shouldn’t be close to 0 or 1, there is no mathematical reason behind the
choice α = 0.85. Researchers such as [8] have proposed to use α = 0.5 instead. We will look
at this problem in a different way. The probability that a user enters a different link in the
address bar obviously depends on the user. It therefore makes sense to see α as a distribution
of teleportation parameters. Gleich et al.[9] did research about this parameter. With a large
amount of data, they concluded that this parameter fits a certain Beta distribution, as we
can see in the following figure.

Figure 8: The distribution of the teleportation parameter α (Source: [9]).

6.1 Expected PageRank

Since the PageRank depends on the value of α, we will from now on denote the PageRank
vector corresponding to a certain value of α by π(α). Suppose that we want a single vector
that corresponds to the PageRank vector of everyone. The most straightforward choice is
π(E[α]): the PageRank vector calculated for the mean of all choices of α. However, this
does not yield a satisfactory vector. Instead, we will look at the expected PageRank vector.
Definition 34. Let f : [0, 1] → R+ be the probability density function of the stochastic
parameter α. We define the expected PageRank vector hπi by
Z 1
hπi = π(α)f (α)dα
0

We assume that this density function of α is very small if α is close to 1, i.e. f (α) ≈ 0 for
α ≈ 1. This is because we originally assumed that 0 ≤ α < 1 should hold; if α = 1, there is
no teleportation, hence there is no guarantee that a unique PageRank vector exists.

32
It is easy to see that the expected PageRank is a probability vector as well. Since for each
α we have eT π(α) = 1, it follows that
Z 1 Z 1 Z 1
eT hπi = eT π(α)f (α)dα = eT π(α)f (α)dα 1f (α)dα = 1
0 0 0

Note that by Theorem 30, the expected PageRank vector satisfies


Z 1 ∞
Z 1X
T −1
hπi = (1 − α)(I − αS ) vf (α)dα = (1 − α)(αS T )n vf (α)dα
0 0 n=0

Additionally, since S,v, and f are all non-negative, the integral and summation can be inter-
changed. The fact that this is allowed is a direct consequence of the Monotone Convergence
Theorem, see [12, p.82]. Thus,
∞ Z
X 1
hπi = (1 − α)(αS T )n vf (α)dα
n=0 0

1
Suppose that α is uniformly distributed between 0 and r for some value r < 0, i.e. f (α) = r
for 0 ≤ α ≤ r and 0 otherwise. In this case, it is possible to calculate this integral:
∞ Z 1 ∞ Z r 
n 1
X X
T n
hπi = (1 − α)(αS ) vf (α)dα = (1 − α)α · dα (S T )n v
r
n=0 0 n=0 0
∞ Z r 
1X
= (αn − αn+1 )dα (S T )n v
r
n=0  0
∞ r ∞  r
1 X αn+1 αn+2 T n 1 X rn+1 rn+2
= − (S ) v = − (S T )n v
r n+1 n+2 0 r n+1 n+2 0
n=0 n=0

However, for general density functions f there is no practical way of calculating hπi. Instead,
we will try to approximate it. An obvious way to do so is by calculating the Riemann sum.
Suppose that we have picked some numbers 0 = x0 < x1 < . . . < xk = 1 and meshpoints
αi ∈ [xi−1 , xi ] for all 1 ≤ i ≤ k. The PageRank vector is approximately
k
X
hπi ≈ π(αi )f (αi ) · [xi − xi−1 ]
i=1

This requires the computation of the PageRank vector for k different values of α. In general,
k must be large if we want a good approximation of hπi. We could approximate π(α) for each
value of α with one of the numerical methods we have discussed, but as the approximation
of a single PageRank vector corresponding to a specific value of α already takes very long,
this is not preferred. However, the Power Method has the advantage that it can easily be
optimised so that the computation is faster.

6.2 Optimized Power Method

As we have seen, the convergence speed of the Power Method is of order O(αk ). An advantage
of the Power Method over the Jacobi Method is that one can pick a starting vector for the
Power Method. Clearly, if this starting vector is close to the real PageRank vector, less
iterations will be needed until a satisfactory approximation will be computed.

33
In paragraph 4.1, we have proposed to use the personalisation vector v as the starting vector,
since this might be the best consequent guess we can make. However, if the PageRank
vector needs to be computed for many different values of α, we already have an idea how
the PageRank vector should look like. Thus, we can use the PageRank vector corresponding
to a different value of α as our initial guess. Suppose that π(α) should be computed for
α1 , . . . , αm , where 0 ≤ α1 < . . . < αm < 1. We can apply the following algorithm:
Algorithm 35.
π(α1 ) ← PowerMethod(v)
for k = 2 to m do
π(αk ) = PowerMethod(π(αk−1 ))
end for

Here PowerMethod(w) stands for that the Power Method should be applied with the starting
vector w. Note that π(α1 ) can also be computed using a different numerical method, but
for k = 2, . . . , m the Power Method needs to be applied.
This algorithm is based on the idea that π(α) is approximately equal to π(α0 ) if α is close
to α0 . In general, this is true. However, note that

π = (1 − α)(I − αS T )−1 v

The smallest eigenvalue of (I − αS T ) is exactly 1 − α. If α is close to 1, this eigenvalue


approximates 0. To be more precise, the 1-norm condition number of I − αS T is equal
1+α
to 1−α , see [7]. Note that as α tends to 1 the condition number tends to infinity. The
consequence of this is that a small relative change in (I − αS T ) (i.e. a small change in α) can
produce a large change in (I − αS T )−1 . Therefore, the PageRank vector can differ greatly
for two values α ≈ α0 close to 1. So the starting error will increase as αk increases, and thus
more iterations may be needed until the Optimized Power Method converges.
In the next paragraph we will discuss a much more efficient (reduced-order) method that is
able to approximate the PageRank vector for all values of α.

6.3 Reduced-order modelling

By Theorem 29, the PageRank vector corresponding to a certain value of α is the solution
of the linear system (I − αS T )π(α) = (1 − α)v. Furthermore, we have shown that it is
sufficient to solve the system (I − αH T )x(α) = v. Our goal is to approximate this solution
x(α) (and thus for π(α)) for many different values of α. To do so, we will make use of the
shift-invariance of the so-called Krylov space. The following approach is based on [10].
Definition 36. The Krylov space Km is defined by:

Km (A, w) = span(w, Aw, A2 w, . . . , Am−1 w)

Since the Krylov space Km (A, w) is a vector space, we have Km (A, w) = Km (βA, w) for
any β 6= 0. Because w ∈ Km (A, w), we can also shift this space without changing it. So
Km (A, w) = Km (I − βA, w) for any β 6= 0.

34
We will look at the Krylov space of the matrix H T with respect to the personalisation vector
v. Then we will try to find a vector x in this space such that (I − αH T )x approximates
v. If x is then normalised with respect to its 1-norm, this vector approximates π(α). The
following theorem is important; this guarantees that the algorithm will converge.
Theorem 37. The PageRank vector π satisfies π ∈ Kn (H T , v).
Proof: First, note that Km (H T , v) = Km (I − αH T , v). Let the characteristic polynomial of
I − αH T be equal to p(λ) = c0 + c1 λ + c2 λ2 + . . . + λn . By the Cayley-Hamilton theorem,
I − αH T satisfies

0 = p(I − αH T ) = c0 I + c1 (I − αH T ) + c2 (I − αH T )2 + . . . + (I − αH T )n

Since (I − αH T ) is invertible, 0 is not an eigenvalue. Hence c0 6= 0. By dividing by c0 ,


subtracting I and finally multiplying by (I − αH T )−1 , we get
1
(I − αH T )−1 = − (c1 + c2 (I − αH T ) + . . . + (I − αH T )n−1 )
c0
Hence,
1
x = (I − αH T )−1 v = − (c1 v + c2 (I − αH T )v + . . . + (I − αH T )n−1 v) ∈ Kn (I − αH T , v)
c0
x
Thus, π = eT x
∈ Kn (I − αH T , v) = Kn (H T , v).

The last theorem shows why it makes sense to look for a solution of the equation (I−αH T )x =
v in the space Km (H T , v). To use this space, we will construct an orthonormal basis by
using the Arnoldi algorithm applied to the vectors v, H T v, (H T )2 v, . . . , (H T )n−1 v. This is
essentially a modified version of the Gram Schmidt algorithm.
Algorithm 38. The Arnoldi Algorithm applied to w1 = v/kvk2 is:
for k = 1 to m do
wk+1 ← H T wk
for j = 1 to k do
uj,k ← hwj , wk+1 i
wk+1 ← wk+1 − uj,k wj
end for
uk+1,k ← kwk+1 k2
if uk+1,k = 0 then
Stop;
end if
wk+1 ← wk+1 /uk+1,k
end for

Theorem 39. Assuming the Arnoldi Algorithm does not stop before m iterations, it creates
an orthonormal basis of Km (H T , v).
Proof: By induction, we show that wk = pk−1 (H T )v for some polynomial pk−1 of degree
k − 1. The fact that this is true is trivial for k = 1, since w1 = v/kvk2 . Let the induction
hypothesis hold for some k ∈ N. Note that

35
k k
1 T
X 1 T T
X
wk+1 = (H wk − uj,k wj ) = (H pk−1 (H )v − uj,k pj−1 (H T )v)
uk+1,k uk+1
j=1 j=1

This T
Pk shows that wk+1 = pk (H )v, where the polynomial pk is defined as pk (x) = (xpk−1 (x)−
j=1 uj,k pj−1 (x))/uk+1,k ). Since pk−1 is assumed to have degree k − 1, the degree of pk is
exactly k. Furthermore, the Arnoldi vectors are orthonormal by construction, hence they
form an orthonormal basis of Km (H T , v).

But what happens if the Arnoldi Algorithm breaks down before creating m Arnoldi vectors?
Theorem 40 shows that this should be seen as a good thing.
Theorem 40. Suppose that the Arnoldi Algorithm stops after k < m iterations. Then the
dimension of the Krylov space is maximal, i.e. Kk (H T , v) = Kl (H T , v) for all l ≥ k.
Proof: If the Arnoldi Algorithm breaks down at iteration k, we must have uk+1,k = 0 and
therefore wk+1 = 0. Thus, we also have
k
X
T
H wk = uj,k wj
j=1

Let z ∈ Kk (H T , v) be arbitrary.
Pk By Theorem 39, we can write z as the linear combination
of Arnoldi vectors: z = j=1 cj wj for some coefficients ci . It follows that
k
X k
X k−1
X
HT z = HT cj wj = cj H T wj = cj H T wj + ck H T wk
j=1 j=1 j=1
k−1
X Xk
= cj H T wj + ck uj,k wj
j=1 j=1

which is an element of Kk (H T , v). Thus, Kk+1 (H T , v) ⊆ Kk (H T , v). ‘⊇’ is trivial, hence


Kk (H T , v) = Kk+1 (H T , v). By induction, the theorem now follows.

The consequence of this theorem is that if the algorithm stops at iteration k, we have made a
basis of Kk (H T , v) = Kn (H T , v). By Theorem 37, this space contains the PageRank vector
so there is no reason to expand this space.
Let W be the n × m matrix containing each Arnoldi vector wi on i’th column. Let U be
the m × m matrix with coefficients ui,j . Note that U is an upper Hessenberg matrix (i.e.
Uij 6= 0 implies i ≤ j + 1). Then it is possible[11, p161] to write Arnoldi Algorithm in a
single equation as

H T W = W U + um+1,m wm+1 eTm

Here em is the standard m’th unit m × 1 vector. To be able to approximate the PageRank
vector we will look for an approximation u ∈ Km (H T , v) such that (I − αH T )u ≈ v. Since
u ∈ Km (H T , v), we can write u = a1 w1 + a2 w2 + . . . + am wm . In shorter notation u = W a,
where a = [a1 , a2 , . . . , am ]T . Our goal is to find good coefficients, i.e. coefficient a such that
the residual is very small. By definition, this residual r is equal to

r = v − (I − αH T )u = v − u + αH T u

36
Since v ∈ Km (H T , v) and u ∈ Km (H T , v), it follows that r ∈ Km+1 (H T , v). Hence we
can write r = b1 w1 + . . . + bm+1 wm+1 for some coefficients P
bi . Because the vectors w are
orthonormal, the 2-norm of this residual is simply equal to ( m+1 2 1/2 . Thus, the best
i=1 |bi | )
approximation u can be achieved by minimising this sum. We will take a slightly different
approach. Instead, we can pick the vector a in such a way that bi = 0 for all 1 ≤ i ≤ m.
This can be seen as a projection of x(α) onto Km (H T , v). To do so, note that

r = v − (I − αH T )W a = v − W a + αH T W a
= v − W a + α(W U + um+1,m um+1 eTm )a
= v − W (I − αU )a + αum+1,m wm+1 (eTm a)

Since w1 = v/kvk2 , we have kvk2 W e1 = v. Therefore

r = W [kvk2 e1 − (I − αU )a] + αum+1,m (eTm a)wm+1

Define a = kvk2 (I − αU )−1 e1 . We find that

r = αum+1,m eTm (I − αU )−1 e1 wm+1

So in other words, b1 = b2 = . . . = bm = 0. The residual is only bm+1 , which is equal to


bm+1 = αum+1,m eTm (I − αU )−1 e1 = αum+1,m c(α), where c(α) is the bottom-left element of
the matrix (I − αU )−1 . Thus we are able to approximate x(α) (and therefore π(α) as well)
by simply calculating am . This requires the computation of the inverse of I − αU . The main
advantage of this method is that this matrix is only m × m large, where m is usually much
smaller than n. Hence this is called a reduced-order method. Inverting an m × m matrix is
generally much faster than computing matrix-vector products of size n.
In short, if we want to calculate π(α) for multiple values of α, we can apply the following
algorithm:

Algorithm 41.

1. Pick m large enough.


2. Calculate the orthonormal matrix W and upper Hessenberg matrix U by using the
Arnoldi algorithm applied to H T and v for m iterations.
3. For each value of α, approximate x(α) by

x(α) ≈ W am = W kvk2 (I − αU )−1 e1

4. Finally, normalise this vector to get an approximation of the PageRank vector:


x(α)
π(α) =
eT x(α)

It is important to note that the Arnoldi algorithm is independent on the value of α. Thus,
the Arnoldi algorithm only needs to be applied once. After we have done this, we can
approximate π(α) very efficiently for many different values of α. But how large does m need
to be? We expect that the residual decreases as m increases, as the dimension of the Krylov
space increases and the first m coefficients bi of the residual are made 0. The following
numerical results confirm this hypothesis and illustrate how large m needs to be.

37
Figure 9: The residual of the approximation of π for different values of (α, m). This has
been calculated using the 5000 × 5000 matrix A, see section 5.1. Note that the residual is
approximately equal to the machine error if m is large or α is small. The red line corresponds
with a residual of 10−5 .

Figure 9 shows the relative residual of the approximation of the PageRank vector for many
different values of m and α. As one can see, the residual gets smaller if either α gets smaller
or m gets larger. Suppose that we require the residual to be less than 10−5 . Then π(α) can
be approximated for any combination of (α, m) below the red line such that the residual is
less than 10−5 . Suppose that we want to calculate π(α) for α = 0.01, 0.02, . . . , 0.95. The
figure shows that m = 80 should give sufficient results.
Clearly, it is easy to see how large m needs to be after we have applied the algorithm. In
general, one needs to pick m before applying the algorithm. A good way to do so is by
creating the space Km (H T , v) for a certain initial guess m by using the Arnoldi algorithm.
Then the approximation of the PageRank vector corresponding to the largest value of α
should be calculated by using algorithm 41. If the residual of this approximation is larger
than requested, m is too small. One can then expand Km (H T , v) by simply applying the
Arnoldi algorithm further. We repeat this until the residual is made small enough. In the
next section we will test this algorithm numerically.

6.4 Numerical experiments

We will compare this Reduced-Order Algorithm with the Optimal Power Method. Further-
more, to make the differences in computation time clearer, we will also test the normal Power
Method and the Jacobi Method applied on H.

38
The goal of each algorithm is to approximate the expected PageRank vector hπi. As we have
stated before, we do this by calculating a Riemann sum. This requires mesh points for α.
For each value of α, π(α) is calculated using one of the numerical methods. The speed of
each method depends on the values of the mesh points for α. We will try multiple sets of α
(for example, many different values of α and/or allowing values close to 1). The numerical
methods will be applied to matrix A, B, C and D as can be found in paragraph 5.2.
We have assumed that f (α) is small if α is close to one. Therefore, we can choose not to
calculate π(α) for such values of α without the total error getting too large. Suppose for
example that we have mesh points α = 0.00, 0.01, . . . , 0.90. Table 5 shows the computation
time of the expected PageRank vector using these numerical methods.

Matrix Reduced Order Optimized Power M. Power Method Jacobi (H)


A 0.154s 6.646s 7.250s 7.735s
(m = 40) (1038 iterations) (1511 iterations) (1782 iterations)
B 0.044s 0.207s 0.431s 0.263s
(m = 25) (623 iterations) (1520 iterations) (1339 iterations)
C 3.338s 39.02s 45.88s 52.00s
(m = 40) (1299 iterations) (1660 iterations) (1906 iterations)
D 7.459s 53.47s 63.79s 68.78s
(m = 40) (1307 iterations) (1690 iterations) (1888 iterations)

Table 5: Computation times for the expected PageRank hπi using mesh points α =
0.00, 0.01, . . . , 0.90

It appeared that for m = 40 (or even m = 25 for matrix B) the Krylov space Km (H T , v) is
large enough to approximate π(α) for all mesh points (i.e. for each mesh point, the residual
is less than 10−5 ). The Reduced-Order algorithm clearly is the fastest method, usually being
around 10 times as fast as any other algorithm. However, the real power of the Reduced-
Order algorithm becomes even more noticeable if a more precise approximation of hπi is
requested. In this case, more mesh points are needed.

Matrix Reduced Order Optimized Power M. Power Method Jacobi (H)


A 0.264s 60.46s 71.62s 78.96s
(m = 40) (9768 iterations) (14799 iterations) (17473 iterations)
B 0.181s 1.014s 4.012s 2.579s
(m = 25) (2641 iterations) (14874 iterations) (13064 iterations)
C 12.31s 314.0s 461.8s 505.2s
(m = 40) (8613 iterations) (16234 iterations) (18635 iterations)
D 30.77s 426.2s 625.3s 685.2s
(m = 40) (9003 iterations) (16532 iterations) (18527 iterations)

Table 6: Computation times for the expected PageRank hπi using mesh points α =
0.00, 0.001, . . . , 0.899, 0.90

39
Table 6 shows the computation time of each algorithm when hπi is calculated using a more
precise grid α = 0, 0.001, . . . , 0.90. Clearly, the difference between the numerical methods is
huge; sometimes the Reduced-Order algorithm is more than 200 times faster than any other
algorithm. This is because this algorithm only needs to solve an m × m linear equation for
each value of α. The other algorithms require the computation of multiple matrix-vector
products with n elements for each value of α. Since n is much larger than m, this computation
takes much longer.
The Reduced-Order algorithm has one disadvantage: large values of α. Even though f (α) ≈ 0
if α ≈ 1, the contribution of π(0.99) to the expected PageRank hπi might be significant.
Thus, a larger mesh grid is required. As we have seen in previous paragraph, one needs to
expand the Krylov space to prevent the residual of this approximation to become too large.

Matrix Reduced Order Optimized Power M. Power Method Jacobi (H)


A 0.408s 10.36s 11.25s 12.18s
(m = 100) (2477 iterations) (2891 iterations) (3390 iterations)
B 0.073s 0.234s 0.869s 0.519s
(m = 40) (641 iterations) (3513 iterations) (2788 iterations)
C 34.48s 60.19s 83.06s 81.73s
(m = 180) (2470 iterations) (3736 iterations) (3606 iterations)
D 103.0s 90.33s 125.5s 115.0s
(m = 190) (2538 iterations) (3849 iterations) (3572 iterations)

Table 7: Computation times for the expected PageRank hπi using mesh points α =
0.00, 0.01, . . . , 0.99

Suppose that we use α = 0, 0.01, . . . , 0.99 as our mesh grid. The Reduced Order algorithm
will need a larger Krylov space to keep the residual below 10−5 . For matrix D for example,
this space should be expanded from m = 40 to m = 190. By [11, p165], the Arnoldi algorithm
is computationally very expensive; the total amount of flops is of the order O(nm2 ). Each
new vector is made orthogonal to every other vector. This explains why the Reduced Order
algorithm can become slightly slower than the Optimized Power Method.

Matrix Reduced Order Optimized Power M. Power Method Jacobi (H)


A 0.687s 78.59s 107.7s 119.6s
(m = 100) (14888 iterations) (26658 iterations) (32608 iterations)
B 0.284s 1.140s 8.205s 4.997s
(m = 40) (2821 iterations) (31461 iterations) (26427 iterations)
C 74.99s 400.6s 768.4s 812.3s
(m = 180) (12173 iterations) (33575 iterations) (34727 iterations)
D 213.6s 582.9s 1181s 1185s
(m = 190) (12485 iterations) (34522 iterations) (34447 iterations)

Table 8: Computation times for the expected PageRank hπi using mesh points α =
0.00, 0.001, . . . , 0.989, 0.99

However, if we want a better approximation of hπi, the amount of mesh points should be
increased. As we have seen before, the Reduced Order method becomes by far the fastest
way of calculation π(α) for these mesh points.

40
6.5 Discussion

To approximate the expected PageRank vector hπi, one needs to calculate π(α) for many
different values of α. The Reduced Order algorithm we have discussed is a very efficient way
of doing so. In some of our experiments, this algorithm is as much as 200 times faster than
any other method we have discussed.
The power of the Reduced Order algorithm lies in the fact that the Arnoldi algorithm
should only be applied once. After that, we only need to solve the simple m × m system
(I − αU )x = e1 . We have seen that m can usually be very small compared to n (in our
examples m = 200 was sufficient), thus this system can be solved very quickly.
If π(α) for α close to 1 should be calculated using this Reduced Order algorithm, the Krylov
space should be made larger. We have seen that is computationally very expensive, and
in some cases (when using a small amount of mesh points but allowing values close to 1)
this might not be worth it. Another option we have not discussed is keeping m small but
applying some iterations of the (Optimized) Power Method to the approximations of π(α)
for values of α close to one.

41
7 Conclusion & Discussion

In this paper we have discussed the PageRank model. We have illustrated the idea behind
PageRank with the help of a random surfer. To make sure this vector is uniquely determined,
modifications to this model have been made; artificial transition probabilities for dangling
nodes have been implemented, as well as a probability to teleport to any web page on the
internet.
Originally defined as the unique stationary distribution of a Markov chain, we defined the
PageRank vector in a more rigorous way with the help of some linear algebra; the PageRank
vector is defined as an eigenvector of the Google matrix corresponding to eigenvalue 1. The
problem was to find an efficient way to calculate this eigenvector. We have discussed several
numerical methods for this.
The first algorithm we have discussed is the well-known Power Method. This algorithm is
often applied for PageRank problem. The error of this algorithm is of the order O(αk ), where
α is the teleportation parameter and k stands for the number of iterations. Futhermore, we
have shown that the PageRank vector π is equal to

π = (1 − α)(I − αS T )−1 v

By noting that (I − αS T )−1 = ∞ T n


P
n=0 (αS ) , we proposed to approximate this series (and
thus the PageRank vector) by a partial sum. The Jacobi Method is an efficient way of doing
so. Moreover, we have shown that it is sufficient to solve the system (I − αH T )x = v. The
PageRank vector can be calculated by normalising this solution. Thus it is also possible to
approximate the PageRank vector by applying the Jacobi Method on the matrix H T .
Additionally, we have shown that in some cases the Jacobi Method can be optimized. This
depends on the spectrum of the hyperlink matrix. By a shift-and-scale approach, one can
apply the Jacobi Method on a different matrix with a smaller spectral radius. However, the
algorithm turned out to be of limited use. In most cases the regular Jacobi Method was
already optimal.
In chapter 4 we have compared these algorithms numerically. For this, we have used 4
different hyperlink matrices, varying greatly in size and structure. One of these has been
computed by using a surfer. For larger-scale testing purposes we have also used hyperlink
matrices corresponding to a subset of the web with almost 700 thousand web pages. The
three numerical methods turned out to have roughly the same speed. The Jacobi Method
applied on S is always the slowest and the Power Method and the Jacobi Method applied
on H often have the same speed. In some cases it is better to apply the Jacobi Method, for
example when the amount of dangling nodes is relatively large.
The PageRank model has two important parameters: the teleportation parameter α and
the personalisation vector v. For each person, the personalisation vector can be changed
such that the PageRank vector better suits the interests of a person. The parameter α
corresponds to the probablity that a user clicks a link; this value can also depend on the
user. Hence, in chapter 5 we have made a modification to the PageRank model and assumed
α to be a stochastic variable. Now the PageRank vector is a function of α. The goal was to
approximate the expected PageRank:
Z 1
hπi = π(α)f (α)dα
0

42
Here f corresponds to the probability density function of α. To approximate this vector,
π(α) should be calculated for many different values of α. We have introduced a reduced-order
algorithm that can do that very efficiently. This algorithm makes use of the shift-invariance
of the Krylov space Km (H T , v). An orthonormal basis of this space can be found by applying
the Arnoldi algorithm. We have shown that the residual of the linear system (I −αH T )x = v
lives in the Krylov space Km+1 (H T , v) for any vector x ∈ Km (H T , v), and by smartly picking
this vector x, one can make the residual of the linear system very small. The vector x can
then be used to approximate π(α).
The only disadvantage of the Reduced Order algorithm is that one needs to expand the
Krylov space if π(α) is requested for values α close to 1. Expanding this space is computa-
tionally very expensive.
The power of the Reduced Order algorithm lies in the fact that the basis of Km (H T , v)
should only be made once using the Arnoldi algorithm. Then one can find an approximation
of π(α) for each value of α by simply solving multiple systems with only m variables, where
m is usually much smaller than n. The numerical experiments in paragraph 6.4 showed that
this algorithm is extraordinary effective. Instead of using the (Optimized) Power Method
or Jacobi Method iteratively, one can make the computation of the expected PageRank
vector hπi up to 200 times faster by applying this Reduced Order algorithm.

43
8 References

References

[1] L. Page, S. Brin, R. Motwani and T. Winograd, The PageRank Citation Ranking: Bring-
ing Order to the Web. 1998
[2] A. Langville and C. Meyer, Google’s PageRank and Beyond: The Science of Search
Engine Rankings. Princeton University Press, New Jersey, 2006
[3] C. Moler, The worlds largest matrix computation. Matlab news and notes, 2002
[4] A. Langville and C. Meyer, Deeper Inside PageRank. Internet Mathematics, 2004, 1.3:
335-380.
[5] D. Armstrong, The Perron-Frobenius theorem. http://www.math.miami.edu/
~armstrong/685fa12/sternberg_perron_frobenius.pdf
[6] T. Haveliwala and S. Kamvar, The Second Eigenvalue of the Google Matrix. Stanford
University Technical Report, 2003
[7] S. Kamvar, and T. Haveliwala, The condition number of the PageRank problem. 2003
[8] K. Avrachenkov, N. Litvak, and K.S. Pham, A singular perturbation approach for choos-
ing the PageRank damping factor. Internet Mathematics 5.1-2: 47-69, 2008
[9] D. Gleich, A. Flaxman, P. Constantine and A. Gunawardana, Tracking the random
surfer: empirically measured teleportation parameters in PageRank. Proceedings of the
19th international conference on World wide web. ACM, 2010
[10] N. Budko, and R. Remis, Electromagnetic inversion using a reduced-order three-
dimensional homogeneous model. Inverse problems 20.6 (S17), 2004
[11] Y. Saad, Iterative methods for sparse linear systems. Society for Industrial and Applied
Mathematics, 2003.
[12] A. Zaanen, Continuity, integration and Fourier theory. Vol. 2. Berlin; Springer-Verlag,
1989.
[13] J. Kleinberg, California hyperlink matrix. http://www.cs.cornell.edu/Courses/
cs685/2002fa/
[14] S. Kamvar, Stanford University hyperlink matrices. http://www.kamvar.org/
personalized_search

44
A Appendix

A.1 Constructing the hyperlink/Google matrices

The following Matlab files have been created to be able to construct link matrices. Further-
more, the Google matrix can be computed with these scripts.

A.1.1 surfer.m

function [ H, names ] = s u r f e r ( r o o t , n )
% S t a r t i n g a t t h e u r l r o o t , t h e s u r f e r w i l l v i s i t web p a g e s and
determines
% where i t l i n k s t o . This p r o c e s s i s r e p e a t e d u n t i l n d i f f e r e n t
web p a g e s
% have been found . The u r l ’ s o f t h e web p a g e s and t h e l i n k m a t r i x
H are returned .

names = c e l l ( n , 1 ) ;
names {1} = r o o t ;
m = 1; %c u r r e n t amount o f u r l ’ s
H = l o g i c a l ( sparse ( n , n ) ) ;

banned = { ’ . j p g ’ , ’ . j p e g ’ , ’ . png ’ , ’ . bmp ’ , ’ . t i f ’ , ’ . g i f ’ , ’ . i c o ’ , ’ . css ’


, ’ . js ’ , ...
’ . c g i ’ , ’ . pdf ’ , ’ . doc ’ , ’ . pps ’ , ’ . ppt ’ , ’ . odt ’ , ’ . r a r ’ , ’ . t a r ’ , ’ . dat ’
, ’ . exe ’ , . . .
’ . j a r ’ , ’ . xml ’ , ’ l m s c a d s i ’ , ’ c y b e r n e t ’ , ’ w3 . o r g ’ , ’ g o o g l e ’ , ’ yahoo ’ ,
’ scripts ’ ,...
’ n e t s c a p e ’ , ’ shockwave ’ , ’ webex ’ , ’ f a n s o n l y ’ , ’ d o u b l e c l i c k ’ , ’#’ , ’ ”
’ } ; %banned f i l e s
t r u n c a t e = { ’ ? ’ , ’#’ } ;

for j = 1 : n
t r y page = u r l r e a d ( names{ j } ) ;
%Reading t h e f i l e and d e t e r m i n i n g l i n k s
for f = s t r f i n d ( page , ’ h r e f =”h t t p : ’ )
l i n k = page ( f +6: f+4+min( s t r f i n d ( page ( f +6:end ) , ’ ” ’ ) ) ) ;
%t r u c a t e u r l s t o p r e v e n t d u p l i c a t e s
f o r i = 1 : length ( t r u n c a t e )
pos = min( s t r f i n d ( l i n k , t r u n c a t e { i } ) ) ;
i f ( ˜ isempty ( pos ) )
l i n k = l i n k ( 1 : pos −1) ;
end
end

%Checking i f t h e u r l i s a l r e a d y known
known = f a l s e ;
f o r i = 1 :m
i f ( s t r c m p i ( names{ i } , l i n k ) )

45
%The u r l i s known !
known = t r u e ;
H( j , i ) = 1 ;
end
end

%I f t h e u r l i s not known , a d d i n g t h i s l i n k
i f ( ˜ known && m ˜= n )
%I s t h e u r l a l l o w e d ?
skip = f a l s e ;
f o r s t r = banned
i f ( ˜ isempty ( s t r f i n d ( l i n k , s t r { 1 } ) ) )
skip = true ;
break ;
end
end
i f (˜ skip )
m = m + 1;
names{m} = l i n k ;
H(m, j ) = 1 ;
end
end
end
catch
%We couldn ’ t open t h e u r l . C o n t i n u i n g .
end
end
end

A.1.2 loadCaliforniaMatrix.m

function [ H, names ] = l o a d C a l i f o r n i a M a t r i x
% Computes t h e h y p e r l i n k m a t r i x o f t h e C a l i f o r n i a web ( s e e
% h t t p : / /www. c s . c o r n e l l . edu / Courses / c s6 8 5 /2002 f a / d a t a / g r 0 .
California ) .

f i d = fopen ( ’ g r 0 . C a l i f o r n i a ’ ) ;
t l i n e = fgets ( f i d ) ;

%Adding t h e nodes
names = { } ;
while i s c h a r ( t l i n e ) && t l i n e ( 1 )==’ n ’
pos = s t r f i n d ( t l i n e , ’ ’ ) ;
i = t l i n e ( pos ( 1 ) +1: pos ( 2 ) −1) ;
j = t l i n e ( pos ( 2 ) +1:end ) ;
names = [ names ; c e l l s t r ( j ) ] ;
t l i n e = fgets ( f i d ) ;
end

%Adding t h e a r c s

46
i = []; j = [];
while i s c h a r ( t l i n e ) && t l i n e ( 1 )==’ e ’
pos = s t r f i n d ( t l i n e , ’ ’ ) ;
i = [ i ; str2num ( t l i n e ( pos ( 1 ) +1: pos ( 2 ) −1) ) ] ;
j = [ j ; str2num ( t l i n e ( pos ( 2 ) +1:end ) ) ] ;
t l i n e = fgets ( f i d ) ;
end

i = i +1;
j = j +1;
n = length ( names ) ;
fclose ( f i d ) ;

%C r e a t i n g t h e ( s p a r s e ) h y p e r l i n k m a t r i x
H = sparse ( i , j , 1 , n , n ) ;
H = spdiags ( 1 . /max( 1 ,sum(H, 2 ) ) , 0 , n , n ) ∗ H;
end

A.1.3 loadStanfordMatrix.m

function H = l o a d S t a n f o r d M a t r i x
% Computes t h e h y p e r l i n k m a t r i x o f t h e S t a n f o r d U n i v e r s i t y web (
see
% h t t p : / /www. kamvar . o r g / a s s e t s / d a t a / s t a n f o r d −web . t a r . g z ) .

load s t a n f o r d −web . dat ;


H = spconvert ( s t a n f o r d w e b ) ;
end

A.1.4 loadSBMatrix.m

function [ H, r o o t u r l s ] = loadSBMatrix2
% Computes t h e h y p e r l i n k m a t r i x o f t h e S t a n f o r d −B e r k e l e y Web ( s e e
% h t t p : / /www. kamvar . o r g / a s s e t s / d a t a / s t a n f o r d −b e r k e l e y −web . t a r . g z ) .

n = 683446;
load s t a n f o r d −b e r k e l e y −bool −s o r t e d . dat ;
H = spconvert ( s t a n f o r d b e r k e l e y b o o l s o r t e d ) ;
% make t h e m a t r i x s q u a r e
H( n , n ) =0;
H = H( 1 : n , 1 : n ) ;
% n o r m a l i z e t h e rows t o sum t o 1
H = spdiags ( 1 . /max( 1 ,sum(H, 2 ) ) , 0 , n , n ) ∗ H;
load s t a n f o r d −b e r k e l e y −s o r t e d −roots . dat ;
indices = stanford berkeley sorted roots ;
i n d i c e s = i n d i c e s ( find ( i n d i c e s <n ) ) ;
r o o t u r l s = t e x t r e a d ( ’ r o o t u r l s . t x t ’ , ’%s ’ ) ;
r o o t u r l s = r o o t u r l s ( 1 :max( s i z e ( i n d i c e s ) ) ) ;
end

47
A.1.5 DanglingNodeVector.m

function d = DanglingNodeVector (H)


% Computes t h e D a n g l i n g Node v e c t o r o f t h e c o r r e s p o n d i n g l i n k
m a t r i x H.

d = (sum(H, 2 ) == 0 ) ;
end

A.1.6 GoogleMatrix.m

function G = GoogleMatrix (H, alpha , v )


% Computes t h e Google Matrix c o r r e s p o n d i n g t o H, a l p h a and v
% Note t h a t G i s not a s p a r s e m a t r i x ; hence t h i s method cannot be
used f o r
% large l i n k matrices .

n = length (H) ;
i f ( nargin <3)
v = o n e s ( n , 1 ) /n ;
end
i f ( nargin <2)
alpha = 0 . 8 5 ;
end
d = DanglingNodeVector (H) ;
G = a l p h a ∗ (H + d∗v ’ ) + (1− a l p h a ) ∗ o n e s ( n , 1 ) ∗v ’ ;
end

A.2 Numerical Algorithms

The next Matlab files are the numerical methods we have discussed in chapter 4.

A.2.1 PowerMethod.m

function [ pi , i t e r ] = PowerMethod (H, alpha , v , s t a r t v e c t o r , error )


% Approximates t h e PageRank v e c t o r such t h a t t h e r e s i d u a l
% i s l e s s than e r r o r . This u s e s t h e Power Method w i t h
s t a r t v e c t o r as
% i n i t i a l guess

n = length (H) ;
e = ones (n , 1 ) ;
d = DanglingNodeVector (H) ;
norm = @( x ) sum( abs ( x ) ) ; %1−norm
i f ( nargin < 5 )
error = 1 e −5;
end
i f ( nargin < 3 )
v = e / n;

48
end
i f ( nargin < 4 )
startvector = v;
end
i f ( nargin < 2 )
alpha = 0 . 8 5 ;
end

K = a l p h a ∗H ’ ;
pi = s t a r t v e c t o r ; %i n i t i a l g u e s s
max = 3 0 0 ; %maximum amount o f i t e r a t i o n s
for i t e r = 2 :max
p i p r e v i o u s = pi ;
pi = K∗ pi + (1− a l p h a+a l p h a ∗sum( d . ∗ pi ) ) ∗v ;
r e s = norm( pi−p i p r e v i o u s ) ;
i f ( r e s < error )
break ;
end
end
end

A.2.2 JacobiMethodS.m

function [ pi , i t e r ] = JacobiMethodS (H, alpha , v , error )


% Approximates t h e PageRank v e c t o r such t h a t t h e r e s i d u a l
% i s l e s s than e r r o r . This u s e s t h e J a c o b i method a p p l i e d
% to S

n = length (H) ;
e = ones (n , 1 ) ;
d = DanglingNodeVector (H) ;
norm = @( x ) sum( abs ( x ) ) ; %1−norm
i f ( nargin < 4 )
error = 1 e −5;
end
i f ( nargin < 3 )
v = e / n;
end
i f ( nargin < 2 )
alpha = 0 . 8 5 ;
end

K = a l p h a ∗H ’ ;
pi = (1− a l p h a ) ∗v ;
max = 3 0 0 ; %maximum amount o f i t e r a t i o n s
a l p h a s p i = K∗ pi + sum( d . ∗ pi ) ∗ a l p h a ∗v ;
for i t e r = 2 :max
pi = a l p h a s p i + (1− a l p h a ) ∗v ;
a l p h a s p i = K∗ pi + sum( d . ∗ pi ) ∗ a l p h a ∗v ;
r e s = norm( pi−a l p h a s p i − (1− a l p h a ) ∗v ) ;

49
i f ( r e s < error )
break ;
end
end
end

A.2.3 JacobiMethodH.m

function [ pi , i t e r ] = JacobiMethodH (H, alpha , v , error )


% Approximates t h e PageRank v e c t o r such t h a t t h e r e s i d u a l
% i s l e s s than e r r o r . This u s e s t h e J a c o b i method a p p l i e d
% to H

n = length (H) ;
e = ones (n , 1 ) ;
norm = @( x ) sum( abs ( x ) ) ; %1−norm
i f ( nargin < 4 )
error = 1 e −5;
end
i f ( nargin < 3 )
v = e / n;
end
i f ( nargin < 2 )
alpha = 0 . 8 5 ;
end

K = a l p h a ∗H ’ ;
pi = v ;
max = 3 0 0 ; %maximum amount o f i t e r a t i o n s
a l p h a h p i = K∗ pi ;
for i t e r = 2 :max
pi = a l p h a h p i + v ;
a l p h a h p i = K∗ pi ;
r e s = norm( pi−a l p h a h p i − v ) ;
i f ( r e s < error ∗sum( pi ) )
break ;
end
end
pi = pi / sum( pi ) ;
end

A.2.4 OptimizedJacobiMethodH.m

function [ pi , i t e r ] = OptimizedJacobiMethodH (H, beta , alpha , v , error )


% Approximates t h e PageRank v e c t o r such t h a t t h e r e s i d u a l
% i s l e s s than e r r o r . This u s e s t h e Optimized J a c o b i method
applied
% t o H, w i t h ( r e q u i r e d ) s h i f t i n g parameter b e t a

n = length (H) ;

50
e = ones (n , 1 ) ;
norm = @( x ) sum( abs ( x ) ) ; %1−norm
i f ( nargin < 5 )
error = 1 e −5;
end
i f ( nargin < 4 )
v = e / n;
end
i f ( nargin < 3 )
alpha = 0 . 8 5 ;
end

K = H’ ;
max = 3 0 0 ; %maximum amount o f i t e r a t i o n s
pi = beta∗(1− a l p h a ) ∗v ;
Qpi = (1−beta ) ∗ pi+(beta∗ a l p h a ) ∗ (K∗ pi ) ;
for i t e r = 2 :max
pi = Qpi + v ;
Qpi = (1−beta ) ∗ pi+(beta∗ a l p h a ) ∗ (K∗ pi ) ;
r e s = norm( pi−Qpi − v ) ;
i f ( r e s < error ∗sum( pi ) )
break ;
end
end
pi = pi /sum( pi ) ;
end

A.2.5 OptimalBeta.m

function o p t b e t a = BestBeta (H, b e t a s , alpha , v , error )


% For each v a l u e o f b e t a , t h e PageRank i s computed u s i n g t h e
Optimized
% J a c o b i Method . Returns t h e v a l u e o f b e t a such t h a t t h e t o t a l
amount
% i t e r a t i o n s needed i s minimal .

n = length (H) ;
i f ( nargin < 5 )
error = 1 e −5;
end
i f ( nargin < 4 )
v = ones (n , 1 ) / n ;
end
i f ( nargin < 3 )
alpha = 0 . 8 5 ;
end

opt = Inf ;
o p t b e t a = NaN;
for beta = b e t a s

51
[ ˜ , i t e r ] = OptimizedJacobiMethodH (H, beta , alpha , v , error ) ;
i f ( i t e r <opt )
opt = i t e r ;
o p t b e t a = beta ;
end
end
end

A.2.6 Compare.m

function Compare (H, alpha , v , beta )


% Compares t h e n u m e r i c a l methods d i s c u s s e d i n c h a p t e r 3 . I f no
value of
% b e t a i s g i v e n ( or b e t a =1) , t h e Optimized J a c o b i Method w i l l not
be used .

n = length (H) ;
i f ( nargin < 4 )
beta = 1 ; %t h e Optimized J a c o b i Method w i l l not be used
end
i f ( nargin < 3 )
v = o n e s ( n , 1 ) /n ;
end
i f ( nargin < 2 )
alpha = 0 . 8 5 ;
end

%Power Method :
t i c ; [ ˜ , i t e r ] = PowerMethod (H, alpha , v ) ;
disp ( [ ’ Power Method : ’ , num2str ( toc ) , ’ s ( ’ , num2str ( i t e r ) , ’
iterations ) ’ ]) ;
%J a c o b i S
t i c ; [ ˜ , i t e r ] = JacobiMethodS (H, alpha , v ) ;
disp ( [ ’ J a c o b i Method ( S ) : ’ , num2str ( toc ) , ’ s ( ’ , num2str ( i t e r ) , ’
iterations ) ’ ]) ;
%J a c o b i H
t i c ; [ ˜ , i t e r ] = JacobiMethodH (H, alpha , v ) ;
disp ( [ ’ J a c o b i Method (H) : ’ , num2str ( toc ) , ’ s ( ’ , num2str ( i t e r ) , ’
iterations ) ’ ]) ;
i f ( beta˜=1)
%Optimized J a c o b i H
t i c ; [ ˜ , i t e r ] = OptimizedJacobiMethodH (H, beta , alpha , v ) ;
disp ( [ ’ Optimized J a c o b i Method ( b e t a= ’ , num2str ( beta ) , ’ ) : ’ ,
num2str ( toc ) , ’ s ( ’ , num2str ( i t e r ) , ’ i t e r a t i o n s ) ’ ] ) ;
end
end

52
A.3 Numerical Methods for the expected PageRank vector

The next Matlab files have been made and used to approximate the expected PageRank
vector.

A.3.1 OptimizedPowerMethod.m

function [ pi , i t e r ] = OptimizedPowerMethod (H, a l p h a s , v )


% Approxomates p i ( a l p h a s ) f o r t h e g i v e n v a l u e s o f a l p h a by u s i n g
the
% Optimized Power Method . For t h e b e s t r e s u l t s , we assume t h a t
alphas is
% s t r i c t l y increasing .

n = length (H) ;
i f ( nargin <3)
v = o n e s ( n , 1 ) /n ;
end
m = length ( a l p h a s ) ;
iter = 0;

pi = zeros ( n ,m) ;
pi ( : , 1 ) = PowerMethod (H, a l p h a s ( 1 ) , v ) ;
for i = 2 : length ( a l p h a s )
[ pi ( : , i ) , k ] = PowerMethod (H, a l p h a s ( i ) , v , pi ( : , i −1) ) ;
iter = iter + k;
end
end

A.3.2 Arnoldi.m

function [W,U] = A r n o l d i (A,m, v )


% A p p l i e s t h e A r n o l d i A l g o r i t h m on t h e m a t r i x A. W i s t h e
constructed
% o r t h o g o n a l matrix , U t h e c o r r e s p o n d i n g upper H e s s e n b e r g m a t r i x .

n = length (A) ;
i f ( nargin <3)
v = o n e s ( n , 1 ) /n ;
end
W = zeros ( n ,m) ;
norm = @( x ) sqrt (sum( abs ( x . ˆ 2 ) ) ) ;
W( : , 1 ) = v / norm( v ) ;
U = zeros (m,m) ;

for k = 1 :m
z = A∗W( : , k ) ;
for j = 1 : k
U( j , k ) = sum(W( : , j ) . ∗ z ) ;

53
z = z − U( j , k ) ∗W( : , j ) ;
end
i f ( j==m)
break ;
end
U( k+1,k ) = norm( z ) ;
i f (U( k+1,k )==0)
%The A r n o l d i Method b r o k e down !
disp ( ’ The a r n o l d i Method broke down ! ’ ) ;
m = k;
break ;
end
W( : , k+1) = z / U( k+1,k ) ;
end
%R e s c a l i n g W and U i n c a s e t h e A r n o l d i A l g o r i t h m b r o k e down .
W = W( : , 1 :m) ;
U = U( 1 :m, 1 :m) ;
end

A.3.3 ReducedOrderMethod.m

function pi = ReducedOrderMethod (H, a l p h a s ,m, v )


% Approximates p i ( a l p h a ) f o r t h e g i v e n v a l u e s o f a l p h a by u s i n g
the
% Optimized Power Method . For t h e b e s t r e s u l t s , we assume t h a t
alphas is
% s t r i c t l y increasing .

n = length (H) ;
i f ( nargin <4)
v = o n e s ( n , 1 ) /n ;
end
e1 = [ 1 ; zeros (m−1 ,1) ] ;
pi = zeros ( n , length ( a l p h a s ) ) ;

%F i r s t , we c o n s t r u c t an o r t h o n o r m a l b a s i s u s i n g t h e A r n o l d i
Algorithm .
[W,U] = A r n o l d i (H’ ,m, v ) ;

%Now, we c a l c u l a t e t h e a p p r o x i m a t i o n o f p i ( a l p h a s ) i n t h e K ry l ov
space for
%each v a l u e o f a l p h a .
for i = 1 : length ( a l p h a s )
x = W∗ ( ( eye (m)−a l p h a s ( i ) ∗U) \ e1 ) ;
pi ( : , i ) = x/sum( x ) ;
end
end

A.3.4 Compare2.m

54
function Compare2 (H, a l p h a s ,m, v )
% Approximates p i ( a l p h a ) f o r t h e g i v e n v a l u e s o f a l p h a by u s i n g
t h e Optimized Power Method , t h e J a c o b i Method a p p l i e d on H and
S , and f i n a l l y by u s i n g t h e Reduced Order A l g o r i t h m . m s t a n d s
f o r t h e s i z e o f t h e K ry l ov s p a c e . Note t h a t f o r s t o r a g e i s s u e s ,
no PageRank v e c t o r w i l l be s t o r e d .

n = length (H) ;
i f ( nargin <4)
v = o n e s ( n , 1 ) /n ;
end
e1 = [ 1 ; zeros (m−1 ,1) ] ;

disp ( ’ Reduced Order Method : ’ ) ;


tic ;
%C a l c u l a t i n g W and U u s i n g H’
[W,U] = A r n o l d i (H’ ,m) ;
for i =1: length ( a l p h a s )
pi = W∗ ( ( eye (m)−a l p h a s ( i ) ∗U) \ e1 ) ;
pi = pi / sum( pi ) ;
end
toc ;

iter = 0;
disp ( ’ Optimal Power Method ’ ) ;
tic ;
pi = v ;
for i =1: length ( a l p h a s )
[ pi , k ] = PowerMethod (H, a l p h a s ( i ) , pi ) ;
i t e r = i t e r+k ;
end
toc ; i t e r

iter = 0;
disp ( ’ Normal Power Method ’ ) ;
tic ;
for i =1: length ( a l p h a s )
[ pi , k ] = PowerMethod (H, a l p h a s ( i ) ) ;
iter = iter + k;
end
toc ; i t e r

iter = 0;
disp ( ’ Normal J a c o b i H Method ’ ) ;
tic ;
for i =1: length ( a l p h a s )
[ pi , k ] = JacobiMethodH (H, a l p h a s ( i ) ) ;
iter = iter + k;

55
end
toc ; i t e r

A.3.5 PlotResidual.m

function P l o t R e s i d u a l (H, ms , a l p h a s , v )
% Using t h e K r yl ov s p a c e o f dimension m, t h i s f u n c t i o n p l o t s t h e
r e s i d u a l o f t h e a p p r o x i m a t i o n o f p i ( a l p h a ) f o r each v a l u e o f
alpha .

n = length (H) ;
i f ( nargin <4)
v = o n e s ( n , 1 ) /n ;
end
norm = @( x ) sqrt (sum( abs ( x . ˆ 2 ) ) ) ;
[m, a l p h a ] = meshgrid (ms , a l p h a s ) ;
r e s i d u a l = zeros ( s i z e (m) ) ;
d = DanglingNodeVector (H) ;

%For each v a l u e o f m, we a p p r o x i m a t e p i ( a l p h a ) f o r a l l a l p h a .
for k = 1 : length (ms)
pi = ReducedOrderMethod (H, a l p h a s , ms( k ) , v ) ;
for j = 1 : length ( a l p h a s )
r e s i d u a l ( j , k ) = norm( pi ( : , j ) −( a l p h a s ( j ) ∗H’ ∗ pi ( : , j ) + sum( d
. ∗ pi ( : , j ) ) ∗ a l p h a s ( j ) ∗v )−(1− a l p h a s ( j ) ) ∗v ) / norm( v ) ;
end
end

%P l o t t i n g t h e r e s i d u a l
figure ;
surf (m, alpha , r e s i d u a l ) ;
xlabel ( ’m’ ) ;
ylabel ( ’ a l p h a ’ ) ;
zlabel ( ’ r e s i d u a l ’ ) ;
set ( gca , ’ z s c a l e ’ , ’ l o g ’ )

56

You might also like