Solution Manual + Answer Key
Solution Manual for Calafiore Optimization Models 1st Edition by
Calafiore
View Full Product:
https://selldocx.com/products/solution-manual-calafiore-optimization-models-1e
Book Title: Calafiore Optimization Models
Edition: 1st Edition
Author: Calafiore
Click above to view a sample
GIUSEPPE CALAFIORE AND L AU R E N T E L G H AO U I
O P T I M I Z AT I O N M O D E L S
SOLUTIONS MANUAL
CAMBRIDGE
Ver. 0.1 - Oct. 2014
DISCLAIMER
This is the first draft of the Solution Manual
for exercises in the book "Optimization Models"
by Calafiore & El Ghaoui.
This draft is under construction. It is still
incomplete and it is very likely to contain
errors.
This material is offered "as is," non commercial-
ly, for personal use of instructors.
Comments and corrections are very welcome.
Contents
2. Vectors 4
3. Matrices 10
4. Symmetric matrices 21
5. Singular Value Decomposition 35
6. Linear Equations 46
7. Matrix Algorithms 58
8. Convexity 68
9. Linear, Quadratic and Geometric Models 81
10. Second-Order Cone and Robust Models 94
11. Semidefinite Models 107
12. Introduction to Algorithms 124
13. Learning from Data 140
14. Computational Finance 152
15. Control Problems 174
16. Engineering Design 183
4
2. Vectors
Exercise 2.1 (Subpaces and dimensions) Consider the set S of points
such that
x1 + 2x2 + 3x3 = 0, 3x1 + 2x2 + x3 = 0.
Show that S is a subspace. Determine its dimension, and find a basis
for it.
Solution 2.1 The set S is a subspace, as can be checked directly: if
x, y ∈ S , then for every λ, µ ∈ R, we have λx + µy ∈ S . To find
the dimension, we solve the equation and find that any solution to
the equations is of the form x1 = −1/2x2 , x3 = −1/3x2 , where x2 is
free. Hence the dimension of S is 1, and a basis for S is the vector
(−1/2, 1, −1/3).
Exercise 2.2 (Affine sets and projections) Consider the set in R3 , de-
fined by the equation
n o
P = x ∈ R3 : x1 + 2x2 + 3x3 = 1 .
1. Show that the set P is an affine set of dimension 2. To this end,
express it as x (0) + span( x (1) , x (2) ), where x (0) ∈ P , and x (1) , x (2)
are linearly independent vectors.
2. Find the minimum Euclidean distance from 0 to the set P , and a
point that achieves the minimum distance.
Solution 2.2
1. We can express any vector x ∈ P as x = ( x1 , x2 , 1/3 − x1 /3 −
2x2 /3), where x1 , x2 are arbitrary. Thus
x = x (0) + x 1 x (1) + x 2 x (2) ,
where
0 1 0
x (0) = 0 , x (1) = 0 , x (2) = 1 .
1
3 − 13 − 23
Since x (1) and x (2) are linearly independent, P is of dimension 2.
2. The set P is defined by a single linear equation a> x = b, with
a> = [1 2 3] and b = 1, i.e., P is a hyperplane. The minimum
Euclidean distance from 0 to P is the `2 norm of the projection of
0 onto P , which can be determined as discussed in Section 2.3.2.2.
That is, the projection x ∗ of 0 onto P is such that x ∗ ∈ P and x ∗
5
is orthogonal to the subspace generating P (which coincides with
the span of a), that is x ∗ = αa. Hence, it must be that a> x ∗ = 1,
thus αk ak22 = 1, and α = 1/k ak22 . We thus have that
a
x∗ = ,
k ak22
√
and the distance we are seeking is k x ∗ k2 = 1/k ak2 = 1/ 14.
Exercise 2.3 (Angles, lines and projections)
1. Find the projection z of the vector x = (2, 1) on the line that passes
through x0 = (1, 2) and with direction given by vector u = (1, 1).
2. Determine the angle between the following two vectors:
1 3
x = 2 , y = 2 .
3 1
Are these vectors linearly independent?
Solution 2.3
1. We can observe directly that u> ( x − x0 ) = 0, hence the projection
of x is the same as that of x0 , which is z = x0 itself.
Alternatively, as seen in Section (2.3.2.1), the projection is
u > ( x − x0 )
z = x0 + u
u> u
which gives z = x0 .
Another method consists in solving
min k x0 + tu − x k22 = min t2 u T u − 2tu> ( x − x0 ) + k x − x0 k22
t t
= min (u> u)(t − t0 )2 + constant,
t
where t0 = ( x − x0 )> u/(u T u). This leads to the optimal t∗ = t0 ,
and provides the same result as before.
2. The angle cosine is given by
x> y 10
cos θ = = ,
k x k2 k y k2 14
which gives θ ≈ 41◦ .
The vectors are linearly independent, since λx + µy = 0 for λ, µ ∈
R implies that λ = µ = 0. Another way to prove this is to observe
that the angle is not 0◦ nor 180◦ .
6
Exercise 2.4 (Inner product) Let x, y ∈ Rn . Under which condition
on α ∈ Rn does the function
n
f ( x, y) = ∑ αk xk yk
k =1
define an inner product on Rn ?
Solution 2.4 The axioms of 2.2 are all satisfied for any α ∈ Rn , except
the conditions
f ( x, x ) ≥ 0;
f ( x, x ) = 0 if and only if x = 0.
These properties hold if and only if αk > 0, k = 1, . . . , n. Indeed, if
the latter is true, then the above two conditions hold. Conversely, if
if there exist k such that αk ≤ 0, setting x = ek (the k-th unit vector
in Rn ) produces f (ek , ek ) ≤ 0; this contradicts one of the two above
conditions.
Exercise 2.5 (Orthogonality) Let x, y ∈ Rn be two unit-norm vectors,
that is, such that k x k2 = kyk2 = 1. Show that the vectors x − y and
x + y are orthogonal. Use this to find an orthogonal basis for the
subspace spanned by x and y.
Solution 2.5 When x, y are both unit-norm, we have
( x − y)> ( x + y) = x > x − y> y − y> x + x > y = x > x − y> y = 0,
as claimed.
We can express any vector z ∈ span( x, y) as z = λx + µy, for some
λ, µ ∈ R. We have z = αu + βv, where
λ+µ λ−µ
α= , β= .
2 2
Hence z ∈ span(u, v). The converse is also true for similar reasons.
Thus, (u, v) is an orthogonal basis for span( x, y). We finish by nor-
malizing u, v, replacing them with (u/kuk2 , v/kvk2 ). The desired or-
thogonal basis is thus given by (( x − y)/k x − yk2 , ( x + y)/k x + yk2 ).
Exercise 2.6 (Norm inequalities)
1. Show that the following inequalities hold for any vector x:
1 √
√ k x k2 ≤ k x k ∞ ≤ k x k2 ≤ k x k1 ≤ n k x k2 ≤ n k x k ∞ .
n
Hint: use the Cauchy-Schwartz inequality.
7
2. Show that for any non-zero vector x,
k x k21
card( x ) ≥ ,
k x k22
where card( x ) is the cardinality of the vector x, defined as the num-
ber of non-zero elements in x. Find vectors x for which the lower
bound is attained.
Solution 2.6
1. We have
n
k x k22 = ∑ xi2 ≤ n · max
i
xi2 = n · k x k2∞ .
i =1
q
Also, k x k∞ ≤ x12 + . . . + xn2 = k x k2 .
The inequality k x k2 ≤ k x k1 is obtained after squaring both sides,
and checking that
!2
n n n
∑ xi2 ≤ ∑ xi2 + ∑ |xi x j | = ∑ |xi | = k x k21 .
i =1 i =1 i6= j i =1
√
Finally, the condition k x k1 ≤ nk x k2 is due to the Cauchy-Schwartz
inequality
| z > y | ≤ k y k2 · k z k2 ,
applied to the two vectors y = (1, . . . , 1) and z = | x | = (| x1 |, . . . , | xn |).
2. Let us apply the Cauchy-Schwartz inequality with z = | x | again,
and with y a vector with yi = 1 if xi 6= 0, and yi = 0 otherwise.
√
We have kyk2 = k, with k = card( x ). Hence
√
| z > y | = k x k1 ≤ k y k2 · k z k2 = k · k x k2 ,
which proves the result. The bound is attained for vectors with k
non-zero elements, all with the same magnitude.
Exercise 2.7 (Hölder inequality) Prove Hölder’s inequality (2.4). Hint:
consider the normalized vectors u = x/k x k p , v = y/kykq , and ob-
serve that
| x > y | = k x k p k y k q · | u > v | ≤ k x k p k y k q ∑ | u k v k |.
k
Then, apply Young’s inequality (see Example 8.10) to the products
|uk vk | = |uk ||vk |.
8
Solution 2.7 The inequality is trivial if one of the vectors x, y is zero.
We henceforth assume that none is, which allows us to define the
normalized vectors u, v. We need to show that
∑ |uk vk | ≤ 1.
k
Using the hint given, we apply Young’s inequality, which states that
for any given numbers a, b ≥ 0 and p, q > 0 such that
1 1
+ = 1,
p q
it holds that
1 p 1 q
ab ≤ a + b .
p q
We thus have, with a = |uk | and b = |vk |, and summing over k:
1 1
∑ |uk vk | ≤
p ∑ |uk | p + q ∑ |vk |q
k k k
1 p 1 q
= kuk p + kvkq
p q
1 1
= + = 1,
p q
where we have used the fact that kuk p = kvkq = 1.
Exercise 2.8 (Linear functions)
1. For a n-vector x, with n = 2m − 1 odd, we define the median of
x as the scalar value x a such that exactly n of the values in x are
≤ x a and n are ≥ x a (i.e., x a leaves half of the values in x to its left,
and half to its right). Now consider the function f : Rn → R, with
values f ( x ) = x a − n1 ∑in=1 xi . Express f as a scalar product, that is,
find a ∈ Rn such that f ( x ) = a> x for every x. Find a basis for the
set of points x such that f ( x ) = 0.
2. For α ∈ R2 , we consider the “power law” function f : R 2++ → R,
α
with values f ( x ) = x1 1 x2α2 . Justify the statement: “the coefficients
αi provide the ratio between the relative error in f to a relative
error in xi ”.
Solution 2.8 (Linear functions) TBD
Exercise 2.9 (Bound on a polynomial’s derivative) In this exercise,
you derive a bound on the largest absolute value of the derivative
of a polynomial of a given order, in terms of the size of the coeffi-
cients1 . For w ∈ Rk+1 , we define the polynomial pw , with values 1
See the discussion on regularization
in Section 13.2.3 for an application of
this result.
9
.
p w ( x ) = w1 + w2 x + . . . + w k +1 x k .
Show that, for any p ≥ 1
dpw ( x )
∀ x ∈ [−1, 1] : ≤ C (k, p)kvk p ,
dx
where v = (w2 , . . . , wk+1 ) ∈ Rk , and
k p = 1,
C (k, p) = k3/2 p = 2,
k ( k +1)
2 p = ∞.
Hint: you may use Hölder’s inequality (2.4) or the results from Exer-
cise 2.6.
Solution 2.9 (Bound on a polynomial’s derivative) We have, with z =
(1, 2, . . . , k), and using Hölder’s inequality:
dpw ( x )
= w2 + 2w3 x + . . . + kwk+1 x k−1
dx
≤ | w2 | + 2 | w3 | + . . . + k | w k +1 |
= |v> z|
≤ kvk p · kzkq .
When p = 1, we have
kzkq = kzk∞ = k.
When p = 2, we have
p √
k z k q = k z k2 = 1 + 4 + . . . + k2 ≤ k · k2 = k3/2 .
When p = ∞, we have
k ( k + 1)
k z k q = k z k1 = 1 + 2 + . . . + k = .
2
10
3. Matrices
Exercise 3.1 (Derivatives of composite functions)
1. Let f : Rm → Rk and g : Rn → Rm be two maps. Let h : Rn → Rk
be the composite map h = f ◦ g, with values h( x ) = f ( g( x )) for
x ∈ Rn . Show that the derivatives of h can be expressed via a
matrix-matrix product, as Jh ( x ) = J f ( g( x )) · Jg ( x ), where Jh ( x ) is
the Jacobian matrix of h at x, i.e., the matrix whose (i, j) element
∂hi ( x )
is ∂x .
j
2. Let g be an affine map of the form g( x ) = Ax + b, for A ∈ Rm,n ,
b ∈ Rm . Show that the Jacobian of h( x ) = f ( g( x )) is
Jh ( x ) = J f ( g( x )) · A.
3. Let g be an affine map as in the previous point, let f : Rn → R (a
scalar-valued function), and let h( x ) = f ( g( x )). Show that
∇ x h( x ) = A> ∇ g f ( g( x ))
∇2x h( x ) = A> ∇2g f ( g( x )) A.
Solution 3.1
1. We have, by the composition rule for derivatives:
m
∂hi ( x ) ∂f ∂g
[ Jh ( x )]i,j = = ∑ i (x) l (x)
∂x j l =1
∂gl ∂x j
m
= ∑ [ J f ( g(x))]i,l [ Jg (x)]l,j ,
l =1
which proves the result.
2. Since gi ( x ) = ∑nk=1 aik xk + bi , i = 1, . . . , m, we have that the (i, j)-th
element of the Jacobian of g is
∂gi ( x )
[ Jg ( x )]ij = = aij ,
∂x j
hence Jg ( x ) = A, and the desired result follows from applying
point 1. of this exercise.
3. For a scalar-valued function, the gradient coincides with the tran-
sopose of the Jacobian, hence the expression for the gradient of h
w.r.t. x follows by applying the previous point. For the Hessian,
11
we have instead
∂2 h ( x ) ∂ ∂h( x ) ∂ >
[∇2x h( x )]ij = = = a ∇ g f ( g( x ))
∂xi ∂x j ∂x j ∂xi ∂x j i
m m
∂ ∂ f ( g( x )) ∂ ∂ f ( g( x ))
=
∂x j ∑ aik ∂gk
= ∑ aik
∂x j ∂gk
k =1 k =1
m m
∂ ∂ f ( g( x )) ∂g p ( x )
= ∑ aik ∑ ∂g p ∂gk ∂x j
k =1 p =1
m m
∂2 f ( g( x )) ∂g p ( x )
= ∑ aik ∑ ∂g p ∂gk ∂x j
k =1 p =1
m m
∂2 f ( g( x ))
= ∑ aik ∑ ∂g p ∂gk pj
a
k =1 p =1
= ai> ∇2g f ( g( x )) a j ,
which proves the statement.
Exercise 3.2 (Permutation matrices) A matrix P ∈ Rn,n is a permu-
tation matrix if its columns are a permutation of the columns of the
n × n identity matrix.
1. For a n × n matrix A, we consider the products PA and AP. De-
scribe in simple terms what these matrices look like with respect
to the original matrix A.
2. Show that P is orthogonal.
Solution 3.2
1. Given the matrix A, the product PA is the matrix obtained by per-
muting the rows of A; AP corresponds to permuting the columns.
2. Every pair of columns ( pk , pl ) of P is of the form (ek , el ), where
ek , el are the k-th and the l-th standard basis vectors in Rn . Thus,
k pk k2 = 1, and p> k pl = 0 if k 6 = l, as claimed.
Exercise 3.3 (Linear maps) Let f : Rn → Rm be a linear map. Show
how to compute the (unique) matrix A such that f ( x ) = Ax for every
x ∈ Rn , in terms of the values of f at appropriate vectors, which you
will determine.
Solution 3.3 For i = 1, . . . , n, let ei be the i-th unit vector in Rn . We
have
f (ei ) = Aei = ai ,
where ai is the i-th column of A. Hence we can compute the matrix
A column-wise, by evaluating f at the points e1 , . . . , en .
12
Exercise 3.4 (Linear dynamical systems) Linear dynamical systems
are a common way to (approximately) model the behavior of physical
phenomena, via recurrence equations of the form2 2
Such models are the focus of Chap-
ter 15.
x (t + 1) = Ax (t) + Bu(t), y(t) = Cx (t), t = 0, 1, 2, . . . ,
where t is the (discrete) time, x (t) ∈ Rn describes the state of the
system at time t, u(t) ∈ R p is the input vector, and y(t) ∈ Rm is the
output vector. Here, matrices A, B, C, are given.
1. Assuming that the system has initial condition x (0) = 0, express
the output vector at time T as a linear function of u(0), . . . , u( T −
1); that is, determine a matrix H such that y( T ) = HU ( T ), where
u (0)
. ..
U (T ) =
.
u ( T − 1)
contains all the inputs up to and including at time T − 1.
2. What is the interpretation of the range of H?
Solution 3.4
1. We have
x (1) = Bu(0)
x (2) = Ax (1) + Bu(1) = ABu(0) + Bu(1)
x (3) = Ax (2) + Bu(2) = A2 Bu(0) + ABu(1) + Bu(2).
We now prove by induction that, for T ≥ 1:
T −1 h i
x(T ) = ∑ Ak Bu( T − 1 − k) = A T −1 B ... AB B U ( T ).
k =0
The formula is correct for T = 1. Let T ≥ 2. Assume the formula
is correct for T − 1; we have
!
T −2
x ( T ) = Ax ( T − 1) + Bu( T − 1) = A ∑ Ak Bu( T − 2 − k) + Bu( T − 1)
k =0
T −2
= ∑ Ak+1 Bu( T − 2 − k) + Bu( T − 1)
k =0
T −1
= ∑ Ak Bu( T − 1 − k) + Bu( T − 1)
k =1
T −1
= ∑ Ak Bu( T − 1 − k),
k =0
13
as claimed. Finally, we have y( T ) = HU ( T ), with
h i
H = C · A T −1 B . . . AB B .
2. The range of H is the set of output vectors that are attainable at
time T by the system by proper choice of the sequence of inputs,
starting from the initial state x (0) = 0.
Exercise 3.5 (Nullspace inclusions and range) Let A, B ∈ Rm,n be
two matrices. Show that the fact that the nullspace of B is contained
in that of A implies that the range of B> contains that of A> .
Solution 3.5 Assume that the nullspace of B is contained in that of
A. This means that
Bx = 0 =⇒ Ax = 0.
Let z ∈ R( A> ): there exist y ∈ Rm such that z = A> y. We have thus,
for any element x ∈ N ( A), z> x = y> Ax = 0. Hence, z is orthogonal
to the nullspace of A, so it is orthogonal to the nullspace of B. We
have obtained R( A> ) ⊆ N ( B)⊥ = R( B> ), as claimed. Here, we
have used the fundamental theorem of linear algebra (3.1).
Exercise 3.6 (Rank and nullspace) Consider the image in Figure 3.6,
a gray-scale rendering of a painting by Mondrian (1872-1944). We
build a 256 × 256 matrix A of pixels based on this image by ignoring
grey zones, assigning +1 to horizontal or vertical black lines, +2 at
the intersections, and zero elsewhere. The horizontal lines occur at
Figure 3.1: A gray-scale rendering of
row indices 100, 200 and 230, and the vertical ones, at columns indices
a painting by Mondrian.
50, 230.
1. What is nullspace of the matrix?
2. What is its rank?
Solution 3.6
1. Denote by ei the i-th unit vector in R256 , by z1 ∈ R256 the vector
with all first 50 components equal to one, by z2 ∈ R256 the vector
with all last 26 components equal to one, and by z3 ∈ R256 the
vector with all last 56 components equal to one. Finally, 1 denotes
the vector of all ones in R256 . We can express the matrix as
>
M = e100 z1> + e200 1> + e230 z3> + 1e50 >
+ z3 e230 .
The condition Mx = 0, for some vector x ∈ Rn , translates as
(z1> x )e100 + (1> x )e200 + (z3> x )e230 + (e50
> >
x )1 + (e230 x )z3 = 0.
14
Since the vectors (e100 , e200 , e230 , 1, z3 ) are linearly independent, we
obtain that the five coefficients in the above must be zero:
0 = z1> x = 1> x = z3> x = e50
> >
x = e230 x.
It is easy to check that the corresponding subspace of R256 is of
dimension 256 − 5 = 251. Indeed, two elements of x are zero
(x50 = x230 = 0), the remaining ones satisfy three independent
equality constraints. From these we can express (say) x1 , x201 , x51
from the remaining variables, which then are free of any con-
straints. We can eliminate a total of five variables from the above
five conditions, so the nullspace is of dimension 251.
2. The rank is 5.
Exercise 3.7 (Range and nullspace of A> A) Prove that, for any ma-
trix A ∈ Rm,n , it holds that
N ( A> A) = N ( A)
R ( A > A ) = R ( A > ). (3.1)
Hint: use the fundamental theorem of linear algebra.
Solution 3.7 First, suppose x ∈ N ( A), then Ax = 0 and obviously
A> Ax = 0. Conversely, suppose x ∈ N ( A> A), we show by contra-
diction that it must be x ∈ N ( A), hence proving the first claim. In-
deed, suppose x ∈ N ( A> A) but x 6∈ N ( A). Define then v = Ax 6= 0.
Such a v is by definition in the range of A, and A> v = A> Ax = 0,
so v is also in the nullspace of A> , which is impossible since, by the
fundamental theorem of linear algebra, R( A)⊥ N ( A> ). Next,
R ( A > ) = N ( A ) ⊥ = N ( A > A ) ⊥ = R ( A > A ),
which proves (3.1).
Exercise 3.8 (Cayley-Hamilton theorem) Let A ∈ Rn,n and let
.
p(λ) = det(λIn − A) = λn + cn−1 λn−1 + · · · + c1 λ + c0
be the characteristic polynomial of A.
1. Assume A is diagonalizable. Prove that A annihilates its own
characteristic polynomial, that is
p( A) = An + cn−1 An−1 + · · · + c1 A + c0 In = 0.
Hint: use Lemma 3.3.
15
2. Prove that p( A) = 0 holds in general, i.e., also for non-diagona-
lizable square matrices. Hint: use the facts that polynomials are
continuous functions, and that diagonalizable matrices are dense
in Rn,n , i.e., for any e > 0 there exist ∆ ∈ Rn,n with k∆k F ≤ e such
that A + ∆ is diagonalizable.
Solution 3.8
1. The result is immediate from Lemma 3.3: if A = UΛU −1 is a
diagonal factorization of A, then p(Λ) = 0, since by definition
eigenvalues are roots of the characteristic polynomial, hence
p( A) = U p(Λ)U −1 = 0.
2. The map p : Rn,n → Rn,n with values p( A) = An + cn−1 An−1 +
· · · + c1 A + c0 In is continuous on Rn,n . This map is identically zero
on the dense subset of Rn,n formed by diagonalizable matrices
(proved in the previous point of the exercise), hence by continuity
it must be zero everywhere in Rn,n .
Exercise 3.9 (Frobenius norm and random inputs) Let A ∈ Rm,n be
a matrix. Assume that u ∈ Rn is a vector-valued random variable,
with zero mean and covariance matrix In . That is, E{u} = 0, and
E{uu> } = In .
1. What is the covariance matrix of the output, y = Au?
2. Define the total output variance as E{ky − ŷk22 }, where ŷ = E{y}
is the output’s expected value. Compute the total output variance
and comment.
Solution 3.9
1. The mean of the output is zero: ŷ = Ey = AEu = 0. Hence the
covariance matrix is given by
E(yy> ) = E( Auu> A> )
= AE(uu> ) A>
= AA> .
2. The total variance is
E( y > y ) = trace E(yy> )
= trace( AA> )
= k Ak2F .
16
The total output variance is the square of the Frobenius norm of
1 2
the matrix. Hence the Frobenius norm captures the response of the
matrix to a class of random inputs (zero mean, and unit covariance
3
matrix).
Exercise 3.10 (Adjacency matrices and graphs) For a given undirec- 5 4
ted graph G with no self-loops and at most one edge between any
pair of nodes (i.e., a simple graph), as in Figure 3.2, we associate a Figure 3.2: An undirected graph with
n × n matrix A, such that n = 5 vertices.
(
1 if there is an edge between node i and node j,
Aij =
0 otherwise.
This matrix is called the adjacency matrix of the graph3 . 3
The graph in Figure 3.2 has adja-
cency matrix
1. Prove the following result: for positive integer k, the matrix Ak
0 1 0 1 1
has an interesting interpretation: the entry in row i and column j 1
0 0 1 1
gives the number of walks of length k (i.e., a collection of k edges) A= 0 0 0 0 1 .
1 1 0 0 0
leading from vertex i to vertex j. Hint: prove this by induction on 1 1 1 0 0
k, and look at the matrix-matrix product Ak−1 A.
2. A triangle in a graph is defined as a subgraph composed of three
vertices, where each vertex is reachable from each other vertex
(i.e., a triangle forms a complete subgraph of order 3). In the
graph of Figure 3.2, for example, nodes {1, 2, 4} form a triangle.
Show that the number of triangles in G is equal to the trace of A3
divided by 6. Hint: For each node in a triangle in an undirected
graph, there are two walks of length 3 leading from the node to
itself, one corresponding to a clockwise walk, and the other to a
counter-clockwise walk.
Solution 3.10
1. We can prove the result by induction on k. For k = 1, the re-
sult follows from the very definition of A. Let Lk (i, j) denote the
number of paths of length k between nodes i and j, and assume
that the result we wish to prove is true for some given h ≥ 1,
so that Lh (i, j) = [ Ah ]i,j . We next prove that it must also hold
that Lh+1 (i, j) = [ Ak+1 ]i,j , thus proving by inductive argument that
Lk (i, j) = [ Ak ]i,j for all k ≥ 1.
Indeed, to go from a node i to a node j with a walk of length h + 1,
one needs first reach, with a walk of length h, a node l linked to j
by an edge. Thus:
Lh+1 (i, j) = ∑ Lh (i, l ),
l ∈V ( j )
17
where V ( j) is the neighbor set of j, which is the set of nodes con-
nected to the j-th node, that is, nodes l such that Al,j 6= 0. Thus:
n
Lh+1 (i, j) = ∑ Lh (i, l ) Al,j .
l =1
But we assumed that Lh (i, j) = [ Ah ]i,j , hence the previous equation
can be written as
n
Lh+1 (i, j) = ∑ [ Ah ]i,l Al,j .
l =1
In the above we recognize the (i, j)-th element of the product
Ah A = Ah+1 , which proves that Lh+1 (i, j) = [ Ah+1 ]i,j , and hence
concludes the inductive proof.
2. Following the hint, we observe that for each node in a triangle in
an undirected graph there are two walks of length 3 leading from
the node to itself, one corresponding to a clockwise walk, and the
other to a counter-clockwise walk. Therefore, each triangle in the
graph produces 6 walks of length 3 (two walks for each vertex
composing the triangle). From the previous result, the number of
walks of length 3 from node i to itself is given by [ A3 ]i,i , hence the
total number of of walks of length 3 from each node to itself is
∑in=1 [ A3 ]i,i = trace( A3 ), and therefore the number of triangles is
trace( A3 )/6.
Exercise 3.11 (Nonnegative and positive matrices) A matrix A ∈ Rn,n
is said to be nonnegative (resp. positive) if aij ≥ 0 (resp. aij > 0) for all
i, j = 1, . . . , n. The notation A ≥ 0 (resp. A > 0) is used to denote
nonnegative (resp. positive) matrices.
A nonnegative matrix is said to be column (resp. row) stochastic,
if the sum of the elements along each column (resp. row) is equal to
one, that is if 1> A = 1> (resp. A1 = 1). Similarly, a vector x ∈ Rn
is said to be nonnegative if x ≥ 0 (element-wise), and it is said to
be a probability vector, if it is nonnegative and 1> x = 1. The set of
probability vectors in Rn is thus the set S = { x ∈ Rn : x ≥ 0, 1> x =
1}, which is called the probability simplex. The following points you
are requested to prove are part of a body of results known as the
Perron-Frobenius theory of nonnegative matrices.
1. Prove that a nonnegative matrix A maps nonnegative vectors into
nonnegative vectors (i.e., that Ax ≥ 0 whenever x ≥ 0), and that
a column stochastic matrix A ≥ 0 maps probability vectors into
probability vectors.
18
2. Prove that if A > 0, then its spectral radius ρ( A) is positive. Hint:
use the Cayley-Hamilton theorem.
3. Show that it holds for any matrix A and vector x that
| Ax | ≤ | A|| x |,
where | A| (resp. | x |) denotes the matrix (resp. vector) of moduli
of the entries of A (resp. x). Then, show that if A > 0 and λi , vi is
an eigenvalue/eigenvector pair for A, then
|λi ||vi | ≤ A|vi |.
4. Prove that if A > 0 then ρ( A) is actually an eigenvalue of A (i.e., A
has a positive real eigenvalue λ = ρ( A), and all other eigenvalues
of A have modulus no larger than this “dominant” eigenvalue),
and that there exist a corresponding eigenvector v > 0. Further,
the dominant eigenvalue is simple (i.e., it has unit algebraic mul-
tiplicity), but you are not requested to prove this latter fact.
Hint: For proving this claim you may use the following fixed-point
theorem due to Brouwer: if S is a compact and convex set4 in Rn , and 4
See Section 8.1 for definitions of
compact and convex sets.
f : S → S is a continuous map, then there exist an x ∈ S such that
.
f ( x ) = x. Apply this result to the continuous map f ( x ) = 1>AxAx ,
with S being the probability simplex (which is indeed convex and
compact).
5. Prove that if A > 0 and it is column or row stochastic, then its
dominant eigenvalue is λ = 1.
Solution 3.11 (Nonnegative and positive matrices)
1. Let A ≥ 0, x ≥ 0, y = Ax, and denote with ai> the i-th row of A.
Then obviously
n
yi = ai> x = ∑ aij x j ≥ 0, i = 1, . . . , n,
j =1
which shows that a nonnegative matrix maps nonnegative vectors
into nonnegative vectors. Further, if x is a probability vector and
A is stochastic, then
1> y = 1> Ax = 1> x = 1,
which shows that y is also a probability vector.
2. Suppose by contradiction that A > 0 and ρ( A) = 0. This would
imply that A has an eigenvalue of maximum modulus in λ = 0,
19
thus, all eigenvalues of A are actually zero. This means that the
characteristic polynomial of A is p A (s) = sn and, by the Cayley-
Hamilton theorem, it must hold that An = 0, which is impossible
since An is the n-fold product of positive matrices, hence it must
be positive.
3. By the triangle inequality, we have that, for i = 1, . . . , n,
n n
| ai> x | ≤ ∑ |aij x j | = ∑ |aij ||x j |
j =1 j =1
= | ai> || x |,
whih proves the first part. If A > 0 the above relation reads | Ax | ≤
A| x | which, for x = vi , becomes
A|vi | ≥ | Avi | = |λi vi | = |λi ||vi |.
.
4. Let S be the probability simplex, and f ( x ) = 1>AxAx . From Brouwer’s
fixed-point theorem there exist v ∈ S such that f (v) = v, that is
such that
.
Av = (1> Av)v = λv, λ = 1> Av.
Moreover, since A > 0, it holds that λ > 0 and v > 0; thus A has a
positive eigenvalue and a corresponding positive eigenvector. We
next apply the same result to A> , obtaining that there exist w ∈ S
such that
.
A> w = (1> A> w)w = µw, µ = 1> A> w,
where again µ > 0 and w > 0. Now, v> w > 0, and
λv> w = v> A> w = µv> w,
which implies that µ = λ, whence A> w = λw.
Next, consider any eigenvalue/eigenvector pair λi , vi for A, and
apply the result of point 3. in this exercise, to obtain that
|λi ||vi | ≤ A|vi |, i = 1, . . . , n.
Multiply both sides on the left by w> to get
w> |λi ||vi | ≤ w> A|vi | = λw> |vi |,
from which we obtain that
|λi | ≤ λ, i = 1, . . . , n,
which proves that λ (which is real and positive, as shown above)
is indeed a maximum modulus eigenvalue of A (thus, λ = ρ( A)),
and the corresponding eigenvector v is positive.
20
5. By definition, if A is column stochastic then 1> A = 1> , which
means that λ = 1 is an eigenvalue of A. Next, recall from Sec-
tion 3.6.3.1 that the spectral radius of A is no larger than its in-
duced `1 norm:
m
ρ( A) ≤ k Ak1 = max
j=1,...,n
∑ |aij | = 1,
i =1
hence λ = 1 is indeed the dominant eigenvalue. An analogous
argument applies to row stochastic matrices.
21
4. Symmetric matrices
Exercise 4.1 (Eigenvectors of a symmetric 2 × 2 matrix) Let p, q ∈ Rn
be two linearly independent vectors, with unit norm (k pk2 = kqk2 =
.
1). Define the symmetric matrix A = pq> + qp> . In your derivations,
.
it may be useful to use the notation c = p> q.
1. Show that p + q and p − q are eigenvectors of A, and determine
the corresponding eigenvalues.
2. Determine the nullspace and rank of A.
3. Find an eigenvalue decomposition of A, in terms of p, q. Hint: use
the previous two parts.
4. What is the answer to the previous part if p, q are not normalized?
Solution 4.1
1. We have
Ap = (cp + q), Aq = p + cq,
from which we obtain
A( p − q) = (c − 1)( p − q), A( p + q) = (c + 1)( p + q).
Thus u± := p ± q is an (un-normalized) eigenvector of A, with
eigenvalue c ± 1.
2. The condition on x ∈ Rn : Ax = 0, holds if and only if
0 = (q> x ) p + ( p> x )q = 0.
Since p, q are linearly independent, the above is equivalent to p> x =
q> x = 0. The nullspace is the set of vectors orthogonal to p and q.
The range is the span of p, q. The rank is thus 2.
3. Since the rank is 2, there is a total of two non-zero eigenvalues.
Note that, since p, q are normalized, c is the cosine angle between
p, q; |c| < 1 since p, q are independent. We have found two lin-
early independent eigenvectors u± = p ± q that do not belong to
the nullspace (since |c| < 1). We can complete this set with eigen-
vectors corresponding to the eigenvalue zero; simply choose an
orthonormal basis for the nullspace.
Then, the eigenvalue decomposition is
A = ( c − 1) v − v > >
− + ( c + 1) v + v + ,