Matrix Analysis, CAAM 335, Spring 2012: Steven J Cox
Matrix Analysis, CAAM 335, Spring 2012: Steven J Cox
Spring 2012
Steven J Cox
Preface
Bellman has called matrix theory the arithmetic of higher mathematics. Under the influence
of Bellman and Kalman engineers and scientists have found in matrix theory a language for representing and analyzing multivariable systems. Our goal in these notes is to demonstrate the role of
matrices in the modeling of physical systems and the power of matrix theory in the analysis and
synthesis of such systems.
Beginning with modeling of structures in static equilibrium we focus on the linear nature of the
relationship between relevant state variables and express these relationships as simple matrixvector
products. For example, the voltage drops across the resistors in a network are linear combinations
of the potentials at each end of each resistor. Similarly, the current through each resistor is assumed to be a linear function of the voltage drop across it. And, finally, at equilibrium, a linear
combination (in minus out) of the currents must vanish at every node in the network. In short, the
vector of currents is a linear transformation of the vector of voltage drops which is itself a linear
transformation of the vector of potentials. A linear transformation of n numbers into m numbers is
accomplished by multiplying the vector of n numbers by an m-by-n matrix. Once we have learned
to spot the ubiquitous matrixvector product we move on to the analysis of the resulting linear
systems of equations. We accomplish this by stretching your knowledge of threedimensional space.
That is, we ask what does it mean that the mbyn matrix X transforms Rn (real ndimensional
space) into Rm ? We shall visualize this transformation by splitting both Rn and Rm each into two
smaller spaces between which the given X behaves in very manageable ways. An understanding of
this splitting of the ambient spaces into the so called four fundamental subspaces of X permits one
to answer virtually every question that may arise in the study of structures in static equilibrium.
In the second half of the notes we argue that matrix methods are equally effective in the modeling
and analysis of dynamical systems. Although our modeling methodology adapts easily to dynamical
problems we shall see, with respect to analysis, that rather than splitting the ambient spaces we
shall be better served by splitting X itself. The process is analogous to decomposing a complicated
signal into a sum of simple harmonics oscillating at the natural frequencies of the structure under
investigation. For we shall see that (most) matrices may be written as weighted sums of matrices of
very special type. The weights are the eigenvalues, or natural frequencies, of the matrix while the
component matrices are projections composed from simple products of eigenvectors. Our approach
to the eigendecomposition of matrices requires a brief exposure to the beautiful field of Complex
Variables. This foray has the added benefit of permitting us a more careful study of the Laplace
Transform, another fundamental tool in the study of dynamical systems.
Contents
1 Matrix Methods for Electrical Systems
20
27
5 Least Squares
31
36
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
46
8 Complex Integration
54
63
74
83
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
11.2 The SVD in Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.3 Trace, Norm and Low Rank Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
90
iv
Ri
Ri
Rm
Rm
Ri
Rm
i /N
a2
m
2a/N
we arrive at the lumped circuit model of Figure 1.1. For a neuron in culture we may assume a
constant extracellular potential, e.g., zero. We accomplish this by connecting and grounding the
extracellular nodes, see Figure 1.2.
1
Ri
Ri
i0
Ri
Rm
Rm
Rm
y6
We have also (arbitrarily) assigned directions to the currents as a graphical aid in the consistent
application of the basic circuit laws.
Ri
Ri
x1
Ri
x2
y1
x3
y3
i0
y2
x4
y5
Rm
Rm
y4
Rm
y6
The A in (S1) is the nodeedge adjacency matrix it encodes the networks connectivity. The G
in (S2) is the diagonal matrix of edge conductances it encodes the physics of the network. The f
in (S3) is the vector of current sources it encodes the networks stimuli. The culminating AT GA
2
in (S4) is the symmetric matrix whose inverse, when applied to f , reveals the vector of potentials,
x. In order to make these ideas our own we must work many, many examples.
1.2. Example 1
With respect to the circuit of figure 1.3, in accordance with step (S1), we express the six potentials
differences (always tail minus head)
e1 = x1 x2
e2 = x2
e3 = x2 x3
e4 = x3
e5 = x3 x4
e6 = x4
Such long, tedious lists cry out for matrix representation, to wit
1 1
0
0
0
0 1 0
0
0 1 1
e = Ax where A =
0 1 0
0
0
0 1 1
0
0
0 1
Step (S2), Ohms law, states that the current along an edge is equal to the potential drop across
the edge divided by the resistance of the edge. In our case,
yj = ej /Ri , j = 1, 3, 5 and yj = ej /Rm , j = 2, 4, 6
or, in matrix notation,
y = Ge
where
1/Ri
0
0
0
0
0
1/Rm
0
0
0
0
0
0
1/Ri
0
0
0
0
G=
0
0
1/Rm
0
0
0
0
0
0
0
1/Ri
0
0
0
0
0
0
1/Rm
Step (S3), Kirchhoffs Current Law, states that the sum of the currents into each node must be
zero. In our case
i0 y 1 = 0
y1 y2 y3 = 0
y3 y4 y5 = 0
y5 y6 = 0
or, in matrix terms
By = f
3
where
1 0
0
0
0
0
1 1 1 0
0
0
B=
0
0
1 1 1 0
0
0
0
0
1 1
i0
0
and f =
0
0
Turning back the page we recognize in B the transpose of A. Calling it such, we recall our main
steps
e = Ax, y = Ge, and AT y = f.
On substitution of the first two into the third we arrive, in accordance with (S4), at
AT GAx = f.
(1.1)
This is a linear system of four simultaneous equations for the 4 unknown potentials, x1 through x4 .
As you may know, the system (1.1) may have either 1, 0, or infinitely many solutions, depending on
f and AT GA. We shall devote chapters 3 and 4 to a careful analysis of the previous sentence. For
now, we simply invoke the Matlab backslash command and arrive at the response depicted in
Figure 1.4.
12
11
10
x (mV)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
z (cm)
Ri
Ri
i0
Ri
Rm
Rm
Em
Rm
Em
Em
Figure 1.5 Circuit model with batteries associated with the rest potential.
The convention is that the potential difference across the battery is Em . As the bottom terminal
of each battery is grounded it follows that the potential at the top of each battery is Em . Revisiting
steps (S14) of the Strang Quartet we note that in (S1) the even numbered voltage drops are now
e2 = x2 Em ,
e4 = x3 Em
and e6 = x4 Em .
No changes are necessary for (S2) and (S3). The final step now reads,
(S4) Combine (S1), (S2) and (S3) to produce
AT GAx = AT Gb + f.
(1.2)
This is the general form for a resistor network driven by current sources and batteries.
Returning to Figure 1.5 we note that
b = Em [0 1 0 1 0 1]T
To build and solve (1.2) requires only minor changes to our old code. The new program is called
cab2.m and results of its use are indicated in Figure 1.6.
58
59
60
x (mV)
61
62
63
64
65
66
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
z (cm)
1.4. Exercises
1. In order to refresh your matrix-vector multiply skills please calculate, by hand, the product
AT GA in the 3 compartment case and write out the 4 equations in (1.1). The second equation
should read
(x1 + 2x2 x3 )/Ri + x2 /Rm = 0.
(1.3)
2. We began our discussion with the hope that a multicompartment model could indeed adequately capture the neurons true potential and current profiles. In order to check this one
should run cab1.m with increasing values of N until one can no longer detect changes in the
computed potentials.
(a) Please run cab1.m with N = 8, 16, 32 and 64. Plot all of the potentials on the same (use
hold) graph, using different line types for each. (You may wish to alter cab1.m so that it
accepts N as an argument).
Let us now interpret this convergence. The main observation is that the difference equation,
(1.3), approaches a differential equation. We can see this by noting that
dz /N
acts as a spatial step size and that xk , the potential at (k 1)dz, is approximately the value
of the true potential at (k 1)dz. In a slight abuse of notation, we denote the latter
x((k 1)dz).
Applying these conventions to (1.3) and recalling the definitions of Ri and Rm we see (1.3)
become
a2 x(0) + 2x(dz) x(2dz) 2adz
+
x(dz) = 0,
i
dz
m
or, after multiplying through by m /(adz),
am x(0) + 2x(dz) x(2dz)
+ 2x(dz) = 0.
i
dz 2
We note that a similar equation holds at each node (save the ends) and that as N and
therefore dz 0 we arrive at
2i
d2 x(z)
x(z) = 0.
(1.4)
2
dz
am
(b) With 2i /(am ) show that
At the far end, we interpret the condition that no axial current may leave the last node to
mean
dx()
= 0.
(1.7)
dz
(c) Substitute (1.5) into (1.6) and (1.7) and solve for and and write out the final x(z).
(d) Substitute into x the , a, i and m values used in cab1.m, plot the resulting function
(using, e.g., ezplot) and compare this to the plot achieved in part (a).
k1
k2
m1
m2
f1
k3
k4
m3
f2
f3
e2 = x2 x1 ,
e3 = x3 x2 ,
and e4 = x3 ,
1
0
0
1 1
0
e = Ax where A =
0 1 1 .
0
0 1
We note that ej is positive when the spring is stretched and negative when compressed. The analog
of Ohms Law is here Hookes Law: the restoring force in a spring is proportional to its elongation.
We call this constant of proportionality the stiffness, kj , of the spring, and denote the restoring
force by yj . Hookes Law then reads, yj = kj ej , or, in matrix terms
k1 0 0 0
0 k2 0 0
y = Ke where K =
0 0 k3 0 .
0 0 0 k4
The analog of Kirchhoffs Current Law is here typically called force balance. More precisely,
equilibrium is synonymous with the fact that the net force acting on each mass must vanish. In
symbols,
y1 y2 f1 = 0, y2 y3 f2 = 0, and y3 y4 f3 = 0,
8
f1
where f = f2
f3
1 1 0
0
and B = 0 1 1 0 .
0 0
1 1
As is the previous section we recognize in B the transpose of A. Gathering our three important
steps
e = Ax
y = Ke
AT y = f
we arrive, via direct substitution, at an equation for x. Namely
AT y = f AT Ke = f AT KAx = f.
Assembling AT KA we arrive at the final system
f1
x1
k1 + k2
k2
0
k2
x2 = f2 .
k2 + k3
k3
f3
x3
0
k3
k3 + k4
(2.2)
Although Matlab solves such systems with ease our aim here is to develop a deeper understanding
of Gaussian Elimination and so we proceed by hand. This aim is motivated by a number of
important considerations. First, not all linear systems have unique solutions. A careful look at
Gaussian Elimination will provide the general framework for not only classifying those systems
that possess unique solutions but also for providing detailed diagnoses of those systems that lack
solutions or possess too many.
In Gaussian Elimination one first uses linear combinations of preceding rows to eliminate nonzeros
below the main diagonal and then solves the resulting triangular system via backsubstitution. To
firm up our understanding let us take up the case where each kj = 1 and so (2.2) takes the form
2 1 0
x1
f1
1 2 1 x2 = f2
(2.3)
0 1 2
x3
f3
We eliminate the (2, 1) (row 2, column 1) element by implementing
1
new row 2 = old row 2 + row 1,
2
(2.4)
bringing
2 1 0
x1
f1
0 3/2 1 x2 = f2 + f1 /2
0 1 2
f3
x3
2
new row 3 = old row 3 + row 2,
3
9
(2.5)
(2.6)
2 1 0
x1
f1
0 3/2 1 x2 =
f2 + f1 /2
0 0 4/3
f3 + 2f2 /3 + f1 /3
x3
(2.7)
2 1 0 | 1 0 0
1 2 1 | 0 1 0
0 1 2 | 0 0 1
2 1 0 | 1
0
0 3/2 1 | 1/2 1
0 0 4/3 | 1/3 2/3
10
0
0
1
Now, rather than simple backsubstitution we instead eliminate up. Eliminating first the (2, 3)
element we find
2 1 0 | 1
0
0
0 3/2 0 | 3/4 3/2 3/4
0 0 4/3 | 1/3 2/3 1
Now eliminating the (1, 2) element we achieve
2 0
0 | 3/2 1 1/2
0 3/2 0 | 3/4 3/2 3/4
0 0 4/3 | 1/3 2/3 1
In the final step we scale each row in order that the matrix on the left takes on the form of the
identity. This requires that we multiply row 1 by 1/2, row 2 by 3/2 and row 3 by 3/4, with the
result
One should check that S 1 f indeed coincides with the x computed above.
Not all matrices possess inverses. Those that do are called invertible or nonsingular. For
example
1 2
2 4
is singular.
Some matrices can be inverted by inspection. An important class of such matrices is in fact
latent in the process of Gaussian Elimination itself. To begin, we build the elimination matrix that
enacts the elementary row operation spelled out in (2.4),
1 0 0
E1 = 1/2 1 0
0 0 1
Do you see that this matrix (when applied from the left to S) leaves rows 1 and 3 unsullied but
adds half of row one to two? This ought to be undone by simply subtracting half of row 1 from
row two, i.e., by application of
1
0 0
E11 = 1/2 1 0
0
0 1
1 0
E2 = 0 1
0 2/3
1
0
0
0
1
0
0 and E21 = 0
0 2/3 1
1
11
Again, please confirm that E2 E21 = I. Now we may express the reduction of S to U (recall (2.6))
as
E2 E1 S = U
and the subsequent reconstitution by
S = LU,
1
0
0
1
0
= 1/2
0
2/3 1
One speaks of this representation as the LU decomposition of S. We have just observed that the
inverse of a product is the product of the inverses in reverse order. Do you agree that S 1 = U 1 L1 ?
And what do you think of the statement S 1 = A1 K 1 (AT )1 ?
LU decomposition is the preferred method of solution for the large linear systems that occur in
practice. The decomposition is implemented in Matlab as
[L U] = lu(S);
and in fact lies at the heart of Matlab s blackslash command. To diagram its use, we write Sx = f
as LU x = f and recognize that the latter is nothing more than a pair of triangular problems:
Lc = f
and U x = c,
that may be solved by forward and backward substitution respectively. This representation achieves
its greatest advantage when one is asked to solve Sx = f over a large class of f vectors. For example,
if we wish to steadily increase the force, f2 , on mass 2, and track the resulting displacement we
would be well served by
[L,U] = lu(S);
f = [1 1 1];
for j=1:100,
f(2) = f(2) + j/100;
x = U \ (L \ f);
plot(x,o)
end
You are correct in pointing out that we could have also just precomputed the inverse of S and then
sequentially applied it in our for loop. The use of the inverse is, in general, considerably more costly
in terms of both memory and operation counts. The exercises will give you a chance to see this for
yourself.
2.2. A Small Planar Network
We move from uni-axial to biaxial elastic nets by first considering the swing in Figure 2.2.
12
k2
m1
m2
x1
x3
x21 + (x2 + L1 )2 L1 .
(2.8)
The price one pays for moving to higher dimensions is that lengths are now expressed in terms of
square roots. The upshot is that the elongations are not linear combinations of the end displacements
as they were in the uni-axial case. If we presume however that the loads and stiffnesses are matched
in the sense that the displacements are small compared with the original lengths then we may
effectively ignore the nonlinear contribution
in (2.8). In order to make this precise we need only
1 + t = 1 + t/2 + O(t2 )
where the latter term signifies the remainder. With regard to e1 this allows
q
e1 = x21 + x22 + 2x2 L1 + L21 L1
q
= L1 1 + (x21 + x22 )/L21 + 2x2 /L1 L1
(2.9)
then, as the O term is even smaller, we may neglect all but the first terms in the above and so
arrive at
e1 = x2 .
To take a concrete example, if L1 is one meter and x1 and x2 are each one centimeter than x2 is
one hundred times (x21 + x22 )/(2L1 ).
13
With regard to the second spring, arguing as above, its elongation is (approximately) its stretch
along its initial direction. As its initial direction is horizontal, its elongation is just the difference
of the respective horizontal end displacements, namely,
e2 = x3 x1 .
Finally, the elongation of the third spring is (approximately) the difference of its respective vertical
end displacements, i.e.,
e3 = x4 .
We encode these three elongations in
0 1 0 0
e = Ax where A = 1 0 1 0 .
0 0 0 1
Hookes law is an elemental piece of physics and is not perturbed by our leap from uni-axial to
biaxial structures. The upshot is that the restoring force in each spring is still proportional to its
elongation, i.e., yj = kj ej where kj is the stiffness of the jth spring. In matrix terms,
k1 0 0
y = Ke where K = 0 k2 0 .
0 0 k3
y2 f1 = 0 and y1 f2 = 0,
while balancing horizontal and vertical forces at m2 brings
y2 f3 = 0 and y3 f4 = 0.
We assemble these into
By = f
0 1 0
1 0 0
where B =
0 1 0 ,
0 0 1
k2
0
S = AT KA =
k2
0
k2 0 k2
0 k1 0
0 0
0
0 0
0
0 k2 0
k1 0
0
.
0 k2 0
0
0 k3
brings
0
x1
f1
0
x2 = f2
0 x3 f1 + f3
k3
x4
f4
14
x4 = f4 /k3 ,
0 = f1 + f3 ,
x2 = f2 /k1 ,
x1 x3 = f1 /k2 .
f1 /k2
0
f2 /k1
and x = f2 /k1
x=
0
f1 /k2
f4 /k3
f4 /k3
satisfy Sx = f . In fact, one may add to either an arbitrary multiple of
1
0
z
1
0
(2.10)
and still have a solution of Sx = f . Searching for the source of this lack of uniqueness we observe
some redundancies in the columns of S. In particular, the third is simply the opposite of the first.
As S is simply AT KA we recognize that the original fault lies with A, where again, the first and
third columns are opposites. These redundancies are encoded in z in the sense that
Az = 0.
Interpreting this in mechanical terms, we view z as a displacement and Az as the resulting elongation. In Az = 0 we see a nonzero displacement producing zero elongation. One says in this case
that the truss deforms without doing any work and speaks of z as an unstable mode. Again, this
mode could have been observed by a simple glance at Figure 2.2. Such is not the case for more
complex structures and so the engineer seeks a systematic means by which all unstable modes may
be identified. We shall see in Chapter 3 that these modes are captured by the null space of A.
From Sz = 0 one easily deduces that S is singular. More precisely, if S 1 were to exist then S 1 Sz
would equal S 1 0, i.e., z = 0, contrary to (2.10). As a result, Matlab will fail to solve Sx = f
even when f is a force that the truss can equilibrate. One way out is to use the pseudoinverse, as
we shall see below.
15
10
13
11
20
14
18
17
15
12
16
8
6
(2.11)
for fiber j, as depicted in the figure below, connecting node m to node n and making the angle j
with the positive horizontal axis when node m is assumed to lie at the point (0, 0). The reader
should check that our expressions for e1 and e3 indeed conform to this general formula and that e2
and e4 agree with ones intuition. For example, visual inspection of the specimen suggests that fiber
2 can not be supposed to stretch (i.e., have positive e2 ) unless x9 > x1 and/or x2 > x10 . Does this
jibe with (2.11)?
16
2n-1
2n
original
j
x
deformed
2m
2m-1
Figure 2.5. The solid(dashed) circles correspond to the nodal positions before(after) the application
of the traction force, f .
For now let us note that every matrix possesses such a pseudo-inverse and that it may be
computed in Matlab via the pinv command. On supposing the fiber stiffnesses to each be one
and the edge traction to be of the form
f = [1 1 0 1 1 1 1 0 0 0 1 0 1 1 0 1 1 1]T ,
we arrive at x via x=pinv(S)*f and refer to Figure 2.5 for its graphical representation.
2.4. Exercises
1. With regard to Figure 2.1, (i) Derive the A and K matrices resulting from the removal of the
fourth spring (but not the third mass) and assemble S = AT KA.
17
(ii) Compute S 1 , by hand via GaussJordan, and compute L and U where S = LU by hand
via the composition of elimination matrices and their inverses. Assume throughout that with
k1 = k2 = k3 = k,
(iii) Use the result of (ii) with the load f = [0 0 F ]T to solve Sx = f by hand two ways, i.e.,
x = S 1 f and Lc = f and U x = c.
2. With regard to Figure 2.2
(i) Derive the A and K matrices resulting from the addition of a fourth (diagonal) fiber that
runs from the top of fiber one to the second mass and assemble S = AT KA.
(ii) Compute S 1 , by hand via GaussJordan, and compute L and U where S = LU by hand
via the composition of elimination matrices and their inverses. Assume throughout that with
k1 = k2 = k3 = k4 = k.
(iii) Use the result of (ii) with the load f = [0 0 F 0]T to solve Sx = f by hand two ways, i.e.,
x = S 1 f and Lc = f and U x = c.
3. Generalize figure 2.3 to the case of 16 nodes connected by 42 fibers. Introduce one stiff
(say k = 100) fiber and show how to detect it by properly choosing f . Submit your welldocumented M-file as well as the plots, similar to Figure 2.5, from which you conclude the
presence of a stiff fiber.
4. We generalize Figure 2.3 to permit ever finer meshes. In particular, with reference to the figure
below we assume N (N 1) nodes where the
horizontal and vertical fibers each have length
1/N while the diagonal fibers have length 2/N . The top row of fibers is anchored to the
ceiling.
(N1)(4N3)
N(N1)
4N2
4N1
4N
N+1
N+2
4
2
2N
4N4
4N3
10
6
7
11
4N5
N
(i) Write and test a Matlab function S=bignet(N) that accepts the odd number N and
produces the stiffness matrix S = AT KA. As a check on your work we offer a spy plot of A
when N = 5. Your K matrix should reflect the fiber lengths as spelled out in (2.1). You may
assume Yj aj = 1 for each fiber. The sparsity of A also produces a sparse S. In order to exploit
this, please use S=sparse(S) as the final line in bignet.m.
18
10
20
30
40
50
60
10
20
nz = 179
30
40
(ii) Write and test a driver called bigrun that generates S for N = 5 : 4 : 29 and for each N
solves Sx = f two ways for 100 choices of f . In particular, f is a steady downward pull on
the bottom set of nodes, with a continual increase on the pull at the center node. This can be
done via f=zeros(size(S,1),1); f(2:2:2*N) = 1e-3/N;
for j=1:100,
f(N+1) = f(N+1) + 1e-4/N;
This construction should be repeated twice, with the code that closes 2.1 as your guide. In
the first scenario, precompute S 1 via inv and then apply x = S 1 f in the j loop. In the
second scenario precompute L and U and then apply x = U \(L\f ) in the j loop. In both
cases use tic and toc to time each for loop and so produce a graph of the form
16
inv
lu
14
12
10
8
6
4
2
0
200
400
600
800
1000
1200
1400
1600
1800
degrees of freedom
Submit your well documented code, a spy plot of S when N = 9, and a time comparison like
(will vary with memory and cpu) that shown above.
19
Sx = [s1 s2 sn ]
(3.1)
... = x1 s1 + x2 s2 + + xn sn .
xn
The picture I wish to place in your minds eye is that Sx lies in the plane spanned by the columns
of S. This plane occurs so frequently that we find it useful to distinguish it with a
Definition 3.1. The column space of the m-by-n matrix S is simply the span of its columns,
i.e.,
R(S) {Sx : x Rn }.
This is a subset of Rm . The letter R stands for range.
For example, let us recall the S matrix associated with Figure 2.2. Its column space is
k
0
k
0
2
2
k1
0
0
0
4
R(S) = x1
+
x
+
x
+
x
.
:
x
R
2
3
4
k2
0
k2
0
0
0
0
k
3
k2
0
0
0
k
0
1
3
+ x2 + x3 : x R .
R(S) = x1
k2
0
0
0
0
k
3
As the remaining three columns are linearly independent we may go no further. We recognize
then R(S) as a three dimensional subspace of R4 . In order to use these ideas with any confidence
we must establish careful definitions of subspace, independence, and dimension.
A subspace is a natural generalization of line and plane. Namely, it is any set that is closed
under vector addition and scalar multiplication. More precisely,
Definition 3.2. A subset M of Rd is a subspace of Rd when
(1) p + q M whenever p M and q M , and
(2) tp M whenever p M and t R.
Let us confirm now that R(S) is indeed a subspace. Regarding (1) if p R(S) and q R(S)
then p = Sx and q = Sy for some x and y. Hence, p + q = Sx + Sy = S(x + y), i.e., (p + q) R(S).
With respect to (2), tp = tSx = S(tx) so tp R(S).
This establishes that every column space is a subspace. The converse is also true. Every subspace
is the column space of some matrix. To make sense of this we should more carefully explain what
we mean by span.
20
We shall be interested in how a subspace is situated in its ambient space. We shall have occasion
to speak of complementary subspaces and even the sum of two subspaces. Lets take care of the
latter right now,
Definition 3.4. If M and Q are subspaces of the same ambient space, Rd , we define their direct
sum
M Q {p + q : p M and q Q}
0
:tR ,
N (S) = t
1
a line in R4 .
The null space answers the question of uniqueness of solutions to Sx = f . For, if Sx = f and
Sy = f then S(x y) = Sx Sy = f f = 0 and so (x y) N (S). Hence, a solution to Sx = f
will be unique if, and only if, N (S) = {0}.
Recalling (3.1) we note that if x N (S) and x 6= 0, say, e.g., x1 6= 0, then Sx = 0 takes the
form
n
X
xj
sj .
s1 =
x
1
j=2
That is, the first column of S may be expressed as a linear combination of the remaining columns
of S. Hence, one may determine the (in)dependence of a set of vectors by examining the null space
of the matrix whose columns are the vectors in question.
Definition 3.6. The vectors {s1 , s2 , . . . , sn } are said to be linearly independent if N (S) = {0}
where S = [s1 s2 sn ].
As lines and planes are described as the set of linear combinations of one or two generators, so
too subspaces are most conveniently described as the span of a few basis vectors.
Definition 3.7. A collection of vectors {s1 , s2 , . . . , sn } in a subspace M is a basis for M when the
matrix S = [s1 s2 sn ] satisfies
21
The first stipulates that the columns of S span M while the second requires the columns of S to
be linearly independent.
3.3. A Blend of Theory and Example
Let us compute bases for the null and column spaces of the adjacency matrix associated with
the ladder below
1
7
3
8
4
1
0
0
0
0 0 0 0
1 0
1
0
0 0 0 0
0 1 0
0 0 0 0
0
0
0 1 0 0
0 1 0
A=
0
0 1 0 0 0 1
0
0
0
0
0
1 0 0 0
0
0
0
0 1 0 1 0
0
0
0
0
0 0 1 0
To determine a basis for R(A) we must find a way to discard its dependent columns. A moments
reflection reveals that columns 2 and 6 are colinear, as are columns 4 and 8. We seek, of course,
a more systematic means of uncovering these, and perhaps other less obvious, dependencies. Such
dependencies are more easily discerned from the row reduced form
1 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 1
Ared = rref(A) =
0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Recall that rref performs the elementary row operations necessary to eliminate all nonzeros below
the diagonal. For those who cant stand to miss any of the action I recommend rrefmovie.
Each nonzero row of Ared is called a pivot row. The first nonzero in each row of Ared is called
a pivot. Each column that contains a pivot is called a pivot column. On account of the staircase
nature of Ared we find that there are as many pivot columns as there are pivot rows. In our example
there are six of each and, again on account of the staircase nature, the pivot columns are the linearly
22
independent columns of Ared . One now asks how this might help us distinguish the independent
columns of A. For, although the rows of Ared are linear combinations of the rows of A no such thing
is true with respect to the columns. The answer is: pay attention only to the indices of the pivot
columns. In our example, columns {1, 2, 3, 4, 5, 7} are the pivot columns. In general
Proposition 3.1. Suppose A is m-by-n. If columns {cj : j = 1, . . . , r} are the pivot columns of
Ared then columns {cj : j = 1, . . . , r} of A constitute a basis for R(A).
Proof: Note that the pivot columns of Ared are, by construction, linearly independent. Suppose,
however, that columns {cj : j = 1, . . . , r} of A are linearly dependent. In this case there exists a
nonzero x Rn for which Ax = 0 and
xk = 0,
k 6 {cj : j = 1, . . . , r}.
(3.2)
Now Ax = 0 necessarily implies that Ared x = 0, contrary to the fact that columns {cj : j = 1, . . . , r}
are the pivot columns of Ared . (The implication Ax = 0 Ared x = 0 follows from the fact that we
may read row reduction as a sequence of linear transformations of A. If we denote the product of
these transformations by T then T A = Ared and you see why Ax = 0 Ared x = 0. The reverse
implication follows from the fact that each of our row operations is reversible, or, in the language
of the land, invertible.)
We now show that the span of columns {cj : j = 1, . . . , r} of A indeed coincides with R(A).
This is obvious if r = n, i.e., if all of the columns are linearly independent. If r < n there exists a
q 6 {cj : j = 1, . . . , r}. Looking back at Ared we note that its qth column is a linear combination
of the pivot columns with indices not exceeding q. Hence, there exists an x satisfying (3.2) and
Ared x = 0 and xq = 1. This x then necessarily satisfies Ax = 0. This states that the qth column of
A is a linear combination of columns {cj : j = 1, . . . , r} of A. End of Proof.
Let us now exhibit a basis for N (A). We exploit the already mentioned fact that N (A) =
N (Ared ). Regarding the latter, we partition the elements of x into so called pivot variables,
{xcj : j = 1, . . . , r}
There are evidently n r free variables. For convenience, let us denote these in the future by
{xcj : j = r + 1, . . . , n}.
One solves Ared x = 0 by expressing each of the pivot variables in terms of the nonpivot, or free,
variables. In the example above, x1 , x2 , x3 , x4 , x5 and x7 are pivot while x6 and x8 are free. Solving
for the pivot in terms of the free we find
x7 = 0, x5 = 0, x4 = x8 , x3 = 0, x2 = x6 , x1 = 0,
or, written as a vector,
0
0
1
0
0
0
0
1
x = x6 + x8 ,
0
0
1
0
0
0
0
1
23
(3.3)
where x6 and x8 are free. As x6 and x8 range over all real numbers the x above traces out a plane
in R8 . This plane is precisely the null space of A and (3.3) describes a generic element as the
linear combination of two basis vectors. Compare this to what Matlab returns when faced with
null(A,r). Abstracting these calculations we arrive at
Proposition 3.2. Suppose that A is m-by-n with pivot indices {cj : j = 1, . . . , r} and free indices
{cj : j = r + 1, . . . , n}. A basis for N (A) may be constructed of n r vectors {z1 , z2 , . . . , znr }
where zk , and only zk , possesses a nonzero in its cr+k component.
With respect to our ladder the free indices are c7 = 6 and c8 = 8. You still may be wondering what
R(A) and N (A) tell us about the ladder that did not already know. Regarding R(A) the answer
will come in the next chapter. The null space calculation however has revealed two independent
motions against which the ladder does no work! Do you see that the two vectors in (3.3) encode
rigid vertical motions of bars 4 and 5 respectively? As each of these lies in the null space of A
the associated elongation is zero. Can you square this with the ladder as pictured in figure 3.1? I
hope not, for vertical motion of bar 4 must stretch bars 1,2,6 and 7. How does one resolve this
(apparent) contradiction?
We close a few more examples. We compute bases for the column and null spaces of
1 1 0
A=
1 0 1
Subtracting the first row from the second lands us at
1 1 0
Ared =
0 1 1
hence both rows are pivot rows and columns 1 and 2 are pivot columns. Proposition 3.1 then
informs us that the first two columns of A, namely
1
1
(3.4)
,
0
1
comprise a basis for R(A). In this case, R(A) = R2 .
Regarding N (A) we express each row of Ared x = 0 as the respective pivot variable in terms of
the free. More precisely, x1 and x2 are pivot variables and x3 is free and Ared x = 0 reads
x1 + x2 = 0
x2 + x3 = 0
Working from the bottom up we find
x2 = x3
and x1 = x3
In other words
: x3 R
N (A) = x3 1
1
24
and
1
1
1
The column space of A was already the whole space and so adding a column changes, with respect
to R(A), nothing. That is, R(B) = R(A) and (3.4) is a basis for R(B).
Regarding N (B) we again subtract the first row from the second,
1 1 0 2
Bred =
0 1 1 1
and identify x1 and x2 as pivot variables and x3 and x4 as free. We see that Bred x = 0 means
x1 + x2 + 2x4 = 0
x2 + x3 + x4 = 0
or, equivalently,
x2 = x3 + x4
and so
and
and x1 = x3 3x4
3
1
1
1
N (B) = x3 + x4 : x3 R, x4 R
0
1
1
0
3
1
1
, 1
1 0
0
1
(i) We first take a concrete example. Report the findings of null when applied to A and AT A
for the A matrix associated with Figure 3.1.
(ii) For arbitrary A show that N (A) N (AT A), i.e., that if Ax = 0 then AT Ax = 0.
(iii) For arbitrary A show that N (AT A) N (A), i.e., that if AT Ax = 0 then Ax = 0. (Hint:
if AT Ax = 0 then xT AT Ax = 0 and says something about kAxk, recall that kyk2 y T y.)
4. Suppose that A is m-by-n and that N (A) = Rn . Argue that A must be the zero matrix.
5. Suppose that both {s1 , . . . , sn } and {t1 , . . . , tm } are both bases for the subspace M . Prove
that m = n and hence that our notion of dimension makes sense.
26
constitute a basis for N (A) while the pivot rows of Ared are
1
0
0
0
1
0
0
0
1
0
0
0
x1 = , x2 = , x3 = ,
0
0
0
0
1
0
0
0
0
0
0
0
and
0
0
0
1
x4 = ,
0
0
0
1
0
0
0
0
x5 = ,
1
0
0
0
0
0
0
0
x6 = .
0
0
1
0
{z1 , z2 , x1 , x2 , x3 , x4 , x5 , x6 }
comprises a set of 8 linearly independent vectors in R8 . These vectors then necessarily span R8 .
For, if they did not, there would exist nine linearly independent vectors in R8 ! In general, we find
Fundamental Theorem of Linear Algebra (Preliminary). Suppose A is m-by-n and has
rank r. The row space, R(AT ), and the null space, N (A), are respectively r and n r dimensional
subspaces of Rn . Each x Rn may be uniquely expressed in the form
x = xR + xN ,
(4.1)
In order to compute a basis for N (AT ) we merely mimic the construction of the previous section.
Namely, we compute (AT )red and then solve for the pivot variables in terms of the free ones.
With respect to the A matrix associated with the unstable ladder of 3.4, we find
1 1 0
0
0 0 0
0
0 0
0 1 0 0 0
0
0 0 0
0
0 1 1 0
0
0 1 0 0
0
0 0
AT =
0
0
0 1 1 0
0 0
0 0
0
1
0 0 0
0
0 0
0
0
0 0 1 1
0 0
0
0
1 0 0
0
and
1
0
0
= rref(AT ) =
0
0
0
0
(AT )red
0 1 0 0 0 0 0
1 1 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 0
.
0 0 0 0 1 0 1
0 0 0 0 0 1 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0
respectively. Solving (AT )red x = 0 for the pivot variables in terms of the free we find
x7 = x8 , x6 = x8 , x5 = 0, x4 = 0, x2 = x3 , x1 = x3 ,
or in vector form,
1
0
1
0
1
0
0
0
x = x3 + x8 .
0
0
0
1
0
1
0
1
These two vectors constitute a basis for N (AT ) and indeed they are both orthogonal to every column
of A. We have now exhibited means by which one may assemble bases for the four fundamental
subspaces. In the process we have established
Fundamental Theorem of Linear Algebra. Suppose A is m-by-n and has rank r. One has the
orthogonal direct sums
Rn = R(AT ) N (A) and Rm = R(A) N (AT )
29
dimR(A) = dimR(AT ) = r
dimN (A) = n r
dimN (AT ) = m r.
We shall see many applications of this fundamental theorem. Perhaps one of the most common
is the use of the orthogonality of R(A) and N (AT ) in the characterization of those b for which an
x exists for which Ax = b. There are many instances for which R(A) is quite large and unwieldy
while N (AT ) is small and therefore simpler to grasp. As an example, consider the (n 1)-by-n
first order difference matrix with 1 on the diagonal and 1 on the super diagonal,
1 1 0 0
0 1 1 0
A=
0 1 1
It is not difficult to see that N (AT ) = {0} and so, R(A), being its orthogonal complement, is the
entire space, Rn1 . That is, for each b Rn1 there exists an x Rn such that Ax = b. The
uniqueness of such an x is decided by N (A). We recognize this latter space as the span of the
vector of ones. But you already knew that adding a constant to a function does not change its
derivative.
4.3. Exercises
1. True or false: support your answer.
(i) If A is square then R(A) = R(AT ).
(ii) If A and B have the same four fundamental subspaces then A=B.
2. Construct bases (by hand) for the four subspaces associated with
1 1 1
.
A=
1 0 1
Also provide a careful sketch of these subspaces.
4. Why is there no matrix whose row space and null space both contain the vector [1 1 1]T ?
5. Write down a matrix with the required property or explain why no such matrix exists.
(a) Column space contains [1 0 0]T and [0 0 1]T while row space contains [1 1]T and [1 2]T .
(b) Column space has basis [1 1 1]T while null space has basis [1 2 1]T .
(c) Column space is R4 while row space is R3 .
6. One often constructs matrices via outer products, e.g., given a n-by-1 vector v let us consider
A = vv T .
(a) Show that v is a basis for R(A),
(b) Show that N (A) coincides with all vectors perpendicular to v.
(c) What is the rank of A?
30
5. Least Squares
We learned in the previous chapter that Ax = b need not possess a solution when the number of
rows of A exceeds its rank, i.e., r < m. As this situation arises quite often in practice, typically in
the guise of more equations than unknowns, we establish a rationale for the absurdity Ax = b.
5.1. The Normal Equations
The goal is to choose x such that Ax is as close as possible to b. Measuring closeness in terms
of the sum of the squares of the components we arrive at the least squares problem of minimizing
kAx bk2 (Ax b)T (Ax b)
(5.1)
over all x Rn . The path to the solution is illuminated by the Fundamental Theorem. More
precisely, we write
b = bR + bN
On noting that (i) (Ax bR ) R(A) for every x Rn and (ii) R(A) N (AT ) we arrive at the
Pythagorean Theorem
kAx bk2 = kAx bR bN k2 = kAx bR k2 + kbN k2 ,
(5.2)
It is now clear from (5.2) that the best x is the one that satisfies
Ax = bR .
(5.3)
As bR R(A) this equation indeed possesses a solution. We have yet however to specify how one
computes bR given b. Although an explicit expression for bR , the so called orthogonal projection
of b onto R(A), in terms of A and b is within our grasp we shall, strictly speaking, not require it.
To see this, let us note that if x satisfies (5.3) then
Ax b = Ax bR bN = bN .
(5.4)
As bN is no more easily computed than bR you may claim that we are just going in circles. The
practical information in (5.4) however is that (Ax b) N (AT ), i.e., AT (Ax b) = 0, i.e.,
AT Ax = AT b.
(5.5)
As AT b R(AT ) regardless of b this system, often referred to as the normal equations, indeed
has a solution. This solution is unique so long as the columns of AT A are linearly independent,
i.e., so long as N (AT A) = {0}. Recalling Chapter 2, Exercise 2, we note that this is equivalent to
N (A) = {0}. We summarize our findings in
Proposition 5.1. The set of x Rn for which the misfit kAx bk2 is smallest is composed of
those x for which
AT Ax = AT b.
There is always at least one such x.
There is exactly one such x iff N (A) = {0}.
1
1 1
and b = 1 .
A= 0 1
1
0 0
31
b
1
0.8
y3
0.6
0.4
0.2
R(A)
0
2
2
1
0
0
1
1
2
where B = AT diag(Ax).
32
(5.6)
Though conceptually simple this is not of great use in practice, for B is 18-by-20 and hence (5.6)
possesses many solutions. The way out is to compute k as the result of more than one experiment.
We shall see that, for our small sample, 2 experiments will suffice.
To be precise, we suppose that x(1) is the displacement produced by loading f (1) while x(2) is the
displacement produced by loading f (2) . We then piggyback the associated pieces in
(1)
T
f
A diag(Ax(1) )
and f =
.
B=
T
(2)
f (2)
A diag(Ax )
This B is 36-by-20 and so the system Bk = f is overdetermined and hence ripe for least squares.
We proceed then to assemble B and f . We suppose f (1) and f (2) to correspond to horizontal
and vertical stretching
f (1) = [1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0]T
f (2) = [0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1]T
respectively. For the purpose of our example we suppose that each kj = 1 except k8 = 5. We
assemble AT KA as in Chapter 2 and solve
AT KAx(j) = f (j)
with the help of the pseudoinverse. In order to impart some reality to this problem we taint
each x(j) with 10 percent noise prior to constructing B. Please see the attached Mfile for details.
Regarding
B T Bk = B T f
we note that Matlab solves this system when presented with k=B\f when B is rectangular. We
have plotted the results of this procedure in the figure below
6
fiber stiffness
10
15
20
fiber number
33
25
5.3. Projections
From an algebraic point of view (5.5) is an elegant reformulation of the least squares problem.
Though easy to remember it unfortunately obscures the geometric content, suggested by the word
projection, of (5.4). As projections arise frequently in many applications we pause here to develop
them more carefully.
With respect to the normal equations we note that if N (A) = {0} then
x = (AT A)1 AT b
and so the orthogonal projection of b onto R(A) is
bR = Ax = A(AT A)1 AT b.
(5.7)
P = A(AT A)1 AT ,
(5.8)
Defining
(5.7) takes the form bR = P b. Commensurate with our notion of what a projection should be we
expect that P map vectors not in R(A) onto R(A) while leaving vectors already in R(A) unscathed.
More succinctly, we expect that P bR = bR , i.e., P P b = P b. As the latter should hold for all b Rm
we expect that
P 2 = P.
(5.9)
With respect to (5.8) we find that indeed
P 2 = A(AT A)1 AT A(AT A)1 AT = A(AT A)1 AT = P.
We also note that the P in (5.8) is symmetric. We dignify these properties through
Definition 5.1. A matrix P that satisfies P 2 = P is called a projection. A symmetric projection
is called an orthogonal projection.
We have taken some pains to motivate the use of the word projection. You may be wondering
however what symmetry has to do with orthogonality. We explain this in terms of the tautology
b = P b + (I P )b.
Now, if P is a projection then so too is (I P ). Moreover, if P is symmetric then the dot product
of bs two constituents is
(P b)T (I P )b = bT P T (I P )b = bT (P P 2 )b = bT 0b = 0,
i.e., P b is orthogonal to (I P )b.
As examples of nonorthogonal projections we offer
1
0
0
1 0
0
0
and 1/2
1 0
1/4 1/2 1
Finally, let us note that the central formula, P = A(AT A)1 AT , is even a bit more general than
advertised. It has been billed as the orthogonal projection onto the column space of A. The need
often arises however for the orthogonal projection onto some arbitrary subspace M . The key to
34
using the old P is simply to realize that every subspace is the column space of some matrix. More
precisely, if
{x1 , . . . , xm }
is a basis for M then clearly if these xj are placed into the columns of a matrix called A then
R(A) = M . For example, if M is the line through [1 1]T then
1 1 1
1 1
1 1 =
P =
1 2
2 1 1
1. A steal beam was stretched to lengths = 6, 7, and 8 feet under applied forces of f = 1, 2,
and 4 tons. Assuming Hookes law L = cf , find its compliance, c, and original length, L,
by least squares.
2. With regard to the example of 5.3 note that, due to the the random generation of the noise
that taints the displacements, one gets a different answer every time the code is invoked.
(i) Write a loop that invokes the code a statistically significant number of times and submit
bar plots of the average fiber stiffness and its standard deviation for each fiber, along with the
associated Mfile.
(ii) Experiment with various noise levels with the goal of determining the level above which it
becomes difficult to discern the stiff fiber. Carefully explain your findings.
3. Find the matrix that projects R3 onto the line spanned by [1 0 1]T .
4. Find the matrix that projects R3 onto the plane spanned by [1 0 1]T and [1 1 1]T .
5. If P is the projection of Rm onto a kdimensional subspace M , what is the rank of P and what
is R(P )?
35
y2
y1
Ri
x2
y3
y6
Rm
Rm
Rm
Cm
x3
i0
Cm
y4
y7
y5
Em
Cm
y8
Em
Em
0
1
0
0
e = b Ax where b = Em
1
0
0
1
36
e4 = x2 ,
e8 = x3 Em ,
1 0
0
1 0
0
0
1 1
0 1 0
and A =
0 1 0
0 1 1
0
0 1
0
0 1
In (S2) we must now augment Ohms law with voltagecurrent law obeyed by a capacitor, namely
the current through a capacitor is proportional to the time rate of change of the potential across
it. This yields, (denoting d/dt by ),
y1 = Cm e1 , y2 = e2 /Rm , y3 = e3 /Ri ,
y5 = e5 /Rm , y6 = e6 /Ri , y7 = Cm e7 ,
y4 = Cm e4 ,
y8 = e8 /Rm ,
G = diag(0 Gm Gi 0 Gm Gi 0 Gm )
C = diag(Cm 0 0 Cm 0 0 Cm 0)
(6.1)
This is the general form of the potential equations for an RC circuit. It presumes of the user
knowledge of the initial value of each of the potentials,
x(0) = X.
(6.2)
Gi + Gm
Gi
0
Cm 0
0
2Gi + Gm
Gi
AT CA = 0 Cm 0 AT GA = Gi
0
Gi
Gi + Gm
0
0 Cm
0
Gm
AT Gb = Em Gm and AT Cb = 0 .
0
Gm
and an initial (rest) potential of
x(0) = Em [1 1 1]T .
We shall now outline two modes of attack on such problems. The Laplace Transform is an analytical
tool that produces exact, closedform, solutions for small tractable systems and therefore offers
insight into how larger systems should behave. The BackwardEuler method is a technique for
solving a discretized (and therefore approximate) version of (6.1). It is highly flexible, easy to code,
and works on problems of great size. Both the BackwardEuler and Laplace Transform methods
37
require, at their core, the algebraic solution of a linear system of equations. In deriving these
methods we shall find it more convenient to proceed from the generic system
x = Bx + g.
(6.3)
and
B = (AT CA)1 AT GA
(Gi + Gm )
Gi
0
1
Gi
(2Gi + Gm )
Gi
=
Cm
0
Gi
(Gi + Gm )
(6.4)
Em Gm + i0
1
Em Gm .
g = (AT CA)1 (AT Gb + f ) =
Cm
Em Gm
where s is a complex variable. We shall soon take a 2 chapter dive into the theory of complex
variables and functions. But for now let us proceed calmly and confidently and follow the lead of
Matlab . For example
>> syms t
>> laplace(exp(t))
ans = 1/(s-1)
>> laplace(t*exp(-t))
ans = 1/(s+1)2
The Laplace Transform of a matrix of functions is simply the matrix of Laplace transforms of
the individual elements. For example
t
e
1/(s 1)
L
=
.
tet
1/(s + 1)2
Now, in preparing to apply the Laplace transform to (6.3) we write it as
Lx = L(Bx + g)
(6.5)
and so must determine how L acts on derivatives and sums. With respect to the latter it follows
directly from the definition that
L(Bx + g) = LBx + Lg = BLx + Lg.
38
(6.6)
(6.7)
(6.8)
The only thing that distinguishes this system from those encountered since chapter 1 is the presence
of the complex variable s. This complicates the mechanical steps of Gaussian Elimination or the
GaussJordan Method but the methods indeed apply without change. Taking up the latter method,
we write
Lx = (sI B)1 (Lg + x(0)).
The matrix (sI B)1 is typically called the resolvent of B at s. We turn to Matlab for its
symbolic calculation. For example,
>> syms s
>> B = [2 -1;-1 2]
>> R = inv(s*eye(2)-B)
R =
[ (s-2)/(s*s-4*s+3), -1/(s*s-4*s+3)]
[ -1/(s*s-4*s+3), (s-2)/(s*s-4*s+3)]
We note that (sI B)1 is well defined except at the roots of the quadratic, s2 4s + 3. This
quadratic is the determinant of (sIB) and is often referred to as the characteristic polynomial
of B. The roots of the characteristic polynomial are called the eigenvalues of B. We will develop
each of these new mathematical objects over the coming chapters. We mention them here only to
point out that they are all latent in the resolvent.
As a second example let us take the B matrix of (6.4) with the parameter choices specified in
fib3.m, namely
5 3
0
1
3 8 3 .
B=
(6.9)
10
0
3 5
The associated resolvent is
(sI B)1
where
(6.10)
is the characteristic polynomial of B. Assuming a current stimulus of the form i0 (t) = t exp(t/4)/1000,
and Em = 0 brings
1.965/(s + 1/4)2
0
(Lg)(s) =
0
Lx = (sI B)1 Lg
s + 1.3s + 0.31
1.965
0.3s + 0.15
=
2
3
(s + 1/4) (s + 1.8s2 + 0.87s + 0.11)
0.09
Now comes the rub. A simple linear solve (or inversion) has left us with the Laplace transform of
x. The accursed No Free Lunch Theorem informs us that we shall have to do some work in order
to recover x from Lx.
In coming sections we shall establish that the inverse Laplace transform of a function h is
Z
1
1
(L h)(t) =
h(s) exp(st) ds,
(6.11)
2i C
where s runs along C, a closed curve in the complex plane that encircles all of the singularities of
h. We dont suppose the reader to have yet encountered integration in the complex plane and so
please view (6.11) as a preview pf coming attractions.
With the inverse Laplace transform one may express the solution of (6.3) as
x(t) = L1 (sI B)1 (Lg + x(0)).
(6.12)
(6.13)
The singularities, or poles, are the points s at which Lx1 (s) blows up. These are clearly the roots
of its denominator, namely
11/10, 1/2, 1/5 and
1/4,
(6.14)
and hence the curve, C, in (6.11), must encircle these. We turn to Matlab however to actually
evaluate (6.11). Referring to fib3.m for details we note that the ilaplace command produces
x1 (t) = 1.965 8et/2 (1/17)et/4 (40912/17 + 76t)
t/5
+ (400/3)e
40
11t/10
+ (200/867)e
(6.15)
6
x
x
x
1
2
3
x (mV)
10
15
20
25
30
35
40
45
50
t (ms)
(I/dt B)
x(2dt) = x(dt)/dt + g(2dt)
and solve for x(2dt). The general step from past to present,
x(jdt) = (I/dt B)1 (
x((j 1)dt)/dt + g(jdt)),
(6.17)
is repeated until some desired final time, T dt, is reached. This equation has been implemented in
fib3.m with dt = 1 and B and g as above. The resulting x (run fib3 yourself!) is indistinguishable
from figure 6.2.
Comparing the two representations, (6.12) and (6.17), we see that they both produce the solution
to the general linear system of ordinary equations, (6.3), by simply inverting a shifted copy of B.
The former representation is hard but exact while the latter is easy but approximate. Of course we
should expect the approximate solution, x, to approach the exact solution, x, as the time step, dt,
approaches zero. To see this let us return to (6.17) and assume, for now, that g 0. In this case,
one can reverse the above steps and arrive at the representation
x(jdt) = ((I dtB)1 )j x(0).
41
(6.18)
Now, for a fixed time t we suppose that dt = t/j and ask whether
x(t) = lim ((I (t/j)B)1 )j x(0).
j
where S = AT KA,
to the dynamical equations for the displacement, x(t), due to a time varying force, f (t), and or
nonequilibrium initial conditions, by simply appending the Newtonian inertial terms, i.e.,
M x (t) + Sx(t) = f (t),
x(0) = x0 ,
x (0) = v0 ,
(6.19)
where M is the diagonal matrix of node masses, x0 denotes their initial displacement and v0 denotes
their initial velocity.
We transform this system of second order differential equations to an equivalent first order system
by introducing
u1 x and u2 u1
and then noting that (6.19) takes the form
u2 = x = M 1 Su1 + M 1 f (t).
As such, we find that u = (u1 u2 )T obeys the familiar
u = Bu + g,
where
B=
0
I
,
M 1 S 0
g=
u(0) = u0
M 1 f
(6.20)
x0
u0 =
.
v0
(6.21)
Let us consider the concrete example of the chain of three masses in Fig. 2.1. If each node has mass
m and each spring has stiffness k then
2 1 0
k
1 2 1 .
(6.22)
M 1 S =
m
0 1 2
The associated characteristic polynomial of B is
where c k/m,
(6.23)
is a cubic in s2 with simple roots at 2c and 2c 2c. And so the eigenvalues of B are the three
purely imaginary numbers
q
q
1 = i 2c 2c, 2 = i 2c, 3 = i 2c + 2c
(6.24)
3c s + 4cs3 + s5
1
cs(s2 + 2c) .
Lu1 (s) =
(6.25)
B (s)
c2 s
On computing the inverse Laplace Transform we (will) find
x1 (t) =
6
X
exp(j t)(3c j +
4c3j
j )
,
B (s) s=j
(s
5j ) exp(j t)
j=1
(6.26)
that is, x1 is a weighted sum of exponentials. As each of the rates are purely imaginary it follows
that our masses will simply oscillate according to weighted sums of three sinusoids. For example,
note that
exp(2 t) = cos( 2ct) + i sin( 2ct) and exp(5 t) = cos( 2ct) i sin( 2ct).
Of course such sums may reproduce quite complicated behavior. We have illustrated each of the
displacements in Figure 6.3 in the case that c = 1. Rather than plotting the complex, yet explicit,
expression (6.26) for x1 , we simply implement the Backward Euler scheme of the previous section.
5
4+x1
2+x
x3
10
20
30
40
50
60
70
80
90
100
Figure 6.3. The displacements of the 3 masses in Fig. 2.1, with k/m = 1, following an initial displacement of the first mass. For viewing purposes we have offset the three displacements. chain.m
43
6.5. Exercises
1. Compute, without the aid of a machine, the Laplace transforms of et and tet . Show all of your
work.
2. Extract from fib3.m analytical expressions for x2 and x3 .
3. Use eig to compute the eigenvalues of B as given in (6.9). Use poly to compute the characteristic polynomial of B. Use roots to compute the roots of this characteristic polynomial.
Compare these to the results of eig. How does Matlab compute the roots of a polynomial?
(type type roots) for the answer). Submit a Matlab diary of your findings.
4. Adapt the Backward Euler portion of fib3.m so that one may specify an arbitrary number
of compartments, as in fib1.m. As B, and so S, is now large and sparse please create the
sparse B via spdiags and the sparse I via speye, and then prefactor S into LU and use
U \L\ rather than S\ in the time loop. Experiment to find the proper choice of dt. Submit
your well documented M-file along with a plot of x1 and x50 versus time (on the same well
labeled graph) for a 100 compartment cable.
5. Derive (6.18) from (6.17) by working backwards toward x(0). Along the way you should explain
why (I/dt B)1 /dt = (I dtB)1 .
6. Show, for scalar B, that ((1 (t/j)B)1 )j exp(Bt) as j . Hint: By definition
((1 (t/j)B)1 )j = exp(j log(1/(1 (t/j)B)))
now use LHopitals rule to show that j log(1/(1 (t/j)B)) Bt.
7. If we place a viscous damper in parallel with each spring in Figure 2.1 as below
x1
k1
m1
d1
x2
k2
k3
m2
f1
d2
x3
k4
f3
d4
m3
f2
d3
(6.27)
where D = AT diag(d)A where d is the vector of damping constants. Modify chain.m to solve
this new system and use your code to reproduce the figure below.
44
4+x1
2+x
x3
10
20
30
40
50
60
70
80
90
100
Figure 6.1. The displacement of the three masses in the weakly damped chain, where k/m = 1
and d/m = 1/10.
45
(x1 + x2 ) + i(y1 + y2 )
(x1 + iy1 )(x2 + iy2 ) = (x1 x2 y1 y2 ) + i(x1 y2 + x2 y1 )
x1 iy1 ,
z1 z 2
(x1 x2 + y1 y2 ) + i(x2 y1 x1 y2 )
=
z2 z 2
x22 + y22
q
z1 + z2
z1 z2
z1
z1
z2
In addition to the Cartesian representation z = x + iy one also has the polar form
z = |z|(cos + i sin ), where (, ] and
/2,
if x = 0,
if x = 0,
/2,
= atan2(y, x) arctan(y/x)
if x > 0,
arctan(y/x) + if x < 0,
arctan(y/x) if x < 0,
y > 0,
y < 0,
y 0,
y < 0.
A complex vector (matrix) is simply a vector (matrix) of complex numbers. Vector and matrix
addition proceed, as in the real case, from elementwise addition. The dot or inner product of two
complex vectors requires, however, a little modification. This is evident when we try to use the old
notion to define the length of complex vector. To wit, note that if
1+i
z=
1i
then
z T z = (1 + i)2 + (1 i)2 = 1 + 2i 1 + 1 2i 1 = 0.
46
Now length should measure the distance from a point to the origin and should only be zero for the
zero vector. The fix, as you have probably guessed, is to sum the squares of the magnitudes of
the components of z. This is accomplished by simply conjugating one of the vectors. Namely, we
define the length of a complex vector via
p
(7.1)
kzk z T z.
|1 + i|2 + |1 i|2 =
4 = 2.
As each real number is the conjugate of itself, this new definition subsumes its real counterpart.
The notion of magnitude also gives us a way to define limits and hence will permit us to introduce
complex calculus. We say that the sequence of complex numbers, {zn : n = 1, 2, . . .}, converges to
the complex number z0 and write
zn z0
or z0 = lim zn ,
n
when, presented with any > 0 one can produce an integer N for which |zn z0 | < when n N .
As an example, we note that (i/2)n 0.
7.2. Complex Functions
A complex function is merely a rule for assigning certain complex numbers to other complex
numbers. The simplest (nonconstant) assignment is the identity function f (z) z. Perhaps the
next simplest function assigns to each number its square, i.e., f (z) z 2 . As we decomposed the
argument of f , namely z, into its real and imaginary parts, we shall also find it convenient to
partition the value of f , z 2 in this case, into its real and imaginary parts. In general, we write
f (x + iy) = u(x, y) + iv(x, y)
where u and v are both realvalued functions of two real variables. In the case that f (z) z 2 we
find
u(x, y) = x2 y 2 and v(x, y) = 2xy.
With the tools of the previous section we may produce complex polynomials
f (z) = z m + cm1 z m1 + + c1 z + c0 .
We say that such an f is of degree m. We shall often find it convenient to represent polynomials
as the product of their factors, namely
f (z) = (z 1 )m1 (z 2 )m2 (z h )mh .
(7.2)
is rational, that f is of order at most m 1 while g is of order m with the simple roots {1 , . . . , m }.
It should come as no surprise that such a r should admit a Partial Fraction Expansion
r(z) =
m
X
j=1
rj
.
z j
One uncovers the rj by first multiplying each side by (z j ) and then letting z tend to j . For
example, if
r1
r2
1
=
+
(7.3)
2
z +1
z+i zi
then multiplying each side by (z + i) produces
r2 (z + i)
1
= r1 +
.
zi
zi
Now, in order to isolate r1 it is clear that we should set z = i. So doing we find r1 = i/2. In order
to find r2 we multiply (7.3) by (z i) and then set z = i. So doing we find r2 = i/2, and so
i/2
i/2
1
=
+
.
+1
z+i zi
Returning to the general case, we encode the above in the simple formula
z2
rj = lim (z j )r(z).
zj
(7.4)
(7.5)
(zI B)
1
z
= 2
z + 1 1
1
1/2
=
z + i i/2
1
z
(7.7)
1
i/2
1/2 i/2
+
.
1/2
z i i/2 1/2
The first line comes from either Gauss-Jordan by hand or via the symbolic toolbox in Matlab .
More importantly, the second line is simply an amalgamation of (7.3) and (7.4). Complex matrices
have finally entered the picture. We shall devote all of Chapter 9 to uncovering the remarkable
properties enjoyed by the matrices that appear in the partial fraction expansion of (zI B)1 .
Have you noticed that, in our example, the two matrices are each projections, that they sum to I,
and that their product is 0? Could this be an accident? To answer this we will also need to develop
(zI B)1 in a geometric expansion. At its simplest, the n-term geometric series, for z 6= 1, is
n1
X
zk =
k=0
48
1 zn
.
1z
(7.8)
We will prove this in the exercises and use it to appreciate the beautiful orthogonality of the columns
of the Fourier Matrix.
In Chapter 6 we were confronted with the complex exponential when considering the Laplace
Transform. By analogy to the real exponential we define
z
e
and find that
X
zn
n=0
n!
With this observation, the polar form is now simply z = |z|ei . One may just as easily verify that
cos =
ei + ei
2
and
sin =
ei ei
.
2i
sin z
eiz eiz
.
2i
eiz + eiz
2
and
for z = |z|ei .
zz0
f (z) f (z0 )
z z0
f (zn ) f (z0 )
zn z0
converges to the same value for every sequence {zn } that converges to z0 . In this case we naturally
call the limit f (z0 ).
Example: The derivative of z 2 is 2z.
lim
zz0
z 2 z02
(z z0 )(z + z0 )
= lim
= 2z0 .
zz0
z z0
z z0
49
X
ezz0 1
(z z0 )n
ez ez 0
z0
z0
= ez0 .
= e lim
= e lim
lim
zz0 z z0
zz0
zz0 z z0
(n + 1)!
n=0
This last example suggests that when f is differentiable a simple relationship must bind its
partial derivatives in x and y.
Proposition 7.1. If f is differentiable at z0 then
f (z0 ) =
f
f
(z0 ) = i (z0 ).
x
y
zz0
f (z) f (z0 )
f (x + iy0 ) f (x0 + iy0 )
f
(z0 ).
= lim
=
xx
z z0
x x0
x
0
zz0
f
f (x0 + iy) f (x0 + iy0 )
f (z) f (z0 )
= i (z0 ).
= lim
yy
z z0
i(y y0 )
y
0
End of Proof.
In terms of the real and imaginary parts of f this result brings the CauchyRiemann equations
u
v
v
u
(7.9)
=
and
= .
x
y
x
y
Regarding the converse proposition we note that when f has continuous partial derivatives in region
obeying the CauchyRiemann equations then f is in fact differentiable in the region.
We remark that with no more energy than that expended on their real cousins one may uncover
the rules for differentiating complex sums, products, quotients, and compositions.
As one important application of the derivative let us attempt to expand in partial fractions a
rational function whose denominator has a root with degree larger than one. As a warm-up let us
try to find r1,1 and r1,2 in the expansion
r1,1
r1,2
z+2
=
+
.
(z + 1)2
z + 1 (z + 1)2
50
(7.10)
On setting z = 1 this gives r1,2 = 1. With r1,2 computed (7.10) takes the simple form z + 1 =
r1,1 (z + 1) and so r1,1 = 1 as well. Hence
1
1
z+2
=
+
.
2
(z + 1)
z + 1 (z + 1)2
This latter step grows more cumbersome for roots of higher degree. Let us consider
r1,1
r1,2
r1,3
(z + 2)2
=
+
+
.
3
2
(z + 1)
z + 1 (z + 1)
(z + 1)3
The first step is still correct: multiply through by the factor at its highest degree, here 3. This
leaves us with
(z + 2)2 = r1,1 (z + 1)2 + r1,2 (z + 1) + r1,3 .
(7.11)
Setting z = 1 again produces the last coefficient, here r1,3 = 1. We are left however with one
equation in two unknowns. Well, not really one equation, for (7.11) is to hold for all z. We exploit
this by taking two derivatives, with respect to z, of (7.11). This produces
2(z + 2) = 2r1,1 (z + 1) + r1,2
and 2 = 2r1,1 .
The latter of course needs no comment. We derive r1,2 from the former by setting z = 1. We
generalize from this example and arrive at
Proposition 7.2. The First Residue Theorem. The ratio, r = f /g, of two polynomials where
the order of f is less than that of g and g has h distinct roots {1 , . . . , h } of respective degrees
{m1 , . . . , mh }, may be expanded in partial fractions
r(z) =
mj
h X
X
j=1 k=1
rj,k
(z j )k
(7.12)
where, as above, the residue rj,k is computed by first clearing the fraction and then taking the
proper number of derivatives and finally clearing their powers. That is,
rj,k = lim
zj
dmj k
1
{(z j )mj r(z)}.
(mj k)! dz mj k
(7.13)
As an application, we consider
1 0 0
B = 1 3 0
0 1 1
1
0
z1
1
1
1
R(z) = (zI B) = (z1)(z3)
z3
1
(z1)2 (z3)
1
(z1)(z3)
(7.14)
0
0
1
z1
1
1
1
R1,1 +
R
+
R2,1 .
1,2
z1
(z 1)2
z3
51
and r1,2 =
1
(z 1)2
1
0
1
1/2
0
=
z 1 1/4 1/2
0
0
1
1/2 1
+
z 3 1/4 1/2
1
z3
(1) =
1
2
1
(3) = .
4
0
0
0 0
1
0
0 +
0 0
2
(z
1)
1
1/2 0 0
0
0 .
0
(7.15)
(7.16)
(7.17)
In closing, we that the method of partial fraction expansions has been implemented in Matlab .
In fact, (7.15) and (7.16) follow from the single command
[r,p]=residue([0 0 0 1],[1 -5 7 -3]).
The first input argument is Matlab -speak for the polynomial f (z) = 1 while the second argument
corresponds to the denominator
g(z) = (z 1)2 (z 3) = z 3 5z 2 + 7z 3.
7.4. Exercises
1. Express |ex+iy | in terms of x and/or y.
2. Suppose z 6= 1 and define the n-term geometric series
n1
X
zk ,
k=0
and show, by brute force, that z = 1 z n . Derive (7.8) from this result.
3. Confirm that eln z = z and ln ez = z.
4. Find the real and imaginary parts of cos z and sin z. Express your answers in terms of regular
and hyperbolic trigonometric functions.
5. Show that cos2 z + sin2 z = 1.
52
1X
exp(2i(k 1)(j m)/n).
F (j, k)F (k, m) =
n
k=1
k=1
(7.19)
Conclude that this sum is 1 when j = m. To show that the sum is zero when j 6= m set
z = exp(2i(j m)/n) and recognize in (7.19) the n-term geometric series of (7.8).
8. Verify that sin z and cos z satisfy the Cauchy-Riemann equations (7.9) and use Proposition 7.1
to evaluate their derivatives.
9. Compute, by hand the partial fraction expansion of the rational function that we arrived at in
(6.13). That is, find r1,1 , r1,2 , r2 , r3 and r4 in
s2 + 1.3s + 0.31
(s + 1/4)2 (s + 1/2)(s + 1/5)(s + 11/10)
r1,2
r2
r3
r4
r1,1
+
+
+
+
.
=
2
s + 1/4 (s + 1/4)
s + 1/2 s + 1/5 s + 11/10
r(s) =
2 1 0
B = 1 2 1 .
0 1 2
You should achieve
2
1
1
1
(sI B)1 =
2
2
2
s (2 + 2) 4
1
1
2
1
2
1
1 0 1
1 1
1
1
0 0 0 +
+
2 2
2
s 2 2 1 0 1
s (2 2) 4
2 1
1
53
8. Complex Integration
Our main goal is a better understanding of the partial fraction expansion of a given resolvent.
With respect to the example that closed the last chapter, see (7.14)(7.17), we found
(zI B)1 =
1
1
1
D1 +
P1 +
P2
2
z 1
(z 1 )
z 2
and BP2 = P2 B = 2 P2
P22
P1 + P2 = I,
= P1 ,
= P2 , and D12 = 0,
P1 D1 = D1 P1 = D1 and P2 D1 = D1 P2 = 0.
In order to show that this always happens, i.e., that it is not a quirk produced by the particular
B in (7.14), we require a few additional tools from the theory of complex variables. In particular,
we need the fact that partial fraction expansions may be carried out through complex integration.
8.1. Cauchys Theorem
We shall be integrating complex functions over complex curves. Such a curve is parametrized
by one complex valued or, equivalently, two real valued, function(s) of a real parameter (typically
denoted by t). More precisely,
C {z(t) = x(t) + iy(t) : t1 t t2 }.
For example, if x(t) = y(t) = t from t1 = 0 to t2 = 1, then C is the line segment joining 0 + i0 to
1 + i.
We now define
Z
Z t2
f (z) dz
f (z(t))z (t) dt.
C
t1
{cos(2t) + i sin(2t)} dt = 0.
Remaining with the unit circle but now integrating f (z) = 1/z we find
Z
Z 2
1
z dz =
eit ieit dt = 2i.
C
We generalize this calculation to arbitrary (integer) powers over arbitrary circles. More precisely,
for integer m and fixed complex a we integrate (z a)m over
C(a, ) {a + eit : 0 t 2},
54
= i
m+1
m+1
ei(m+1)t dt
Z0 2
(8.1)
= i
{cos((m + 1)t) + i sin((m + 1)t)} dt
0
(
2i if m = 1,
=
0
otherwise,
regardless of the size of !
When integrating more general functions it is often convenient to express the integral in terms
of its real and imaginary parts. More precisely
Z
Z
f (z) dz = {u(x, y) + iv(x, y)}{dx + idy}
C
ZC
Z
= {u(x, y) dx v(x, y) dy} + i {u(x, y) dy + v(x, y) dx}
C
C
Z b
{u(x(t), y(t))x (t) v(x(t), y(t))y (t)} dt
=
a
Z b
{u(x(t), y(t))y (t) + v(x(t), y(t))x (t)} dt.
+i
a
dxdy
{M dx + N dy} =
x
y
C
Cin
Applying this proposition to the situation above, we find, so long as C is closed, that
Z
ZZ
ZZ
v u
u v
f (z) dz =
+
dxdy + i
dxdy.
x y
x y
C
Cin
Cin
At first glance it appears that Greens Theorem only serves to muddy the waters. Recalling the
CauchyRiemann equations however we find that each of these double integrals is in fact identically
zero! In brief, we have proven
Proposition 8.2. Cauchys Theorem. If f is differentiable on and in the closed curve C then
Z
f (z) dz = 0.
C
Strictly speaking, in order to invoke Greens Theorem we require not only that f be differentiable
but that its derivative in fact be continuous. This however is simply a limitation of our simple mode
of proof, Cauchys Theorem is true as stated.
55
This theorem, together with (8.1), permits us to integrate every proper rational function. More
precisely, if r = f /g where f is a polynomial of degree at most m 1 and g is an mth degree
polynomial with h distinct zeros at {j }hj=1 with respective multiplicities of {mj }hj=1 we found that
r(z) =
mj
h X
X
j=1 k=1
rj,k
.
(z j )k
(8.2)
Observe now that if we choose the radius j so small that j is the only zero of g encircled by
Cj C(j , j ) then by Cauchys Theorem
Z
r(z) dz =
Cj
mj
X
rj,k
Cj
k=1
1
dz.
(z j )k
In (8.1) we found that each, save the first, of the integrals under the sum is in fact zero. Hence
Z
r(z) dz = 2irj,1 .
(8.3)
Cj
With rj,1 in hand, say from (7.13) or residue, one may view (8.3) as a means for computing the
indicated integral. The opposite reading, i.e., that the integral is a convenient means of expressing
rj,1 , will prove just as useful. With that in mind, we note that the remaining residues may be
computed as integrals of the product of r and the appropriate factor. More precisely,
Z
r(z)(z j )k1 dz = 2irj,k .
(8.4)
Cj
One may be led to believe that the precision of this result is due to the very special choice of curve
and function. We shall see ...
8.2. The Second Residue Theorem
After (8.3) and (8.4) perhaps the most useful consequence of Cauchys Theorem is the freedom
it grants one to choose the most advantageous curve over which to integrate. More precisely,
Proposition 8.3. Suppose that C2 is a closed curve that lies inside the region encircled by the
closed curve C1 . If f is differentiable in the annular region outside C2 and inside C1 then
Z
Z
f (z) dz =
f (z) dz.
C1
C2
Proof: With reference to the figure below we introduce two vertical segments and define the
closed curves C3 = abcda (where the bc arc is clockwise and the da arc is counter-clockwise) and
C4 = adcba (where the ad arc is counter-clockwise and the cb arc is clockwise). By merely following
the arrows we learn that
Z
Z
Z
Z
f (z) dz =
f (z) dz +
f (z) dz +
f (z) dz.
C1
C2
C3
C4
As Cauchys Theorem implies that the integrals over C3 and C4 each vanish, we have our result.
End of Proof.
56
a
C1
C2
c
d
C2
z dz
2i2
=
.
(z 1 )(z 2 )
2 1
1
2
C1
C2
(8.5)
We may view (8.5) as a special instance of integrating a rational function around a curve that
encircles all of the zeros of its denominator. In particular, recalling (8.2) and (8.3), we find
Z
r(z) dz =
mj Z
h X
X
j=1 k=1
Cj
h
X
rj,k
dz = 2i
rj,1 .
(z j )k
j=1
(8.6)
To take a slightly more complicated example let us integrate f (z)/(z a) over some closed curve
C inside of which f is differentiable and a resides. Our Curve Replacement Lemma now permits us
to claim that
Z
Z
f (z)
f (z)
dz =
dz.
C z a
C(a,) z a
It appears that one can go no further without specifying f . The alert reader however recognizes
that the integral over C(a, ) is independent of r and so proceeds to let r 0, in which case z a
and f (z) f (a). Computing the integral of 1/(z a) along the way we are lead to the hope that
Z
f (z)
dz = f (a)2i
C za
In support of this conclusion we note that
Z
Z
f (a)
f (a)
f (z)
f (z)
dz
dz =
+
za za za
C(a,)
C(a,) z a
Z
Z
1
f (z) f (a)
= f (a)
dz +
dz.
za
C(a,) z a
C(a,)
Now the first term is f (a)2i regardless of while, as 0, the integrand of the second term
approaches f (a) and the region of integration approaches the point a. Regarding this second term,
as the integrand remains bounded as the perimeter of C(a, ) approaches zero the value of the
integral must itself be zero. End of Proof.
This result is typically known as
Proposition 8.4. Cauchys Integral Formula. If f is differentiable on and in the closed curve
C then
Z
f (z)
1
dz
(8.7)
f (a) =
2i C z a
The consequences of such a formula run far and deep. We shall delve into only one or two. First,
we note that, as a does not lie on C, the right hand side is a perfectly smooth function of a. Hence,
differentiating each side, we find
Z
Z
df (a)
1
1
d f (z)
f (z)
f (a) =
=
dz =
dz
(8.8)
da
2i C da z a
2i C (z a)2
for each a lying inside C. Applying this reasoning n times we arrive at a formula for the nth
derivative of f at a,
Z
n!
f (z)
dn f
(a)
=
dz
(8.9)
dan
2i C (z a)1+n
58
for each a lying inside C. The upshot is that once f is shown to be differentiable it must in fact be
infinitely differentiable. As a simple extension let us consider
Z
f (z)
1
dz
2i C (z 1 )(z 2 )2
where f is still assumed differentiable on and in C and that C encircles both 1 and 2 . By the
curve replacement lemma this integral is the sum
Z
Z
f (z)
f (z)
1
1
dz +
dz
2
2i C1 (z 1 )(z 2 )
2i C2 (z 1 )(z 2 )2
where j now lies in only Cj . As f (z)/(z 2 ) is well behaved in C1 we may use (8.7) to conclude
that
Z
f (z)
1
f (1 )
dz =
.
2
2i C1 (z 1 )(z 2 )
(1 2 )2
Similarly, As f (z)/(z 1 ) is well behaved in C2 we may use (8.8) to conclude that
Z
1
f (z)
d f (a)
.
dz =
2i C2 (z 1 )(z 2 )2
da (a 1 ) a=2
Proposition 8.5. The Second Residue Theorem. If g is a polynomial with roots {j }hj=1 of
degree {mj }hj=1 and C is a closed curve encircling each of the j and f is differentiable on and in
C then
Z
h
X
f (z)
dz = 2i
res(f /g, j )
C g(z)
j=1
where
1
dmj 1
res(f /g, j ) = lim
zj (mj 1)! dz mj 1
is called the residue of f /g at j by extension of (7.13).
mj
(z j )
f (z)
g(z)
One of the most important instances of this theorem is the formula for
8.3. The Inverse Laplace Transform
If r is a rational function with poles {j }hj=1 then the inverse Laplace transform of r is
Z
1
1
(L r)(t)
r(z)ezt dz
2i C
(8.10)
(L r)(t) =
h
X
res(r(z)ezt , j ).
(8.11)
j=1
Let us put this lovely formula to the test. We take our examples from chapter 6. Let us first
compute the inverse Laplace Transform of
r(z) =
1
.
(z + 1)2
59
res(1) = lim
This closes the circle on the example begun in 6.3 and continued in exercise 6.1. For our next
example we recall from (6.13) (ignoring the leading 1.965),
(s2 + 1.3s + 0.31)
Lx1 (s) =
(s + 1/4)2 (s3 + 1.8s2 + 0.87s + 0.11))
(s2 + 1.3s + 0.31)
,
=
(s + 1/4)2 (s + 1/2)(s + 1/5)(s + 11/10)
and so (8.11) dictates that
d
(s2 + 1.3s + 0.31)
x1 (t) = exp(t/4)
ds (s + 1/2)(s + 1/5)(s + 11/10) s=1/4
(s2 + 1.3s + 0.31)
+ exp(t/2)
2
(s + 1/4) (s + 1/5)(s + 11/10) s=1/2
(s2 + 1.3s + 0.31)
+ exp(t/5)
2
(s + 1/4) (s + 1/2)(s + 11/10) s=1/5
(s2 + 1.3s + 0.31)
+ exp(11t/10)
(s + 1/4)2 (s + 1/2)(s + 1/5)
(8.12)
s=10/11
Evaluation of these terms indeed confirms our declaration, (6.15), and the work in exercise 7.9.
The curve replacement lemma of course gives us considerable freedom in our choice of the curve
C used to define the inverse Laplace transform, (8.10). As in applications the poles of r are typically
in the left half of the complex plane (why?) it is common to choose C to be the half circle
C = CL () CA (),
comprised of the line segment, CL , and arc, CA ,
CL () {i : } and CA () {ei : /2 3/2},
where is chosen large enough to encircle the poles of r. With this concrete choice, (8.10) takes
the form
Z
Z
1
1
zt
1
r(z)e dz +
r(z)ezt dz
(L r)(t) =
2i CL
2i CA
(8.13)
Z
Z 3/2
1
it
i ei t i
=
r(i)e d +
r(e )e
e d.
2
2 /2
Although this second term appears unwieldy it can be shown to vanish as , in which case
we arrive at
Z
1
1
r(i)eit d,
(8.14)
(L r)(t) =
2
the conventional definition of the inverse Laplace transform.
60
8.4. Exercises
1. Compute the integral of z 2 along the parabolic segment z(t) = t + it2 as t ranges from 0 to 1.
2. Evaluate each of the integrals below and state which result you are using, e.g., The barehanded
calculation (8.1), Cauchys Theorem, The Cauchy Integral Formula, The Second Residue Theorem, and show all of your work.
Z
Z
Z
cos(z)
cos(z)
cos(z)
dz,
dz,
dz,
C(2,1) z(z 2)
C(2,1) z(z + 2)
C(2,1) z 2
Z
Z
Z
cos(z)
cos(z)
z cos(z)
dz,
dz.
dz,
3
3
z
C(0,2) z + z
C(0,2)
C(0,2) z 1
3. Let us confirm the representation (8.4) in the matrix case. More precisely, if R(z) (zI B)1
is the resolvent associated with B then (8.4) states that
mj
h X
X
R(z) =
j=1 k=1
where
Rj,k
1
=
2i
Cj
Rj,k
(z j )k
(8.15)
Compute the Rj,k per (8.15) for the B in (7.14). Confirm that they agree with those appearing
in (7.17).
4. Use (8.11) to compute the inverse Laplace transform of 1/(s2 + 2s + 2).
5. Use the result of the previous exercise to solve, via the Laplace transform, the differential
equation
x (t) + x(t) = et sin t,
x(0) = 0.
Hint: Take the Laplace transform of each side.
6. Evaluate all expressions in (8.12) in Matlab s symbolic toolbox via syms, diff and subs
and confirm that the final result jibes with (6.15).
7. Return to 6.6 and argue how one deduces (6.26) from (6.25). Evaluate (6.26) and graph it
with ezplot and contrast it with Fig. 6.3.
8. Let us check the limit we declared in going from (8.13) to (8.14). First show that
i
|ee t | = et cos .
Next show (perhaps graphically) that
cos 1 2/
61
when /2 .
/2
/2
Z
= max |r(ei )|2
et cos d
/2
Z
et(12/) d
max |r(ei )|2
/2
as .
62
1 1 0
(9.1)
B = 0 1 0
0 0 2
then
(sI B)1
(s 1)(s 2)
s2
0
1
0
(s 1)(s 2)
0
=
(s 1)2 (s 2)
0
0
(s 1)2
(9.2)
and so 1 = 1 and 2 = 2 are the two eigenvalues of B. Now, to say that j I B is not invertible
is to say that its columns are linearly dependent, or, equivalently, that the null space N (j I B)
contains more than just the zero vector. We call N (j I B) the jth eigenspace and call each of its
nonzero members a jth eigenvector. The dimension of N (j I B) is referred to as the geometric
multiplicity of j . With respect to B above we compute N (1 I B) by solving (I B)x = 0,
i.e.,
0
0 1 0
x1
0 0
0
x 2 = 0
0
0 0 1
x3
Clearly
This function is a scaled version of the even simpler function 1/(1 z). This latter function satisfies
(recall the n-term geometric series (7.8))
zn
1
= 1 + z + z 2 + + z n1 +
1z
1z
63
(9.4)
for each positive integer n. Furthermore, if |z| < 1 then z n 0 as n and so (9.4) becomes,
in the limit,
X
1
=
zn,
1z
n=0
the full geometric series. Returning to (9.3) we write
1/s
1
b
bn1 bn 1
1
=
= + 2 + + n + n
,
sb
1 b/s
s s
s
s sb
and hence, so long as |s| > |b| we find,
1
1X
=
sb
s n=0
n
b
.
s
This same line of reasoning may be applied in the matrix case. That is,
1 B
B n1 B n
+ 2 + + n + n (sI B)1 ,
s s
s
s
(9.5)
and hence, so long as |s| > kBk where kBk is the magnitude of the largest eigenvalue of B, we find
1
(sI B)
=s
(B/s)n .
(9.6)
n=0
Although (9.6) is indeed a formula for the resolvent you may, regarding computation, not find it
any more attractive than the Gauss-Jordan method. We view (9.6) however as an analytical rather
than computational tool. More precisely, it facilitates the computation of integrals of R(s). For
example, if C is the circle of radius centered at the origin and > kBk then
Z
(sI B)
ds =
n=0
s1n ds = 2iI.
(9.7)
This result is essential to our study of the eigenvalue problem. As are the two resolvent identities.
Regarding the first we deduce from the simple observation
(s2 I B)1 (s1 I B)1 = (s2 I B)1 (s1 I B s2 I + B)(s1 I B)1
that
R(s2 ) R(s1 ) = (s1 s2 )R(s2 )R(s1 ).
(9.8)
(9.9)
mj
h X
X
j=1 k=1
Rj,k
(s j )k
(9.10)
(9.11)
0 0 0
0 1 0
1 0 0
R1,1 = 0 1 0 R1,2 = 0 0 0 and R2,1 = 0 0 0 .
0 0 1
0 0 0
0 0 0
One notes immediately that these matrices enjoy some amazing properties. For example
2
R1,1
= R1,1 ,
2
R2,1
= R2,1 ,
R1,1 R2,1 = 0,
2
and R1,2
= 0.
(9.12)
We now show that this is no accident. We shall find it true in general as a consequence of (9.11)
and the first resolvent identity.
2
Proposition 9.1. Rj,1
= Rj,1 .
Proof: Recall that the Cj appearing in (9.11) is any circle about j that neither touches nor
encircles any other root. Suppose that Cj and Cj are two such circles and Cj encloses Cj . Now
Z
Z
1
1
Rj,1 =
R(z) dz
R(z) dz =
2i Cj
2i Cj
and so
2
Rj,1
1
=
(2i)2
1
=
(2i)2
R(z) dz
Cj
R(w) dw
Cj
Cj
R(z)R(w) dw dz
Cj
R(z) R(w)
dw dz
wz
Cj Cj
)
(Z
Z
Z
Z
1
1
1
R(w)
dw dz
dz dw
R(z)
=
(2i)2
Cj w z
Cj
Cj
Cj w z
Z
1
=
R(z) dz = Rj,1 .
2i Cj
1
=
(2i)2
65
We used the first resolvent identity, (9.8), in moving from the second to the third line. In moving
from the fourth to the fifth we used only
Z
Z
1
1
dw = 2i and
dz = 0.
(9.13)
Cj w z
Cj w z
The latter integrates to zero because Cj does not encircle w. End of Proof.
Recalling definition 5.1 that matrices that equal their squares are projections we adopt the
abbreviation
Pj Rj,1 .
With respect to the product Pj Pk , for j 6= k, the calculation runs along the same lines. The
difference comes in (9.13) where, as Cj lies completely outside of Ck , both integrals are zero. Hence,
Proposition 9.2. If j 6= k then Pj Pk = 0.
Along the same lines we define
Dj Rj,2
and prove
m
Proposition 9.3. If 1 k mj 1 then Djk = Rj,k+1 . Dj j = 0.
Proof: For k and greater than or equal to one,
Z
Z
1
k
R(z)(z j ) dz
R(w)(w j ) dw
Rj,k+1 Rj,+1 =
(2i)2 Cj
Cj
Z Z
1
R(z)R(w)(z j )k (w j ) dw dz
=
(2i)2 Cj Cj
Z Z
1
R(z) R(w)
=
(z j )k (w j ) dw dz
2
(2i) Cj Cj
wz
Z
Z
(w j )
1
k
dw dz
R(z)(z
)
=
j
(2i)2 Cj
wz
Cj
Z
Z
1
(z j )k
dz dw
R(w)(w
)
j
(2i)2 Cj
wz
Cj
Z
1
R(z)(z j )k+ dz = Rj,k++1 .
=
2i Cj
because
Cj
(w j )
dw = 2i(z j )
wz
and
Cj
(z j )k
dz = 0.
wz
(9.14)
2
With k = = 1 we have shown Rj,2
= Rj,3 , i.e., Dj2 = Rj,3 . Similarly, with k = 1 and = 2 we find
Rj,2 Rj,3 = Rj,4 , i.e., Dj3 = Rj,4 , and so on. Finally, at k = mj we find
Z
1
mj
Dj = Rj,mj +1 =
R(z)(z j )mj dz = 0
2i Cj
Of course this last result would be trivial if in fact Dj = 0. Note that if mj > 1 then
Z
mj 1
R(z)(z j )mj 1 dz 6= 0
= Rj,mj =
Dj
Cj
for the integrand then has a term proportional to 1/(z j ), which we know, by (8.1), leaves a
nonzero residue.
With this we now have the sought after expansion
R(z) =
h
X
j=1
mj 1
X
1
1
Dk
Pj +
k+1 j
z j
(z
j)
k=1
(9.15)
To find this we note that (zI B)1 = (I B/z)1 /z and so (9.16) may be written
(I B/z)1 =
h
h
X
X
Pj
zPj
=
z j
1 j /z
j=1
j=1
(9.17)
If we now set z = k/t, where k is a positive integer, and use the fact that the Pj are projections
that annihilate one another then we arrive first at
k
(I (t/k)B)
h
X
j=1
Pj
(1 (t/k)j )k
(9.18)
lim (I (t/k)B)
h
X
exp(j t)Pj
(9.19)
j=1
and naturally refer to this limit as exp(Bt). We will confirm that this solve our dynamics problems
by checking, in the Exercises, that (exp(Bt)) = B exp(Bt).
9.3. The Spectral Representation
With just a little bit more work we shall arrive at a similar expansion for B itself. We begin by
67
applying the second resolvent identity, (9.9), to Pj . More precisely, we note that (9.9) implies that
Z
1
(zR(z) I) dz
BPj = Pj B =
2i Cj
Z
1
=
zR(z) dz
2i Cj
(9.20)
Z
Z
1
j
=
R(z)(z j ) dz +
R(z) dz
2i Cj
2i Cj
= Dj + j Pj ,
where the second equality is due to Cauchys Theorem and the third arises from adding and subtracting j R(z). Summing (9.20) over j we find
B
h
X
Pj =
j=1
h
X
j Pj +
j=1
h
X
Dj .
(9.21)
j=1
We can go one step further, namely the evaluation of the first sum. This stems from (9.7) where we
integrated R(s) over a circle C where > kBk. The connection to the Pj is made by the residue
theorem. More precisely,
Z
h
X
R(z) dz = 2i
Pj .
C
j=1
h
X
Pj = I,
(9.22)
j=1
h
X
j Pj +
j=1
h
X
Dj .
(9.23)
j=1
(9.24)
mj
= Pj and Dj
(B j I)mj Pj = 0.
= 0 we find
(9.25)
For this reason we call the range of Pj the jth generalized eigenspace, call each of its nonzero
members a jth generalized eigenvector and refer to the dimension of R(Pj ) as the algebraic
multiplicity of j . Let us conform that the eigenspace is indeed a subspace of the generalized
eigenspace.
Proposition 9.4. N (j I B) R(Pj ) with equality if and only if mj = 1.
Proof: For each e N (j I B) we show that Pj e = e. To see this note that Be = j e and so
(sI B)e = (s j )e so (sI B)1 e = (s j )1 e and so
Z
Z
1
1
1
1
e ds = e.
(sI B) e ds =
Pj e =
2i Cj
2i Cj s j
68
h
X
nj = n.
j=1
We then denote by Ej = [ej,1 ej,2 ej,nj ] a matrix composed of basis vectors of R(Pj ). We note
that
Bej,k = j ej,k ,
and so
BE = E where E = [E1 E2 Eh ]
(9.26)
1/2 i/2
1/2 i/2
.
i
B = 1 P1 + 2 P2 = i
i/2 1/2
i/2 1/2
69
(9.28)
As m1 = m2 = 1 we see that B is semisimple. By inspection we see that R(P1 ) and R(P2 ) are
spanned by
1
1
,
and e2 =
e1 =
i
i
respectively. It follows that
E=
1 1
,
i i
and so
B = EE
Consider next,
1
=
2
1
=
2
1 i
,
1 i
i 0
0 i
1 i
i 0
1 1
.
1 i
0 i
i i
1 0 1
B = 0 1 1
0 0 2
1 0 1/3
E = 0 1 1/3
0 0 1/ 3
(9.29)
1 0 0
L = 0 1 0
0 0 2
and so B = ELE 1 . We note that although we did not explicitly compute the resolvent we do
not have explicit information about the orders of its poles. However, as Matlab has returned 2
eigenvalues, 1 = 1 and 2 = 2, with geometric multiplicities n1 = 2 (the first two columns of E
are linearly independent) and n2 = 1 it follows that m1 = m2 = 1.
9.5. The Characteristic Polynomial
Our understanding of eigenvalues as poles of the resolvent has permitted us to exploit the beautiful field of complex integration and arrive at a (hopefully) clear understanding of the Spectral
Representation. It is time however to take note of alternate paths to this fundamental result. In
fact the most common understanding of eigenvalues is not via poles of the resolvent but rather as
zeros of the characteristic polynomial. Of course an eigenvalue is an eigenvalue and mathematics
will not permit contradictory definitions. The starting point is as above - an eigenvalue of the nby-n matrix B is a value of s for which (sI B) is not invertible. Of the many tests for invertibility
let us here recall that a matrix is not invertible if and only if it has a zero pivot. If we denote the
jth pivot of B by pj (B) we may define the determinant of B as
det(B)
n
Y
pj (B)
j=1
a b
c d
det(B) ad bc.
70
s a b
c s d
3. We suggested in Chapter 6 that x(t) = exp(Bt)x(0) is the solution of the dynamical system
x (t) = Bx(t). Let us confirm that our representation (for semisimple B)
exp(Bt) =
h
X
exp(j t)Pj
(9.30)
j=1
indeed satisfies
(exp(Bt)) = B exp(Bt).
To do this merely compare the t derivative of (9.30) with the product of
B=
h
X
j Pj
j=1
and (9.30).
4. Let us compute the matrix exponential, exp(Bt), for the nonsemisimple matrix B in (9.1) by
following the argument that took us from (9.16) to (9.19). To begin, deduce from
1
1
1
D1 +
P1 +
P2
(zI B)1 =
2
z 1
(z 1 )
z 2
that
1
1
1
P1 +
D1 +
P2 .
2
1 1 /z
z(1 1 /z)
1 2 /z
Next take explicit products of this matrix with itself and deduce (using exercise 2) that
(I B/z)1 =
(I B/z)k =
k
1
1
P1 +
D1 +
P2 .
k
k+1
(1 1 /z)
z(1 1 /z)
(1 2 /z)k
1
t
1
P1 +
D1 +
P2 ,
k
k+1
(1 (t/k)1 )
(1 (t/k)1 )
(1 (t/k)2 )k
2 8 6
4 2 8
B=
6 4 2
8 6 4
4
6
8
2
circulant because each column is a shifted version of its predecessor. First compare the results
of eig(B) and F4 B(:, 1) and then confirm that
B = F4 diag(F4 B(:, 1))F4 /4.
Why must we divide by 4? Now check the analogous formula on a circulant 5-by-5 circulant
matrix of your choice. Submit a marked-up diary of your computations.
72
6. Let us return exercise 6.7 and study the eigenvalues of B as functions of the damping d when
each mass and stiffness is 1. In this case
2 1 0
0
I
where S = 1 2 1 .
B=
S dS
0 1 2
(i) Write and execute a Matlab program that plots, as below, the 6 eigenvalues of B as d
ranges from 0 to 1.1 in increments of 0.005.
2
1.5
imaginary
0.5
0.5
1.5
2
2.5
1.5
0.5
0.5
real
Figure 9.1. Trajectories of eigenvalues of the damped chain as the damping increased.
(ii) Argue that if [u; v]T is an eigenvalue of B with eigenvalue then v = u and Su dSv =
v. Substitute the former into the latter and deduce that
Su =
2
u.
1 + d
(iii) Confirm,
from Exercise 7.10, that the eigenvalues of S are 1 = 2 + 2, 2 = 2 and
3 = 2 2 and hence that the six eigenvalues of B are the roots of the 3 quadratics
p
d
(dj )2 4j
j
2 + dj + j = 0, i.e., j =
.
2
Deduce from the projections in Exer. 7.10 the 6 associated eigenvectors of B.
(iv) Now argue that when d obeys (dj )2 = 4j that a complex pair of eigenvalues of B collide
on the real line and give rise to a nonsemisimple eigenvalue. Describe Figure 9.1 in light of
your analysis.
73
1
B = 1
1
is
symmetric then
1 1
1 1
1 1
s2
1
1
1
1
s2
1
R(s) =
s(s 3)
1
1
s2
(10.1)
where we have used the fact that B is real. We now take the inner product of the first with x and
the second with x and find
xT Bx = kxk2
and xT Bx = kxk2 .
(10.2)
The left hand side of the first is a scalar and so equal to its transpose
xT Bx = (xT Bx)T = xT B T x = xT Bx
(10.3)
where in the last step we used B = B T . Combining (10.2) and (10.3) we see
kxk2 = kxk2 .
Finally, as kxk > 0 we may conclude that = . End of Proof.
We next establish
Proposition 10.2. If B is real and B = B T then each eigenprojection, Pj , and each eigennilpotent,
Dj , is real and symmetric.
74
Proof: From the fact (please prove it) that the transpose of the inverse is the inverse of the
transpose we learn that
{(sI B)1 }T = {(sI B)T }1 = (sI B)1
i.e., the resolvent of a symmetric matrix is symmetric. Hence
!T
Z
Z
1
1
1
T
=
(sI B) ds
{(sI B)1 }T ds
Pj =
2i Cj
2i Cj
Z
1
=
(sI B)1 ds = Pj .
2i Cj
By the same token, we find that each DjT = Dj . The elements of Pj and Dj are real because they
are residues of real functions evaluated at real poles. End of Proof.
The next result will banish the nilpotent component.
Proposition 10.3. The zero matrix is the only real symmetric nilpotent matrix.
Proof: Suppose that D is n-by-n and D = DT and Dm = 0 for some positive integer m. We
show that Dm1 = 0 by showing that every vector lies in its null space. To wit, if x Rn then
kDm1 xk2 = xT (Dm1 )T Dm1 x
= xT Dm1 Dm1 x
= xT Dm2 Dm x
= 0.
As Dm1 x = 0 for every x it follows (recall Exercise 3.3) that Dm1 = 0. Continuing in this fashion
we find Dm2 = 0 and so, eventually, D = 0. End of Proof.
We have now established the key result of this chapter.
Proposition 10.4. If B is real and symmetric then
B=
h
X
j Pj
(10.4)
j=1
where the j are real and the Pj are real orthogonal projections that sum to the identity and whose
pairwise products vanish.
One indication that things are simpler when using the spectral representation is
B 100 =
h
X
100
j Pj .
j=1
As this holds for all powers it even holds for power series. As a result,
exp(B) =
h
X
j=1
75
exp(j )Pj .
(10.5)
Pj b we
j=1
h
X
1
Pj b.
x=
Pj x =
j=1
j=1 j
(10.6)
We clearly run in to trouble when one of the eigenvalues vanishes. This, of course, is to be expected.
For a zero eigenvalue indicates a nontrivial null space which signifies dependencies in the columns
of B and hence the lack of a unique solution to Bx = b.
Another way in which (10.6) may be viewed is to note that, when B is symmetric, (9.15) takes
the form
h
X
1
1
Pj .
(zI B) =
z j
j=1
Now if 0 is not an eigenvalue we may set z = 0 in the above and arrive at
B
h
X
1
Pj .
=
j=1 j
(10.7)
j
j=1
1
as in (10.6). With (10.7) we have finally reached a point where we can begin to define an inverse
even for matrices with dependent columns, i.e., with a zero eigenvalue. We simply exclude the
offending term in (10.7). Supposing that h = 0 we define the pseudoinverse of B to be
h1
X
1
Pj .
B
j
j=1
+
Let us now see whether it is deserving of its name. More precisely, when b R(B) we would expect
that x = B + b indeed satisfies Bx = b. Well
h1
h1
h1
h1
X
X
X
X
1
1
1
Pj b =
BPj b =
j Pj b =
Pj b.
BB b = B
j=1 j
j=1 j
j=1
j=1 j
+
h
X
j=1
Pj b and b R(B).
76
h
X
j Pj P1 x = 1 P1 x = 1 x,
j=1
nj
j=1
elements. That these dimensions indeed sum to the ambient dimension, n, follows directly from the
fact that the underlying Pj sum to the n-by-n identity matrix. We have just proven
Proposition 10.5. If B is real and symmetric and n-by-n then B has a set of n linearly independent
eigenvectors.
Getting back to a more concrete version of (10.5) we now assemble matrices from the individual
bases
Ej [xj,1 xj,2 . . . xj,nj ]
and note, once again, that Pj = Ej (EjT Ej )1 EjT , and so
B=
h
X
j Ej (EjT Ej )1 EjT .
(10.9)
j=1
I understand that you may feel a little underwhelmed with this formula. If we work a bit harder
we can remove the presence of the annoying inverse. What I mean is that it is possible to choose a
basis for each R(Pj ) for which each of the corresponding Ej satisfy EjT Ej = I. As this construction
is fairly general let us devote a separate section to it.
10.2. GramSchmidt Orthogonalization
Suppose that M is an m-dimensional subspace with basis
{x1 , . . . , xm }.
77
m1
X
qj qjT xm .
j=1
Set qm = ym /kym k.
To take a simple example, let us orthogonalize the following basis for R3 ,
x1 = [1 0 0]T ,
x2 = [1 1 0]T ,
x3 = [1 1 1]T .
(GS1) q1 = y1 = x1 .
(GS2) y2 = x2 q1 q1T x2 = [0 1 0]T , and so q2 = y2 .
(GS3) y3 = x3 q1 q1T x3 q2 q2T x3 = [0 0 1]T , and so q3 = y3 .
We have arrived at
q1 = [1 0 0]T , q2 = [0 1 0]T , q3 = [0 0 1]T .
(10.10)
Once the idea is grasped the actual calculations are best left to a machine. Matlab accomplishes
this via the orth command. Its implementation is a bit more sophisticated than a blind run through
our steps (GS1) through (GSm). As a result, there is no guarantee that it will return the same
basis. For example
>> X=[1 1 1;0 1 1;0 0 1];
>> Q=orth(X)
Q =
0.7370 -0.5910 0.3280
0.5910 0.3280 -0.7370
0.3280 0.7370 0.5910
This ambiguity does not bother us, for one orthogonal basis is as good as another. We next put
this into practice, via (10.9).
78
Pj =
Qj QTj
nj
X
T
qj,k qj,k
.
h
X
nj
X
k=1
h
X
j Qj QTj
j=1
j=1
T
qj,k qj,k
.
(10.11)
k=1
This is the spectral representation in perhaps its most detailed dress. There exists, however, still
another form! It is a form that you are likely to see in future engineering courses and is achieved
by assembling the Qj into a single n-by-n orthonormal matrix
Q = [Q1 Qh ].
Having orthonormal columns it follows that QT Q = I. Q being square, it follows in addition that
QT = Q1 . Now
Bqj,k = j qj,k
may be encoded in matrix terms via
BQ = Q
(10.12)
where is the n-by-n diagonal matrix whose first n1 diagonal terms are 1 , whose next n2 diagonal
terms are 2 , and so on. That is, each j is repeated according to its multiplicity. Multiplying each
side of (10.12), from the right, by QT we arrive at
B = QQT .
(10.13)
QT BQ =
(10.14)
1 1 1
B = 1 1 1
1 1 1
of the last chapter. Recall that the eigenspace associated with 1 = 0 had
e1,1 = [1 1 0]T
1
and q1,2 = [1 1 2]T .
6
79
3 1 2
1
q2 ] = 3 1 2
6
2
0
2
0 0 0
and = 0 0 0 .
0 0 3
In this section we will derive two extremely useful characterizations of the largest eigenvalue. The
first was discovered by Lord Rayleigh in his research on sound and vibration. To begin, we denote
the associated orthonormal eigenvectors of B by
q1 , q2 , . . . , qn
and note that each x Rn enjoys the expansion
x = (xT q1 )q1 + (xT q2 )q2 + + (xT qn )qn .
(10.15)
(10.16)
That is, xT Bx 1 xT x for every x Rn . This, together with the fact that q1T Bq1 = 1 q1T q1
establishes
Proposition 10.6. Rayleighs Principle. If B is symmetric then its largest eigenvalue is
1 = max
x6=0
xT Bx
xT x
(10.17)
Bkx
= sign(k1 xT q1 )q1 + O((2 /1 )k )
k
kB xk
and note that the latter term goes to zero with increasing k so long as 1 is strictly greater than
2 . If, in addition, we assume that 1 > 0, then the first term does not depend on k and we arrive
at
Proposition 10.7. The Power Method. If B is symmetric and its greatest eigenvalue is simple
and positive and the initial guess, x, is not orthogonal to q1 then
Bkx
= sign(xT q1 )q1
k kB k xk
lim
10.5. Exercises
1. The stiffness matrix associated with the unstable swing of figure 2.2 is
1 0 1 0
0 1 0 0
S=
1 0 1 0 .
0 0 0 1
(i) Find the three distinct eigenvalues, 1 = 1, 2 = 2, 3 = 0, along with their associated
eigenvectors e1,1 , e1,2 , e2,1 , e3,1 , and projection matrices, P1 , P2 , P3 . What are the respective
geometric multiplicities?
(ii) Use the folk theorem that states in order to transform a matrix it suffices to transform its
eigenvalues to arrive at a guess for S 1/2 . Show that your guess indeed satisfies S 1/2 S 1/2 = S.
Does your S 1/2 have anything in common with the element-wise square root of S?
(iii) Show that R(P3 ) = N (S).
(iv) Assemble
1
1
S + = P1 + P2
1
2
and check your result against pinv(S) in Matlab .
(v) Use S + to solve Sx = f where f = [0 1 0 2]T and carefully draw before and after pictures
of the unloaded and loaded swing.
(vi) It can be very useful to sketch each of the eigenvectors in this fashion. In fact, a movie is
the way to go. Please run the Matlab truss demo by typing truss and view all 12 of the
movies. Please sketch the 4 eigenvectors of (i) by showing how they deform the swing.
2. The Cayley-Hamilton Theorem. Use Proposition 10.4 to show that if B is real and symmetric
and p(z) is a polynomial then
h
X
p(B) =
p(j )Pj .
(10.18)
j=1
Confirm this in the case that B is the matrix in Exercise 1 and p(z) = z 2 + 1. Deduce from
(10.18) that if p is the characteristic polynomial of B, i.e., p(z) = det(zI B), then p(B) = 0.
Confirm this, via the Matlab command, poly, on the S matrix in Exercise 1.
81
3. Argue that the least eigenvalue of a symmetric matrix obeys the minimum principle
n = min
x6=0
xT Bx
xT x
(10.19)
82
R(AT A) = R(AT ),
N (AAT ) = N (AT ),
You have proven the first of these in a previous exercise. The proof of the second is identical. The
row and column space results follow from the first two via orthogonality.
On the spectral side, we shall now see that the eigenvalues of AAT and AT A are nonnegative
and that their nonzero eigenvalues coincide. Let us first confirm this on the A matrix associated
with the unstable swing (see figure 2.2)
0 1 0 0
(11.1)
A = 1 0 1 0 .
0 0 0 1
The respective products are
1 0 0
AAT = 0 2 0
0 0 1
0
and AT A =
1
0
0 1 0
1 0 0
.
0 1 0
0 0 1
Analysis of the first is particularly simple. Its null space is clearly just the zero vector while 1 = 2
and 2 = 1 are its eigenvalues. Their geometric multiplicities are n1 = 1 and n2 = 2. In AT A we
recognize the S matrix from exercise 10.1 and recall that its eigenvalues are 1 = 2, 2 = 1, and
3 = 0 with multiplicities n1 = 1, n2 = 2, and n3 = 1. Hence, at least for this A, the eigenvalues of
AAT and AT A are nonnegative and their nonzero eigenvalues coincide. In addition, the geometric
multiplicities of the nonzero eigenvalues sum to 3, the rank of A.
Proposition 11.1. The eigenvalues of AAT and AT A are nonnegative. Their nonzero eigenvalues,
including geometric multiplicities, coincide. The geometric multiplicities of the nonzero eigenvalues
sum to the rank of A.
Proof: If AT Ax = x then xT AT Ax = xT x, i.e., kAxk2 = kxk2 and so 0. A similar
argument works for AAT .
nj
Now suppose that j > 0 and that {xj,k }k=1
constitutes an orthogonal basis for the eigenspace
R(Pj ). Starting from
AT Axj,k = j xj,k
(11.2)
we find, on multiplying through (from the left) by A that
AAT Axj,k = j Axj,k ,
83
first paragraph of this proof that kAxj,k k = j , which, by hypothesis, is nonzero. Hence,
Axj,k
yj,k ,
j
1 k nj
(11.3)
is a collection of unit eigenvectors of AAT associated with j . Let us now show that these vectors
are orthonormal for fixed j.
T
yj,i
yj,k =
1 T T
x A Axj,k = xTj,i xj,k = 0.
j j,i
(11.4)
where
X = [X1 Xh ],
Y = [Y1 Yh ],
and n is the n-by-n diagonal matrix with 1 in the first n1 slots, 2 in the next n2 slots, etc.
Similarly
AAT = Y m Y T
(11.5)
where
and m is the m-by-m diagonal matrix with 1 in the first n1 slots, 2 in the next n2 slots, etc. The
yj,k were defined in (11.3) under the assumption that j > 0. If j = 0 let Yj denote an orthonormal
basis for N (AAT ). Finally, call
j = j
and let denote the m-by-n matrix diagonal matrix with 1 in the first n1 slots and 2 in the next
n2 slots, etc. Notice that
T = n and T = m .
(11.6)
Now recognize that (11.3) may be written
Axj,k = j yj,k
84
2 0 0
0 1 0
4 =
0 0 1
0 0 0
and
Hence,
in (11.1). We have
1 0
0
0
0
1
2
0
0
and X =
0 0
0
2 1
0
2
0
0
2 0 0
3 = 0 1 0
0 0 1
1
0
1
0
0 1 0
and Y = 1 0 0 .
0 0 1
2 0 0 0
= 0 1 0 0
0 0 1 0
1/ 2
0 1 0
2 0 0 0
1 0 0 0 1 0 0 0
0
0 0 1
0 0 1 0
1/ 2
(11.8)
0 1/ 2
1
0
0
0
0 1/ 2
0
0
.
1
0
This indeed agrees with A. It also agrees (up to sign changes in the columns of X) with what one
receives upon typing [Y,SIG,X]=svd(A) in Matlab .
You now ask what we get for our troubles. I express the first dividend as a proposition that
looks to me like a quantitative version of the fundamental theorem of linear algebra.
Proposition 11.2. If Y X T is the singular value decomposition of A then
(i) The rank of A, call it r, is the number of nonzero elements in .
(ii) The first r columns of X constitute an orthonormal basis for R(AT ). The n r last columns
of X constitute an orthonormal basis for N (A).
(iii) The first r columns of Y constitute an orthonormal basis for R(A). The m r last columns of
Y constitute an orthonormal basis for N (AT ).
Let us now solve Ax = b with the help of the pseudoinverse of A. You know the right thing
to do, namely reciprocate all of the nonzero singular values. Because m is not necessarily n we must
also be careful with dimensions. To be precise, let + denote the n-by-m matrix whose first n1
diagonal elements are 1/1 , whose next n2 diagonal elements are 1/2 and so on. In the case that
h = 0, set the final nh diagonal elements of + to zero. Now, one defines the pseudo-inverse of
A to be
A+ X+ Y T .
85
0
+ =
0
0
0
1
0
0
0
0
1
0
and so
1/ 2
0
A+ =
1/ 2
0
0 1/2
1
0
=
0 1/2
0
0
0
1
0
0
0 1/ 2
1/ 2
0
0
0
0 1/ 2 0
0
1
0
0
0
0
1
0
1
0
0
0
0 1 0
0
1 0 0
1
0 0 1
0
in agreement with what appears from pinv(A). Let us now investigate the sense in which A+ is
the inverse of A. Suppose that b Rm and that we wish to solve Ax = b. We suspect that A+ b
should be a good candidate. Observe now that
(AT A)A+ b = Xn X T X+ Y T b
+
because X T X = I
= Xn Y b
= XT + Y T b
T
by (11.6)
because T + = T
= X Y b
= AT b
by (11.4)
by (11.7),
10
singular value
10
10
10
10
20
40
60
80
100
120
140
160
index
Figure 11.2. The results of imagesc(Ak) for, starting at the top left, k=165, 64, 32 and
moving right, and then starting at the bottom left and moving right, k=24,20,16.
87
(11.10)
From here we arrive at a lovely identity for symmetric matrices. Namely, if B = B T and we list its
eigenvalues, including multiplicities, {1 , 2 , . . . , n } then, from (10.13) it follows that
tr(B) = tr(QQT ) = tr(QQT ) = tr() =
n
X
j .
(11.11)
j=1
You might wish to confirm that this holds for the many examples weve accumulated. From here
we may now study the properties of the natural, or Frobenius, norm of an m-by-n matrix A,
!1/2
m X
n
X
kAkF
.
(11.12)
A2ij
i=1 j=1
(11.13)
m
X
i (AA ) =
i=1
m
X
i2 ,
i=1
i.e., the Frobenius norm of a matrix is the square root of the sum of the squares of its singular
values. We will also need the fact that if Q is square and QT Q = I then
kQAk2F = tr(QAAT QT ) = tr(AAT QT Q) = tr(AAT ) = kAk2F .
The same argument reveals that kAQkF = kAkF . We may now establish
(11.14)
Proposition 11.3. Given an m-by-n matrix A (with SVD A = Y X T ) and a whole number
k min{m, n} then the best (in terms of Frobenius distance) rank k approximation of A is
Ak = Y (:, 1 : k)(1 : k, 1 : k)X(:, 1 : k)T .
min
rank(B)=k
kA Bk2F =
j2 .
j>k
implies B = Y SX T ,
and so
kA Bk2F =
X
(j sj )2 .
j
As the rank of B is k it follows that the best choice of the sj is sj = j for j = 1 : k and sj = 0
there after.
End of Proof.
11.4. Exercises
1. Suppose that A is m-by-n and b is m-by-1. Set x+ = A+ b and suppose x satisfies AT Ax = AT b.
Prove that kx+ k kxk. (Hint: decompose x = xR + xN into its row space and null space
+
+
+
components. Likewise x+ = x+
R + xN . Now argue that xR = xR and xN = 0 and recognize that
you are almost home.)
2. Experiment with compressing the bike image below (also under Resources on our Owlspace
page). Submit labeled figures corresponding to several low rank approximations. Note:
bike.jpg is really a color file, so after saving it to your directory and entering Matlab you
might say M = imread(bike.jpg) and then M = M(:,:,1) prior to imagesc(M)
and colormap(gray).
89
Cm
u1
Cm
v1
Cm
u2
v2
u3
v3
G
Cm
un
vn
(12.1)
where
a2
Sn ,
Rdx
In is the n-by-n identity matrix and Sn is the n-by-n seconddifference matrix
1 1 0 0 0
1 2 1 0 0
Sn =
0 0 1 2 1
0 0 0 1 1
C = (2adxCm )In ,
G = (2adxGL )In
(12.2)
a
Sn (GL /Cm )In and g = u/(2adxCm ).
(12.3)
2RCm dx2
We will construct a reduced model that faithfully reproduces the response at a specified compartment (the putative spike initiation zone), assumed without loss to be the first,
B=
y(t) = q T v(t),
where q T = (1 0 0 0),
to an arbitrary stimulus, u. On applying the Laplace Transform to each side of (12.1) we find
sLv = BLv + Lg
90
H(s)
= q T X(sIk X T BX)1 X T ,
is close to H. We will see that it suffices to set
X(:, 1) = B 1 q/kB 1 qk,
X(:, j + 1) = B 1 X(:, j)/kB 1 X(:, j)k,
j = 1, . . . , k 1,
(12.4)
followed by orthonormalization,
X = orth(X).
We note that (12.4) is nothing more than the Power Method applied to B 1 , and so as j increases the
associated column of X approaches the (constant) eigenvector associated with the least eigenvalue
of B. We apply this algorithm and then return to prove that it indeed makes H close to H.
15
10
full, N=100
reduced, k=3
0
0
10
20
30
40
50
60
t (ms)
Figure 12.2. Response of the first compartment in a 100 compartment cell and the corresponding
3 compartment reduced cell to identical current injections distributed randomly in space and time.
cabred.m
Now, in order to understand why this method works so well we examine the associated transfer
functions. In particular, we examine their respective Taylor series about s = 0. To begin, we note
91
R (s) = lim
and so
and in general
dj R(s)
= (1)j j!Rj+1 (s) and so
dsj
dj R(0)
= j!(B 1 )j+1
dsj
and so
H(s) = q T R(s) =
s j Mj ,
where Mj = q T (B 1 )j+1
j=0
are the associated moments. The reduced transfer function is analogously found to be
H(s)
=
j,
sj M
(12.5)
j=0
If M
= H(s) + O(sk ) and so we now show that the reducer built
according to (12.4) indeed matches the first k moments.
j = Mj for 0 j < k.
Proposition 12.1. If (B 1 )j+1 q R(X) for 0 j < k then M
jT , beginning with j = 0 where
Proof: We will show that MjT = M
M0T = B 1 q
T = X(X T BX)1 X T q.
and M
0
12.3. Exercises
1. Show that the n-by-n second difference Sn in (12.2) obeys
T
x Sn x =
n1
X
j=1
(xj xj+1 )2
for every x Rn . Why does it follow that the eigenvalues of Sn are nonpositive? Why does
it then follow that eigenvalues of the B matrix in (12.3) are negative? If we then label the
eigenvalues of B as
n (B) n1 (B) 2 (B) 1 (B) < 0,
93