Linear Algebra and its Applications: Functions of a matrix and Krylov matrices聻
Linear Algebra and its Applications: Functions of a matrix and Krylov matrices聻
A R T I C L E I N F O A B S T R A C T
Article history: For a given nonderogatory matrix A, formulas are given for functions
Received 12 September 2009 of A in terms of Krylov matrices of A. Relations between the coef-
Accepted 25 August 2010 ficients of a polynomial of A and the generating vector of a Krylov
Available online 6 October 2010
matrix of A are provided. With the formulas, linear transformations
Submitted by V. Mehrmann between Krylov matrices and functions of A are introduced, and
associated algebraic properties are derived. Hessenberg reduction
forms are revisited equipped with appropriate inner products and
AMS classification:
15A21 related properties and matrix factorizations are given.
65F15 © 2010 Elsevier Inc. All rights reserved.
Keywords:
Krylov matrix
Krylov subspace
Function of a matrix
Polynomial of a matrix
Hessenberg matrix
Companion matrix
1. Introduction
Given a scalar function f (t ) that is well defined on the spectrum of A, one defines a matrix f (A) ∈
Cn,n , which is usually called a function of A, e.g., [10,11].
聻
Partially supported by Senior Visiting Scholar Fund of Fudan University Key Laboratory and the University of Kansas General
Research Fund allocation # 2301717. Part of the work was done while this author was visiting School of Mathematics, Fudan
University whose hospitality is gratefully acknowledged.
E-mail address: [email protected]
0024-3795/$ - see front matter © 2010 Elsevier Inc. All rights reserved.
doi:10.1016/j.laa.2010.08.044
186 H. Xu / Linear Algebra and its Applications 434 (2011) 185–200
Both functions of a matrix and Krylov matrices play a fundamental role in matrix computations. They
are key tools in understanding and developing numerical methods for solving eigenvalue problems and
systems of linear equations, including the QR algorithm and Krylov subspace methods, e.g., [8,9,22].
Functions of a matrix arise from a variety of applications. The development of numerical algorithms
is still a challenging topic. See the recently published book [10] for details.
In this paper, we investigate the Krylov matrices and functions of a matrix. We focus on the situation
where the associated matrix A is nonderogatory, i.e., the geometric multiplicity of every eigenvalue of A
is one. We provide formulas to express a function of A in terms of Krylov matrices and vise versa, based
on a simple observation. We use the formulas to study the relations and properties of these two objects.
Krylov matrices and functions of a matrix have been studied extensively in the past several decades.
Still, it seems that their behaviors have not been fully understood. The goal of this study is to use a
new angle to interpret the existing properties and provide new insight that may be potentially useful
for the development of numerical methods.
The paper is organized as follows. In Section 2, we give definitions of functions of a matrix and
Krylov matrices, and some basic properties that are necessary for deriving main results. In Section 3,
we show relations between functions of a matrix and Krylov matrices by providing explicit formulas. In
Section 4, we interpret the relations in terms of linear transformations and subspaces. In Section 5, we
study the Hessenberg reduction forms, and derive some related properties and matrix factorizations.
In Section 6, we give conclusions.
The spectrum of A is denoted by λ(A). · stands for both the Euclidian norm of a vector and the
spectral norm of a matrix. In is the n × n identity matrix, and ej is the jth column of In . Nr is the r × r
nilpotent matrix with 1 on the super diagonal and 0 elsewhere, and Nr (λ) = λIr + Nr . A square matrix
is called unreduced upper Hessenberg if it is upper Hessenberg with nonzero subdiagonal elements. Pm
denotes the space of the polynomials with degree no greater than m.
In this paper, we only consider the functions defined as follows. Let A be a square matrix and have
the Jordan canonical form
Z −1 AZ = diag(Nr1,1 (λ1 ), . . . , Nr1,s1 (λ1 ), . . . , Nrη,1 (λη ), . . . , Nrη,sη (λη ))
where λ1 , . . . , λη ∈ λ(A) are distinct. Let f (t ) be a scalar function. If for each λi , f (λi ) and the deriva-
tives f (k) (λi ) (k = 1, . . . , max1 j si ri,j − 1) are defined, we define
f (A) := Zdiag(f (Nr1,1 (λ1 )), . . . , f (Nr1,s1 (λ1 )), . . . , f (Nrη,1 (λη )), . . . , f (Nrη,sη (λη )))Z −1 ,
where
⎡ (ri,j −1)
⎤
f (λi ) (λi )
⎢f (λi ) ... f
1! (ri,j −1)! ⎥
⎢ ⎥
⎢ .. .. .. ⎥
⎢ . . . ⎥
f (Nri,j (λi )) = ⎢ ⎥.
⎢ ⎥
⎢ .. f (λi ) ⎥
⎣ .
1!
⎦
f (λi )
m
j=0 αj t ∈ Pm we simply have p(A)
For a scalar polynomial p(t ) = m = αj Aj .
j
j =0
We provide below some basic properties of functions of a matrix.
Proposition 2.1 [10,11]. Suppose that μ is the degree of the minimal polynomial of A. For any function
f (t ) such that f (A) is defined, there exists a unique polynomial p(t ) ∈ Pμ−1 such that
f (A) = p(A).
The unique polynomial p(t ) can be constructed by the Lagrange–Hermite interpolation with
p(k) (λi ) = f (k) (λi ) for k = 0, 1, . . . , max1 j si ri,j − 1, and i = 1, 2, . . . , η.
H. Xu / Linear Algebra and its Applications 434 (2011) 185–200 187
The Schur–Parlett algorithm for computing f (A) is based on these two properties. See [16,17,12,3] and
[10, Chapters 4 and 9]. The properties will be used frequently in the rest of the paper.
Suppose that A ∈ Cn,n and b ∈ Cn . We define the Krylov matrix
Kn,m (A, b) = b Ab . . . Am−1 b ∈ Cn,m .
When m = n, we will simply use the notation Kn (A, b) or K (A, b).
A polynomial with degree no greater than m − 1 is characterized uniquely by its m coefficients. We
use the following polynomial notation to emphasize the coefficients.
It is obvious that
p(A; x)b = Kn,m (A, b)x, x ∈ Cm . (1)
So Kn,m (A, b)x = 0 if and only if p(A; x)b = 0. The minimal polynomial of b with respect to A is a nonzero
polynomial p(t ) of the lowest degree such that p(A)b = 0, [22, pp. 36–37]. Let ν be the degree of this
minimal polynomial p(t ). Then based on (1),
rank Kn,m (A, b) = min{m, ν}.
More precisely, b, Ab, . . . , Aν−1 b are linearly independent, and for any k ν , Ak b can be expressed as
a linear combination of b, Ab, . . . , Aν−1 b, [19, Chapter VI].
Proposition 2.4. Suppose rank Kn,m (A, b) = r . Then there exists a nonsingular matrix X = [X1 , X2 ] ∈
Cn,n with X1 ∈ Cn,r and range X1 = range Kn,r (A, b) such that
Kn,m (A, b) = X1 R11 R12 , (2)
where R11 ∈ C is nonsingular, and
r,r
A11 A12 b
X −1 AX = , X −1 b = 01 (3)
0 A22
with A11 ∈ Cr,r and b1 ∈ Cr .
Moreover, R11 is upper triangular if and only if A11 is unreduced upper Hessenberg and X −1 b = γ e1 .
Then
AX1 = X1 A11 , = Z −1 C r Z .
A11
So for a nonsingular matrix X = [X1 , X2 ] we have (3).
−1
Suppose that R11 in (2) is upper triangular. Then Z = R11 is also upper triangular. So A11 is unreduced
upper Hessenberg. Because b1 = Z e1 = R11 e1 = r11 e1 , we have X −1 b = r11 e1 =: γ e1 .
− 1
We now turn to a square Krylov matrix K (A, b) (m = n). Suppose that the characteristic polynomial
of A is
det(λI − A) =: λn − cn λn−1 − · · · − c2 λ − c1 .
We define the companion matrix1 of A as
⎡ ⎤
0 c1
⎢ .. ⎥
⎢ . ⎥
⎢1 c2 ⎥
C =⎢ ⎥. (4)
⎢ .. .. .
.. ⎥
⎣ . . ⎦
1 cn
Proof. For sufficiency, it is straightforward to show AK (A, b) = K (A, b)C for any b ∈ Cn , by using the
Cayley–Hamilton Theorem.
For necessity, X = K (A, b) follows simply by comparing the columns of the matrices AX and XC
with b = Xe1 .
The rank of K (A, b) is ν , the degree of the minimal polynomial of b with respect to A, which is
no greater than the degree of the minimal polynomial of A. In order for K (A, b) to be nonsingular,
it is necessary for the minimal polynomial of A to be the same as its characteristic polynomial, or
equivalently, A has to be nonderogatory, i.e., the geometric multiplicity for every eigenvalue is one
[6,7]. Still, the nonsingularity of K (A, b) depends on the vector b. There are numerous equivalence
conditions based on canonical forms [1] and the controllability from linear system theory [13,18,4,5].
We list a few of them in the following proposition.
Proposition 2.6. Suppose A ∈ Cn,n and b ∈ Cn . The following statements are equivalent.
1
Usually the transpose of C is also called a companion matrix of A. In this paper, we always refer the companion matrix of A
to C.
H. Xu / Linear Algebra and its Applications 434 (2011) 185–200 189
Proof. (a), (b), (c) are just three equivalence conditions for (A, b) to be controllable [13,18,4,5]. The
equivalence between (a) and (d) is from Proposition 2.4. The equivalence between (a) and (e) can be
shown by using (1).
X −1 AX = diag(C1 , . . . , Cq ),
where C1 , . . . , Cq are in a companion matrix form, and the characteristic polynomial of each Cj divides
the characteristic polynomials of C1 , . . . , Cj−1 . Suppose the size of Cj is nj × nj , for j = 1, . . . , q. Then
the similarity matrix X can be expressed as
X = [Kn,n1 (A, b1 ); . . . , Kn,nq (A, bq )],
for some b1 , . . . , bq ∈ Cn , which generalizes the result in Proposition 2.5. In this paper, however, we
focus on the nonderogatory case only, although some results can be generalized to the derogatory case
by using the above observation.
The formulation of a function of A in terms of Krylov matrices of A is based on the following simple
observation. For any A ∈ Cn,n and and b ∈ Cn , using the fact f (A)A = Af (A) (Proposition 2.2) we have
f (A)K (A, b) = K (A, d), d = f (A)b. (5)
We first use this fact to show relations between polynomials p(A; x) and Krylov matrices.
Proof. The formula (6) follows simply from (5) with f (t ) = p(t ; x), and (1).
The formula (7) is simply from (6) and the nonsingularity of K (A, b). (8) follows from
p(A; x)K (A, b) = p(K (A, b)CK (A, b)−1 ; x)K (A, b) = K (A, b)p(C ; x),
based on Proposition 2.2 (ii) and Proposition 2.5.
Theorem 3.2. Suppose that K (A, b) is nonsingular and C is the companion matrix of A. Let f (t ) be a scalar
function and τ ∈ C such that f (τ A) is defined. Then
C (τ ) = K (τ A, b)−1 (τ A)K (τ A, b) = D−1 K (A, b)−1 (τ A)K (A, b)D = τ D−1 CD,
which has the form (12).
Note that when τ = 0, (11) may not hold, since on the left-hand side it only requires f (0) to be
defined while on the right-hand side f (C (0)) has to be defined. Even if (11) holds, it usually does not
give a polynomial corresponding to f (0) with minimal degree. This is because y(0) = f (C (0))e1 may
not be a scalar multiple of e1 , resulting a polynomial p(t ; y(0)) with degree greater than 0. Note also
that when τ = 1, (10) and (11) are identical.
The following results are directly from Theorem 3.2.
Theorem 3.3. Suppose that K (A, b) is nonsingular and C is the companion matrix of A. Let f (t ) be a scalar
function such that f (A) is defined. Then
Proof. The first part is from Theorem 3.2 with τ = 1. So we only need to prove (16).
If f (A) is nonsingular, then from (14), K (A, d) is also nonsingular. So [f (A)]−1 = K (A, b)K (A, d)−1 ,
and (16) is from (15).
Formula (15) not only restates the result given in Proposition 2.1 in the nonderogatory case, i.e.,
f (A) ∈ Pn−1 (A), but also provides an explicit formula for the polynomial p(t ; x). Formula (16) shows
that the same properties hold for the inverse of f (A).
Formula (14) holds true for all f (A). When f (t ) is a rational function, we have the following additional
formula.
Theorem 3.4. Suppose that K (A, b) is nonsingular and r (t ) = p(t )/q(t ) is a rational function with q(A)
nonsingular. Then
r (A) = p(A)q(A)−1 = p(A)K (A, b)K (A, b)−1 q(A)−1 = K (A, p(A)b)K (A, q(A)b)−1 ,
which gives (17). The relations in (18) are from (17) and (7).
Remark 3.5. Computing a Krylov matrix K (A, b) has the same cost of a matrix–matrix multiplication.
So if f (A)b is available, computing f (A) with (14) or (17) requires two matrix–matrix multiplications
and one matrix equation solving.
In general, computing the vector f (A)b is far from trivial, but it is straightforward when f (t ) is a
polynomial or a rational function. So this approach may have advantages in symbolic or exact arithmetic
computations. For numerical computations, however, it is well known that a Krylov matrix is usually
ill-conditioned. A method that uses (14) or (17) directly may be numerically unstable.
The above formulations may be used to derive some interesting results. For instance, let f (t ) = et .
Then from Theorem 3.2, we have
eτ A = K (A, d(τ ))K (A, b)−1 , d(τ ) = eτ A b.
This shows that the fundamental matrix of the linear system dx/dτ = Ax is completely determined
by the solution to the initial value problem dx/dτ = Ax, x(0) = b.
In the end of this section, we consider the case where Krylov matrices are slightly generalized.
Let
gj (t ) = p(t ; γj ) ∈ Pn−1 , γj ∈ Cn ,
for j = 1, . . . , n. Define
G(A, b) = g1 (A)b g2 (A)b ... gn (A)b . (19)
By using (1),
gj (A)b = K (A, b)γj ,
for j = 1, . . . , n. Define
192 H. Xu / Linear Algebra and its Applications 434 (2011) 185–200
= γ1 ... γn .
Then
G(A, b) = K (A, b) , (20)
Clearly for any ∈ C a matrix G(A, b) can be generated by using (20). When K (A, b) is nonsingular,
n,n
it defines an isomorphism from to G(A, b). Note also that G(A, b) is nonsingular if and only if both
K (A, b) and are nonsingular.
Corollary 3.6. Suppose that G(A, b) defined in (19) with g1 (t ), . . . , gn (t ) ∈ Pn−1 is nonsingular. Let f (t )
be a scalar function and τ ∈ C be a scalar such that f (τ A) is defined. Then
f (τ A) = G(A, d(τ ))G(A, b)−1 , d(τ ) = f (τ A)b. (21)
Remark 3.7. All the results established in this section apply to the matrices and vectors defined over
any field as long as f (τ A) is defined and satisfies f (τ A)A = Af (τ A).
In this section, we interpret Krylov matrices and polynomials of a matrices in terms of linear
transformations.
For any vectors b1 , b2 ∈ Cn and scalars α , β , we have
K (A, α b1 + β b2 ) = α K (A, b1 ) + β K (A, b2 ).
So the matrix A introduces a linear transformation: Cn → Cn,n defined by b → K (A, b). The range
of the transformation is the set of the Krylov matrices of A:
K(A) = {K (A, b) | b ∈ Cn },
which is a subspace of Cn,n . Clearly, dim K(A) = n.
Let L(K(A)) be the space of the linear operators on K(A). It has the dimension n2 . Suppose T ∈
L(K(A)) and T ∈ Cn,n is its matrix with the basis {K (A, ej )}nj=1 . Then
= K (A, Tb), ∀b ∈ Cn .
TK (A, b)
So we may identify L(K(A)) with Cn,n based on the above relation.
Now consider a subspace of L(K(A)) defined by
Lc (K(A)) = {T | TK (A, b) = K (A, Tb) = TK (A, b), ∀b ∈ Cn }.
Define
Pn−1 (A) = {p(A; x) | x ∈ Cn }.
From (6), p(A; x) ∈ Lc (K(A)) for any x ∈ Cn . Hence Pn−1 (A) ⊆ Lc (K(A)).
Theorem 4.1. Suppose A ∈ Cn,n . Then Pn−1 (A) = Lc (K(A))if and only if A ∈ Cn,n is nonderogatory.
Proof. For any T ∈ Lc (K(A)), the corresponding matrix T satisfies K (A, Tb) = TK (A, b) for all b ∈ Cn
if and only if TA = AT . Without distinguishing T and its matrix T we have
Lc (K(A)) = {T | TA = AT , T ∈ Cn,n },
which is called the centralizer of A [11, p. 275]. With this connection, the equivalence relations follow
from [11, Corollary 4].4.18].
Because K(A) and Cn are isomorphic, when A is nonderogatory, Pn−1 (A) and Cn are also isomor-
phic. So Pn−1 (A) and K(A) are isomorphic. Then L(Pn−1 (A), K(A)), the space of linear transformations
H. Xu / Linear Algebra and its Applications 434 (2011) 185–200 193
from Pn−1 (A) to K(A), has the dimension n2 , and it is isomorphic to Cn,n . For any S ∈ Cn,n we may
introduce S ∈ L(Pn−1 (A), K(A)) defined by
Sp(A; x) = K (A, Sx), ∀x ∈ Cn .
n −1
(Again, S is considered as the matrix of S with the bases {Aj }j=0 and {K (A, ej )}nj=1 .)
Define
Lc (Pn−1 (A), K(A)) = {S | Sp(A; x) = K (A, Sx) = p(A; x)S, x ∈ Cn } ⊆ L(Pn−1 (A), K(A)).
Proof. Formula (6) shows that for any b ∈ Cn , the linear transformation S corresponding to S :=
K (A, b) is in Lc (Pn−1 (A), K(A)). So if we do not distinguish S with S, we have
K(A) ⊆ Lc (Pn−1 (A), K(A)).
On the other hand, for each S ∈ Lc (Pn−1 (A), K(A)), the corresponding matrix S satisfies K (A, Sx)ej
= p(A; x)Sej for j = 1, . . . , n. Using these relations and (1), we have
Aj−1 Sx = K (A, Sx)ej = p(A; x)Sej = K (A, Sej )x, ∀x ∈ Cn ,
which implies
Theorems 4.1 and 4.2 show that when A is nonderogatory, a linear operator on K(A) is just a
polynomial p(A; x) and a linear transformation from Pn−1 to K(A) is just a Krylov matrix K (A, b), both
described by (6). A common technique to generate a new Krylov matrix from K (A, b) is to choose a new
initial vector d = p(A; x)b for an appropriate polynomial p(t ; x). Such a technique is widely used in
the QR algorithm and Krylov subspace methods [19,8,9,20]. Theorem 4.1 shows that in order to obtain
K (A, d) expressed as TK (A, b), this is the only way when A is nonderogatory.
For any b such that K (A, b) is nonsingular, the corresponding Sb ∈ Lc (Pn−1 (A), K(A)) is an isomor-
phism of Pn−1 (A) onto K(A) defined by
Sb p(A; x) = K (A, d) = p(A; x)K (A, b), d = K (A, b)x, ∀x ∈ Cn .
Its inverse is
−1
Sb K (A, d) = p(A; x) = K (A, d)K (A, b)−1 , x = K (A, b)−1 d, ∀d ∈ Cn , (23)
which is just (7).
Using Lc (K(A)) and the isomorphisms of K(A) and Lc (Pn−1 (A), K(A)), we are also able to define
a subspace of linear operators on Pn−1 (A). Let Sb1 , Sb2 ∈ Lc (Pn−1 (A), K(A)) be invertible. Define
−1
Wp(A; x) = Sb2 TSb1 p(A; x) = p(A; y), T = p(A; z ) ∈ Lc (K(A)),
Lc (Pn−1 (A)) = W
y = K (A, b2 )−1 p(A; z )K (A, b1 )x, ∀x ∈ Cn
Clearly, Lc (Pn−1 (A)) is isomorphic to Lc (K(A)) = Pn−1 (A). So its dimension is n.
When K (A, b) is nonsingular, by Proposition 2.4 we have a Hessenberg reduction form
Q ∗ AQ = H, Q ∗ b = γ e1 , |γ | = b, (24)
where Q is unitary and H is unreduced upper Hessenberg.
194 H. Xu / Linear Algebra and its Applications 434 (2011) 185–200
End
In practice, one uses the modified Arnoldi process [8, Section 9.4], or the numerically stable Hessenberg
reduction method with Householder transformations [8, Section 7.4]. Note that with the above Arnoldi
process γ = b and all the subdiagonal elements of H are positive. With the Hessenberg reduction
form (24), by Proposition 2.4, one has the QR factorization
K (A, b) = QR, (25)
k − 1
where R = K (H, γ e1 ) is nonsingular and upper triangular with rkk =γ j =1 hj+1,j for k = 1, . . . , n.
From (25),
Q = K (A, b)R−1 =: g1 (A)b g2 (A)b . . . gn (A)b , (26)
and it is easily verified that gj (t ) ∈ Pn−1 and deg gj (t ) = j − 1 for j = 1, . . . , n. So Q is a generalized
Krylov matrix of the form (19). In fact the polynomials gi (t ) has the following properties.
Proof. From the Arnoldi process it is not difficult to get the recurrence
1 1
g1 (t ) = , gj+1 (t ) = (tgj (t ) − h1j g1 (t ) − · · · − hjj gj (t )), j = 1, . . . , n − 1.
γ hj+1,j
We now prove (27) by induction. When j = 2,
1 1 1
g2 (t ) = (tg1 (t ) − h11 g1 (t )) = (t − h11 ) = det(tI1 − H1 ).
h21 γ h21 γ h21
So (27) holds for j = 2.
Assume (27) is true for 1, . . . , j. Expanding det(tI − Hj ) based on the last column we get
det(tI − Hj )=(t − hjj ) det(tI − Hj−1 ) − hj−1,j hj,j−1 det(tI − Hj−2 )
⎛ ⎞ ⎛ ⎞
j
j
−hj−2,j ⎝ hk,k−1 ⎠ det(tI − Hj−3 ) − · · · − h1j ⎝ hk,k−1 ⎠ .
k =j −1 k =2
H. Xu / Linear Algebra and its Applications 434 (2011) 185–200 195
j + 1
By dividing γ k =2 hk,k−1 on both side, and using the assumption we have
1 1
j+1 det(tI − Hj ) = ((t − hjj )gj (t ) − hj−1,j gj−1 (t ) − · · · − h1j g1 (t )) = gj+1 (t ).
hk,k−1 hj+1,j
k =2
So (27) hold also for j + 1.
The relation (28) is simply form (26).
Since K (A, b) is nonsingular, one may introduce the following inner product in Pn−1 (A).
p(A; x), p(A; y)b = b∗ p(A; x)∗ p(A; y)b = x∗ K (A, b)∗ K (A, b)y, ∀p(A; x), p(A; y) ∈ Pn−1 (A).
The last relation is due to (1). With this inner product we define the norm
1/2
p(A; x)b =
p(A; x), p(A; x)b = p(A; x)b = K (A, b)x.
Then the matrices g1 (A), . . . , gn (A) that determine Q in (26) are orthonormal, which can be viewed as
being generated from I, A, . . . , An−1 ∈ Pn−1 (A) by applying the Gram–Schmidt process with respect
to the above defined inner product [2]. So g1 (A), . . . , gn (A) form an orthonormal basis for Pn−1 (A).
Also, the polynomials g1 (t ), . . . , gn (t ) form an orthonormal basis for Pn−1 with respect to the inner
product
p(t ; x), p(t ; y)A,b :=
p(A; x), p(A; y)b , ∀p(t ; x), p(t ; y) ∈ Pn−1 ,
which can be interpreted as being generated by applying the Gram–Schmidt process to 1, t, . . . , t n−1 .
−1
Because K(A) and Pn−1 (A) are isomorphic, if K (A, b) is nonsingular, using the isomorphism Sb
defined in (23), an inner product in K(A) can be induced from the inner product
·, ·b with
K (A, u), K (A, v)= Sb−1 K (A, u), Sb−1 K (A, v) = p(A, K (A, b)−1 u), p(A, K (A, b)−1 v)
b b
−1 ∗ ∗ −1 ∗
= (K (A, b) u) (K (A, b) K (A, b))(K (A, b) v) = u v,
which is just the standard inner product in C . n
Given a nonsingular matrix W , in a similar way one can determine a W -unitary matrix X, i.e.,
X ∗ W ∗ WX = I, such that
X −1 AX X −1 b = γ̂ e1 ,
= H, (29)
where H is unreduced upper Hessenberg (by Proposition 2.4). The matrix X can be obtained by ap-
plying the Arnoldi process. The only difference is to make the columns of X to be W -orthonormal. By
Proposition 2.4,
K (A, b) = X
R, γ̂ e1 ).
R = K (H,
So we also have
X R−1 = ĝ1 (A)b
= K (A, b) ĝ2 (A)b ... ĝn (A)b ,
where ĝj (t ) ∈ Pn−1 with deg ĝj (t ) = j − 1, and ĝ1 (A), . . . , ĝn (A) form an orthonormal basis for
Pn−1 (A) with the generalized inner product
p(A; x), p(A; y)W ,b = b∗ p(A; x)∗ W ∗ Wp(A; y)b = x∗ K (A, b)∗ W ∗ WK (A, b)y. (30)
Theorem 5.1. Suppose K (A, b) is nonsingular. Let W be nonsingular, X and H satisfy (29), and Q , H satisfy
= WX and T =
(24). Define Q RR−1 = K (H,
γ̂ e1 )K (H, γ e1 )−1 . Then Q
is unitary, T is upper triangular,
and
Q = XT ,
W =Q TQ ∗ ,
H = T −1 HT
.
Proof. It is obvious that Q = WX is unitary and T is upper triangular.
As (28), we have
This theorem shows the relations between the Hessenberg reduction forms (24) and (29). In fact,
A = WAW −1 and b̂ = Wb, (29) can be rewritten as
with
∗
Q
AQ Q
= H, ∗ b̂ = γ̂ e1 , |γ̂ | = b̂.
,
So (29) is the same as (24) but with Q , A, b replaced by Q A, b̂.
The next result shows that for any sequence of n polynomials in Pn−1 with degrees in increasing
order, a unitary matrix can be constructed to reduces WAW −1 and Wb to a Hessenberg reduction form
for an appropriate W .
Theorem 5.2. Suppose that K (A, b) is nonsingular and p(t ; r1 ), . . . , p(t ; rn ) ∈ Pn−1 with deg p(t ; rj ) =
j − 1 for j = 1, . . . , n. There exists a nonsingular matrix W such that for
A = WAW −1 ,
b̂ = Wb,
the matrix
Q = [p(
A; r1 )b̂, . . . , p(
A; rn )b̂]
is unitary and satisfies
∗
Q
AQ Q
= H, ∗ b̂ = γ̂ e1 , |γ̂ | = b̂,
is unreduced upper Hessenberg
where H
Proof. Let
R = [r1 , . . . , rn ], X = [p(A; r1 )b, . . . , p(A; rn )b].
By the assumptions R is upper triangular and nonsingular, and X = K (A, b)R is nonsingular. From
AK (A, b) = K (A, b)C, where C is the companion matrix of A, we have
X −1 AX = R−1 CR =: H,
X −1 b = γ̂ e1 ,
is unreduced upper Hessenberg and γ̂ is a scalar.
where H
H. Xu / Linear Algebra and its Applications 434 (2011) 185–200 197
Let
X = W −1 Q
,
where W is a nonsingular matrix and Q is unitary. Such a factorization always exists, for instance, an
RQ factorization. Then, with this W and the corresponding A, b̂, we have
Q = WX = W [p(A; r1 )b, . . . , p(A; rn )b] = [p(
A; r1 )b̂, . . . , p(
A; rn )b̂],
∗ −1 − 1 −1 −1
Q AQ = X W (WAW )WX = X AX = H,
and
∗ b̂
Q = X −1 W −1 Wb = X −1 b = γ̂ e1 .
Obviously, |γ̂ | ∗ b̂ = b̂.
= Q
We now use the Hessenberg reduction form (24) to give a factorization for f (A).
Theorem 5.3. Suppose that K (A, b) is nonsingular and A, b have the forms in (24). Then for any f (t ) such
that f (A) is defined,
11
where H ∈ Cr,r is an r × r unreduced upper Hessenberg matrix. Then rankf (A) = r and
f (A) = (Q Q RQ ∗ ,
1 ) (32)
where
R=γ 11 , e1 )K (H, e1 )−1 ,
Kr,n (H
is upper triangular.
22 are
Corollary 5.4. Suppose K (A, b) is nonsingular. If rankf (A) = r < n, then all the eigenvalues of H
2 = null([f (A)]∗ ).
the roots of f (t ) = 0 (counting multiplicity), and rangeQ Q
When f (A) is nonsingular, (32) can be also derived by the orthogonalization argument using the
weighted inner product (30) with W = f (A). In this case, (WAW −1 , Wb) becomes (A, d). From Theorem
5.3, the matrices and scalar in (29) are
= QQ
Q , H γ̂ = γ γ̃ .
= H,
By Theorem 5.1, f (A) has the URV decomposition ([21]),
TQ ∗ ,
f (A) = Q Q
where
T γ γ̃ e1 )K (H, γ e1 )−1 = γ̃ K (H,
= K (H, e1 )K (H, e1 )−1 =
R.
The formula (32) is more generalized, since it holds when f (A) is singular as well.
Remark 5.5. Using Af (A) = f (A)A, the matrix R in (32) satisfies H 11
R = RH. So R can be computed
column by column with the recurrence
11 r̃k − h1k r̃1 − · · · − hkk r̃k )/hk+1,k , k = 1, . . . , n − 1,
r̃1 = γ̃ e1 , r̃k+1 = (H
which is the Arnoldi process for qk , but with A replaced by H 11 .
More generally, using
Hf (H ) = f (H )H,
one may use the same recurrence to compute f (H ), provided f (H )e1 is given. This approach was
mentioned in [14,15] for f (t ) = et .
The next result shows how the unitary matrix is related to Q if it is generated by another vector.
=
Theorem 5.6. Suppose that K (A, b) is nonsingular and A, b have the forms in (24). Let d ∈ Cn and Q
1 , Q
[Q 2 ] be unitary with Q
1 ∈ Cn,r and satisfy
11
H 12
H
AQ H,
=Q H= ∗ d = γ̂ e1 , |γ̂ | = d,
0 22 ,
H
Q
11
where H ∈ Cr,r is unreduced upper Hessenberg. Let
H H12
Q = [Q1 , Q2 ], H = H11 H22
, Q1 ∈ Cn,r , H11 ∈ Cr,r .
21
Then
1 Kr,n (H
Q 11 , γ̂ e1 ) = p(A; x)QK (H, γ e1 ), (33)
and
1 Tr
Q = p(A; x)Q1 ,
where
x = K (A, b)−1 d, Tr 11 , γ̂ e1 )Kr (H11 , γ e1 )−1 .
= Kr (H
So we have (33). The second equation follows by equating the first r columns in (33) to get
1 Kr (H
Q 11 , γ̂ e1 ) = p(A; x)Q1 Kr (H11 , γ e1 ),
and by using the fact that Kr (H11 , γ e1 ) is nonsingular.
H. Xu / Linear Algebra and its Applications 434 (2011) 185–200 199
The relation (33) shows that the unitary matrix corresponding to d in the Hessenberg reduction is
just the unitary factor of the QR factorization of p(A; x)Q for an appropriate polynomial p(t ; x).
Any nonsingular Krylov matrix K (A, b) has a QR factorization (25) with R nonsingular. If Q = I, from
(24) A has to be unreduced upper Hessenberg and b = γ e1 . In this case (32) becomes a QR factorization
of f (A). If further K (A, b) = I, then b = e1 and A is the companion matrix C defined in (4). In this case,
f (A) and K (A, b) have simpler relations.
Corollary 5.8. Suppose C is the companion matrix (4) and f (t ) is a scalar function such that f (C ) is defined.
Then
f (C ) = K (C, d) = p(C ; d), d = f (C )e1 .
6. Conclusions
Starting from a simple observation, we derived formulas to show relations between functions of
a matrix and Krylov matrices. By introducing subspaces and linear transformations, we interpreted
the relations at an abstract level. We provided several properties of Hessenberg reductions that can
be used to understand some common techniques used in Krylov subspace methods and eigenvalue
algorithms. How to use the results to improve existing methods and develop new methods? That needs
more work.
Acknowledgment
The author thanks the referees for their comments and suggestions.
References
[1] M. Arioli, V. Pták, Z. Strakoš, Krylov sequences of maximal length and convergence of GMRES, BIT 38 (4) (1998) 636–643.
[2] D. Calvetti, S.-M. Kim, L. Reichel, Quadrature rules based on the Arnoldi process, SIAM J. Matrix Anal. Appl. 26 (3) (2005)
765–781.
[3] P.I. Davies, N.J. Higham, A Schur–Parlett algorithm for computing matrix functions, SIAM J. Matrix Anal. Appl. 25 (2) (2003)
464–485.
[4] R. Eising, The distance between a systems and the set of uncontrollable systems, in: Proc. MTNS, Beer-sheva, June, 1983,
pp. 303–314.
[5] R. Eising, Between controllable and uncontrollable, Systems Control Lett. 4 (5) (1984) 263–264.
[6] F.R. Gantmacher, Theory of Matrices, vol. 1, Chelsea, New York, 1959.
[7] F.R. Gantmacher, Theory of Matrices vol. 2, Chelsea, New York, 1959.
[8] G.H. Golub, C.F. Van Loan, Matrix Computations, third ed., Johns Hopkins University Press, Baltimore, 1996.
200 H. Xu / Linear Algebra and its Applications 434 (2011) 185–200
[9] A. Greenbaum, Iterative Methods for Solving Linear Systems, SIAM Publications, Philadelphia, PA, 1997.
[10] N.J. Higham, Functions of Matrices, Theory and Computation, SIAM Publications, Philadelphia, PA, 2008.
[11] R.A. Horn, C.R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, 1991.
[12] B. Kågström, Numerical computation of matrix functions, Report UMIN-58.77, Department of Information processing,
University of Umeå, Sweden, 1977.
[13] R.E. Kalman, On the general theory of control systems, in: Proc. 1st IFAC Congr., vol. 1, Butterworth, London, 1960, pp.
481–491.
[14] C.B. Moler, C.F. Van Loan, Nineteen dubious ways to compute the exponential of a matrix, SIAM Rev. 29 (1978) 801–837.
[15] C.B. Moler, C.F. Van Loan, Nineteen dubious ways to compute the exponential of a, matrix twenty-five years later, SIAM
Rev. 45 (2003) 3–49.
[16] B.N. Parlett, Computation of Functions of Triangular Matrices, Memorandum ERL-M481, Electronics Research Laboratory,
College of Engineering, UC Berkeley, 1974.
[17] B.N. Parlett, A recurrence among the elements of functions of triangular matrices, Linear Algebra Appl. 14 (1976) 117–121.
[18] H.H. Rosenbrock, State-Space and Multivariable Theory, Nelson, London, 1970.
[19] Y. Saad, Numerical Methods for Large Eigenvalue Problems, Manchester University Press, Manchester, UK, 1992.
[20] Y. Saad, Iterative Methods for Sparse Linear Systems, second ed., SIAM Publications, Philadelphia, PA, 2004.
[21] G.W. Stewart, Updating a rank-revealing URV decompositions in parallel, Parallel Comput. 20 (1994) 151–172.
[22] J.H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, Oxford, 1965.