Block LU Factorization
Block LU Factorization
php
Chapter 13
Block LU Factorization
245
246 Block LU Factorization
prises three nested loops that can be ordered in six ways, each yielding a different
algorithmic variant of the method. These variants involve different computational
kernels: inner product and saxpy operations (level-1 BLAS), or outer product and
gaxpy operations (level-2 BLAS). To introduce matrix–matrix operations (level-3
BLAS), which are beneficial for high-performance computing, further manipula-
tion beyond loop reordering is needed. We will use the following terminology,
which emphasises an important distinction.
A partitioned algorithm is a scalar (or point) algorithm in which the operations
have been grouped and reordered into matrix operations.
A block algorithm is a generalization of a scalar algorithm in which the basic
scalar operations become matrix operations (α + β, αβ, and α/β become A + B,
AB, and AB −1 ), and a matrix property based on the nonzero structure becomes
the corresponding property blockwise (in particular, the scalars 0 and 1 become
the zero matrix and the identity matrix, respectively). A block factorization is
defined in a similar way and is usually what a block algorithm computes.
A partitioned version of the outer product form of LU factorization may be
developed as follows. For A ∈ Rn×n and a given block size r, write
A11 A12 L11 0 Ir 0 U11 U12
= , (13.1)
A21 A22 L21 In−r 0 S 0 In−r
where A11 is r × r.
In general, the blocks can be of different dimensions. Note that this factorization is
not the same as a standard LU factorization, because U is not triangular. However,
the standard and block LU factorizations are related as follows: if A = LU is a
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
block LU factorization and each Uii has an LU factorization Uii = Lii U ii , then
A = L diag(Lii ) · diag(U ii )U is an LU factorization. Conditions for the existence
of a block LU factorization are easy to state.
Theorem 13.5 (Demmel and Higham). Under the assumptions (13.4)–(13.6), the
LU factors of A ∈ Rn×n computed using the partitioned outer product form of LU
bU
factorization with block size r satisfy L b = A + ∆A, where
k∆Ak ≤ u δ(n, r)kAk + θ(n, r)kLk b kUb k + O(u2 ), (13.7)
and where
Proof. The proof is essentially inductive. To save clutter we will omit “+O(u2 )”
from each bound. For n = r, the result holds trivially. Consider the first block
stage of the factorization, with the partitioning (13.1). The assumptions imply
that
b 11 U
L b12 = A12 + ∆A12 , b 11 k kU
k∆A12 k ≤ c2 (r, n − r)ukL b12 k, (13.8)
b 21 U
L b11 = A21 + ∆A21 , b 21 k kU
k∆A21 k ≤ c2 (r, n − r)ukL b11 k. (13.9)
b 21 U
To obtain S = A22 − L21 U12 we first compute C = L b12 , obtaining
b=L
C b 21 U
b12 + ∆C, b 21 k kU
k∆Ck ≤ c1 (n − r, r, n − r)ukL b12 k,
Sb = A22 − C
b + F, b
kF k ≤ u(kA22 k + kCk). (13.10)
It follows that
Sb = A22 − L
b 21 U
b12 + ∆S, (13.11a)
b b b b
k∆Sk ≤ u kA22 k + kL21 k kU12 k + c1 (n − r, r, n − r)kL21 k kU12 k . (13.11b)
We also assume that when a system Uii xi = di of order r is solved, the computed
solution x
bi satisfies
(Uii + ∆Uii )b
xi = di , k∆Uii k ≤ c5 (r)ukUii k + O(u2 ). (13.15)
The assumptions (13.14) and (13.15) are satisfied for Implementation 1 of Algo-
rithm 13.3 and are sufficient to prove the following result.
Proof. We omit the proof (see Demmel, Higham, and Schreiber [326, ]
for details). It is similar to the proof of Theorem 13.5.
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
The bounds in Theorem 13.6 are valid also for other versions of block LU
factorization obtained by “block loop reordering”, such as a block gaxpy based
algorithm.
Theorem 13.6 shows that the stability of block LU factorization is determined
by the ratio kLkb kU b k/kAk (numerical experiments show that the bounds are, in
fact, reasonably sharp). If this ratio is bounded by a modest function of n, then
b and U
L b are the true factors of a matrix close to A, and x b solves a slightly
perturbed system. However, kLk b kU b k can exceed kAk by an arbitrary factor, even
if A is symmetric positive definite or diagonally dominant by rows. Indeed, kLk ≥
kL21 k = kA21 A−111 k, using the partitioning (13.2), and this lower bound for kLk can
be arbitrarily large. In the following two subsections we investigate this instability
more closely and show that kLk kU k can be bounded in a useful way for particular
classes of A. Without further comment we make the reasonable assumption that
kLk kU k ≈ kLkb kU b k, so that these bounds may be used in Theorem 13.6.
What can be said for Implementation 2? Suppose, for simplicity, that the
inverses A−1
11 (which are used in step 2 of Algorithm 13.3 and in the block back
substitution) are computed exactly. Then the best bounds of the forms (13.14)
and (13.15) are
Working from these results, we find that Theorem 13.6 still holds provided the
first-order terms in the bounds in (13.16) are multiplied by maxi κ(Ubii ). This
suggests that Implementation 2 of Algorithm 13.3 can be much less stable than
Implementation 1 when the diagonal blocks of U are ill conditioned, and this is
confirmed by numerical experiments.
This definition implicitly requires that the diagonal blocks Ajj are all nonsingu-
lar. A is block diagonally dominant by rows if AT is block diagonally dominant
by columns. For the block size 1, the usual property of point diagonal domi-
nance is obtained. Note that for the 1- and ∞-norms diagonal dominance does
not imply block diagonal dominance, nor does the reverse implication hold (see
Problem 13.2). Throughout our analysis of block diagonal dominance we take the
norm to be an arbitrary subordinate matrix norm.
252 Block LU Factorization
First, we show that for block diagonally dominant matrices a block LU factor-
ization exists, using the key property that block diagonal dominance is inherited by
the Schur complements obtained in the course of the factorization. In the analysis
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
For j = 2: m we have
m
X m
X
(2)
kAij k = kAij − Ai1 A−1
11 A1j k
i=2 i=2
i6=j i6=j
m
X m
X
≤ kAij k + kA1j k kA−1
11 k kAi1 k
i=2 i=2
i6=j i6=j
m
X
≤ kAij k + kA1j k kA−1 −1 −1
11 k kA11 k − kAj1 k , using (13.17),
i=2
i6=j
m
X
= kAij k + kA1j k − kA1j k kA−1
11 k kAj1 k
i=2
i6=j
≤ kA−1
jj k
−1
− kA1j k kA−1
11 k kAj1 k, using (13.17),
= min kAjj xk − kA1j k kA−1
11 k kAj1 k
kxk=1
(2) Pm (2)
Now if Ajj is singular it follows that i=2,i6=j kAij k = 0; therefore A(2) , and
(2)
hence also A, is singular, which is a contradiction. Thus Ajj is nonsingular, and
13.3 Error Analysis of Block LU Factorization 253
kAij k ≤ kAjj k ,
i=2
i6=j
showing that A(2) is block diagonally dominant by columns. The result follows by
induction.
The next result allows us to bound kU k for a block diagonally dominant matrix.
Theorem 13.8 (Demmel, Higham, and Schreiber). Let A satisfy the conditions
of Theorem 13.7. If A(k) denotes the matrix obtained after k − 1 steps of Algo-
rithm 13.3, then
(k)
max kAij k ≤ 2 max kAij k.
k≤i,j≤m 1≤i,j≤m
Proof. Let A be block diagonally dominant by columns (the proof for row
diagonal dominance is similar). Then
m
X m
X
(2)
kAij k = kAij − Ai1 A−1
11 A1j k
i=2 i=2
m
X m
X
≤ kAij k + kA1j k kA−1
11 k kAi1 k
i=2 i=2
Xm
≤ kAij k,
i=1
Pm (k)
using (13.17). By induction, using Theorem 13.7, it follows that i=k kAij k ≤
Pm
i=1 kAij k. This yields
m
X m
X
(k) (k)
max kAij k ≤ max kAij k ≤ max kAij k.
k≤i,j≤m k≤j≤m k≤j≤m
i=k i=1
P
From (13.17), i6=j kAij k ≤ kA−1
jj k
−1
≤ kAjj k, so
(k)
max kAij k ≤ 2 max kAjj k ≤ 2 max kAjj k = 2 max kAij k.
k≤i,j≤m k≤j≤m 1≤j≤m 1≤i,j≤m
The implications of Theorems 13.7 and 13.8 for stability are as follows. Suppose
A is block diagonally dominant by columns. Also, assume for the moment that
the (subordinate) norm has the property that
X
max kAij k ≤ kAk ≤ kAij k, (13.19)
i,j
i,j
which holds for any p-norm, for example. The subdiagonal blocks in the first
block column of L are given by Li1 = Ai1 A−1 T T T
11 and so k[L21 , . . . , Lm1 ] k ≤ 1, by
(13.17) and (13.19). From Theorem 13.7 it follows that k[Lj+1,j , . . . , LTmj ]T k ≤ 1
T
254 Block LU Factorization
(i)
for j = 2: m. Since Uij = Aij for j ≥ i, Theorem 13.8 shows that kUij k ≤ 2kAk
for each block of U (and kUii k ≤ kAk). Therefore kLk ≤ m and kU k ≤ m2 kAk,
and so kLk kU k ≤ m3 kAk. For particular norms the bounds on the blocks of L
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
and U yield a smaller bound for kLk and kU k. For example, for the 1-norm we
have kLk1 kU k1 ≤ 2mkAk1 and for the ∞-norm kLk∞ kU k∞ ≤ 2m2 kAk∞ . We
conclude that block LU factorization is stable if A is block diagonally dominant
by columns with respect to any subordinate matrix norm satisfying (13.19).
Unfortunately, block LU factorization can be unstable when A is block diago-
nally dominant by rows, for although Theorem 13.8 guarantees that kUij k ≤ 2kAk,
kLk can be arbitrarily large. This can be seen from the example
A 0 I 0 A11 0
A = 111 = 1 −1 = LU,
2I I 2 A11 I 0 I
where A is block diagonally dominant by rows in any subordinate norm for any
nonsingular matrix A11 . It is easy to confirm numerically that block LU factor-
ization can be unstable on matrices of this form.
Next, we bound kLk kU k for a general matrix and then specialize to point
diagonal dominance. From this point on we use the norm kAk := maxi,j |aij |. We
partition A according to
A11 A12
A= , A11 ∈ Rr×r , (13.20)
A21 A22
and denote by ρn the growth factor for GE without pivoting. We assume that GE
applied to A succeeds.
To bound kLk, we note that, under the partitioning (13.20), for the first
block stage of Algorithm 13.3 we have kL21 k = kA21 A−111 k ≤ nρn κ(A) (see Prob-
lem 13.4). Since the algorithm works recursively with the Schur complement S,
and since every Schur complement satisfies κ(S) ≤ ρn κ(A) (see Problem 13.4),
each subsequently computed subdiagonal block of L has norm at most nρ2n κ(A).
Since U is composed of elements of A together with elements of Schur complements
of A,
kU k ≤ ρn kAk. (13.21)
Overall, then, for a general matrix A ∈ Rn×n ,
kLk kU k ≤ 2kAk.
Thus block LU factorization is perfectly stable for a matrix point diagonally dom-
inant by columns.
13.3 Error Analysis of Block LU Factorization 255
kA21 A−1 −1
11 k2 ≤ kR12 k2 kR11 k2 ≤ kRk2 kR
−1
k2 = κ2 (R) = κ2 (A)1/2 .
The following lemma is proved in a way similar to the second inequality in Prob-
lem 13.4.
Using the same reasoning as in the last subsection, we deduce from these
two lemmas that each subdiagonal block of L is bounded in 2-norm by κ2 (A)1/2 .
Therefore kLk2 ≤ 1 + mκ2 (A)1/2 , where
√ there are m block stages in the algorithm.
Also, it can be shown that kU k2 ≤ mkAk2 . Hence
√
kLk2 kU k2 ≤ m(1 + mκ2 (A)1/2 )kAk2 . (13.24)
256 Block LU Factorization
Table 13.1. Stability of block and point LU factorization. ρn is the growth factor for GE
without pivoting.
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
It follows from Theorem 13.6 that when Algorithm 13.3 is applied to a symmetric
positive definite matrix A, the backward errors for the LU factorization and the
subsequent solution of a linear system are both bounded by
√
cn mukAk2 (2 + mκ2 (A)1/2 ) + O(u2 ). (13.25)
Any resulting bound for kx − x bk2 /kxk2 will be proportional to κ2 (A)3/2 , rather
than κ2 (A) as for a stable method. This suggests that block LU factorization
can lose up to 50% more digits of accuracy in x than a stable method for solving
symmetric positive definite linear systems. The positive conclusion to be drawn,
however, is that block LU factorization is guaranteed to be stable for a symmetric
positive definite matrix that is well conditioned.
The stability results for block LU factorization are summarized in Table 13.1,
which tabulates a bound for kA− L bU b k/(cn ukAk) for block and point LU factoriza-
tion for the matrix properties considered in this chapter. The constant cn incor-
porates any constants in the bound that depend polynomially on the dimension,
so a value of 1 in the table indicates unconditional stability.
Block LU factorization appears to have first been proposed for block tridi-
agonal matrices, which frequently arise in the discretization of partial differential
equations. References relevant to this application include Isaacson and Keller [667,
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
, p. 59], Varah [1187, ], Bank and Rose [62, ], Mattheij [827, ],
[828, ], and Concus, Golub, and Meurant [262, ].
For an application of block LU factorization to linear programming, see Elder-
sveld and Saunders [388, ].
Theorem 13.5 is from Demmel and Higham [324, ]. The results in §13.3 are
from Demmel, Higham, and Schreiber [326, ], which extends earlier analysis
of block LU factorization by Demmel and Higham [324, ].
Block diagonal dominance was introduced by Feingold and Varga [406, ],
and has been used mainly in generalizations of the Gershgorin circle theorem.
Varah [1187, ] obtained bounds on kLk and kU k for block diagonally dominant
block tridiagonal matrices; see Problem 13.1.
Theorem 13.7 is obtained in the case of block diagonal dominance by rows
with minj γj > 0 by Polman [946, ]; the proof in [946, ] makes use of the
corresponding result for point diagonal dominance and thus differs from the proof
we have given.
At the cost of a much more difficult proof, Lemma 13.9 can be strengthened
to the attainable bound kA21 A−1 11 k2 ≤ (κ2 (A)
1/2
− κ2 (A)−1/2 )/2, as shown by
Demmel [307, , Thm. 4], but the weaker bound is sufficient for our purposes.
13.4.1. LAPACK
LAPACK does not implement block LU factorization, but its LU factorization
(and related) routines for full matrices employ partitioned LU factorization in
order to exploit the level-3 BLAS and thereby to be efficient on high-performance
machines.
Problems
13.1. (Varah [1187, ]) Suppose A is block tridiagonal and has the block LU
factorization A = LU (so that L and U are block bidiagonal and Ui,i+1 = Ai,i+1 ).
Show that if A is block diagonally dominant by columns then
What can be deduced about the stability of the factorization for these two classes
of matrices?
13.2. Show that for the 1- and ∞-norms diagonal dominance does not imply block
diagonal dominance, and vice versa.
13.3. If A ∈ Rn×n is symmetric, has positive diagonal elements, and is block
diagonally dominant by rows, must it be positive definite?
258 Block LU Factorization
A21 A22
with A11 nonsingular. Let kAk := maxij |aij |. Show that kA21 A−1
11 k ≤ nρn κ(A),
where ρn is the growth factor for GE without pivoting on A. Show that the Schur
complement S = A22 − A21 A−1 11 A12 satisfies κ(S) ≤ ρn κ(A).
and the computed solution to the multiple right-hand side system AX = B (where
(13.5) is assumed to hold for the multiple right-hand side triangular solves) satisfies
b − Bk ≤ cn u kAk + kLk
kAX b kU
b k kXk
b + O(u2 ).
(T − U W −1 V T )−1 = T −1 + T −1 U (W − V T T −1 U )−1 V T T −1 ,