Orthogonality in Gram-Schmidt
Orthogonality in Gram-Schmidt
To our close friend and mentor Gene Golub, on his 60th birthday.
This is but one of the many topics on which Gene has generated so
much interest, and shed so much light.
Abstract. This paper arose from a fascinating observation, apparently by Charles Sheeld, and
relayed to us by Gene Golub, that the QR factorization of an m n matrix A via MGS is numerically
equivalent to that arising from Householder transformations applied to the matrix A augmented by
an n by n zero matrix. This is explained in a clear and simple way, and then combined with a
well known rounding error result to show the upper triangular matrix R from MGS is about as
accurate as R from other QR factorizations. The special structure of the product of the Householder
transformations is derived, and then used to explain and bound the loss of orthogonality in MGS.
Finally this numerical equivalence is used to show how orthogonality in MGS can be regained in
general. This is illustrated by deriving a numerically stable algorithm based on MGS for a class
of problems which includes solution of nonsingular linear systems, minimum 2-norm solution of
underdetermined linear systems, and linear least squares problems. A brief discussion on the relative
merits of such algorithms is included.
Key Words. orthogonal matrices, QR factorization, Householder transformations, least
squares, minimum norm solution, numerical stability, Gram-Schmidt, augmented systems.
AMS Subject Classications: 65F25, 65G05, 65F05, 65F20
1. Introduction. We consider a matrix A 2 Rmn with rank n m. The
Modied Gram-Schmidt algorithm (MGS) in theory produces Q1 and R in the QR
factorization
R
(1.1) A = Q 0 = Q1 R; Q = ( Q1 Q2 )
where Q is orthogonal and R upper triangular. In practice if the condition number
= (A) 1=n is large, (1 : : : n being the singular values of A), then
the columns of Q1 are not accurately orthogonal [3]. If orthogonality is crucial, then
usually either rotations or Householder transformations have been used to compute
the QR factorization. Here we show how MGS can be used just as stably for many
problems requiring this orthogonality.
We derive some important properties of MGS in the presence of rounding errors.
In particular we show that the R obtained from MGS is numerically as good as that
obtained from rotations or Householder transformations. We present new insights
on the loss of orthogonality in Q1 from MGS, and show how this can eectively
be regained in computations which use Q1 , without altering the MGS algorithm or
reorthogonalizing the columns of Q1 . As a practical example of this, we indicate how
Q1 and R from MGS may be used to solve an important class of problems reliably,
despite the loss of orthogonality in Q1. This new approach seems applicable to most
problems for which MGS is in theory relevant.
This research was partially supported by NSERC of Canada Grant No. A9236.
y Mathematics, Linkoping University, S-581 83, Linkoping, Sweden, ([email protected]).
z Computer Science, McGill University, Montreal, Quebec, Canada, H3A 2A7,
([email protected]).
1
2 AKE BJO RCK AND CHRIS PAIGE
The class of problems we consider is that of solving the symmetric indenite linear
system involving A 2 Rmn with rank n
I A x b
(1.2) AT 0 y = c :
In general we call (1.2) the Augmented System Formulation (ASF) of the following
two problems, since it represents the conditions for their solution:
(1.3) min
x
kb xk2; AT x = c;
(1.4) min
y
fkb Ayk22 + 2cT yg:
We examine these problems more fully in [5]. The ASF can be obtained by dierenti-
ating the Lagrangian kb xk22 +2yT (AT x c) of (1.3), and equating to zero. Here y is
the vector of Lagrange multipliers. The ASF can also be obtained by dierentiating
(1.4) to give AT (b Ay) = c, and setting x to be the \residual" x = b Ay.
The ASF covers two important special cases. Setting b = 0 in (1.3), and so in
(1.2), gives the problem of nding the minimum 2-norm solution of a Linear Under-
determined System (LUS). Setting c = 0 in (1.4) gives the much used Linear Least
Squares (LLS) problem. The ASF also occurs in its full form (1.2) in the iterative
renement of least squares solutions [2].
Using the QR factorization (1.1) we can transform (1.2) into
0 R1
I T T
@ 0 A Qy x = Qc b :
( RT0) 0
This gives one method for solving (1.2):
d z
(1.5) z = R c; T T
f = Q b; x = Q f ; y = R 1(d z):
Using x = Q1z + Q2 f = Q1z + Q2QT2 b = Q1z + (I Q1 QT1 )b an obvious variant is
(1.6) z = R T c; d = QT1 b; x = b Q1 (d z); y = R 1 (d z):
Bjorck [2] showed (1.5) is backward stable for (1.2) using the Householder QR fac-
torization. Since (1.5) uses Q, (1.6) seems preferable if x is required and only Q1 is
available. However, as we shall see it cannot in general be recommended when Q1 is
obtained by MGS. We will show how to develop more reliable algorithms based on
Q1 from MGS.
In Section 2 we illustrate the important but not widely appreciated result that
MGS is numerically equivalent to the Householder QR factorization applied to A
augmented with a block of zeros. From this we show in Section 3 that the computed
R from MGS is numerically as satisfactory as that obtained using Householder QR
on A. The
product P of the Householder transformations from the QR factorization
O
of A is crucial for a full understanding of MGS. P has a simple and important
n
structure, and this is derived in the theorem in Section 4. This structure shows exactly
how the computed Q 1 from MGS can lose orthogonality. In Section 5 this structure
LOSS AND RECAPTURE OF ORTHOGONALITY IN MODIFIED GRAM-SCHMIDT 3
is used to bound the loss of orthogonality of Q 1 , while Section 6 shows how the lost
orthogonality can be compensated for just by using Q 1 dierently without altering Q 1
or MGS. We illustrate this by producing a new backward stable algorithm for (1.2)
using the computed Q 1 and R from MGS. In Section 7 we consider when we might
use MGS in preference to the Householder QR factorization of A.
2. Modied Gram-Schmidt as a Householder Method. The MGS algo-
rithm computes a sequence of matrices A = A(1) ; A(2); : : :; A(n+1) = Q1 2 Rmn ,
where A(k) = (q1 ; : : :; qk 1; a(kk); : : :; a(nk)). Here the rst (k 1) columns are nal
columns in Q1, and a(kk); : : :; a(nk) have been made orthogonal to q1; : : :; qk 1. In the
kth step we take
(2.1) qk0 = a(kk); kk = kqk0 k2 ; qk = qk0 =kk ;
and orthogonalize a(kk+1) ; : : :; a(nk) against qk using the orthogonal projector I qk qkT ,
(2.2) a(jk+1) = (I qk qkT )a(jk) = a(jk) qk kj ;
kj = qkT a(jk) ; j = k + 1; : : :; n:
We see A(k) = A(k+1)Rk where Rk has the same kth row as upper triangular R (ij ),
but is the unit matrix otherwise. After n steps we have obtained the factorization
(2.3) A = A(1) = A(2) R1 = A(3) R2R1 = A(n+1)Rn : : :R1 = Q1 R;
where in exact arithmetic the columns of Q1 are orthonormal by construction. Note
that in the modied Gram-Schmidt algorithm, as opposed to the classical version,
all the projections qk kj are subtracted from the a(jk) sequentially as soon as qk is
computed. In practice a square root free version is often used, where one computes
Q01; R0, and D = diag(
1 ; : : :;
n ) in the scaled factorization, taking qk0 as above,
(2.4) A = Q01R0 ; Q01 = (q10 ; : : :; qn0 );
k = (qk0 )T qk0 ; k = 1; : : :; n;
with R0 = (0kj ) unit upper triangular, and 0kj = (qk0 )T a(jk)=
k , j > k.
It was reported in [4] that the modied Gram-Schmidt algorithm for the QR
factorization can be interpreted as Householder's method applied to the matrix A
augmented with a square matrix of zero elements on top. This is not only true in
theory, but in the presence of rounding errors as well. This observation is originally
due to Charles Sheeld, and was communicated to the authors by Gene Golub. Be-
cause it is such an important but unexpected result, we will discuss this relationship
in some detail. First we look at the theoretical result.
Let A 2 Rmn have rank n, and let On 2 Rnn be a zero matrix. Consider
the two QR factorizations (here we use Q for m m and P for (m + n) (m + n)
orthogonal matrices),
R R
A = Q 0 = ( Q1 Q2 ) 0 ;
~ ~
(2.5) A~ OAn = P R0 = PP11 PP12 R0 :
21 22
4 AKE BJO RCK AND CHRIS PAIGE
Since A has rank n, P11 is zero, P21 is an m n matrix of orthonormal columns, and
A = Q1R = P21R. ~ If upper triangular R and R~ are both chosen to have positive
diagonal elements in AT A = RT R = R~ T R,~ then R = R~ by uniqueness, so P21 = Q1
can be found from any QR factorization of the augmented matrix. The last m columns
of P are then arbitrary up to an m m orthogonal multiplier. The important result
is that the Householder QR factorization of the augmented matrix is numerically
equivalent to MGS applied to A.
To see this, remember that with ek the kth column of the unit matrix, the House-
holder transformation P a = e1 uses P = I 2vvT =kvk22 , v = a e1 , = kak2. If
(2.5) is obtained using Householder transformations, then
(2.6) P T = Pn : : :P2 P1; Pk = I 2^vk v^kT =kv^k k22; k = 1; : : :; n;
where the vectors v^k are described below. Now from MGS applied to A(1) = A,
11 = ka(1) (1) 0
1 k2 and a1 = q1 = q111 , so for the rst Householder transformation
applied to the augmented matrix
O 0 ;
A~(1) n
A(1) ; a~(1)
1 = a(1)
e 1 e
v^1(1) 1 11 = 11v1 ;
0q1 v1 = 1
q1 ;
(since there can be no cancellation we take kk 0). But kv1k22 = 2, giving
P1 = I 2^v1v^1T =kv^1k22 = I 2v1 v1T =kv1 k22 = I v1v1T ;
and
e
P1a~(1) (1) v vT ~a(1) = 0 1 qT a(1) = e1 1j ;
j = a~j 1 1 j a(1) q1 1 j a(2)
j j
so
0 11 12 1n 1
BB 0. 0 0 C
.. .. .. C
P1A = B
~(1)
B@ .. . . . C CA ;
0 0 (2)
0
0 a(2)
2 an
where these values are clearly numerically the same as in the rst step of MGS on
A. We see the next Householder transformation produces the second row of R and
a(3) (3)
3 ; : : :; an , just as in MGS. Carrying on this way we see this Householder QR is
numerically equivalent to MGS applied to A, and that every Pk is eectively dened
by Q1, since
e
(2.7) Pk = I vk vkT ; vk = k
qk ; k = 1; : : :; n:
P gives us a key to understanding the numerical behavior of MGS. First note
that in theory viT vj = eTi ej + qiT qj = 0 if i 6= j, so PiPj = I vi viT vj vjT , and
P T = Pn P1 = I v1 v1T v2 v2T vn vnT is symmetric, so using Householder
LOSS AND RECAPTURE OF ORTHOGONALITY IN MODIFIED GRAM-SCHMIDT 5
transformations in (2.5),
P11 = 0;
P12T = P21 = q1 eT1 + + qneTn = Q1;
P22 = I q1q1T qnqnT = I Q1QT1 = Q2 QT2 :
This shows such special orthogonal matrices are fully dened by their (1; 2) blocks,
O Q T
(2.8) n
P = Q1 I Q1 QT : 1
1
3. Accuracy of R from Modied Gram-Schmidt. A rounding error analysis
of MGS was given in [3]. There it was shown that the computed Q 1 and R satisfy
A + E = Q 1 R;
kE k2 c1 ukAk2;
(3.1) kI Q T1 Q 1 k2 c2 u;
where ci are constants depending on m; n and the details of the arithmetic, and u
is the unit roundo. Hence Q 1R accurately represents A and the departure from
orthogonality can be bounded in terms of the condition number = 1=n.
From the numerical equivalence shown in the previous section it follows that
the backward error analysis for the Householder QR factorization of the augmented
matrix in (2.5) can also be applied to the modied Gram-Schmidt algorithm on A.
Here we will do this, and in this section and Section 5 we will rederive (3.1) as well
as give some new results. This is a simple and unied approach, in that the one
analysis of orthogonal transformations can be used to analyse the QR factorization
via both Householder transformations and MGS. It also deepens our understanding
of the MGS algorithm and its possible uses.
Let Q 1 = (q1; : : :; qn) be the matrix of vectors computed by MGS, and for k =
1; : : :; n dene
e
(3.2) vk = q k ; Pk = I vk vkT ; P = P1P2 : : : Pn;
k
e
q~k = qk =kqk k2; v~k = k ~ T ~ ~ ~ ~
q~k ; Pk = I v~k v~k ; P = P1P2 : : : Pn:
Then Pk is the computed version of theHouseholder matrix applied in the kth step
of the Householder QR factorization of OAn , and P~k is its orthonormal equivalent,
so that P~kT P~k = I. Wilkinson [11, pp. 153{162] has given a general error analysis of
orthogonal transformations of this type. From this it follows that for R computed by
MGS, the equivalent of (2.5) is
E R
1 ~ ~ 0
A + E2 = P 0 ; P = P + E ;
(3.3) kEik2 ciukAk2; i = 1; 2; kE 0k2 c3 u;
where again ci are constants depending on m; n and the details of the arithmetic.
To show this R from MGS, or the Householder QR factorization of the augmented
matrix, is numerically about as good as that from the ordinary Householder QR
factorization of A, we use the following general result.
6 AKE BJO RCK AND CHRIS PAIGE
The bound (5.3) is of similar form to the bound (3.1) given in [3], but here we also
derived the relation of Q~ 1 to the orthonormal matrix P~ , and described the relation
between the loss of orthogonality in Q~ 1 and the deviation of P~ from the ideal form
of P. We also note here that if the rst k columns of A in (3.3) have a small , then
the rst k columns of P~11 will be small, and the rst k columns of Q~ 1 will be nearly
orthonormal.
Our main purpose is not to show how Q~ 1 or Q 1 may be improved. Instead the
key point of this work is that although the computed P is very close to the exactly
orthogonal P~ in (3.3), the columns of Q 1 need not be particularly orthonormal. Our
thesis here is that as a result of this, it is usually inadvisable to use Q 1 as our set of
orthonormal vectors, but we can use P (as the theoretical product of the computed
Pk = I vk vkT , which is extremely close to P),
~ to make use of the desired orthogonality,
since we have all the necessary information in Q 1, that is vkT = ( eTk ; qkT ). Thus
we can solve problems as accurately using MGS as using Householder or Givens QR
factorizations if, instead of using the computed Q 1 directly, we formulate the problems
in terms of (2.5), see (3.3), and use the qk to dene P. Ofcourse in most cases no
block of P need actually be formed. We illustrate an important use of this idea in
the next section, and discuss the eciency of such an approach in Section 7.
6. Backward Stable Solution of the ASF using MGS. Bjorck [2] showed
(1.5) is backward stable for the ASF (1.2) using the Householder QR factorization,
but the same is not true when we use (1.6) with R and Q 1 computed by MGS, see
[5]. Here we use our new knowledge of MGS to produce a backward stable algorithm
for the ASF based on Q 1 and R from MGS. This new approach can be used to design
good algorithms using MGS in general.
Our original ASF (1.2) is equivalent to the augmented system
0 I 0 0 10w 1 0 01
(6.1) @0 I AA@ x A = @bA;
T
0 A 0 y c
so applying Householder transformations as in (2.5) gives the augmented version of
the method (1.5) is
d w z
(6.2) z = R T c; T 0 1
h =P b ; x = P h ; y = R (d z):
But as we saw in Section 2 we can use the qk from MGS to produce Pk = I vk vkT ,
vkT = ( eTk ; qkT ), and use P T = Pn : : :P2 P1 in (6.2). We show in [5] that this algorithm
is strongly stable (see [6]) for (6.1), and also strongly stable for (1.2).
We now show how to take advantage of the structure of the Pk , then we will
summarize this numerically stable use of MGS for the ASF. To compute d and h in
(6.2) note that P T = Pn : : : P1, and dene
d(1) 0 d(k+1) 0 d(k)
h(1) = b ; h(k+1) = Pk : : :P1 b = Pk h(k) :
Now using induction we see d(k) has all but its rst k 1 elements zero, and
d(k+1) d(k) e
= k ( eT qT ) d(k) = d(k) + ek (qkT h(k) ) ;
hk+1) h(k) qk k k h(k) h(k) qk (qkT h(k))
12 AKE BJO RCK AND CHRIS PAIGE
For (1.3), step 5 can be omitted if the vector of Lagrange multipliers y is not needed,
while for (1.4), step 4 can be omitted if the residual x is not needed.
If b = 0, corresponding to LUS, then d = 0 and step 3 will be omitted, and step 5
too if the Lagrange multipliers are not needed. If c = 0, corresponding to LLS, then
z = 0 and step 2 will be omitted, and step 4 too if the LLS residual x is not needed.
Then the algorithm is equivalent to the following variant of MGS:
R d
(6.3) (A; b) = (Q1 ; h) 0 1 ; y = R 1d;
where d is computed as part of MGS. This is the approach recommended for LLS in [3].
The work here is another way of proving the backward stability of this approach, and
adds insight into why it works. For LUS however, the numerically stable algorithm
made of steps 1,2 and 4 constitutes a new algorithm which is superior to the usual
approach that omits the !k in step 4.
If A is square and nonsingular, (1.3) becomes the solution of AT x = c, and x is
independent of b, so if y is not wanted then b can be taken as zero in the algorithm,
and steps 3 and 5 dropped. Similarly if A is square and nonsingular and c = 0 then
(1.4) becomes Ay = b, and steps 2 and 4 can be dropped. This gives two dierent
backward stable algorithms for solving nonsingular systems using MGS. Note that the
rst algorithm applies MGS to the rows of the matrix (here AT ), and is numerically
invariant under row scalings. The second algorithm applies MGS to the columns of
A, and is invariant under column scalings. Hence the rst algorithm is to be preferred
if the matrix is badly row scaled, the second if A is badly column scaled.
A square root free version of Algorithm 6.1 is obtained if we instead use the
factorization (2.4) A = Q01R0 , where R0 is unit upper triangular:
Algorithm 6.2.
1. Carry out MGS on A to give Q01 = (q10 ; : : :; qn0 ), R0, and D = diag(
1 ; : : :;
n ),
where
i = kqi0 k22.
2. Solve (R0 )T Dz 0 = c for z 0 = (10 ; : : :; n0 )T .
3. for k = 1; : : :; n dofk0 := (qk0 )T b=
k ; b := b qk0 k0 g;
4. for k = n; : : :; 1 dof!k0 := (qk0 )T b=
k ; b := b qk0 (!k0 k0 )g; x := b;
5. Solve R0 y = d0 z 0 for y, where d0 = (10 ; : : :; n0 )T .
This section has not only shown how MGS can be used in a numerically stable
way to solve the very useful linear system (1.2), along with its many specializations,
but it has hopefully shown how MGS can be used more eectively in general.
7. Comparison of MGS and Householder Factorizations. There are four
main approaches we need to compare:
1. MGS on A producing computed R and Q 1, and using these.
2. MGS on A producing computed R and Q 1, and using R and P1 ; : : :; Pn.
O
3. Householder transformations on A producing R and P1 ; : : :; Pn and using
n
these.
4. Householder transformations on A producing R^ and P^1; : : :; P^n say, and using
these.
14 AKE BJO RCK AND CHRIS PAIGE
We call these approaches rather than algorithms, since each includes a reduction
algorithm, plus a choice of tools to use in problems that use the reduction. We only
consider the case of a single processor computer, and a dense matrix A.
Approaches 2 and 3 are numerically equivalent, but it is clearly more ecient for
computer storage to use approach 2 via (2.1) and (2.2) than to use 3, even though we
may think in terms of 3 to design algorithms which use the P1; : : :; Pn, these of course
being \stored" as q1; : : :; qn. Thus we would use the new approach 2 rather than 3
computationally, while being aware of both their properties theoretically.
The most usual case is where we wish to use the orthogonality computationally,
but cannot rely on (A) being small. Then the choice is between 2 and 4. For
the initial QR factorization MGS requires mn2
ops compared to mn2 n3 =3 for
Householder. MGS also needs n(n 1)=2 more storage locations. Hence approach
4 has an advantage with respect to both storage and operation count for the initial
factorization, although this is small when m n.
If accurately orthogonal Q or Q1 in (1.1) is required as an entity in itself, then
since such orthogonal matrices are not immediately produced by 2 when (A) is
large, the obvious choice is 4, where Q (or Q1 ) is available as the product (or part
of it) of the P^k . To produce Q1 doubles the cost using 4. To produce an accurately
orthogonal Q1 with MGS in general, we apparently need to reorthogonalize. This also
approximately doubles the factorization cost, and again the operation count is higher
than for Householder.
For both approaches 2 and 4 we have shown backward stability in the usual
normwise sense. In agreement with this both these approaches tend to give similar
accuracy, although experience shows that MGS has a small edge here, in particular if
the square root free version is used.
If the matrix A is not well row scaled then row interchanges may be needed in 4
to give accurate solutions for problem LLS, see [10]. In this context it is interesting
to note that MGS is numerically invariant under row permutations of A as long as
inner products are unaltered by the order of accumulation of terms. That is, if Q 1
and R are the computed factors for A then Q 1 and R are the computed factors of
A. This shows that 2 is more stable than 4 without row interchanges. However,
if row interchanges are included in 4, this approach is more accurate for problems
where the row norms of A vary widely. In approach 2 a second order error term
O((wu)2 ) appears, where w is the maximum ratio of row norms. This error term
can be eliminated by reorthogonalization, which however increases the cost of MGS.
We nally mention that sometimes R is used alone to solve our problems, and
then approaches 1 and 2 are identical. We will discuss this case in [5].
REFERENCES
[1] M. Arioli and A. Laratta, Error analysis of algorithms for computing the projection of a
point onto a linear manifold, Linear Algebra Appl., 82 (1986), pp. 1{26.
A. Bjo rck, Iterative renement of linear least squares solutions I, BIT, 7 (1967), pp. 257{278.
[2]
[3] , Solving linear least squares problems by Gram-Schmidt orthogonalization, BIT, 7 (1967),
pp. 1{21.
[4] , Methods for sparse least squares problems, in Sparse Matrix Computations, J. Bunch
and D. J. Rose, eds., Academic Press, New York, 1976, pp. 177{199.
A. Bjo rck and C. Paige, Solution of augmented linear systems using orthogonal factoriza-
[5]
tions, BIT, 34 (1994), pp. 1{24.
[6] J. R. Bunch, The weak and strong stability of algorithms in numerical linear algebra, Linear
Algebra Appl., 88/89 (1987), pp. 49{66.
LOSS AND RECAPTURE OF ORTHOGONALITY IN MODIFIED GRAM-SCHMIDT 15
[7] G. H. Golub and C. F. V. Loan, Matrix Computations, The Johns Hopkins University Press,
Baltimore, Maryland, 2nd ed., 1989.
[8] H. Y. Huang, A direct method for the general solution of a system of linear equations, J.
Optim. Theory Appl., 16 (1975), pp. 429{445.
[9] J. W. Longley, Least Squares Computations Using Orthogonalization Methods, Marcel
Dekker, Inc., New York, 1984.
[10] M. J. D. Powell and J. K. Reid, On applying Householder's method to linear least squares
problems, in Proceedings IFIP Congress, 1968, pp. 122{126.
[11] J. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1965.