Benesty-Gänsler2004 Article NewInsightsIntoTheRLSAlgorithm
Benesty-Gänsler2004 Article NewInsightsIntoTheRLSAlgorithm
Jacob Benesty
INRS-EMT, Université du Québec, 800 de la Gauchetière Ouest, Suite 6900, Montréal, Québec, Canada H5A 1K6
Email: [email protected]
Tomas Gänsler
Agere Systems Inc., 1110 American Parkway NE, Allentown, PA 18109-3229, USA
Email: [email protected]
Received 21 July 2003; Revised 9 October 2003; Recommended for Publication by Hideaki Sakai
The recursive least squares (RLS) algorithm is one of the most popular adaptive algorithms that can be found in the literature, due
to the fact that it is easily and exactly derived from the normal equations. In this paper, we give another interpretation of the RLS
algorithm and show the importance of linear interpolation error energies in the RLS structure. We also give a very efficient way
to recursively estimate the condition number of the input signal covariance matrix thanks to fast versions of the RLS algorithm.
Finally, we quantify the misalignment of the RLS algorithm with respect to the condition number.
Keywords and phrases: adaptive algorithms, normal equations, RLS, fast RLS, condition number, linear interpolation.
is the true (subscript t) impulse response of the system, the 3. AN RLS ALGORITHM BASED ON THE
superscript T denotes the transpose of a vector or a matrix, INTERPOLATION ERRORS
T In this section, we show another way to write the RLS algo-
x(n) = x(n) x(n − 1) · · · x(n − L + 1) (4) rithm. This new formulation, based on linear interpolation,
gives a better insight of the adaptive algorithm structure.
is a vector containing the last L samples of the input signal x, We would like to minimize the criterion [6, 7]:
and w is a white Gaussian noise (uncorrelated with x) with 2
n
L
−1
variance σw2 . In (1), n−m
Jint,i (n) = λ − cil (n)x(m − l)
m=0 l =0
ŷ(n) = hT (n − 1)x(n) (5) n
2 (14)
= λn−m
− cTi (n)x(m)
is the model filter output and m=0
= cTi (n)R(n)ci (n),
T
h(n − 1) = h0 (n − 1) h1 (n − 1) · · · hL−1 (n − 1) (6) with the constraint
Furthermore, since R−1 (n) is a symmetric matrix, (21) can Now, if we premultiply both sides of (27) by uTi , we can easily
be written as find that
1 Ei (n) = λEi (n − 1) + ei (n)εi (n). (28)
E (n) 0 ··· 0
0
1
0 ··· 0 This means that the interpolation error energy can be com-
E1 (n)
R−1 (n) =
puted recursively. This relation is well known for the forward
.. .. .. ..
. . . . (i = 0) and backward (i = L) predictors [1]. It is used to
1 obtain fast versions of the RLS algorithm.
0 0 ···
EL−1 (n) Also, the interpolator vectors can be computed recur-
sively:
1 −c01 (n)
· · · −c0(L−1) (n)
−c (n) 1 · · · −c1(L−1) (n)
10 1
×
. . .. ..
ci (n) = ci (n − 1) + k(n)ei (n) . (29)
.
. .
. . . 1 − ki (n)ei (n)
−c(L−1)0 (n) −c(L−1)1 (n) · · · 1
If we premultiply both sides of (29) by −xT (n), we obtain a
−1
= De (n)C(n). relation between the a priori and a posteriori interpolation
(22) error signals:
The first and last columns of R−1 (n) contain, respectively, εi (n) ϕ(n)
= . (30)
the normalized forward and backward predictors and all the ei (n) 1 − ki (n)ei (n)
columns between contain the normalized interpolators.
We define, respectively, the a priori and a posteriori in- We now give another interpretation of the RLS algorithm:
terpolation error signals as εl (n)e(n)
hl (n) = hl (n − 1) +
ei (n) = −cTi (n − 1)x(n), εi (n) = −cTi (n)x(n). (23) El (n)
el (n)e(n)
Using expression (22), we now have an interesting inter- = hl (n − 1) + ϕ(n) , l = 0, 1, . . . , L − 1.
λEl (n − 1)
pretation of the a priori and a posteriori Kalman gain vec- (31)
tors:
In Sections 4 and 5, we will show how the linear interpo-
k (n) lation error energies appear naturally in the condition num-
= R−1 (n − 1)x(n) ber formulation.
e0 (n) e1 (n) eL−1 (n) T
= ··· ,
E0 (n − 1) E1 (n − 1) EL−1 (n − 1) 4. CONDITION NUMBER OF THE INPUT SIGNAL
(24) COVARIANCE MATRIX
k(n)
= R−1 (n)x(n) Usually, the condition number is computed by using the 2-
norm matrix. In the context of RLS equations, it is more con-
ε0 (n) ε1 (n) εL−1 (n) T
= ··· . venient to use a different norm as explained below.
E0 (n) E1 (n) EL−1 (n) The covariance matrix R(n) is symmetric and positive
The ith component of the a priori (resp., a posteriori) definite. It can be diagonalized as follows:
Kalman gain vector is the ith a priori (resp., a posteriori) in-
QT (n)R(n)Q(n) = Λ(n), (32)
terpolation error signal normalized with the ith interpolation
error energy at time n − 1 (resp., n). where
Writing (18) at time n and n − 1, we obtain
QT (n)Q(n) = Q(n)QT (n) = I,
R(n)ci (n) λR(n − 1)ci (n − 1) (33)
− = ui = − . (25) Λ(n) = diag λ0 (n), λ1 (n), . . . , λL−1 (n) ,
Ei (n) λEi (n − 1)
Replacing λR(n − 1) in (25) by and 0 < λ0 (n) ≤ λ1 (n) ≤ · · · ≤ λL−1 (n). By definition, the
square root of R(n) is
λR(n − 1) = R(n) − x(n)xT (n), (26)
R1/2 (n) = Q(n)Λ1/2 (n)QT (n). (34)
we get
The condition number of a matrix R(n) is [8]
Ei (n)
ci (n) = ci (n − 1) + k(n)ei (n) . (27) χ R(n) = R(n)R−1 (n),
λEi (n − 1) (35)
334 EURASIP Journal on Applied Signal Processing
where · can be any matrix norm. Note that χ[R(n)] de- Also, since tr[R(n)] ≤ LλL−1 (n) and tr[R−1 (n)] ≤ L/λ0 (n),
pends on the underlying norm and the subscripts will be we obtain
used to distinguish the different condition numbers. Usually,
we take the convention that χ[R(n)] = ∞ for a singular ma- −1
tr R(n) λL−1 (n)
tr R(n) tr R (n) ≤ L ≤ L2 . (43)
trix R(n). λ0 (n) λ0 (n)
Consider the following norm:
Therefore, we deduce that
1/2
1 T
R(n) = tr R (n)R(n) . (36) 1
E
L χ2 R(n) ≤ χE2 R1/2 (n) ≤ χ2 R(n) . (44)
L2
We can easily check that, indeed, ·E is a matrix norm since
for any real matrices A and B and a real scalar γ, the following According to the previous expression, χE2 [R1/2 (n)] is then
three conditions are satisfied: a measure of the condition number of the matrix R(n).
In Section 5, we will show how to recursively compute
(i) AE ≥ 0 and AE = 0 if and only if A = 0L×L , χE2 [R1/2 (n)].
(ii) A + BE ≤ AE + BE ,
(iii) γAE = |γ|AE .
5. RECURSIVE COMPUTATION OF
Also, the E-norm of the identity matrix is equal to one. THE CONDITION NUMBER
We have
The positive number R1/2 (n)2E can be easily calculated re-
1/2 cursively. Indeed, taking the trace of
1/2 1 L
−1
1/2 1
R (n) = tr R(n) = λl (n) ,
E
L L
l=0 R(n) = λR(n − 1) + x(n)xT (n), (45)
1/2
1 L 1
−1
−1/2 1 −1 1/2
R (n)E = tr R (n) = . we get
L L λ (n)
l=0 l
(37) tr R(n) = λ tr R(n − 1) + xT (n)x(n). (46)
Hence, the condition number of R1/2 (n) associated with ·E Therefore,
is
1/2 2 T
R (n) = λR1/2 (n − 1)2 + x (n)x(n) . (47)
χE R 1/2
(n) = R1/2 (n)E R−1/2 (n)E ≥ 1. (38) E E
L
If χ[R(n)] is large, then R(n) is said to be an ill- Note that the inner product xT (n)x(n) can also be computed
conditioned matrix. Note that this is a norm-dependent pro- in a recursive way with two multiplications only at each iter-
perty. However, according to [8], any two condition numbers ation.
χα [R(n)] and χβ [R(n)] are equivalent in that constants c1 and Now we need to determine R−1/2 (n)2E . Thanks to (22),
c2 can be found for which we find that
L
−1
c1 χα R(n) ≤ χβ R(n) ≤ c2 χα R(n) . (39) 1
tr R−1 (n) = . (48)
l=0
El (n)
For example, for the 1- and 2-norm matrices, we can show
[8] that
Using (24), we have
1 1
χ2 R(n) ≤ χ1 R(n) ≤ χ2 R(n) . (40) L
−1
L2 L el (n)εl (n)
kT (n)k (n) = , (49)
We now show the same principle for the E- and 2-norm l=0
El (n)El (n − 1)
matrices. We recall that and replacing in the previous expression:
λ (n)
χ2 R(n) = L−1 . (41) El (n) − λEl (n − 1) = el (n)εl (n), (50)
λ0 (n)
Since tr[R−1 (n)] ≥ 1/λ0 (n) and tr[R(n)] ≥ λL−1 (n), we we obtain
have
L
−1 1 L−1
λL−1 (n) 1
tr R(n) tr R (n) ≥
tr R(n)
−1
≥ . kT (n)k (n) = −λ . (51)
λ0 (n) λ0 (n)
(42)
l =0
El (n − 1) E (n)
l =0 l
New Insights into the RLS Algorithm 335
Thus,
Initialization.
L
−1
1 h(0) = k (0) = a(0) = b(0) = 0,
tr R−1 (n) =
l=0
El (n) α(0) = λ,
(52)
L
−1 Ea (0) = E0 , (positive constant),
1
= λ−1 − kT (n)k (n). L
El (n − 1) 1/2 2
R (0) = E0
−1
l=0 λ−l ,
E
L l=0
Finally, L−1
−1/2 2
R 1 l
−1/2 2 (0)E = λ.
R (n)E LE0 l=0
2 λ−1 ϕ(n)kT (n)k (n) Prediction.
= λ−1 R−1/2 (n − 1)E −
L ea (n) = x(n) − aT (n − 1)x(n − 1),
L
−1
α1 (n) = α(n − 1) + ea2 (n)/Ea (n − 1),
2 λ−1 ϕ(n) el2 (n)
= λ−1 R−1/2 (n − 1)E − .
L l=0
El (n − 1)
2
t(n) 0 1
= + e (n)/Ea (n − 1),
(53) m(n) k (n − 1) −a(n − 1) a
Ea (n) = λ Ea (n − 1) + ea2 (n)/α(n − 1) ,
By using (47) and (53), we see that we easily compute
χE2 [R1/2 (n)] recursively with only an order of L multiplica- a(n) = a(n − 1) + k (n − 1)ea (n)/α(n − 1),
tions per iteration given that k (n) is known. eb (n) = x(n − L) − bT (n − 1)x(n),
Note that we could have used the inverse of R(n),
k (n) = t(n) + b(n − 1)m(n),
R−1 (n) = λ−1 R−1 (n − 1) − λ−2 ϕ(n)k (n)kT (n), (54) α(n) = α1 (n) − eb (n)m(n),
b(n) = b(n − 1) + k (n)eb (n)/α(n).
to estimate R−1/2 (n)2E , but we have chosen here to use
Filtering.
the interpolation formulation to better understand the link
among all variables in the RLS algorithm, and especially to e(n) = y(n) − hT (n − 1)x(n),
emphasize the role of the interpolation error energies since
h(n) = h(n − 1) + k (n)e(n)/α(n).
tr[R−1 (n)] = Ll=−01 1/El (n), even though there are indirect
ways to compute this value. Clearly, everything can be writ- Condition Number.
ten in terms of El (n) and this formulation is more natural 1/2 2 T
for the condition number estimation. For example, in the ex- R (n) = λR1/2 (n − 1)2 + x (n)x(n) ,
E E
L
treme cases of an input signal close to a white noise or to a
−1/2 2 2 kT (n)k (n)
predictable process, the value maxl [El (n)]/ minl [El (n)] gives R (n)E = λ−1 R−1/2 (n − 1)E − ,
a good idea of the condition number of the corresponding Lα(n)
2 2
signal covariance matrix. χE2 R 1/2
(n) = R1/2 (n)E R−1/2 (n)E .
It is easy to combine the estimation of the condition
number with an FRLS algorithm. There exist several meth-
ods to compute the a priori Kalman gain vector k (n) in a Algorithm 1: The FRLS algorithm and estimation of the condition
very efficient way. Once this gain vector is determined, the es- number.
timation of χE2 [R1/2 (n)] at each iteration follows immediately
with roughly L more multiplications. Algorithm 1 shows the
combination of an FRLS algorithm with the condition num-
ber estimation of the input signal covariance matrix. It can easily be shown, under certain conditions, that [9]
2 1
6. MISALIGNMENT AND CONDITION NUMBER E ht − h(n)2 ≈ σw2 tr R−1 (n) . (56)
2
We define the normalized misalignment in dB as follows:
Hence, we can write (56) in terms of the interpolation error
energies:
ht − h(n)2
m0 (n) = 10 log10 E 2
2
, (55)
ht
2 2 1 L−1
1
E ht − h(n)2 ≈ σw2 . (57)
where · 2 denotes the 2-norm vector. Equation (55) mea- 2 l=0 El (n)
sures the mismatch between the true impulse response and
the modelling filter. However, we are more interested here to write (56) in terms
336 EURASIP Journal on Applied Signal Processing
of the condition number. Indeed, we have Usually, we take for the exponential window
1/2 2
R (n) = 1 tr R(n) , λ=1−
1
, (64)
E
L K0 L
L−1 (58)
−1/2 2
R 1 1 where K0 ≥ 3. Also, the second term in (63) represents
(n)E = .
L l=0
El (n) roughly the inverse output SNR in dB. We can then rewrite
(63) as follows:
But
m0 (n) ≈ −10 log10 2K0 − oSNR + 10 log10 χE2 R1/2 (n) .
n
(65)
tr R(n) = tr λ n−m
x(m)x (m)
T
m=0 For example, if we take K0 = 5 and an output SNR (oSNR)
n of 39 dB, we obtain
(59)
= λn−m xT (m)x(m)
m=0 m0 (n) ≈ −49 + 10 log10 χE2 R1/2 (n) . (66)
L 2
≈ σ , If the input signal is a white noise, χE2 [R1/2 (n)] = 1, then
1−λ x
m0 (n) ≈ −49 dB. This will be confirmed in the following sec-
for n large and for a stationary signal x with power σx2 . The tion.
condition number is then
L
−1 7. SIMULATIONS
σx2 1
χE2 1/2
R (n) ≈ , (60) In this section, we present some results on the condition
(1 − λ)L l=0 El (n)
number estimation and how this number affects the mis-
alignment in a system identification context. We try to es-
and expression (57) becomes timate an impulse response ht of length L = 512. The same
length is used for the adaptive filter h(n). We run the FRLS al-
2 (1 − λ)L σw2 2 1/2
E ht − h(n)2 ≈ χ R (n) . (61) gorithm with a forgetting factor λ = 1 − 1/(5L). Performance
2 σx2 E of the estimation is measured by means of the normalized
misalignment (55). The input signal x(n) is a speech signal
If we divide both sides of (61) by ht 22 , we get sampled at 8 kHz. The output signal y(n) is obtained by con-
volving ht with x(n) and adding a white Gaussian noise sig-
nal with an SNR of 39 dB. In order to evaluate the condi-
ht − h(n)2 (1 − λ)L σw2
E 2
2
≈ 2 χE R1/2 (n) .
2
(62) tion number in different situations, a white Gaussian signal is
ht 2 ht σ 2 added to the input x(n) with different SNRs. The range of the
2 x 2
input SNR is −10 dB to 50 dB. Therefore, with an input SNR
Finally, we have a formula for the normalized misalign- equal to −10 dB (the white noise dominates the speech), we
ment in dB (which is valid only after convergence of the RLS can expect the condition number of the input signal covari-
algorithm): ance matrix to be close to 1, while with an input SNR of 50 dB
(the speech largely dominates the white noise), the condition
(1 − λ)L σ2 number will be high. Figures 1, 2, 3, 4, 5, 6, and 7 show the
m0 (n) ≈ 10 log10 + 10 log10 w2 evolution in time of the input signal, the normalized mis-
2 ht σ 2
2 x (63) alignment (we approximate the normalized misalignment
+ 10 log10 χE2 R (n) .
1/2 with its instantaneous value), and the condition number of
the input signal covariance matrix with different input SNRs
Expression (63) depends on three terms or three factors: the (from −10 dB to 50 dB). We can see that as the input SNR in-
exponential window, the level of noise at the system output, creases, the condition number degrades as expected since the
and the condition number. The closer the exponential win- speech signal is ill-conditioned. As a result, the normalized
dow is to one, the better the misalignment is, but the tracking misalignment is greatly affected by a large value of the con-
abilities of the RLS algorithm will suffer a lot. A high level of dition number. As expected, the value of the misalignment
noise as well as an input signal with a large condition num- after convergence in Figure 1 is equal to −49 dB and the con-
ber will obviously degrade the misalignment. With a fixed dition number is almost one. Now compare this to Figure 3.
exponential window and noise, it is interesting to see how In Figure 3, the misalignment is equal to −40 dB and the av-
the misalignment will degrade by increasing the condition erage condition number is 8.2. The higher condition num-
number of the input signal. For example, by increasing the ber in this case degrades the misalignment by 9 dB, which is
condition number from 1 to 10, the misalignment will de- exactly the degradation predicted by formula (63). We can
grade by 10 dB; the simulations confirm this. verify the same trend with the other simulations.
New Insights into the RLS Algorithm 337
×104 ×104
2 1
Input signal
Input signal
1 0.5
0 0
−1 −0.5
−2 −1
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (s) Time (s)
(a) (a)
Misalignment (dB)
Misalignment (dB)
0 0
−10 −10
−20 −20
−30 −30
−40 −40
−50 −50
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (s) Time (s)
(b) (b)
Condition number
Condition number
4 20
3 15
2 10
1 5
0 0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (s) Time (s)
(c) (c)
Figure 1: Evolution in time of the (a) input signal, (b) normalized Figure 3: The presentation is the same as in Figure 1. The input
misalignment, and (c) condition number of the input signal covari- SNR is 10 dB.
ance matrix. The input SNR is −10 dB.
×104 ×104
1 1
Input signal
Input signal
0.5 0.5
0 0
−0.5 −0.5
−1 −1
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (s) Time (s)
(a) (a)
Misalignment (dB)
Misalignment (dB)
0 0
−10 −10
−20 −20
−30 −30
−40 −40
−50 −50
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (s) Time (s)
(b) (b)
Condition number
Condition number
4 80
3 60
2 40
1 20
0 0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (s) Time (s)
(c) (c)
Figure 2: The presentation is the same as in Figure 1. The input Figure 4: The presentation is the same as in Figure 1. The input
SNR is 0 dB. SNR is 20 dB.
338 EURASIP Journal on Applied Signal Processing
×104 ×104
1 1
Input signal
Input signal
0.5 0.5
0 0
−0.5 −0.5
−1 −1
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (s)
Time (s)
(a)
(a)
Misalignment (dB)
Misalignment (dB)
0 0
−10 −10
−20 −20
−30 −30
−40 −40
−50 −50
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (s) Time (s)
(b) (b)
Condition number
Condition number
200 3000
150 2000
100
50 1000
0 0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (s) Time (s)
(c) (c)
Figure 5: The presentation is the same as in Figure 1. The input Figure 7: The presentation is the same as in Figure 1. The input
SNR is 30 dB. SNR is 50 dB.
×104 8. CONCLUSIONS
1
Input signal
0.5
The RLS algorithm plays a major role in adaptive signal pro-
0
cessing. A very good understanding of its different variables
−0.5
may lead to new concepts and new algorithms. In this paper,
−1
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 we have shown that the update equation of the RLS can be
Time (s) written in terms of the a priori or a posteriori interpolation
error signals normalized with their respective interpolation
(a)
error energies. Hence, the interpolation error energy formu-
lation can be further exploited. This formulation has moti-
Misalignment (dB)
0
−10 vated us to propose a simple and an efficient way to estimate
−20 the condition number of the input signal covariance matrix.
−30 We have shown that this condition number can be easily inte-
−40
−50 grated in the FRLS structure at a very low cost from an arith-
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 metic complexity point of view. Finally, we have shown how
Time (s) the misalignment of the RLS depends on the condition num-
(b) ber. A formula was derived, predicting how the misalignment
degrades when the condition number increases. The accu-
Condition number