Adaptive Filters with Applications
Md. Kamrul Hasan, PhD
Dept. of Electrical and Electronic Engineering
BUET
What is an adaptive filter?
An adaptive filter is a system with a linear filter that has
a transfer function controlled by variable parameters and a
means to adjust those parameters according to an optimization
algorithm.
An adaptive filter consists of TWO distinct parts:
A digital filter with adjustable coefficients
FIR filter structure
IIR filter structure
An adaptive algorithm
RLS algorithm
Adaptive Filter Theory
LMS type algorithms
(Chapter 13)
Kalman Filter
-Simon Haykin
Tasks of Information Processing
Filtering: the extraction of information about some quantity of interest at the current
time t by using data measured up to and including the time t
Smoothing: involves a delay of the output because it uses information extracted
both after and before the current time t to extract the information.
Prediction: involves forecasting information some time into the future given the
current and past data at time t and before
Deconvolution: Undo the effect of a filter/Channel
u (n) y (n)
Unknown
System + e(n)
yˆ (n)
Digital Filter
adaptive filter FIR/IIR
Adaptive
Algorithm
a filtering process, which produces an output signal in response to a given input signal.
• an adaptation process, which aims to adjust the filter parameters (filter transfer
function) to the (possibly time-varying) environment
Signals to be
Observed:
Input x(n)
Output y(n)
Adaptive
Desired signal d(n) Filter
Error signal e(n)
Performance
Measure
Fixed Filter
signal + noise
noise
Problem statement: Show that minimizing the total power at the output of the canceller
maximizes the output SNR
Noisy signal:
Estimate of the desired signal:
+
Squaring, we get
+
Taking expectation on both sides
+
s(n) is uncorrelated with both
estimated signal clean signal power remnant noise power
Power total which may still be in (n)
output power
- By adjusting the filter towards the optimum position, the remnant
noise power and hence the total output power are minimized
- The desired signal power is unaffected by this adjustment since s(n)
is uncorrelated with v(n)
min +
- Net effect of minimizing the output power is to maximize the
output SNR = 10log[ / ]
- When the filter setting is such that then
and the output of the noise canceller is NOISE-FREE
Solution Technique
noise
Tap-weight vector Performance function
w [ w0 w1 wM 1 ]T
J E e (n)
2
Signal Input
x(n) [ x(n) x(n 1) x(n M 1)]T J E d 2 (n) 2w T p wT Rw
Filter output where
y ( n) w x( n)
T
R E x(n)xT (n)
Error signal Autocorrelation matrix of filter input
e( n ) d ( n ) y ( n ) p E x ( n) d ( n)
d ( n ) w T ( n ) x( n ) Cross-correlation vector between
x(n) and d(n)
, p E x( n) d ( n)
,
The performance function J is a quadratic function of the filter
tap-weight vector w.
J has a single global minimum obtained by solving the Wiener-
Hopf equation.
Set J 0
1
Rw p*
w R p
* If the input is white noise
(e.g., SI case)
J E e (n)
2
- Can be used for system identification
- Can be used for signal enhancement
- Or Filtering
Adaptive Digital FIR Filters Using
the Recursive Least Squares Method:
RLS Algorithm
The method of least squares will be used to derive a recursive
algorithm for automatically adjusting the coefficients of a tapped-
delay-line (i.e., FIR) filter.
No assumptions on the statistics of the input signals.
This procedure, called the recursive least-squares (RLS) algorithm,
is capable of realizing a rate of convergence that is much faster than
the LMS algorithm,
because the RLS algorithm utilizes all the information contained
in the input data from the start of the adaptation up to the present.
Adaptive Digital FIR Filters Using the Least Squares
u(i) u(i-1) u(i-M+1)
z 1 z 1
d(i)
w0 (n) w1(n) wM2(n) wM1(n)
y(i) +
-
Figure M 1:Digital
1 FIR filter structure.
y (i ) wk (n)u (i k ) i 1,2,..., n e(i)
k 0
The requirement is to design the filter in such a way that it minimizes
the residual sum of squares of the error, e(i)=d(i)-y(i), i..e.,
n
J (n) n i e2 (i ) (1)
i 1
Vector differentiation of a scalar Differentiation of a matrix-vector
Product
c
a
1 (Ra) R T
c a
(c )
a2
.
T
a
.
(a R ) R
c
a p a
T T
Vector differentiation of a vector (a Ra ) [a R ]a [a ](a T
R ) T
a a a
T
( a Ra ) Ra R T
a ( R R T
)a
1c c2 ... cm
a
1 a1 a1 a
(c)
: : :
For symmetric matrix R
a
c1
a
c2
...
cm
a p
p a p T
(a Ra) 2Ra
a
Adaptive Digital FIR Filters Using the Least Squares
The filter output is the convolution sum
M 1
y (i ) wk (n)u (i k ) i 1,2,..., n (2)
k 0
(3)
Error: e(i ) d (i ) y (i ) d (i ) w (n)u(i )T
Tap-input vector at time i:
u(i ) u (i ), u (i - 1),..., u (i - M 1) T
Tap-weight vector at time n:
w (n) w0 (n), w1(n),..., wM-1(n)T
Cost function:
n
J (n) n i [d (i ) w T (n)u(i )][d (i ) uT (i )w (n)]
i 1 (4)
n
J (n) n i [d (i ) 2d (i )w T u(i ) w T (n)u (i )uT w (n)]
i 1
Adaptive Digital FIR Filters Using the Least Squares
We may treat the tap coefficients as constants for the duration of the
input data, from 1 to n. Hence, differentiating Eq.(4) with respect to
w(n), we get
J (n) n n i
T
2u(i )d (i ) 2u(i )u (i )w (n)
w (n) i 1
J (n)
setting 0
w (n)
n n
n i
u(i )u (i )w (n) u(i )d (i )
T ˆ n i (5)
i 1 i 1
ˆ ( n ) θ( n )
φ( n) w (6)
n n
where φ(n) n i T
u(i )u (i ) and θ(n) n iu(i ) d (i ) (7)
i 1 i 1
Adaptive Digital FIR Filters Using the Least Squares
LS solution: ˆ ( n ) θ( n )
Φ( n) w (8)
Assuming (n) is non-singular
ˆ ( n) Φ-1(n)θ( n)
W (9)
and for the resulting filter the residual sum of squares attains the
minimum value: T
J min ( n) Ed (n) w (n)θ(n)
ˆ (10)
n
Ed (n) n 1d 2 (i )
i 1
Ed (n 1) d 2 (n)
Adaptive Tapped-delay-line Filters Using the Least Squares
Properties of the Least-squares Estimate
Property 1. The least-squares estimate of the coefficient vector
approaches the optimum Wiener solution as the data length n approaches
infinity, if the filter input and the desired response are jointly stationary
ergodic processes.
Property 2. The least-squares estimate of the coefficient vector is
unbiased if the error signal e(i) has zero mean for all i.
Property 3. The covariance matrix of the least-squares estimate ŵ
equals Φ-1 , except for a scaling factor, if the error vector e0 has zero
mean and its elements are uncorrelated.
Property 4. If the elements of the error vector e0 are statistically
independent and Gaussian-distributed, then the least-squares estimate is
the same as the maximum-likelihood estimate.
n 1 n 1i T
φ( n) u(i )u (i ) u(i )uT (i ) (11)
i 1
This can be written as
T
φ(n) φ(n 1) u(n)u (n) (12)
Similarly
θ(n) θ(n 1) u(n)d (n) (13)
The Matrix-Inversion Lemma
Let A and B be two positive definite, M by M matrices related by
A B CD C
-1 -1 T
(14)
where D is another positive definite, N by N matrix and C is an M by N
matrix. According to the matrix-inversion lemma, we may express the
inverse of the matrix A as follows:
-1
A B BC D C BC T
1 T
C B (15)
The Exponentially Weighted RLS Algorithm (Contd.)
To use the matrix inversion lemma, we define
-1
A Φ(n) B Φ(n - 1)
C u ( n) D 1
Substituting these definitions in the matrix inversion lemma, we obtain
-2 -1 T -1
-1 1 -1 Φ ( n - 1)u ( n )u ( n )Φ (n - 1)
Φ (n) Φ (n - 1) - (16)
1 1uT (n)Φ-1(n - 1)u(n)
For convenience of computation, let
1 (17)
P ( n) φ ( n)
1P(n 1)u(n)
k ( n) (18)
1 T
1 u (n)P (n 1)u(n)
The Exponentially Weighted RLS Algorithm (Contd.)
Then, we may rewrite Eq.(16) as
1 1 T
P(n) P(n - 1) - k (n)u (n)P(n - 1) (19)
The M-by-1 vector k(n) is called the gain vector.
Rearranging Eq.(18), we find that
1 1 T
k (n) P (n - 1)u(n) - k (n)u (n)P (n - 1)u(n)
(20)
1 1 T
[ P (n - 1) - k (n)u (n)P (n - 1)]u(n)
Substituting Eq.(19) into Eq.(20), we get
k (n) P ( n)u( n) (21)
Or
1
k (n) φ (n)u(n)
The Exponentially Weighted RLS Algorithm (Contd.)
Developing the recursive equation for updating the LS estimate:
Substituting Eq.(13) into Eq.(9), we get
ˆ (n) Φ-1(n)θ(n) P (n)θ(n)
w (22)
P (n)θ(n 1) P (n)u(n)d (n)
Substituting P(n) only in the first term, we get
ˆ (n) P (n 1)θ(n 1) k (n)uT (n)P (n 1)θ(n 1) P (n)u(n)d (n)
w
ˆ (n) φ 1(n 1)θ(n 1) k (n)uT (n)φ 1(n 1)θ(n 1) P (n)u(n)d (n)
w
ˆ (n 1) k (n)uT (n)w
w ˆ (n 1) P ( n)u(n) d (n)
w ˆ (n 1) k ( n)[d ( n) uT (n) w
ˆ ( n) w ˆ (n 1)]
(23)
wˆ (n 1) k ( n) (n)
The Exponentially Weighted RLS Algorithm (Contd.)
RLS update:
ˆ ( n 1) k ( n) ( n)
ˆ ( n) w
w
where α(n) is the “a priori” estimation error defined as
(n) d (n) uT (n)wˆ (n 1) (24)
Estimation of the
desired response
based on the old LS
estimate
α(n) is in general different from the ‘a posteriori’ estimation error
e(n) d (n) uT ( n)w
ˆ (n) (25)
The Exponentially Weighted RLS Algorithm (Contd.)
Summary of the RLS Algorithm
1. Let n=1
-1P(n - 1)u(n)
2. Compute the gain vector k ( n)
1 -1uT (n)P (n - 1)u(n)
ˆ T ( n 1)u(n)
3. Compute the true estimation error ( n) d (n) w
4. Update the estimate of the coefficient vector
ˆ ( n 1) k ( n) (n)
ˆ (n) w
w
5. Update the error correlation matrix
1 1 T
P (n) P (n-1)- k (n)u (n)P(n-1)
6. Increment n by 1, go back to step 2
Side result: recursion of the minimum value of the residual sum of squares
J min (n) J min ( n - 1) ( n)e( n) (4.37)
Modify the correlation matrix slightly by writing
affects
n n i T n starting value
φ( n) u(i )u (i ) I
iP(01)
1
I
Where I is the MxM identity matrix and is a small positive constant.
Putting n=0
φ(0) I
This is equivalent to setting
( M 1) / 2 1 / 2 , n M 1
u ( n)
0, n 0, n M 1
Lastly, ˆ ( 0) 0
w
Input vector
ˆ (n 1)u(n)
w
u(n)
FIR Filter Output
Adaptive
Algorithm +
Desired
Response d(n)
% RLS Algorithm for Computing FIR Impulse Response
function [AR,MA,error] = KF(y,u1,p,q,N,th,fgf);
%*****************************************
% Initial Conditions
%*****************************************
gama=1000000000;
R=gama*eye(p+q);
I=eye(p+q);
par=[];
sy=zeros(1,p);
su=zeros(1,q);
s=[-sy su]';
for i=1:N-1
if i <= p
if i-1 > 0
sy(1:i-1)=y(i-1:-1:1);
end
else
sy=y(i-1:-1:i-p);
end
if i <= q
if i-1 > 0
su(1:i) = u1(i:-1:1);
end
else
su = u1(i:-1:i-q+1);
end
s=[-sy su]';
x_est(i) = s' * th;
e(i) = y(i) - x_est(i); %Innovation
aux = R*s;
var_ex = s' * aux + fgf; %Innovation covariance
k(:,i) = aux/var_ex; %Kalman Gain
par = [par,th];
th = th + k(:,i) * e(i); %Parameter correction
R =1/fgf*(I-k(:,i)*s')*R; %Covariance correction
end
AR=[1 th(1:p)']; %Denominator polynomial
MA=[th(p+1:p+q)']; % FIR coefficients
DATA Generation:
Medium: Acoustic Channel
Length of the Impulse
Response: 128
Input type: white noise
Model: FIR Filter
IIR Approximation
of the FIR Filter
FIR length: 128
IIR order
AR: 32
MA: 32
Input: white noise
Fixed Filter -----> Adaptive Filter:
Make an initial guess of the tap-weight vector, w(0)
Compute the Gradient Vector of the Loss Function
Compute the next guess at the tap-weight vector by making a change in the
initial or present guess in a direction opposite to that of the gradient vector
1
Tap-weight update equation: w ( n 1) w ( n) [ J ( n)]
2
There are many iterative search algorithms for minimizing the cost function with
the true statistics replaced by their estimate.
Gradient based iterative methods
› Method of Steepest Descent
› Newton’s Method
The gradient of J is Substituting k J
given by w (k 1) w (k ) 2 [Rw (k ) p(k )]
J 2p 2Rw Substituting p(k) from Wiener-Hopf Eq.
with an initial guess of w(0) at n=0,
the tap-weight vector at the k-th w (k 1) [I 2 R ]w (k ) 2 Rw*
iteration is denoted as w(k).
• The convergence of w(k) to the
optimum solution w* and the
The recursive equation:
convergence speed are dependent
w (k 1) w (k ) k J on the step-size parameter .
0 : is the stepsize
Rearranging, we get
w (k 1) w * [I 2 R ]w (k ) 2 Rw * w *
[I 2 R ](w (k ) w * )
Defining the vector v(k) as
• R may be diagonalized as
v ( k ) w (k ) w *
We obtain R QQ T
v (k 1) [I 2 R ]v (k ) I QQT
•Diagonalize R to obtain decoupled
equations Diagonal matrix of eigenvalues
v (k 1) [QQT 2 QΛQT ]v (k )
Q[I 2 Λ ]Q v (k )
T vi (k 1) [1 2 i ]vi ( k ),
i 0,1, , M 1
Denoting that
Solving
v (k ) QT v( k )
vi (k ) [1 2 i ]k vi (0),
QT v(k 1) [I 2 Λ ]QT v (k )
i 0,1, , M 1
v (k 1) [I 2 Λ]v (k ) vi (k ) 0 if
This vector recursive equation may 1 2 i 1 Leading to
separated into scalar recursive equation 1
0
max
Practical Implementation: Block-based Steepest Descent
- replace ensemble average (E) by time average (ergodic process)[LS]
Rqi i qi i=1,2,……., M
RQ QΛ
Q q1, q2 ,, qM
Property of the eigenvector q
Λ diag (1, 2 ,, M )
T 1, i j
qi q j
Pre-multiplying both sides by QT 0, i j
QT RQ QT QΛ Λ
QT Q I
1 T
Q Q
The steepest descent algorithm may suffer from slow modes of
convergence which arise as a result of the eigenvalue spread of R
The Newton’s method can somehow get rid of the eigenvalue spread.
Starting from the steepest descent algorithm
w (k 1) w (k ) 2 R[w (k ) w ] *
The presence of R causes the eigenvalue-spread problem
in the steepest descent method. Newton’s method
overcome this problem by replacing scalar with a matrix
step-size R 1
The resulting algorithm is
w (k 1) w (k ) R 1 k J
Substituting J 2p 2Rw we obtain
w (k 1) w (k ) 2 R [ Rw (k ) p]
1
(I 2 )w (k ) 2 R 1p
(I 2 ) w ( k ) 2 w *
Subtracting w* from both sides, we get
w (k ) w* (I 2 ) k [w (0) w* ]
In actual implementation of R 1 is not available and have to be estimated.
Filter Output:
M 1 The conventional LMS algorithm is a
y (n) wi (n) x (n i ) stochastic implementation of the
steepest descent algorithm.
i 0
Error: It simply replaces the cost function
e( n ) d ( n ) y ( n )
d ( n ) w ( n) x ( n )
T J E e (n)
2
The LMS algorithm adapts the filter by its instantaneous coarse estimate
tap weights so that e(n) is minimized
in the mean-square sense. J e ( n)
2
The LMS algorithm is a practical scheme
for realizing Wiener filters, without
Spatial Case: change in input vector only
sensor signals
explicitly solving the Wiener-Hopf equation. x
Update rule: Advantages and
Disadvantages of
w (n 1) w (n) J LMS
› - Simplicity in
Substituting J e (n)
2
we obtain
implementation
w (n 1) w ( n) e (n) 2
› Stable and robust
performance against
e (n) 2e(n)e(n)
2 different signal
conditions
2e(n)x( n) › - Slow convergence
Therefore, (due to eigenvalue
spread)
w (n 1) w ( n) 2 e(n)x(n)
This is referred to as the LMS algorithm.
Convergence/Stability Analysis of the LMS Algorithm
)
)
- Estimation error produced in the
optimum Wiener Solution
- Tap weight vector is independent of , the input vector
- Consequently, is also independent of
Taking Expectation on both sides,
Correlation matrix of the tap
input vector
From the principle of orthogonality,
- Exactly the same mathematical form of the steepest descent case
- The average weight-error vector pertaining to the LMS
has the same role as the weight error vector of the Steepest DA
Now, converges to zero as , provided that
From analogy, we may infer that for the LMS algorithm
as , provided that
,
goes to zero as
Convergence in the mean
- Convergence in MSE: is finite
,
LMS algorithm converges, if
The speed of convergence is determined by the factor, , whose
Magnitude is the largest, because this corresponds to the
slowest mode.
, restricted to half of its maximum value
The exponential rate of convergence is .Then
MSE convergence rate, since MSE is a quadratic function of the weights
The rate of convergence of the LMS algorithm can be characterized
by an exponential time constant
,
if is v. small
sec iterations
Excess MSE:
Misadjustment factor (MF) of the LMS Algorithm:
Excess MSE:
Misadjustment factor (MF) of the LMS Algorithm:
learning curve
Rate of convergence of LMS
Is characterized by an exp.
Time-constant,
Normalized LMS is the LMS algorithm with a step-size normalized by an
estimate of the input data power
Improves the convergence and stability of the LMS
Moves from fixed ---- data adaptive
The normalized LMS algorithm can be derived as the following
recursion for the tap-weight vector:
~
w (k 1) w (k ) x ( k ) e( k )
|| x(k ) ||2
~
However, in practice we use, w (k 1) w (k ) x ( k )e ( k )
a || x(k ) ||2
where a is a small regularization constant to
avoid divide by zero, as does not depend on input
Minimize subject to
Using the Lagrange multiplier,
Finding :
Derive the weight update
equation for the following
loss function:
where
Leaky LMS
Leaky LMS:
- more stable than LMS
- is the leakage factor
(1) can be viewed as a recursive system with input
and its pole at . If | | < 1, the effect is to
introduce more stability.
Pole at
function [hhat,J] = LMS(x,d,L,mu);
%%%Initialization
[N,M]=size(x);
xm=zeros(L,M);
gradx=zeros(L,M);
hhat=[zeros(L,M)];
alpha=1;
%%%%%%%%%%%%%%%%%%
for m = 1:N
if m-L+1>0
xm=x(m:-1:m-L+1,:); %Input data vector
else
xm=[x(m:-1:1);zeros(L-size(x(m:-1:1),1),M)];
end
e(m) = d(m)-xm'*hhat; %Error calculation
gradx = -2*xm*e(m); %gradient calculation
hhat = hhat - (mu/(alpha+xm'*xm))*gradx; %Parameter update
J(m) = e(m)^2;
end
NLMS Estimate
Learning Curves
Comparison of the RLS and LMS Algorithms
1. In the LMS algorithm, the correction that is applied in updating the
old estimate of the coefficient vector is based on the instantaneous
sample value of the tap-input vector and the error signal. On the other
hand, in the RLS algorithm the computation of this correction utilizes all
the past available information.
2. In the LMS algorithms, the correction applied to the previous estimate
consists of the product of three factors: the (scalar) step-size parameter
, the error signal e( n-1), and the tap-input vector u(n-1). On the other
hand, in the RLS algorithm this correction consists of the product of two
factors: the true estimation error (n) and the gain vector k(n). The gain
vector itself consists of -1(n), the inverse of the deterministic
correlation matrix, multiplied by the tap-input vector u(n). The major
difference between the LMS and RLS algorithms is therefore the
presence of -1(n) in the correction term of the RLS algorithm that has
the effect of decorrelating the successive tap inputs, thereby making the
RLS algorithm self-orthogonalizing. Because of this property, we find
that the RLS algorithm is essentially independent of the eigenvalue
spread of the correlation matrix of the filter input.
Adaptive Tapped-delay-line Filters Using the Least Squares
3. The LMS algorithm requires approximately 20M iterations to
converge in mean square, where M is the number of tap coefficients
contained in the tapped-delay-line filter. On the other band, the RLS
algorithm converges in mean square within less than 2M iterations. The
rate of convergence of the RLS algorithm is therefore, in general, faster
than that of the LMS algorithm by an order of magnitude.
4. Unlike the LMS algorithm, there are no approximations made in the
derivation of the RLS algorithm. Accordingly, as the number of
iterations approaches infinity, the least-squares estimate of the
coefficient vector approaches the optimum Wiener value, and
correspondingly, the mean-square error approaches the minimum value
possible. In other words, the RLS algorithm, in theory, exhibits zero
misadjustment. On the other hand, the LMS algorithm always exhibits a
nonzero misadjustment; however, this misadjustment may be made
arbitrarily small by using a sufficiently small step-size parameter .
Adaptive Tapped-delay-line Filters Using the Least Squares
5. The superior performance of the RLS algorithm compared to the LMS
algorithm, however, is attained at the expense of a large increase in
computational complexity. The complexity of an adaptive algorithm for
real-time operation is determined by two principal factors: (1) the
number of multiplications (with divisions counted as multiplications)
per iteration, and (2) the precision required to perform arithmetic
operations. The RLS algorithm requires a total of 3M(3 + M )/2
multiplications, which increases as the square of M, the number of filter
coefficients. On the other hand, the LMS algorithm requires 2M + 1
multiplications, increasing linearly with M. For example, for M = 31 the
RLS algorithm requires 1581 multiplications, whereas the LMS
algorithm requires only 63.
Applications of Adaptive Filters
- The Problem can be solved using non-adaptive & adaptive signal
processing techniques
Estimated
Fetal ECG
Abdominal Lead
Chest Leads c
i=1,2,,…. 4
Adaptive
d Algorithm
+……………..
where
x
Identifiability Conditions:
Cross-relation:
xi (n) h j (n) x j (n) hi (n)
eij (n) s(n) [hi (n) h j (n) h j (n) hi (n)] 0
x T (n)h x T (n)h , i j , i, j 1,2,, M
eij (n) i j j i
0, i j , i, j 1,2,, M
The normalized error signal is
x T (n)h x T
i j j (n)hi
, i j , i, j 1,2,, M
ij (n) || h || || h ||
0, i j , i, j 1,2,, M
eij (n)
ij (n)
|| h ||
The cost function is defined as LMS Algorithm:
M 1 M hˆ (n) hˆ (n) J (n) h hˆ ( n)
2
J ( n) ij (n)
i 1 j i 1 J ( n )
2
2
~
R (n)h J (n)h
|| h ||
hˆ arg min EJ (n)
h
hˆ (n) hˆ (n)
2 ~ ˆ
|| hˆ (n) ||
R (n)h(n) J (n)hˆ (n)
Subject to || hˆ || 1
If the channel estimate is normalized after
Each update
hˆ (n 1)
hˆ (n 1)
|| hˆ (n 1) ||