09 Filter Design
09 Filter Design
Stanford University
1/57
Last lecture
I Two’s complement is a fixed-point representation that represents fractions as integers
I There’s an inherent trade-off between roundoff noise and overflow/clipping
I FIR systems remain stable after coefficient quantization
I Linear phase FIR systems remain linear phase after coefficient quantization, since the
impulse response remains symmetric
I Coefficient quantization may lead to instability in IIR systems, as poles may move outside
the unit circle
I Similarly to quantization noise, roundoff noise is modeled by an additive uniformly
distributed white noise that is independent of the input signal (the linear noise model).
I Roundoff noise is minimized by performing quantization only after accumulation, but this
requires (2B + 1)-bit adders
I In FIR structures the equivalent roundoff noise at the output is white
I IIR structures lead to roundoff noise shaping
I The least noisy IIR structure depends on the system
I Cascade and parallel forms are used to mitigate total roundoff noise
2/57
Practice and theory
In practice
xc (t) x[n] Digital Signal y[n] yc (t)
ADC DAC
Processor
DSP theory
Heq (jΩ)
3/57
Digital filter design
Design techniques:
I Impulse invariance
I Bilinear transformation
Design by impulse invariance can result in either FIR or IIR filters, whereas bilinear
transformation generally results in IIR filters.
4/57
Digital filter design
2. Digital FIR filter design from specifications
How to find FIR H(z) such that H(ejω ) best approximates a desired frequency response
Hd (ejω )? Essentially a polynomial curve fitting problem.
Hd (ejω )
1 + δ1 Design techniques:
1 − δ1 I Window method
I Optimal filter design
I Parks-McClellan algorithm
Passband Transition Stopband I Least-squares algorithm
δ2
ωp ωs πω 5/57
Outline
Outline
6/57
Digital processing of analog signals
Heq (jΩ)
As long as there is no aliasing and that the reconstruction filter is the ideal lowpass filter these
equalities hold:
(
H(ejΩT ), |Ω| < π/T
Heq (jΩ) = (from DSP to analog)
0, |Ω| > π/T
Heq (jΩ)
The scaling factor T compensates for the 1/T attenuation in the frequency domain due to
sampling
The resulting h[n] depends on the sampling period T .
8/57
Impulse invariance example: lowpass Butterworth filter
Butterworth filters are maximally flat in the passband and are monotonic overall. The
downside of Butterworth filters is their relatively slow roll-off.
For this example, consider the following 6th-order continuous-time lowpass Butterworth filter:
0.12093
Heq (s) =
(s2 + 0.364s + 0.4945)(s2 + 0.9945s + 0.4945)(s2 + 1.3385 + 0.4945)
9/57
Impulse invariance example: lowpass Butterworth filter
To design an FIR filter by impulse invariance we must
1. Obtain the continuous-time impulse response heq (t) ←→ Heq (s) (impulse in Matlab)
2. Sample and scale heq (t) with period T and record only M + 1 first samples
(
T heq (nT ), n = 0, . . . , M
h[n] = , (for causal heq (t))
0, otherwise
h[n] is the FIR filter coefficients. M is typically chosen to satisfy some energy criterion.
For instance, samples must contain 95% of the signal energy.
heq (t)
T 5T 24T t
10/57
Impulse invariance example: lowpass Butterworth filter
Magnitude Phase
dB rad
Heq (jΩ)
−10 −2
H(ejω ) FIR 25 samples
Heq (jΩ)
−20 H(ejω ) FIR 25 samples
−4
−30
−6
−40
−8
−50
−60 −10
Questions:
1. What would happen if we take fewer samples (smaller M )?
2. What would happen if we decrease the sampling period e.g., T2 = 0.5T ?
11/57
Impulse invariance example: lowpass Butterworth filter
I Designing FIR filters by impulse invariance is straightforward. Plus, FIR systems have the
implementation advantages discussed in lectures 7 and 8
I Problem: it may require prohibitively many samples to achieve good accuracy
I IIR systems generally offer better accuracy while requiring fewer operations (coefficients)
To design an IIR filter by impulse invariance we must
1. Invert the Laplace transform Heq (s) using partial fraction expansion to obtain heq (t)
analytically. Function residue in Matlab
2. Sample heq (t): h[n] = T heq (nT )
3. Calculate the z-transform H(z) of h[n]
12/57
Impulse invariance example: lowpass Butterworth filter
Im{z}
0.2871 − 0.4466z −1
H(z) =
1 − 1.2971z −1 + 0.6949z −2 0.5
−2.1428 + 1.1455z −1
+
1 − 1.0691z −1 + 0.3699z −2
1.8557 − 0.6303z −1 −1 −0.5 0.5 1 Re{z}
+
1 − 0.9972−1 + 0.2570z −2
−0.5
−1
13/57
Impulse invariance example: lowpass Butterworth filter
Magnitude Phase
dB rad
Heq (jΩ)
−10 −2
H(ejω ) IIR
Heq (jΩ)
−20 H(ejω ) IIR
−4
−30
−6
−40
−8
−50
−60 −10
I IIR systems achieve better accuracy while requiring fewer operations (coefficients) than
FIR systems.
I Similarly to FIR systems, if we change the sampling frequency the behavior of the filter
changes.
14/57
Bilinear transformation
Another way to answer the question: How to design h[n] ←→ H(z) given heq (t) ←→ Heq (s)?
The bilinear transformation maps the left-hand side of the s-plane into the unit circle in the
z-plane.
2 1 − z −1
s= (Bilinear transformation)
T 1 + z −1
Ω Im{z}
σ 1 Re{z}
s-plane z-plane
15/57
Bilinear transformation
To design a digital filter from an analog filter using the bilinear transformation, we simply make
the following change of variables:
16/57
Frequency warping
Evaluating z on the unit circle is equivalent to evaluating s on the imaginary axis jΩ:
2 1 − e−jω
2
jΩ = = j tan ω/2
T 1 + e−jω T
This results in the following relation
ω = 2 arctan(ΩT /2) (frequency warping)
Problem: with the bilinear transformation we no longer have the linear relation ω = ΩT . This
is known as frequency warping.
ω = 2 arctan(ΩT /2)
−π − π2 π π ΩT
2
-π 17/57
Bilinear transformation example: lowpass Butterworth filter
Revisiting the example of the 6th-order lowpass Butterworth filter
To obtain H(z) we simply make:
0.5
×6
−0.5
−1
18/57
Bilinear transformation example: lowpass Butterworth filter
Magnitude Phase
dB rad
−10 −2
Heq (jΩ)
−20 H(ejω ) bilinear T = 0.5
−4 H(ejω ) bilinear T = 2
−30
−6
−40
Heq (jΩ)
H(ejω ) bilinear T = 0.5 −8
−50
H(ejω ) bilinear T = 2
−60 −10
I Similarly to impulse invariance, the resulting frequency response depends on the sampling
period T .
I Frequency warping leads to the disagreement between continuous-time and discrete-time
filters for ω > 0.3π
19/57
Frequency pre-warping
Ωp is chosen so that H(ejω ) will preserve a particular characteristic of Heq (jΩ) e.g., Ωp is
made equal to the 3-dB bandwidth.
20/57
Bilinear transformation example: lowpass Butterworth filter
Example of bilinear transformation with frequency pre-warping
I Ωp = 0.6π for T = 2
I Ωp = 0.2π for T = 0.5.
Magnitude Phase
dB rad
−10 −2
Ωp = 0.6π
Heq (jΩ)
−20 H(ejω ) bilinear T = 0.5, Ωp = 0.2π
−4 H(ejω ) bilinear T = 2, Ωp = 0.6π
−30
Heq (jΩ) Ωp = 0.6π
−6
H(ejω ) bilinear T = 0.5, Ωp = 0.2π
−40
H(ejω ) bilinear T = 2, Ωp = 0.6π
−8
−50
−60 −10
21/57
Common terminology
Hd (ejω )
1 + δ1
1 − δ1
δ2
ωp ωs π ω
Terminology
I The filter order is equal to the largest power of z −1 or z
I δ1 passband ripple
I δ2 stopband ripple (stopband attenuation)
I ωp passband edge frequency
I ωs stopband edge frequency 22/57
Classic filters
23/57
Comparison of classic filters
IAll are 6th-order filters designed to have 3-dB bandwidth of ≈ π/2.
IRipple was set to 1 dB in passband
I Stopband attenuation was 30 dB.
Magnitude
dB
π/4 π/2 3π/4 π
ω
Butterworth
−10 π/4 π/2 Chebyshev I
Chebyshev II
Elliptic
−1 Bessel
−20
−30
−40
24/57
Comparison of classic filters
I All are 6th-order filters designed to have 3-dB bandwidth of ≈ π/2.
I Ripple was set to 1dB in passband and stopband
I Stopband attenuation was 30 dB.
Phase
rad
−2
−4
Butterworth
−6
Chebyshev I
Chebyshev II
−8 Elliptic
Bessel
−10 25/57
From lowpass to highpass, bandpass, and bandstop
26/57
Outline
Outline
27/57
Digital FIR filter design from specifications
How to find FIR H(z) such that H(ejω ) best approximates a desired frequency response
Hd (ejω )? Essentially a polynomial curve fitting problem.
Hd (ejω )
1 + δ1
1 − δ1
δ2
ωp ωs π ω
Design techniques:
I Window method
I Optimal filter design
I Parks-McClellan algorithm
I Least squares 28/57
Window method
An easy way to design an FIR filter to match a desired frequency response Hd (ejω ) is to
calculate the inverse DTFT of Hd (ejω ) and truncate the result to a reasonable number of
samples (similar to impulse invariance):
Z π
1
hd [n] = Hd (ejω )ejωn dω (inverse DTFT)
2π −π
Then we truncate it to have at most M + 1 samples
(
hd [n], n = 0, 1, . . . , M
h[n] = (truncated sequence)
0, otherwise
w[n] is the window sequence, which in this case is the rectangular window.
29/57
Window method
Representing truncation as h[n] = w[n]hd [n], gives us an easy way to understand what
happens in the frequency domain.
Problem: H(ejω ) will not be equal to Hd (ejω ). Instead, it will be a smeared version of the
desired response Hd (ejω ).
30/57
Revisiting the Gibbs phenomenon
Time domain Frequency domain
M
sin ωc n ωc ω
c jω
X sin ωc n −jωn
hlpf [n] = = sinc n HM (e ) = e
πn π π πn
n= −M
ωc
π 1 M =7
−M M
n ω
−π −ωc ωc π
30/57
Revisiting the Gibbs phenomenon
Time domain Frequency domain
M
sin ωc n ωc ω
c jω
X sin ωc n −jωn
hlpf [n] = = sinc n HM (e ) = e
πn π π πn
n= −M
ωc
π 1 M = 19
−M M
n ω
−π −ωc ωc π
30/57
Revisiting the Gibbs phenomenon
I The Gibbs phenomenon appears when we truncate the impulse response of the ideal
lowpass filter (or any discontinuous DTFT).
I In lecture 1, we attributed this to convergence issues of the DTFT for non-absolute
summable sequences. The DTFT of the sinc converges only in the mean square sense, and
not uniformly
I Another way to view the Gibbs phenomenon is as a result of windowing.
1
H(ejω ) = W (ejω ) ∗ Hd (ejω ) (convolution)
2π
I In this case the desired response Hd (ejω ) is the ideal lowpass filter, and the window
function is
(
1, n = −M, −M + 1, . . . , M − 1, M
w[n] =
0, otherwise
sin(ω(2M + 1)/2)
⇐⇒ W (ejω ) =
sin(ω/2)
31/57
Rectangular window
W (ejω )
M =7
M = 19
−π −π/2 π/2 π ω
32/57
Rectangular window
From Fourier transform theory, we can show that the rectangular window produces H(ejω )
that best matches Hd (ejω ) in the mean-square sense. That is,
Z π
1
|H(ejω ) − Hd (ejω )|2 dω, (mean-square error)
2π −π
is minimized when w[n] is the rectangular window.
Question: are there other windows w[n] that minimize issues with discontinuities without
excessively increasing the mean-square error?
33/57
Commonly used windows
Rectangular:
1, 0 ≤ n ≤ M
w[n] =
0, otherwise
Bartlett (triangular):
2n/M, 0 ≤ n ≤ M/2, M even
w[n] = 2 − 2n/M, M/2 < n ≤ M
0, otherwise
Hann:
0.5 − 0.5 cos(2πn/M ), 0 ≤ n ≤ M,
w[n] =
0, otherwise
Hamming:
0.54 − 0.46 cos(2πn/M ), 0 ≤ n ≤ M,
w[n] =
0, otherwise
Blackman:
0.42 − 0.5 cos(2πn/M ) + 0.08 cos(4πn/M ), 0 ≤ n ≤ M,
w[n] =
0, otherwise
34/57
Commonly used windows
Time domain
All windows are symmetric about M/2.
w[n]
Rectangular
1
Blackman
Hamming
Hann
Bartlett (triangular)
M M n
2
Note: n is discrete. These curves were plotted as continuous functions just for easier
visualization.
We will revisit windows when talking about spectrum analysis (lecture 12) 35/57
Linear phase in filters designed by windowing
If the window is causal and symmetric and if the desired impulse response hd [n] is causal and
symmetric, then it follows
Therefore, h[n] is either even or odd symmetric and consequently H(ejω ) has generalized linear
phase.
36/57
Kaiser window
It’s typically desired that the window be maximally concentrated around ω = 0 (small sidelobe
area).
The Kaiser window offers a nearly optimal trade-off between main-lobe width and side-lobe
area. p
2 2
I0 β 1 − (n − α) /α
w[n] = , 0≤n≤L−1,
I0 (β)
0, otherwise
where α = (L − 1)/2, β is a design parameter, and I0 (·) is the modified Bessel function of
first kind and order 0.
See section 7.5.3 of the textbook for recommendations on values of β for lowpass filter design.
37/57
Summary on FIR filter design by the window method
1. From the desired frequency response Hd (ejω ) calculate the desired impulse response hd [n].
2. Choose the filter order M and the window w[n]. Then,
(
hd [n]w[n], n = 0, . . . , M
h[n] = (for hd [n] causal)
0, otherwise
designs a lowpass FIR filter of order M and cutoff frequency ωc using the window method with
Kaiser window with parameter β
38/57
Optimal FIR filter design
39/57
Optimal FIR filter design
Hd (ejω )
1 + δ1
1 − δ1
δ2
ωp ωs π ω
W (ω ≤ ωp ) = 1 W (ωp < ω < ωs ) = 0 W (ω ≥ ωs ) = 1
40/57
Matrix notation
E(ω) = W (ω) Hd (ejω ) − H(ejω ) (continuous weighted error)
e = W (d − Qh) (matrix notation)
41/57
Matrix notation
I Q is the matrix:
2 cos(ω1 ( M 2 cos(ω1 ( M
2 )) 2 − 1)) ... 2 cos(ω1 ) 1
2 cos(ω2 ( M )) 2 cos(ω2 ( M
2 2 − 1)) ... 2 cos(ω2 ) 1
Q=
.. .. .. .. ..
. . . . .
2 cos(ωN ( M
2 )) 2 cos(ωN ( M
2 − 1)) ... 2 cos(ωN ) 1 N × M +1
2
h[M − n]
Even symmetry h[n] =
M
2 −1
X
−jω M
e 1+ 2h[n] cos(ω(M/2 − n)) , M even
2
jω n=0
H(e ) = M −1
X2
M
−jω
e 2h[n] cos(ω(M/2 − n)), M odd
2
n=0
−h[M − n]M
Odd symmetry h[n] =
2 −1
X
−jω M
e 1 + 2jh[n] sin(ω(M/2 − n)) , M even
2
jω n=0
H(e ) = M −1
X2
M
−jω 2
e 2jh[n] sin(ω(M/2 − n)), M odd
n=0
43/57
Optimal FIR filter design
Question: how to find the coefficients h[0], . . . , h[b M
2 c] (the vector h)?
Two algorithms:
1. Parks-McClellan algorithm: minimizes the maximum weighted error
firpm in Matlab.
2. Least squares: minimizes the mean-square weighted error
Z π
1
min |E(ω)|2 dω (least squares)
h[n] 2π −π
firls in Matlab.
44/57
Parks-McClellan algorithm
The Parks-McClellan algorithm finds the filter coefficients that minimize the maximum
weighted error:
45/57
Parks-McClellan algorithm as a linear program
The least-squares algorithm finds the filter coefficients that minimize the mean-square
weighted error:
Z π
1
min |E(ω)|2 dω (mean square weighted error)
h[n] 2π −π
min ||W (d − Qh)||22 (in matrix notation)
h
min ||Ah − b||22 (change of variables A = W Q and b = W d)
h
Problems of the form minh ||Ah − b||22 are referred to as least-squares problems and they
have analytical solution:
h = A† b (least-squares solution)
† H −1 H
A = (A A) A is the Moore-Penrose pseudoinverse (pinv in Matlab).
Note: AH = (A∗ )T is the Hermitian (conjugate transpose matrix), since A could be complex.
47/57
Example: optimal bandpass FIR design
We want to design an FIR bandpass filter with the following desired response Hd (ejω )
The weight function is zero in the transition bands. Hence, we don’t care about the error in
those regions.
Hd (ejω )
W (ω) = 0
∆ω ∆ω
0.5π 0.7π π ω
Now that we have the redefined matrix Q, we can apply the least-squares algorithm as usual
Important: d and W have to be defined for the same frequencies used in calculating Q.
If Hd (ejω ) is Hermitian symmetric i.e., Hd (ejω ) = Hd∗ (e−jω ), then h will be purely real.
50/57
Example: predicting band-limited signals
Question: how to predict the next sample from previous samples?
x[n]
?
1
1 2 3 4 5 6 7 8 9 10 11 n
−1
−2
51/57
Example: predicting band-limited signals
Suppose our band-limited signal is such that
|X(ejω )|
H(z)
−π −ωc ωc π ω
Mathematically, X
M
e[n] = h[m]x[n − m] (filter output)
m=0
M
X
0 ≈ h[0]x[n] + h[m]x[n − m]
m=1
M
1 X
x[n] ≈ − h[m]x[n − m] (prediction based on M previous samples)
h[0] m=1
Conclusion: designing a good predictive filter for band-limited signals boils down to
designing a good high-pass filter.
This method was first proposed by Vaidyanathan in 1987 52/57
Example: predicting band-limited noise
This is an example of prediction of a Gaussian noise with PSD:
(
jω 1, |ω| ≤ 0.7π
Φxx (e ) ≈
0, 0.7 < |ω| ≤ π
-50
-100
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
/
Predictions
0.04
noise
0.02 prediction
Sample
-0.02
-0.04
-0.06
0 10 20 30 40 50 60 70 80 90 100
n
53/57
Summary
Impulse invariance
I The impulse response of the continuous-time system is sampled and scaled by T . In FIR
implementations the impulse response is truncated up to a specified number of samples.
In IIR implementations the discrete-time system is obtained analytically.
Bilinear transformation
I The bilinear transformation maps the left-hand side of the s-plane into the unit circle in
the z-plane. This non-linear mapping leads to frequency warping, which can be mitigated
by frequency pre-warping. Oversampling also mitigates frequency warping.
FIR filter design by windowing
I Design by windowing is almost an art form
I The Kaiser window is a nearly optimal choice
Optimal FIR filter design
I Optimal FIR filters minimize some characteristic of the weighted error
I The Parks-McClellan method minimizes the maximum weighted error
I The least-squares method minimizes the mean-square weighted error
54/57