T61.
3015DigitalSignalProcessingand
Filtering
8.4.2013
Introduction
Finite wordlength effects are caused by:
T-61.3015
Digital Signal Processing and Filtering
Quantization of the filter coefficients
Rounding / truncation of multiplication results
Chapter 12:
Analysis of Finite Wordlength Effects
Quantization of the input signal
Dynamic range constraints of the
implementation
Sanjit K. Mitra,
Digital Signal Processing, A Computer-Based Approach,
3rd Edition, McGraw-Hill, 2006
Olli Simula
8.4.2013
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Analysis of Finite Wordlength Effects
Analysis of Finite Wordlength Effects
Ideally, the system parameters along with the
signal variables have infinite precision taking any
value between
and
In practice, they can take only discrete values
within a specified range since the registers of the
digital machine where they are stored are of finite
length
The discretization process results in nonlinear
difference equations characterizing the discretetime systems
T-61.3015 Digital Signal Processing Filtering; Chapter 12
These nonlinear equations, in principle, are
almost impossible to analyze and deal with
exactly
However, if the quantization amounts are small
compared to the values of signal variables and
filter parameters, a simpler approximate theory
based on a statistical model can be applied
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Analysis of Noise Properties and
Dynamic Range Constraints
Analysis of Finite Wordlength Effects
Using the statistical model, it is possible to
derive the effects of discretization and develop
results that can be verified experimentally
Sources of errors:
(1) Filter coefficient quantization
(2) A/D conversion
(3) Quantization of arithmetic operations
(4) Limit cycles
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Chapter12:AnalysisofFiniteWordlength
Efeects
T-61.3015 Digital Signal Processing Filtering; Chapter 12
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
The Quantization Process and Errors
Example: First Order IIR Filter
Fractional numbers (sign
bit +
fractional part)
Quantizer:
The quantization
process model
Quantization of coefficients :
Q(x)
Quantization of input x[n]:
Rounding/truncation of v[n]:
Rounding /
Truncation:
Output y[n] with finite wordlength:
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Twos complement
truncation
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Quantization Error
The Quantization Errors
Rounding
To be discarded
Sign-magnitude and ones
complement truncation
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Quantization of Floating-Point Numbers
Chapter12:AnalysisofFiniteWordlength
Efeects
10
Analysis of Coefficient Quantization
Effects
Only mantissa is quantized; the relative error is
relevant!
T-61.3015 Digital Signal Processing Filtering; Chapter 12
T-61.3015 Digital Signal Processing Filtering; Chapter 12
The transfer function H(z) of the digital filter
implemented with quantized coefficients is
different from the desired transfer function H(z)
Main effect of coefficient quantization is to move
the poles and zeros to different locations from the
original desired locations
The actual frequency response H(ej) is thus
different from the desired frequency response
H(ej)
11
T-61.3015 Digital Signal Processing Filtering; Chapter 12
12
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
Coefficient Quantization Effects on a
Direct Form IIR Filter
Analysis of Coefficient Quantization
Effects
In some cases, the poles may move outside the
unit circle causing the implemented digital filter to
become unstable even though the original transfer
function H(z) is stable
Direct form realizations are more sensitive to
coefficient quantization than cascade or parallel
forms
The sensitivity increases with increasing filter order
Usually second order blocks in cascade or parallel
are used
13
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Gain responses of a 5-th order elliptic lowpass filter
with unquantized and quantized coefficients
Fullband Gain Response
Passband Details
T-61.3015 Digital Signal Processing Filtering; Chapter 12
14
Coefficient Quantization Effects on a
Cascade Form IIR Filter
Coefficient Quantization Effects on a
Direct Form IIR Filter
Gain responses of a 5-th order elliptic lowpass
filter implemented in a cascade form with
unquantized and quantized coefficients
Pole and zero locations
of the filter with
quantized coefficients
(denoted by x and o)
and those of the filter
with unquantized
coefficients (denoted by
+ and *)
Fullband Gain Response
15
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Coefficient Quantization Effects on a
Direct Form FIR Filter
Passband Details
T-61.3015 Digital Signal Processing Filtering; Chapter 12
16
Example of Coefficient Quantization
in 6th Order Direct Form Realization
Gain responses of a 39-th order equiripple
lowpass FIR filter with unquantized and
quantized coefficients
Fullband Gain Response
Passband details
original - solid line, quantized - dashed line
Gain, dB
0
-20
-40
-60
0.2
0.4
0.6
0.8
Amplitude responses
17
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Pole-zero locations
T-61.3015 Digital Signal Processing Filtering; Chapter 12
18
Copyright 2001, S. K. Mitra
Chapter12:AnalysisofFiniteWordlength
Efeects
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
Example:
Example of Coefficient Quantization
in 6th Order Cascade Form Realization
6th order bandstop
filter with unquantized
coefficients
Cascade form with
coefficients quantized
to 6 bits
Amplitude responses
Pole-zero locations
Parallel form with
coefficients quantized
to 6 bits
19
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Coefficient Quantization in FIR Filters
T-61.3015 Digital Signal Processing Filtering; Chapter 12
20
A/D Conversion Noise Analysis
Consider an (M-1)th order FIR transfer function
Quantization of the filter coefficients results in a
new transfer function
Analog input
H(z)
+
E(z)
Linear phase: h[n] = + h[N-1-n]
Symmetry of the impulse response is not affected
by quantization
T-61.3015 Digital Signal Processing Filtering; Chapter 12
21
Quantization Noise Model
Analog input
sample x[n]
Binary equivalent
Quantized
input sample of quantized input
Quantization of the input signal introduces error
at the input of the filter
This error is propagated through the filter together
with the input signal
Affects the signal-to-noise ratio of the system
T-61.3015 Digital Signal Processing Filtering; Chapter 12
22
Quantization Error
Twos complement
representation
The quantization error e[n]:
For twos complement rounding:
e[n] is called
granular noise
Outside RFS the error increases linearly; e[n] is
called the saturation error or the overload noise
The output value is clipped to the maximum value
Input signal is assumed to be scaled to be in the
range of +1 by dividing its amplitude by RFS/2
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Chapter12:AnalysisofFiniteWordlength
Efeects
23
T-61.3015 Digital Signal Processing Filtering; Chapter 12
24
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
Model of the Quantization Error
Quantization Error Distributions
(a) Rounding
Assumptions:
1) The error sequence {e[n]} is a sample sequence of a widesense stationary (WSS) white noise process, with each sample
e[n] being uniformly distributed over the quantization error
2) The error sequence is uncorrelated with its corresponding input
sequence {x[n]}
3) The input sequence is a sample sequence of a stationary
random process
The assumptions hold in most practical situations with
rapidly changing input signals
T-61.3015 Digital Signal Processing Filtering; Chapter 12
25
Additive quantization noise e[n] on the signal x[n]
Signal-to-quantization noise ratio in dB is defined
as
where
x2 is the signal variance (power) and
e2 is the noise variance (power)
27
26
A/D conversion:
(b+1) bits: = 2-(b+1)RFS , where RFS is the fullscale range
T-61.3015 Digital Signal Processing Filtering; Chapter 12
28
Propagation of Input Quantization Noise
to Digital Filter Output
Effect of Input Scaling on SNR
where RFS=Kx (x is the RMS value of the signal)
Scaling down the input signal (A<1) decreases the SNR
Scaling up the input signal (A>1) increases the possibility
to exceed the full-scale range RFS resulting
in clipping SNR
Chapter12:AnalysisofFiniteWordlength
Efeects
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Thus, SNR increases 6 dB for each added bit in
the wordlength
Let the input scaling factor be A with A>0
The variance of the scaled input Ax[n] is A2x2
The SNR changes to
T-61.3015 Digital Signal Processing Filtering; Chapter 12
The variance represents the noise power
Signal-to-Noise Ratio
Signal-to-Noise Ratio
T-61.3015 Digital Signal Processing Filtering; Chapter 12
(b) Twos
complement
truncation
29
Due to linearity of H(z) and the assumption that x[n]
and e[n] are uncorrelated the output can be
expressed as a linear combination (sum) of two
sequences:
The output noise is:
T-61.3015 Digital Signal Processing Filtering; Chapter 12
30
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
Propagation of Input Quantization Noise
to Digital Filter Output
Propagation of Input Quantization Noise
to Digital Filter Output
The mean and variance of v[n] characterize the
output noise
The normalized output noise variance is given by
The mean mv is:
which can be written as:
The noise variance v2 is:
The output noise power spectrum is:
T-61.3015 Digital Signal Processing Filtering; Chapter 12
An equivalent expression is:
31
T-61.3015 Digital Signal Processing Filtering; Chapter 12
32
Quantization of Multiplication Results
Analysis of
Assumptions:
Arithmetic Round-Off Errors
1) The error sequence {e[n]} is a sample sequence of a
stationary white noise process, with each sample e[n] being
uniformly distributed
2) The quantization error sequence {e[n]} is uncorrelated with
the signal {v[n]}, the input sequence {x[n]} to the filter, and
all other quantization errors
The assumption of {e[n]} being uncorrelated with
{v[n]} holds for rounding and twos complement
truncation
T-61.3015 Digital Signal Processing Filtering; Chapter 12
33
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Quantization of Multiplication Results
Quantization of Multiplication Results
The quantization model can be used to analyze
the quantization effects at the filter output
Statistical model of the filter:
Quantization before
summation
The number of
multiplications kl at adder
inputs
The rth branch node with
signal value ur[n] needs to
be scaled to prevent
overflow
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Chapter12:AnalysisofFiniteWordlength
Efeects
35
34
fr[n]
Impulse response
from filter input to
branch node r
gl[n]
Impulse response
from input of lth adder
to filter output
T-61.3015 Digital Signal Processing Filtering; Chapter 12
36
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
Quantization of Multiplication Results
Quantization of Multiplication Results
Branch nodes to be scaled
lead to multipliers and are
outputs of summations:
The total output noise variance:
+
Scaling transfer function: Fr(z)
Noise transfer function:
Gl(z)
where L is the number of summation nodes to
which noise sources are connected
The noise variance can also be written as
Let 02 be the variance of each individual noise
source; then kl02 is the noise variance of el[n]
The output noise variance is:
T-61.3015 Digital Signal Processing Filtering; Chapter 12
37
The Output Quantization Noise
T-61.3015 Digital Signal Processing Filtering; Chapter 12
38
Dynamic Range Scaling
The amount of noise depends on the implementation
Quantization of
multiplication results
after summation
reduces the number of
noise sources to one
The variance of the
noise source el[n] is
now 02
DSP processors usually carry out multiply-accumulate
operation using double precision arithmetic
T-61.3015 Digital Signal Processing Filtering; Chapter 12
39
T-61.3015 Digital Signal Processing Filtering; Chapter 12
40
Digital filter
Three different conditions to ensure that ur[n]
satisfies the conditions:
1) An absolute bound
2) Linfinity -bound
3) L2 -bound
Different bounds are applicable under certain
input signal conditions
Chapter12:AnalysisofFiniteWordlength
Efeects
The rth node value ur[n] has to be scaled
Assume that the input sequence is bounded by
unity, i.e., |x[n]| < 1 for all values of n
The objective of scaling is to ensure that
|ur[n]| < 1 for all r and all values of n
An Absolute Bound
Dynamic Range Scaling
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Digital filter
Fr(z) is the scaling transfer function
The node value ur[n] is determined by the
convolution
41
T-61.3015 Digital Signal Processing Filtering; Chapter 12
42
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
Scaling with the Absolute Bound
An Absolute Bound
Assuming that x[n] satisfies the dynamic range
constraint |x[n]| < 1
If the dynamic range constraint is not satisfied the
filter input has to be scaled with the multiplier K
The node value ur[n] now satisfies the dynamic
range constraint, i.e., | ur[n]| < 1 if
The scaling rule based on the absolute bound is
too pessimistic and reduces the SNR significantly
More practical and easy to use scaling rules can
be derived in the frequency domain if some
information about the input signal is known a priori
This is both necessary and sufficient condition
to guarantee that there will be no overflow
T-61.3015 Digital Signal Processing Filtering; Chapter 12
43
Scaling Norms
Define the Lp-norm of a Fourier transform F(ej) as
T-61.3015 Digital Signal Processing Filtering; Chapter 12
44
Scaling Norms: L-Bound
An inverse Fourier transform
L2-norm, ||F||2, is the root-mean-square (RMS)
value of F(ej), and
L1-norm, ||F||1, is the mean absolute value of F(ej)
over
Moreover, limp->||F||p exists for a continuous F(ej)
and is given by its peak
T-61.3015 Digital Signal Processing Filtering; Chapter 12
45
If ||X||1 < 1, then the dynamic range constraints
satisfied if
Applying Schwarz inequality
If the mean absolute value of the input spectrum
is bounded by unity, then there will be no adder
overflow if the peak gains from the filter input to
all adder output nodes are scaled satisfying the
above bound
The scaling rule is rarely used since with most
input signals encountered in practice ||X||1 < 1
does not hold
Chapter12:AnalysisofFiniteWordlength
Efeects
46
Scaling Norms: L2-Bound
Scaling Norms: L-Bound
T-61.3015 Digital Signal Processing Filtering; Chapter 12
T-61.3015 Digital Signal Processing Filtering; Chapter 12
47
or equivalently
If the filter input has finite energy bounded by unity,
i.e., ||X||2 < 1, then the adder overflow can be
prevented by scaling the filter such that the RMS
value of the scaling transfer functions are bounded
by unity:
T-61.3015 Digital Signal Processing Filtering; Chapter 12
48
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
A General Scaling Rule
Scaling of a Cascade Form IIR Filter
A more general scaling rule is obtained using
Holders inequality
The nodes (*)
need to be
scaled
for all p,q > 1, with
After the scaling the transfer functions become ||F||p
and the scaling constants should be chosen such
that
Scaling transfer functions:
In many structures the scaling multipliers can be
absorbed to the existing feed-forward multipliers
Fr(z) can be expressed by poles and zeros of the original H(z)
49
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Scaling - Back-Scaling
50
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Scaled Cascade Form IIR Filter Structure
FILTER
Scaling transfer
functions:
The effect of input scaling is compensated by backscaling at the output of the filter
Scaling block-by-block in cascade realization forms
H1(z)
H2(z)
HR(z)
Each second order block is scaled individually
The scaling coefficients between the blocks contain the
back-scaling of the previous block and the scaling of the
next block
T-61.3015 Digital Signal Processing Filtering; Chapter 12
51
The scaled structure has new values of the coefficients
in the feed-forward branches
Only one critical branch node in each second order
block has to be checked for overflow
T-61.3015 Digital Signal Processing Filtering; Chapter 12
52
Noise Transfer Functions
Optimum Section Ordering and Pole-Zero
Pairing of a Cascade Form IIR Digital Filter
Ordering of second-order sections as well as
pairing of poles and zeros affects the output
noise power of the filter
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Chapter12:AnalysisofFiniteWordlength
Efeects
The noise transfer functions can be expressed using the
transfer functions of the cascaded second-order blocks
The scaled noise transfer functions are given by
53
T-61.3015 Digital Signal Processing Filtering; Chapter 12
54
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
Noise Model of Second-Order Blocks
Noise Transfer Functions
The output noise power spectrum due to product
round-off is given by
and output noise variance is
where the integral in the parenthesis is the square
of the L2-norm of the noise transfer function
T-61.3015 Digital Signal Processing Filtering; Chapter 12
kl = 1, for l = 1, 2,...,R+1
T-61.3015 Digital Signal Processing Filtering; Chapter 12
56
The scaling transfer function Fl(z) contains
sections Hi(z), i = 1, 2,..., l-1
The noise transfer function Gl(z) contains sections
Hi(z), i = l, l+1,..., R
Every term in the sum for the noise power or the
noise variance includes the transfer function of all
R sections in the cascade realization
To minimize the output noise power the norms of
Hi(z) should be minimized for all values of i by
appropriately pairing the poles and zeros
The output noise power spectrum of the scaled
filter is
and output noise variance is
57
Pairing the Poles and Zeros
T-61.3015 Digital Signal Processing Filtering; Chapter 12
58
Section Ordering
Poles close to unit circle introduce gain and zeros
(on the unit circle) introduce attenuation
1) First, the poles closest to the
unit circle should be paired
with the nearest zeros
2) Next, the poles closest to the
previous set of poles should
be paired with the next
closest zeros
3) This process is continued
until all poles and zeros are
paired
Chapter12:AnalysisofFiniteWordlength
Efeects
Rounding after summation:
Minimizing the Output Round-Off Noise
Noise Transfer Functions
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Rounding before summation: k1 = kR+1 = 3,
kl = 5, for l = 2, 3,...,R
55
The scaling coefficients are
T-61.3015 Digital Signal Processing Filtering; Chapter 12
The noise model introduces noise sources to the
input/output summation of each block
The number of elementary noise sources, kl, has
different values depending on the location of
rounding (before or after the summation) and
depending on the block (first, intermediate, last)
Let kl be the total number multipliers connected to
the lth adder
59
A section in the front part of the cascade has its
transfer function Hi(z) appearing more frequently
in the scaling transfer functions
A section near the output end of the cascade has
its transfer function Hi(z) appearing more
frequently in the noise transfer function
expressions
=>
The best location for Hi(z) depends on
the type of norms being applied to the
scaling and noise transfer functions
T-61.3015 Digital Signal Processing Filtering; Chapter 12
60
10
T61.3015DigitalSignalProcessingand
Filtering
8.4.2013
Section Ordering
Error Spectrum Shaping
L2 scaling:
The ordering of paired sections does not influence too much
the output noise power since all norms in the expressions
are L2-norms
Quantization error can
be compensated
using the so called
error-feedback (or
error spectrum
shaping)
L scaling:
The sections with poles closest to the unit circle exhibit a
peaking magnitude response and should be placed closer to
the output end
=> Ordering should be from least-peaked to most-peaked
On the other hand, the ordering scheme is exactly opposite if
the objective is to minimize the peak noise ||Pyy()|| and L2scaling is used
The ordering has no effect on the peak noise with L-scaling
T-61.3015 Digital Signal Processing Filtering; Chapter 12
61
The filtered error
signal is added to the
signal branch before
quantization (Q[.]).
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Error Spectrum Shaping
Error Spectrum Shaping
Without error-feedback
the error signal e[n] is
the pure quantization
error, i.e.,
e[n] = y[n] - x[n]
In the compensated
structure the error signal
is the difference
between the output y[n]
and the compensated
input signal
T-61.3015 Digital Signal Processing Filtering; Chapter 12
62
63
Error Spectrum Shaping
Substituting w[n]:
Total error between
output and input is still:
T-61.3015 Digital Signal Processing Filtering; Chapter 12
64
Error Spectrum Shaping
Solving y[n] - x[n]:
Example: a=-2 and b=1
Taking the z-transform:
Double zero is at z=1
Noise spectrum is modified by attenuating
noise at low frequencies
where G(z) is the error shaping transfer
function
T-61.3015 Digital Signal Processing Filtering; Chapter 12
Chapter12:AnalysisofFiniteWordlength
Efeects
65
T-61.3015 Digital Signal Processing Filtering; Chapter 12
66
11