C Algorithms For Real-Time DSP - EMBREE
C Algorithms For Real-Time DSP - EMBREE
''
li
CONTENTS
ISBN: 0-13-337353-3
vii
PREFACE
CHAPTER 1
1.1 SEQUENCES 2
1.1.1 Toe Sampling Function 3
1.1.2 Samples Signal Spectra 4
1.1.3 Spectra of Continuous Time and Discrete Time Signals
Causality 10
Difference Equations 10
Toe z-Transfonn Description of Linear Operators 11
Frequency Domain Transfer Function of an Operator 14
Frequency Response from the z-Transform Description 15
1.3.1
1.3.2
1.3.3
1.3.4
27
iii
Contents
iv
53
2.8 STRUCTURES 82
2.8.1 Declaring and Referencing Structures
2.8.2 Pointers to Structures 84
2.8.3 Complex Numbers 85
78
82
87
53
C PROGRAMMING FuNDAMENTALS
Contents
28
92
97
DSP MICROPROCESSORS
IN EMBEDDED SYSTEMS
98
113
r
Contents
vi
CHAPTER
4.1.1
4.1.2
4.1.3
4.1.4
132
REAL-TIME FILTERING
132
PREFACE
163
170
CHAPfER
178
184
186
186
193
237
238
241
Digital signal processing techniques have become the method of choice in signal processing as digital computers have increased in speed, convenience, and availability. As
microprocessors have become less expensive and more powerful, the number of DSP applications which have become commonly available has exploded. Thus, some DSP
microprocessors can now be considered commodity products. Perhaps the most visible
high volume DSP applications are the so called "multimedia" applications in digital
audio, speech processing, digital vdeo, and digital communications. ln many cases, these
applications contain embedded digital signal processors where a host CPU works in a
loosely coupled way with one or more DSPs to control the signal flow or DSP algorithm
behavior at a real-time rate. Unfortunately, the development of signal processing algorithms for these specialized embedded DSPs is still difficult and often requires specialized training in a particular assembly language for the target DSP.
Toe tools for developing new DSP algorithms are slowly improving as the need to
design new DSP applications more quickly becomes important. Toe C language is proving itself to be a valuable programming tool for real-time computationally intensive software tasks. has high-level language capabilities (such as structures, arrays, and functions) ~ well as low-level assembly languag!l capabilities (such as bit manipulation,
direct hardware input/output, and macros) which makes C an ideal language for embedded DSP. Most of the manufacturers of digital signal processing <levices (suchas
Texas Instruments, AT&T, Motorola, and Analog Devices) provide C compilers, simulators, and emulators for their parts. These C compilers offer standard C language with extensions for DSP to allow for very efficient code to be generated. For example, an inline
assembly language capability is usually provided in order to optimize the performance of
time criticai parts of an application. Because the majority of the code is C, an application
can be transferred to another processor much more easily than an ali assembly language
program.
This book is constructed in such a way that it will be most useful to the engineer
who is familiar with DSP and the C language, but who is not necessarily an expert in
both. AH of the example programs in this book have been tested using standard C compilvii
viii
Preface
r
i
ers in the UNIX and MS-DOS programming environments. ln addition, the examples
have been compiled utilizing the real-time programing tools of specific real-time embedded DSP microprocessors (Analog Devices' ADSP-21020 and ADSP-21062; Texas
Instrument's TMS320C30 and TMS320C40; and AT&T's DSP32C) and then tested with
real-time hardware using real world signals. Ali of the example programs presented in the
text are provided in source code form on the IBM PC floppy disk included with the book.
Toe text is divided into severa! sections. Chapters 1 and 2 cover the basic principies
of digital signal processing and C programming. Readers familiar with these topics may
wish to skip one or both chapters. Chapter 3 introduces the basic real-time DSP programming techniques and typical programming environments which are used with DSP microprocessors. Chapter 4 covers the basic real-time filtering techniques which are the cornerstone of one-dimensional real-time digital signal processing. Finally, several real-time
DSP applications are presented in Chapter 5, including speech compression, music signal
processing, radar signal processing, and adaptive signal processing techniques.
Toe floppy disk included with this text contains C language source code for ali of
the DSP programs discussed in this book. Toe floppy disk has a high density format and
was written by MS-DOS. Toe appendix and the READ.ME files on the floppy disk provide more information about how to compile and run the C programs. These programs
have been tested using Borland's TURBO C (version 3 and greater) as well as Microsoft C
(versions 6 and greater) for the IBM PC. Real-time DSP platforms using the Analog
Devices ADSP-21020 and the ADSP-21062, the Texas Instruments TMS320C30, and the
AT&T DSP32C have been used extensively to test the real-time performance of the
algorithms.
CHAPTER
Digital signal processing begins with a digital signal which appears to the computer as a
sequence of digital values. Figure 1.1 shows an example of a digital signal processing operation or simple DSP system. There is an inpu~~equenc~n1_!he operator (J { } and an
output~s~E.l!.~_E~n). A complete digital signal processing system may consist of many
opertions on the sarne sequence as well as operations on the result of operations.
Because digital sequences are processed, ali operator~ in DSP are discr~~-~-Qp.e[l!!_Qfs
(as opposed to continuous time operators employed by analog systems). Discrete time operators may be classified as time-1:._arying_Q_rJime:llYWia1_1t and linear_or nonlinear. Most
of the operators described in this text will be time-invariant with the exception of adaptive filters which are discussed in Section 1.7. Linearity will be discussed in Section 1.2
and severa! nonlinear operators will be introduced in Section 1.5.
()yerat<Jrs are applied to sequences in order to effect the following results:
ACKNOWLEDGMENTS
I thank the following people for their generous help: Laura Mercs for help in preparing
the electronic manuscript and the software for the DSP32C; the engineers at Analog
Devices (in particular Steve Cox, Marc Hoffman, and Hans Rempel) for their review of
the manuscript as well as hardware and software support; Texas Instruments for hardware
and software support; Jim Bridges at Communication Automation & Control, Inc., and
Talai Itani at Domain Technologies, Inc.
Paul M. Embree
TRADEMARKS
IBM and IBM PC are trademarks of the International Business Machines Corporation.
MS-DOS and Mircosoft C are trademarks of the Microsoft Corporation.
TURBOC is a trademark of Borland International.
UNIX is a trademark of American Telephone and Telegraph Corporation.
DSP32C and DSP3210 are trademarks of American Telephone and Telegraph Corporation.
TMS320C30, TMS320C3 l, and TMS320C40 are trademarks of Texas Instruments lncorporated.
ADSP-21020, ADSP-21060, and ADSP-21062 are trademarks of Analog Devices lncorporated.
This chapter is divided into severa! sections. Section 1.1 deals with sequences of
numbers: where and how they originate, their spectra, and their relation to continuous
signals. Section 1.2 describes the common characteristics of linear time-invariant operators which are the most often used in DSP. Section 1.3 discusses the class of operators
called digital filters. Section 1.4 introduces the discrete Fourier transform (DFTs and
1
...._
r
~
1 l~
Chap. 1
!<
~
... :
(f} { }
FIGURE 1.1
Sequences
(1) Toe signal is sampled. It has finite value at only discrete points in time.
DPS Operation
,{n)
... x(2),
x(1), ,(O)
Sec. 1.1
(2) Toe signal is truncated outside some finite length representing a finite time interval.
(3) Toe signal is quantized. It is limited to discrete steps in amplitude, where the step
size and, therefore, the accuracy (or signalfidelity) depends on how many steps are
available in the A/D converter and on the arithmetic precision (number of bits) of
the digital signal processor or computer.
......)'(2),)'(o)
)'(1), )'(O)
DSP operation.
ln order to understand the nature of the results that DSP operators produce, these
characteristics must be taken into account. The effect of sampling will be considered in
Section 1.1.1. Truncation will be considered in the section on the discrete Fourier transform (Section 1.4) and quantization will be discussed in Section 1.7.4.
FFfs). Section 1.5 describes the properties of commonly used nonlinear operators.
Section 1.6 covers basic probability theory and random processes and discusses their ap-.
plication to signal processing. Finally, Section 1.7 discusses the subject of adaptive digital filters.
Toe sampling function is the key to traveling between the continuous time and discrete
time worlds. It is called by various names: the Dirac delta function, the sifting function,
the singularity function, and the sampling function among them. It has the following
properties:
ln order for the digital computer to manipulate a signal, the signal must have been sampled at some interval. Figure 1.2 shows an example of a continuous function of time
which has been sampled at intervals of T seconds. Toe resulting set of numbers is called a
sequence. If the continuous time function was x(t), then the samples would be x(n1) for n,
an integer extending over some finite range of values. It is common practice to normalize
the sample interval to 1 and drop it from the equations. Toe sequence then becomes x(n).
Care must be taken, however, when calculating power or energy from the sequences. Toe
sample interval, including units of time, must be reinserted at the appropriate points in the
power or energy calculations.
A sequence as a representation of a continuous time signal has the following important characteristics:
r
T
2T
3T
4T
5T
FIGURE 1.2
6T
7T
BT
(1.2)
X(t)
J(t - -r)dt = 1.
(1.1)
r::
/(t)~(t _ 't)dt
f(t)dt.
(1.3)
This can be thought of as a kind of smearing of the sampling process across a band which
is related to the pulse width of ~(t). A better approximation to the sampling function
would be a function ~(t) with a narrower pulse width. As the pulse width is narrowed,
however, the amplitude must be increased. ln the limit, the ideal sampling function must
have infinitely narrow pulse width so that it samples ata single instant in time, and infinitely large amplitude so that the sampled signal still contains the sarne finite energy.
Figure 1.2 illustrates the sampling process at sample intervals of T. Toe resulting
time waveform can be written
9T
Sampling.
L._
Chap. 1
Sec. 1.1
Sequences
~t)
x(t) =
(1.7)
Since this is true for any continuous function of time, x(t), it is also true for x.,(t).
1.01- - - - - - - -
Xs(f)
_ """T_ _ _ _ __
xs(t)e-j 21tftdt.
(1.8)
1
1
,:
,:--
,:+2
Xs(f)= J:[ntx(t)(t-nT)}-jZ1tftdt.
..
(1.9)
The order of the summation and integration can be interchanged and Property 1 of the
sampling function applied to give
~
FIGURE 1.3
Xs(f)
Lx(nT)e-j2rrfaT_
(1.10)
n=-oo
xs(t)= Iix(t)(t-nT).
(1.4)
n=-oo
The waveform that results from this process is impossible to visualize due to the infinite
amplitude and zero width of the ideal sampling function. It may be easier to picture a
somewhat less than ideal sampling function (one with very small width and very large
amplitude) multiplying the continuous time waveform.
It should be emphasized that xs(t) is a continuous time waveform made from the superposition of an infinite set of continuous time signals x(t)(t - n1). It can also be written
~
x_,(t) = Lx(nT)(t-nT)
(1.5)
n=-o::::,
since the sampling function gives a nonzero multiplier only at the values t = nT. In this
last equation, the sequence x(n1) makes its appearance. This is the set of numbers or samples on which almost ali DSP is based.
This equation is the exact form of a Fourier series representation of X.,(f), a periodic
function of frequency having period lr. The coefficients of the Fourier series are x(n1)
and they can be calculated from the following integral:
1
x(nT) = T
The last two equations are a Fourier series pair which allow calculation of either the time
signal or frequency spectrum in terms of the opposite member of the pair. Notice that the
use of the problematic signal xs<t) is eliminated and the sequence x(n1) can be used instead.
J:
J~
1
X(f)ejZ!ifnT df
=T
Xs(f)ejZrrfaT df.
(1.12)
-2T
(1.11)
-2T
x(t)e-j 21tftdt
(1.6)
The right-hand side of Equation (1.7) can be expressed as the infinite sum of a set of integrais with finite limits
x(nT) =
2m+I
m=-oo
2T
(l.13)
Chap. 1
L J~ X(. + ;)ej21tnT/2"rnT
m
m=--oo
2T
d..
(1.14)
Moving the summation inside the integral, recognizing that eJ7.nmn (for all integers m and
n) is equal to 1, and equating everything inside the integral to the similar part of Equation
(1.11) give the following relation:
~
X,(!)=
L X(!+;).
(1.15)
m=--oo
Equation ( 1.15) shows that the sampled time frequency spectrum is equal to an infinite
sum of shifted replicas of the continuous time frequency spectrum overlaid on each
other. The shift of the replicas is equal to the sample frequency, 1/T. It is interesting to examine the conditions under which the two spectra are equal to each other, at least for
a limited range of frequencies. ln the case where there are no spectral components of
frequency greater than '12 T in the original continuous time waveform, the two spectra
Sec. 1.1
Sequences
are equal over the frequency range f = -'12T to f = +'lzT Of course, the sampled time spectrum will repeat this sarne set of amplitudes periodically for all frequencies, while the
continuous time spectrum is identically zero for ali frequencies outside the specified
range.
The Nyquist sampling criterion is based on the derivation just presented and asserts
that a continuous time waveform, when sampled at a frequency greater than twice the
maximum frequency component in its spectrum, can be reconstructed completely from
the sampled waveform. Conversely, if a continuous time waveform is sampled at a
frequency lower than twice its maximum frequency component a phenomenon called
aliasing occurs. If a continuous time signal is reconstructed from an aliased representation, distortions will be introduced into the result and the degree of distortion is dependent on the degree of aliasing. Figure 1.4 shows the spectra of sampled signals without
aliasing and with aliasing. Figure 1.5 shows the reconstructed waveforms of an aliased
signal.
g(,t)
IG(f)I
Li_,
g(,t)
IG(f)I
f\
l :1, .
fs
I\ /\ tT.
fs
2
(b) Sampled spectrum
gP)
IGR(f)I
J 11
fs
fs
2
(c) Reconstructured spectrum
'\
FIGURE 1.4 Aliasing in the frequency domain. (ai Input spectrum.
(bl Sampled spectrum.
(cl Reconstructed spectrum.
, l'
,,.
I
\
\
,;
,~
Chap. 1
Sec. 1.2
Toe most commonly used DSP operators are linear and time-invariant (or LTI). The linearity property is stated as follows:
Given x(n), a finite sequence, and (J {
},
Recalling that (J {
= O{x(n)}.
y(n)
(l.17)
Every operator has a set of outputs that are its response when an impulse sequence is applied to its input. Toe impulse response is represented by h(n) so that
m=-
(l.18)
This impulse response is a sequence that has special significance for (J { }, since it is the
sequence that occurs at the output of the block labeled O{ } in Figure 1.1 when an impulse sequence is applied at the input. By time invariance it must be true that
y(n)= O{x(n))
(l.24)
h(n)= O{Uo(n)}.
(l.23)
(l.16)
If
x(n) = ax 1(n)+bx 2 (n)
(l.22)
(l.25)
so that
(1.19)
~
y(n)= Lx(m)h(n-m).
Another way to state this property is that if x(n) is periodic with period N such that
(l.26)
m:=:.-oo
x(n + N) = x(n)
Equation (l.26) states that y(n) is equal to the convolution of x(n) with the impulse response h(n). By substituting m = n - pinto Equation (l.26) an equivalent form is derived
= f(n)}.
Next, the LTI properties of the operator O{ } will be used to derive an expression and
method of calculation for O{x(n)}. First, the impulse sequence can be used to represent
x(n) in a different manner,
~
(l.20)
m=-oo
y(n)= Lh(p)x(n-p).
(l.27)
p=-
lt must be remembered that m and p are dummy variables and are used for purposes of
the summation only. From the equations just derived it is clear that the impulse response
completely characterizes the operator O{ } and can be used to label the block representing
the operator as in Figure l.6.
This is because
1
u0 (n-m)= { '
O,
n= m
(l.21)
otherwise.
Toe impulse sequence acts as a sampling or sifting function on the function x(m), using
the dummy variable m to sift through and find the single desired value x(n). Now this
somewhat devious representation of x(n) is substituted into the operator Equation (l.16):
x(n)
{~~n~
~, ,
y(n)
FIGURE 1.6 Impulse response representation of an operator.
10
Sec. 1.2
Chap. 1
11
or
1.2.1 Causality
ln the mathematical descriptions of sequences and operators thus far, it was assumed that
the impulse responses of operators may include values that occur before any applied
input stimulus. This is the most general form of the equations and has been suitable for
the development of the theory to this point. However, it is clear that no physical system
can produce an output in response to an input that has not yet been applied. Since DSP
operators and sequences have their basis in physical systems, it is more useful to consider
that subset of operators and sequences that can exist in the real world.
The first step in representing realizable sequences is to acknowledge that any sequence must have started at some time. Thus, it is assumed that any element of a sequence in a realizable system whose time ndex is less than zero has a value of zero.
Sequences which start at times !ater than this can still be represented, since an arbitrary
number of their beginning values can also be zero. However, the earliest true value of any
sequence must be at a value of n that is greater than or equal to zero. This attribute of sequences and operators is called causality, since it allows ali attributes of the sequence to
be caused by some physical phenomenon. Clearly, a sequence that has already existed for
infinite time lacks a cause, as the term is generally defined.
Thus, the convolution relation for causal operators becomes:
y(n)= 2i(m)x(n-m).
y(n)
2)
(1.32)
To represent an operator properly may require a very high value of N, and for some complex operators N may have to be infinite. ln practice, the value of N is kept within limits
manageable by a computer; there are often approximations made of a particular operator
to make N an acceptable size.
ln Equations (1.30) and (1.31) the terms y(n - m) and x(n - p) are shifted or delayed versions of the functions y(n) and x(n), respectively. For instance, Figure 1.7 shows
a sequence x(n) and x(n - 3), which is the sarne sequence delayed by three sample periods. Using this delaying property and Equation (l.32), a structure or flow graph can be
constructed for the general form of a discrete time LTI operator. This structure is shown
in Figure 1.8. Each of the boxes is a delay element with unity gain. Toe coefficients are
shown next to the legs of the flow graph to which they apply. Toe circles enclosing the
summation symbol (L) are adder elements.
(1.28)
m=O
This form follows naturally since the impulse response is a sequence and can have no
values for m less than zero.
(1.33)
Z.{x(n)}= Lx(n)z-n
n=O
where the symbol Z.{ } stands for "z-transform of," and the z in the equation is a complex
number. One of the most important properties of the z-transform is its relationship to time
Ali discrete time, linear, causal, time-invariant operators can be described in theory by
the Nth order difference equation
x(n)
N-1
N-1
~>my(n-m)= :~)px(n-p)
m=O
(1.29)
p=O
where x(n) is the stimulus for the operator and y(n) is the results or output of the operator.
The equation remains completely general if all coefficients are normalized by the value
of a0 giving
N-1
(1.30)
p=O
p=O
IJI
3
(1.31)
.-----,,......---,~~--.-~~~~~~~~~~~~~~.-n
2
5
7
6
3
4
8
FIGURE 1.7 Shifting of a sequence.
m=l
oo
N-1
-~r-~. ~ . . . r I r .
N-1
12
Chap. 1
Sec. 1.2
13
=
x(n)
.,,
= z-p I,x(m)z-m.
----------.Y(n)
(1.37)
m=O
But comparing the summation in this last equation to Equation (1.33) for the z-transform
of x(n), it can be seen that
Z.{x(n- p))
= z-Pz_{x(n)) = z-PX(z).
(1.38)
This property of the z-transform can be applied to the general equation for L TI operators
as follows:
(1.39)
Since the z-transform is a linear transform, it possesses the distributive and associative
properties. Equation ( 1.39) can be simplified as follows:
=
(1.40)
=
FIGURE 1.8 Flow graph structure of linear operators.
delay in sequences. To show this property take a sequence, x(n), with a z-transform as
follows:
Z.{x(n)} = X(z)
= L, x(n)z-n.
Y(z)+ LPz-PY(z)
p=I
= L,bqz-qX(z)
(l.41)
q=O
Y(+ ta,,-lx(,{~vJ
(l.42)
Finally, Equation (1.42) can be rearranged to give the transfer function in the z-transform
domain:
(l.34)
n=O
L,bqz-q
Y(z) _
H(z) = X(z) -
Z.{x(n-p))
=I,x(n- p)z-n.
(l.35)
n=O
q=O
~a
l+ ~
z-P
(1.43)
p=I
=
Z.{x(n-p))= I,x(m)z-(m+p)
(l.36)
m=O
14
Chap. 1
Sec. 1.2
15
i--~~---~~----~zj
.,,
Y(f)
X(f)
= i- h(m)e-jZflfm
::'o
'
(1.47)
which is easily recognized as the Fourier transform of the series h(m). Rewriting this equation
Y(f)
X(f)
= H(f) = .'f{h(m)}.
(1.48)
Figure 1.10 shows the time domain block diagram of Equation (1.48) and Figure 1.11
shows the Fourier transform (or frequency domain) block diagram and equation. Toe frequency domain description of a linear operator is often used to describe the operator.
Most often it is shown as an amplitude and a phase angle plotas a function of the variable
f (sometimes normalized with respect to the sampling rate, 1/T).
1.2.5 Frequency Response
from the z-Transform Description
X,(f) = Lx(nT)e-j2itfaT
(1.49)
n::::::-00
and
RGURE 1.9
x(nT) =
J:
X,(f)ejZllfnT df.
(1.50)
Taking the Fourier transform of both sides of Equation (1.28) (which describes any L TI
causal operator) results in the following:
A(n)
.'f'{y(n)} =
Lh(m).'f{x(n - m)}.
(1.44)
h(m)
l .
>(n)
00
m=O
y(n) =
h(m) x(n - m)
m=O
(1.45)
H{f)
(1.46)
Y(f) = H(f) X(f)
1 .
>(()
FIGURE 1.11 Frequency block diagram of LTI system.
16
Chap. 1
ln order to simplify the notation, the value of T, the period of the sampling waveform, is
normalized to be equal to one.
Now compare Equation (1.49) to the equation for the z-transform of x(n) as follows:
~
Sec. 1.3
Digital Filters
Summarizing the last few paragraphs, the impulse response of an operator is simply
a sequence, h(m), and the Fourier transform of this sequence is the frequency response of
the operator. Toe z-transform of the sequence h(m), called H(z), can be evaluated on the
unit circle to yield the frequency domain representation of the sequence. This can be written as follows:
(1.51)
X(z) = Lx(n)z-n.
n=O
Equations (1.49) and (l.51) are equal for sequences x(n) which are causal (i.e., x(n)
for ali n < O) if z is set as follows:
z = ej 2 11f.
(I.52)
13
z =a+ jj3
Toe linear operators that have been presented and analyzed in the previous sections can
be thought of as digital filters. Toe concept of filtering is an analogy between the action
of a physical strainer or sifter and the action of a linear operator on sequences when the
operator is viewed in the frequency domain. Such a filter might allow certain frequency
components of the input to pass unchanged to the output while blocking other components. Naturally, any such action will have its corresponding result in the time domain.
This view of linear operators opens a wide area of theoretical analysis and provides increased understanding of the action of digital systems.
There are two broad classes of digital filters. Recall the difference equation for a
general operator:
Q-1
P-1
q=O
a
2
-2
(1.53)
=O
A plot of the locus of values for z in the complex plane described by Equation (l.52) is
shown in Figure 1.12. Toe plot is a circle of unit radius. Thus, the z-transform of a causal
sequence, x(n), when evaluated on the unit circle in the complex plane, is equivalent to
the frequency domain representation of the sequence. This is one of the properties of the
z-transfonn which make it very useful for discrete signal analysis.
--3
17
(1.54)
p=I
Notice that the infinite sums have been replaced with finite sums. This is necessary in
order that the filters can be physically realizable.
Toe first class of digital filters have ap equal to O for all p. Toe common name for
filters of this type is finite impulse response (FIR) filters, since their response to an impulse dies away in a finite number of samples. These filters are also called moving average (or MA) filters, since the output is simply a weighted average of the input values.
Q-1
y(n)= Lbqx(n-q).
(l.55)
q=O
There is a window of these weights (bq) that takes exactly the Q most recent values of
x(n) and combines them to produce the output.
Toe second class of digital filters are infinite impulse response (IIR) filters. This
class includes both autoregressive (AR) filters and the most general form, autoregressive
moving average (ARMA) filters. ln the AR case ali bq for q = l to Q- 1 are set to O.
-2
P-1
y(n)=x(n)- LaPy(n-p)
FIGURE 1.12
p=I
(1.56)
18
Chap. 1
For ARMA filters, the more general Equation (1.54) applies. ln either type of IIR filter, a
single-impulse response at the input can continue to provide output of infinite duration
with a given set of coefficients. Stability can be a problem for IIR filters, since with
poorly chosen coefficients, the output can grow without bound for some inputs.
1.3.1 Finite Impulse Response (FIR) Filters
Sec. 1.3
Digital Filters
19
Rabiner and Gold 1975). The program REMEZ.C is a C language implementation of the
Parks-McClellan program and is included on the enclosed disk. An example of a filter designed using the REMEZ program is shown at the end of section 4.1.2 in chapter 4.
The design of digital filters will not be considered in detail here. lnterested readers
may wish to consult references listed at the end of this chapter giving complete descriptions of ali the popular techniques.
The frequency response of FIR filters can be investigated by using the transfer
function developed for a general linear operator:
Q-1
y(n)= Llqx(n-q).
q=O
Q-1
(l.57)
Y(z) _
__ H(z)- X(z)
Comparing this equation with the convolution relation for linear operators
Ibqz-q
q=O
P-1
~ a z-P
l+ ~ p
y(n)= Lh(m)x(n-m),
m=O
one can see that the coefficients in an FIR filter are identical to the elements in the impulse response sequence if this impulse response is finite in length.
bq = h(q)
p=l
Notice that the sums have been made finite to make the filter realizable. Since for FIR filters the Pare ali equal to O, the equation becomes:
Q-1
H(z)
This means that if one is given the impulse response sequence for a linear operator with a
finite impulse response one can immediately write down the FIR filter coefficients.
However, as was mentioned at the start of this section, filter theory looks at linear operators primarily from the frequency domain point of view. Therefore, one is most often
given the desired frequency domain response and asked to determine the FIR filter coefficients.
There are a number of methods for determining the coefficients for FIR filters
given the frequency domain response. The two most popular FIR filter design methods
are Iisted and described briefly below.
1. Use of the DFT on the sampled frequency response. ln this method the required
frequency response of the filter is sampled at a frequency interval of 1/T where T is the
time between samples in the DSP system. The inverse discrete Fourier transform (see
section 1.4) is then applied to this sampled response to produce the impulse response of
the filter. Best results are usually achieved if a smoothing window is applied to the frequency response before the inverse DFf is performed. A simple method to obtain FIR filter coefficients based on the Kaiser window is described in section 4.1.2 in chapter 4.
2. Optimal mini-max approximation using linear programming techniques. There is
a well-known program written by Parks and McClellan (1973) that uses the REMEZ exchange algorithm to produce an optimal set of FIR filter coefficients, given the required
frequency response of the filter. The Parks-McClellan program is available on the IEEE
digital signal processing tape or as part of many of the filter design packages available for
personal computers. The program is also printed in severa! DSP texts (see Elliot 1987 or
(l.58)
= Y(z) = ~ b
-;:'o
X(z)
z-q.
(l.59)
The Fourier transform or frequency response of the transfer function is obtained by letting z = eJ2 1tf, which gives
Q-1
H(f)=H(z)I
-2
z=el
=~be-j2 xfq_
,rf
~ q
q=O
(1.60)
There is an important class of FIR filters for which this polynomial can be factored into a
product of sums from
n
M-1
H(z)=
N-1
(z- +amz-
m=O
(1.61)
n=O
This expression for the transfer function makes explicit the values of the variable z- 1 which
cause H(z) to become zero. These points are simply the roots of the quadratic equation
2
which in general provides complex conjugate zero pairs, and the values 'Yn which provide
single zeros.
20
Chap. 1
Sec. 1.3
Digital Filters
21
q=O
Factoring out the factor e-j1.1t(Q-l)j7Z and letting equal (Q- 1)/2 gives
H(f)
Combining the coefficients with complex conjugate phases and placing them together in
brackets
H(f)
ej21t{-2)/
+b
Q-3
e-j21t(-2)/]
+...}
If each pair of coefficients inside the brackets is set equal as follows:
bo =bQ-1
b,
Q-1
L,hqz-q
_ Y(z)
H(z)- X(z)
= q=:_
~1 a
l+ LJ
z-P
p
p=I
p=I
where a and Pare constants. If the transfer function of a filter can be separated into a real
function of f multiplied by a phase factor ej[a.f + 131, then this transfer function will exhibit
linear phase.
P-1
y(n)= L,hqx(n-q)-}:>py(n-p).
=bQ-2
b2 = bQ_ 3 , etc.
Each term in brackets becomes a cosine function and the linear phase relationship is
achieved. This is a common characteristic of FIR filter coefficients.
No simple relationship exists between the coefficients of the IIR filter and the impulse response sequence such as that which exists in the FIR case. Also, obtaining linear
phase IIR filters is nota straightforward coefficient relationship as is the case for FIR filters. However, IIR filters have an important advantage over FIR structures: ln general,
IIR filters require fewer coefficients to approximate a given filter frequency response
than do FIR filters. This means that results can be computed faster on a general purpose
computer or with less hardware in a special purpose design. ln other words, IIR filters are
computationally efficient. The disadvantage of the recursive realization is that IIR filters
are much more difficult to design and implement. Stability, roundoff noise, and sometimes phase nonlinearity must be considered carefully in ali but the most trivial IIR filter
designs.
Toe direct form IIR filter realization shown in Figure 1.9, though simple in appearance, can have severe response sensitivity problems because of coefficient quantization,
especially as the order of the filter increases. To reduce these effects, the transfer function
is usually decomposed into second order sections and then realized as cascade sections.
Toe C language implementation given in section 4.1.3 uses single precision floating-point
numbers in order to avoid coefficient quantization effects associated with fixed-point implementations that can cause instability and significant changes in the transfer function.
IIR digital filters can be designed in many ways, but by far the most common IIR
design method is the bilinear transform. This method relies on the existence of a known
s-domain transfer function (or Laplace transform) of the filter to be designed. Toe
s-domain filter coefficients are transformed into equivalent z-domain coefficients for use
in an IIR digital filter. This might seem like a problem, since s-domain transfer functions
are just as hard to determine as z-domain transfer functions. Fortunately, Laplace transform methods and s-domain transfer functions were developed many years ago for designing analog filters as well as for modeling mechanical and even biological syswms.
Thus, many tables of s-domain filter coefficients are available for almost any type of filter function (see the references for a few examples). Also, computer programs are available to generate coefficients for many of the common filter types (see the books by Jong,
Anoutino, Steams (1993), Embree (1991), or one of the many filter design packages
22
Chap. 1
available for personal computers). Because of the vast array of available filter tables, the
large number of filter types, and because the design and selection of a filter requires careful examination of ali the requirements (passband ripple, stopband attenuation as well as
phase response in some cases), the subject of s-domain IIR filter design will not be covered in this book. However, several IIR filter designs with exact z-domain coefficients
are given in the examples in section 4.1 and on the enclosed disk.
1.3.3 Examples of Filter Responses
As an example of the frequency response of an FIR filter with very simple coefficients,
take the following moving average difference equation:
Sec. 1.3
One would suspect that this filter would be a lowpass type by inspection of the coefficients,
since a constant (DC) value at the input will produce that sarne value at the output. Also,
since ali coefficients are positive, it will tend to average adjacent values of the signal.
The response of this FIR filter is shown in Figure 1.13. It is indeed lowpass and the
nulls in the stop band are characteristic of discrete time filters in general.
As an example of the simplest IIR filter, take the following difference equation:
y(n) =x(n) + y(n -1).
Some contemplation of this filter' s response to some simple inputs (like constant values,
O, 1, and so on) will lead to the conclusion that it is an integrator. For zero input, the output holds at a constant value forever. For any constant positive input greater than zero,
the output grows linearly with time. For any constant negative input, the output decreases
linearly with time. The frequency response ofthis filter is shown in Figure 1.14.
As mentioned previously, filters are generally specified by their performance in the frequency domain, both amplitude and phase response as a function of frequency. Figure l .15 shows a lowpass filter magnitude response characteristic. The filter gain has
20~~~~~~~~~~~~~~~~~~~--.~~~~~-r~----,
15
-5
10
-10
i.,
""'aE
23
Digital Filters
-15
bO
"'
-20
-5
-25
-10~~~~~~~~~~~~~~~~~~~~~~~~~
-30~~~~~~~~~~~~~~~~~~~~~~~~~
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.25
0.3
0.35
Frequency (f/fs)
Frequency (f/fs)
AGURE 1.13
0.2
FIGURE 1.14
0.4
0.45
0.5
24
Chap. 1
been nonnalized to be roughly 1.0 at low frequencies and the sampling rate is normalized
to unity. Toe figure illustrates the most important terms associated with filter specifications.
Toe region where the filter allows the input signal to pass to the output with little or
no attenuation is called the passband. ln a lowpass filter, the passband extends from frequency f = O to the start of the transition band, marked as frequency f passin Figure 1.15.
Toe transition band is that region where the filter smoothly changes from passing the signal to stopping the signal. Toe end of the transition band occurs at the stopband frequency, fstop Toe stopband is the range of frequencies over which the filter is specified to
attenuate the signal by a given factor. Typically, a filter will be specified by the following
parameters:
(1)
(2)
(3)
(4)
Computer programs that calculate filter coefficients from frequency domain magnitude response parameters use the above list or some variation as the program input.
Sec. 1.4
where the sample time period has been normalized to l (T = l). If the sequence is of limited duration (as must be troe to be of use in a computer) then
N-1
Passband
0.6
1
1
O)
<li
::l:
1
1
(l)
.a
2
(l.63)
n;Q
where the sampled time domain waveform is N samples long. Toe inverse Fourier transform is
_r-1{X(f)} = x(n) =
112
X(f)e-j 21tfadf
-1/2
(l.64)
since X(f} is periodic with period 1/T = 1, the integral can be taken over any full period.
Therefore,
f:
X(f)e-j 21tfadf.
(l.65)
1.4.1 Form
1.0
1-0
0.8
"O
(l.62)
n;Q
x(n)
1+0
25
1
1
0.4
1
1
0.2
1
______ L_
1/
These representations for the Fourier transform are accurate but they have a major drawback
for digital applications-the frequency variable is continuous, not discrete. To overcome this
problem, both the time and frequency representations of the signal must be approximated.
To create a discrete Fourier transform (DFT) a sampled version of the frequency
waveform is used. This sampling in the frequency domain is equivalent to convolution in
the time domain with the following time waveform:
Transition
Band
1
1
1
1
1
h1(t)
_L (t- rT).
r=-oo
1
1
Stopband
O T
fpass fstop
Frequency
FIGURE 1.15 Magnitude response of normalized lowpass filter.
0.5fs
This creates duplicates of the sampled time domain waveform that repeats with period T.
This Tis equal to the T used above in the time domain sequence. Next, by using the sarne
number of samples in one period of the repeating frequency domain waveform as in one period of the time domain waveform, a DFT pair is obtained that is a good approximation to
the continuous variable Fourier transform pair. Toe forward discrete Fourier transform is
N-1
X(k) = _Lx(n)e-j27tknt N
n;Q
(l.66)
26
Chap. 1
x(n)=
_!_ L,X(k)e-jZnknlN_
N
(l.67)
k=O
Sec. 1.4
27
This corresponds to Property 8-5 in Brigharn. Rernernber that for the DFf it is assurned
that the sequence x(m) goes on forever repeating its values based on the period n = O to
N - 1. So the rneaning of the negative time argurnents is sirnply that
x(-p)= x(N- p), forp
For a complete developrnent of the DFf by both graphical and theoretical rneans, see the
text by Brigharn (chapter 6).
1.4.2 Properties
This section describes sorne of the properties of the DFf. Toe corresponding paragraph
nurnbers in the book The Fast Fourier Transform by Brigharn (1974) are indicated. Dueto
the sarnpling theorern it is clear that no frequency higher than 112r can be represented by
X(k). However, the values of k extend to N-1, which corresponds to a frequency nearly
equal to the sarnpling frequency 11r. This rneans that for a real sequence, the values of k
frorn N/2 to N-1 are aliased and, in fact, the amplitudes of these values of X(k) are
I X(k) 1= I X(N -k) 1, for k = N/2 to N -1.
= O toN-l.
The DFf is often used as an analysis tool for determining the spectra of input sequences.
Most often the amplitude of a particular frequency cornponent in the input signal is desired. The DFf can be broken into amplitude and phase cornponents as follows:
X(!) = X,ea1 (!) + j Ximag{f)
X(!)
= 1X(f)lej0<f)
(l.70)
(l.71)
(l.68)
and 0(!) = tan -I [ Ximag ]
Toe power spectrurn of the signal can be determined using the signal spectrurn times its
conjugate as follows:
2
x(n)
where a and
X,ea1
where A(k) and B(k) are the DFfs of the time functions a(n) and b(n), respectively. This
corresponds to Property 8-1 in Brigharn.
Toe DFf also displays a similar attribute under time shifting as the z-transform. If
X(k) is the DFf of x(n) then
N-1
DFf{x(n-p)}= I,x(n-p)e-jZknlN
n=O
DFf{x(n - p)} =
I,x(m)e-j!tkm/N e-j211kp/N,
m=-p
= a a(n) + ~ b(n),
(l.69)
2
+ Ximag
(1.72)
There are sorne problerns with using the DFf as a spectrurn analysis tool, however. Toe
problern of interest here concems the assurnption rnade in deriving the DFf that the sequence was a single period of a periodically repeating waveform. For alrnost all sequences there will be a discontinuity in the time waveform at the boundaries between
these pseudo periods. This discontinuity will result in very high-frequency cornponents in
the resulting waveform. Since these cornponents can be rnuch higher than the sarnpling
theorern lirnit of 1/2 r (or half the sarnpling frequency) they rnay be aliased into the rniddle
of the spectrurn developed by the DFf.
Toe technique used to overcorne this difficulty is called windowing. Toe problern to
be overcorne is the possible discontinuity at the edges of each period of the waveform.
Since for a general purpose DFf algorithrn there is no way to know the degree of discontinuity at the boundaries, the windowing technique sirnply reduces the sequence amplitude at the boundaries. lt does this in a gradual and srnooth rnanner so that no new discontinuities are produced, and the result is a substantial reduction in the aliased frequency
cornponents. This irnprovernent does not come without a cost. Because the window is
rnodifying the sequence before a DFf is performed, sorne reduction in the fidelity of the
spectral representation rnust be expected. Toe result is sornewhat reduced resolution of
closely spaced frequency cornponents. Toe best windows achieve the rnaxirnurn reduction of spurious (or aliased) signals with the rninirnurn degradation of spectral resolution.
There are a variety of windows, but they ali work essentially the sarne way:
28
Chap.1
Attenuate the sequence elements near the boundaries (near n =O and n =N - 1) and compensate by increasing the values that are far away from the boundaries. Each window has
its own individual transition from the center region to the outer elements. For a comparison of window performance see the references Iisted at the end of this chapter. (For example, see Harris (1983)).
Sec. 1.4
29
By representing the even elements of the sequence x(n) by xev and the odd elements
by xod, the equation can be rewritten
Nfl-1
Nfl-1
X(k) = Lxev(n)W;fz
+ wtn Ixod(n)w;~.
n=O
Now there are two expressions in the form of DFfs so Equation ( 1. 77) can be simplified
as follows:
Because signals are always associated with noise-either due to some physical attribute of
the signal generator or externai noise picked up by the signal source-the DFf of a single
sequence from a continuous time process is often not a good indication of the true spectrum of the signal. The solution to this dilemma is to take multiple DFfs from successive
sequences from the sarne signal source and take the time average of the power spectrum.
If a new DFf is taken each NT seconds and successive DFfs are labeled with superscripts:
M-1
i
2 i 2]
Power Spectrum -_ ~
.L.J[ X,.a
1 ) + (Ximag) .
(1.73)
i=O
Clearly, the spectrum of the signal cannot be allowed to change significantly during the
interval t =O to t = M (NT).
1.4.5 The Fast Fourier Transform (FFT)
Thefast Fourier transform (or FFf) is a very efficient algorithm for computing the DFf
of a sequence. lt takes advantage of the fact that many computations are repeated in the
DFf due to the periodic nature of the discrete Fourier kemel: e-j"21tkn I N. Toe form of the
DFfis
N-1
X(k) = Lx(n)e-j27tkn/N.
(1.74)
n=O
By Ietting
n=O
Now, w(N + qN)(k + rN) = wnk for ali q, r that are integers due to the periodicity of the
Fourier kemel.
Next break the DFf into two parts as follows:
Nfl-1
X(k)
(1.78)
Notice that only DFfs of N/2 points need be calculated to find the value of X(k). Since
the index k must go to N - 1, however, the periodic property of the even and odd DFfs is
used. ln other words,
Xev(k)=Xev(k-
2)
for !'!...<k<N-I
2 .
(1.79)
The process of dividing the resulting DFfs into even and odd halves can be repeated until
one is left with only two point DFfs to evaluate
A(k)
for ali k
= (O)+(l)
= (O)-(l)
for k odd.
for keven
Therefore, for 2 point DFfs no multiplication is required, only additions and subtractions. To compute the complete DFf still requires multiplication of the individual 2-point
DFfs by appropriate factors of Wranging from w 0 to wN12- 1. Figure 1.16 shows a flow
graph of a complete 32-point FFf. The savings in computation due to the FFf algorithm
is as follows.
For the original DFf, N complex multiplications are required for each of N values
of k. Also, N - 1 additions are required for each k.
ln an FFf each function of the form
(O) WP(l)
N-1
X(k)= Lx(n)Wnk.
Nfl-1
(1.77)
n=O
n=O
where the subscript N on the Fourier kemel represents the size of the sequence.
(1.76)
(called a butte,jly dueto its flow graph shape) requires one multiplication and two additions. From the flow graph in Figure 1.16 the number of butterflies is
Number of butterflies = N log 2 (N).
2
This is because there are N/2 rows of butterflies (since each butterfly has two inputs) and
there are logi{N) columns of butterflies.
Table 1.1 gives a listing of additions and multiplications for various sizes of FFfs
and DFfs. The dramatic savings in time for larger DFfs provided in the FFf has made
this method of spectral analysis practical in many cases where a straight DFf computa-
30
Chap. 1
Sec. 1.4
o
2
24
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
31
TABLE 1.1 Comparison of Number of Butterfly Operations in the DFT and FFT,
(each operation is one complex multiply/accumulate calculation).
16
8
Transfonn Length
DFI' Operations
(N)
(N2)
FFf Operations
NLOG2 (N)
20
12
8
16
32
36
2
18
10
64
26
128
256
512
1024
2048
22
14
30
1
17
9
25
21
22
23
37
3
19
11
27
7
24
25
26
27
28
X(f) =
15
31
30
31
FIGURE 1.16
160
384
8%
1024
4608
10240
22528
23
29
24
64
With this input a 16-point FFf will produce a very simple output. This output is
shown in Figure 1.18. lt is a spike at k = 4 of amplitude 0.5 anda spike at k = 12 of amplitude -0.5. Toe spike nature in the FFf output in this example occurs because for a cosine waveform of arbitrary frequency the Fourier transform is
5
21
13
20
64
256
1024
4096
16384
65536
262144
1048576
4194304
2-
2-
32-Point, radix 2, in-place FFT. (From Rabiner and Gold, 1975, p. 380.)
tion would be much too time consuming. Also, the FFf can be used for performing operations in the frequency domain that would require much more time consuming computations in the time domain.
1.4.6 An Example of the FFT
Cos (21t4 16 )
+1
ln order to help the reader gain more understanding of spectrum analysis with the FFf, a
simple example is presented here. An input signal to a 16-point FFf processor is as follows:
-1
RGURE 1.17
10
12
14
~n
32
Chap. 1
X(k)
-0.5
12
1
2
1
4
1
6
10
.. k
14
Sec. 1.5
Nonlinear Operators
ability to differentiate between amplitudes of sound waves decreases with the amplitude
of the sound. ln these cases, a nonlinear function is applied to the signal and the resulting
output range of values is quantized uniformly with the available bits.
This process is illustrated in Figure 1.19. First, the input signal is shown in Figure
1.19(a). The accuracy is 12 bits and the range is O to 4.095 volts, so each quantization
levei represents 1 mV. It is necessary because of some system consideration (such as
transmission bandwidth) to reduce the number bits in each word to 8. Figure l.19(b)
shows that the resulting quantization leveis are 16 times as coarse. Figure l.19(c) shows
the result of applying a linear-logarithmic compression to the input signal. ln this type of
compression the low-level signals (out to some specified value) are unchanged from the
input values. Beginning at a selected levei, say f;n = a, a logarithmic function is applied.
The form of the function might be
out
--0.5
33
a)
so that atfin = a the output also equals a and A is chosen to place the maximum value of
out at the desired point.
FIGURE 1.18 Output of 16-point FFT.
It can be shown that the integrand in the two integrais above integrates to O unless the ar-
gument of the exponential is O. If the argument of the exponential is zero, the result is
two infinite spikes, one at f =fo and the other at f = -f0 . These are delta functions in the
frequency domain.
Based on these results, and remembering that the impulse sequence is the digital
analog of the delta function, the results for the FFf seem more plausible. lt is still left to
explain why k = 12 should be equivalent to f =-f0 . Referring back to the development of
the DFf, it was necessary atone point for the frequency spectrum to become periodic
with periodfs Also, in the DFf only positive indices are used. Combining these two facts
one can obtain the results shown in Figure 1.18.
A simpler version of the sarne process is shown in Figure 1.20. lnstead of applying
a logarithmic function from the point f = a onward, the output values for f;:: a are ali the
sarne. This is an example of clipping. A region of interest is defined and any values outside the region are given a constant output.
1.5.1 -Law and A-Law Compression
There are two other compression laws worth listing because of their use in telephonythe -law and A-law conversions. The -law conversion is defined as follows:
, =s
Jout
gn
( +: ) ln(l + l/;_ 0 1)
Jin
ln(l + ) ,
(l.80)
where sgn() is a function that takes the sign of its argument, and is the compression parameter (255 for North American telephone transmission). The input value fn must be
normalized to lie between -1 and +l. TheA-law conversion equations are as follows:
A({; 1
Most of this book is devoted to linear operators and linear-signal processing because
these are the most commonly used techniques in DSP. However, there are severa! nonlinear operators that are very useful in one-dimensional DSP. This section introduces the
simple class of nonlinear operators that compress or clip the input to derive the output sequence.
There is often a need to reduce the number of significant bits in a quantized sequence. This is sometimes done by truncation of the least significant bits. This process is
advantageous because it is linear: The quantization error is increased uniforrnly over the
entire range of values of the sequence. There are many applications, however, where the
need for accuracy in quantization is considerably less at high-signal values than at lowsignal values. This is true in telephone voice communications where the human ear' s
out
= sgn(fin) 1+ ln(A)
l + ln( Al{; 0 1)
== sgn(/;_n) l + ln(A)
(1.81)
34
Chap. 1
Output (lnteger)
4095
Sec. 1.6
35
Output (lnteger)
255
1 mV/bit
2048
Input (Volts)
2.048
128
4.095
(a)
255
4.095
0.128
16 mV/bit
128
FIGURE 1.20
Clipping to 8 bits.
Input
2.048
4.095
(b)
Toe signals of interest in most signal-processing problems are embedded in an environment of noise and interference. The noise may be due to spurious signals picked up during transrnission (interference), or dueto the noise characteristics of the electronics that
receives the signal or a number of other sources. To <leal effectively with noise in a signal, some model of the noise or of the signal plus noise must be used. Most often a probabilistic model is used, since the noise is, by nature, unpredictable. This section introduces
the concepts of probability and randomness that are basic to digital signal processing and
gives some examples of the way a composite signal of interest plus noise is modeled.
256
128
---,F-------------~ Input
0.128
4.096
togram of the picture elements) and any image element with a value higher than threshold
is set to l and any element with a value lower than threshold is set to zero. ln this way the
significant bits are reduced to only one. Pictures properly thresholded can produce excellent outlines of the most interesting objects in the image, which simplifies further processing considerably.
Probability begins by defining the probability of an event labeled A as P(A). Event A can
be the result of a coin toss, the outcome of a horse race, or any other result of an activity
that is not completely predictable. There are three attributes of this probability P(A):
(1) P(A) > = O. This simply means that any result will either have a positive chance of
occurrence or no chance of occurrence.
(2) P (Ali possible outcomes) = l. This indicates that some result among those possible
is bound to occur, a probability of 1 being certainty.
(3) For {A;}. where (A; n A;)= O, P(uA;) l:; P(A;). For a set of events, {A;}, where
the events are mutually disjoint (no two can occur as the result of a single triai of
36
Chap. 1
the activity), the probability of any one of the events occurring is equal to the sum
of their individual probabilities.
With probability defined in this way, the discussion can be extended to joint and
conditional probabilities. Joint probability is defined as the probability of occurrence of a
specific set of two or more events as the result of a single triai of an activity. For instance,
the probability that horse A will finish third and horse B will finish first in a particular
horse race is ajoint probability. This is written:
P(A n B) = P(A and B) = P(AB).
(l .82)
(l.83)
If this conditional probability, P(AIB), and the probability of B are both known, the probability of both of these events occurring Goint probability) is
P(AB) = P(AIB)P(B).
37
the variable lying within a range of values. A cumulative distribution function (or CDF)
for a random variable can be defined as follows:
F(x)
= P(X S x).
(1.86)
(1.87)
= rp()d..
(1.88)
(1.84)
Sec. 1.6
Since F(x) is always monotonically increasing, p(x) must be always positive or zero.
Figure 1.22 shows the density function for the distribution of Figure 1.21. Toe utility of
these functions can be illustrated by determining the probability that the random variable
X lies between a and b. By using probability Property 3 from above
(1.85)
~XS~=~a<XS~+~xs~
(l.89)
This is another way to define conditional probability once joint probability is understood.
This is troe because the two conditions on the right-hand side are independent (mutually
exclusive) and X must meet one or the other if it meets the condition on the left-hand
side. This equation can be expressed using the definition of the distribution:
P(a < X S b)
= F(b)-F(a)
(l.90)
= J:p(x)dx.
ln this way, knowing the distribution or the density function allows the calculation of the
probability that X Iies within any given range.
38
Chap. 1
F(x)
Sec. 1.6
39
on. Toe expected value is sometimes called the mean, average, or first moment of the
variable and is calculated from the density function as follows:
E[x] = f~(x)cb:.
(1.91)
A typical density function for a random variable is shown in Figure 1.23. Toe most likely
value of variable x is also indicated in the figure. Toe expected value can be thought of as
a "center of gravity" or first moment of the random variable x.
Toe variance of a random variable is defined as
a 2 = Var{x} = E((x-E[x]) 2 ),
~~~-!"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~......
(1.92)
where a is the root mean square value of the variable's difference from the mean. Toe
variance is sometimes called the mean square value of x.
By extending the use of the expectation operator to joint probability densities, a
variable Y can be a function of two random variables, s and t such that
Y=O{s,t}.
FIGURE 1.21
J:J[s,
t}p(s, t)dsdt
(1.93)
where the joint probability density of s and t (p(s,t)), is required in the equation. Toe correlation of two random variables is defined to be the expected value of their product
E[st} =
p(x)
J:J;,
p(s, t)dsdt.
(1.94)
p(_x)
X
~~-"--""-~~~~~~~~~-1-~~~~~~....-~~~~~~~~~~~~~-
E(x)
FIGURE 1.22
Density function.
FIGURE 1.23
40
Chap. 1
Sec. 1.6
(x-)2] ,
l e x{
p(x)= ~
OUANTIZATION
t~
-r---
000010
_.,_
_.,_
000000
-r---
Digital
Code
Ouantized
Sampie
31
dj+1
.
3
.. ,
RECONSTRUCTION
f ~
011110
30
111110
- - - - -,Reconstruction
_/Leveis
62
..
111111
63
au
64
(l.95)
2cr 2
-v21tcr
41
,t~
WWib__.
ir~.
--000001
---
2
1
o
Original
Sample
FIGURE 1.25
-dj
___
___
8L
Quantization
Decision
Leveis
E=
E{U- j)2} =
ru
(! -
})2 p(f)df,
al
and if the signal range is broken up into the segments between decision leveis dj and dj+I
then
Lf
J-1
E=
E{U -
})2} =
d
j+I
(! -
rj)2 p(f)df.
j=O dj
Numerical solutions can be determined that minimize E for several common probability
densities. The most common assumption is a uniform density (p(f) equals IIN for all values of f, where N is the number of decision intervals ). ln this case, the decision leveis are
uniformly spaced throughout the interval and the reconstruction leveis are centered between decision leveis. This method of quantization is almost universal in commercial
analog-to-digital converters. For this case the error in the analog-to-digital converter output is uniformly distributed from -'fz of the least significant bit to +1/2 of the least significant bit. If it is assumed that the value of the least significant bit is unity, then the mean
squared error due to this uniform quantization is given by:
var{d =
J-i+.;.,(!- /)'
p(f)df =
f+~
l
2
,- f df = -,
12
42
Chap. 1
since p(f) = 1 from - 1/2 to +'12 . This mean squared error gives the equivalent variance, or
noise power, added to the original continuous analog sarnples as a result of the uniform
quantization. If it is further assumed that the quantization error can be modeled as a stationary, uncorrelated white noise process (which is a good approximation when the number
of quantization leveis is greater than 16), then a maximum signal-to-noise ratio (SNR)
can be defined for a quantization process of B bits (28 quantization leveis) as follows:
SNR = 10 log 10 (V
A random process is a function composed of random variables. An example is the random process f (t). For each value of t, the process f(t) can be considered a random variable. For t = a there is a random variable f(a) that has a probability density, an expected
value (or mean), anda variance as defined in section 1.6.3. ln a two-dimensional image,
the function would be f(x,y), where x and y are spatial variables. A two-dimensional random process is usually called a randomfield. Each f(a,b) is a random variable.
One of the important aspects of a random process is the way in which the random
variables at different points in the process are related to each other. Toe concept of joint
probability is extended to distribution and density functions. Ajoint probability distribution is defined as
the products of the random variables which make up the process. The symbol for autocorrelation is ~ (t1, t2 ) for the function f(t) and the definition is
(l.98)
J:r~
(l.99)
dsdt
(1.100)
By its nature, a noise process cannot be specified as a function of time in the way a deterministic signal can. Usually a noise process can be described with a probability function
and the first and second moments of the process. Although this is only a partia! characterization, a considerable amount of analysis can be performed using moment parameters
alone. Toe first moment of a process is simply its average or mean value. ln this section,
ali processes will have zero mean, simplifying the algebra and derivations but providing
results for the most common set of processes.
Toe second moment is the autocorrelation of the process
for k
=O, 1, 2, ....
Toe processes considered here are stationary to second order. This means that the first
and second order statistics do not change with time. This allows the autocorrelation to be
represented by
d2 F(s, t)
p (s,t ) = - - -
where Pt< J3; t 1, t2) is the joint probability density ft...t 1) andfih). By including a and J3
in the parentheses the dependence of p on these variables is made explicit.
1
ln the general case, the autocorrelation can have different values for each value of
t 1 and t2 . However, there is an important special class of random processes called stationary processes for which the form of the autocorrelation is somewhat simpler. ln stationary random processes, the autocorrelation is only a function of the difference between the
two time variables. For stationary processes
(l.96)
43
ln section 1.6.6 the continuous variable theory presented here is extended to discrete variables and the concept of modeling real world signals is introduced.
J:
),
where V2 is the total signal power. For exarnple, if a sinusoid is sampled with a peak amplitude of 28 - 1, then V2 = 228/8 giving the signal to noise ratio for a full scale sinusoidas
F(s, t) =
Sec. 1.6
(1.97)
ln section 1.6.3 it was shown that the correlation of two random variables is the expected
value of their product. Toe autocorrelation of a random process is the expected value of
for k
= O, 1, 2, ...
since it is a function only of the time difference between samples and not the time variable itself. ln any process, an important member of the set of autocorrelation values is
r(O), which is
r(O) = E{u(n)u*(n)} = E{lu(n)l2 },
(1.101)
44
Chap. 1
which is the mean square value of the process. For a zero mean process this is equal to
the variance of the signal
r(O) = var{u}.
Sec. 1.6
45
x(n)
i----r----- y(n)
.. ,
(l.102)
~Z=
u(n-1)
u(n) = 1u(n - 2)
~+~7~
1 +a1z
-82
(l.103)
u(n-M+I)
AR Filter
R=
E{u(n)u8 (n)}
r(O)
r(l)
r(2) ...
r(-1)
r(O)
r(l) ...
r(-2)
r(-1)
r(O) ...
(l.104)
x(n)
r(m-1)
~,
y(n)
r(-1)
-1
r(O)
r(-M + 1)
r(-M + 2)
~Z) = b0 + b1z
-2
+ b~
r(l)
S(f)=
Lr(k)e-i 2 19'k,
MA Filter
(1.105)
k=-M+I
which is the discrete Fourier transform (DFf) of the autocorrelation of the process (r(k)).
Thus, the autocorrelation is the time domain description of the second order statistics, and
the power spectral density, S(f), is the frequency domain representation. This power
spectral density can be modified by discrete time filters.
Discrete time filters may be classified as autoregressive (AR), moving average
(MA), or a combination of the two (ARMA). Examples of these filter structures and the
z-transforms of each of their impulse responses are shown in Figure 1.26. It is theoretically possible to create any arbitrary output stochastic process from an input white noise
Gaussian process using a filter of sufficiently high (possibly infinite) order.
Referring again to the three filter structures in Figure 1.26, it is possible to create
any arbitrary transfer function H(z) with any one of the three structures. However, the orders of the realizations will be very different for one structure as compared to another.
For instance, an infinite order MA filter may be required to duplicate an Mfh order AR
filter.
One of the most basic theorems of adaptive and optimal filter theory is the Wold
x(n)
.. ,
.. .Y(n)
-1
-2
H(
b0 + b1z + b~
,z) =
1 +a1z + ~z
--------1------2
ARMA Filter
FIGURE 126 AR, MA, and ARMA filter structures.
46
Chap. 1
Sec. 1.7
47
decomposition. This theorem states that any real-world process can be decomposed into a
deterministic component (such as a sum of sine waves at specified amplitudes, phases,
and frequencies) and a noise process. ln addition, the theorem states that the noise
process can be modeled as the output of a linear filter excited at its input by a white noise
signal.
d (desired output)
x (input)
y (ou1p"t)
Processor 1
~~
..,
, (erro<)
I
The problem of determining the optimum linear filter was solved by Norbert Wiener and
others. The solution is referred to as the Wiener filter and is discussed in section 1.7.I.
Adaptive filters and adaptive systems attempt to find an optimum set of filter parameters
(often by approximating the Wiener optimum filter) based on the time varying input and
output signals. ln this section, adaptive filters and their application in closed loop adaptive systems are discussed briefly. Closed-loop adaptive systems are distinguished from
open-Ioop systems by the fact that in a closed-loop system the adaptive processor is controlled based on information obtained from the input signal and the output signal of the
processor. FigtJre 1.27 illustrates a basic adaptive system consisting of a processor that is
controlled by an adaptive algorithm, which is in tum controlled by a performance calculation algorithm that has direct knowledge of the input and output signals.
Closed-loop adaptive systems have the advantage that the performance calculation
algorithm can continuously monitor the input signal (d) and the output signal (y) and determine if the performance of the system is within acceptable limits. However, because
severa! feedback loops may exist in this adaptive structure, the automatic optimization algorithm may be difficult to design, the system may become unstable or may result in
nonunique and/or nonoptimum solutions. ln other situations, the adaptation process may
not converge and Iead to a system with grossly poor performance. ln spite of these possible drawbacks, closed-loop adaptive systems are widely used in communications, digital
storage systems, radar, sonar, and biomedical systems.
The general adaptive system shown in Figure l.27(a) can be applied in severa!
ways. Toe most common application is prediction, where the desired signal (d) is the application provided input signal and a delayed version of the input signal is provided to the
input of the adaptive processor (x) as shown in Figure l.27(b). The adaptive processor
mUst then try to predict the current input signal in order to reduce the error signal (E) toward a mean squared value of zero. Prediction is often used in signal encoding (for example, speech compression), because if the next values of a signal can be accurately predicted, then these samples need not be transmitted or stored. Prediction can also be used
to reduce noise or interference and therefore enhance the signal quality if the adaptive
processor is designed to only predict the signal and ignore random noise elements or
known interference patterns.
As shown in Figure l.27(c), another application of adaptive systems is system
modeling of an unknown or difficult to characterize system. The desired signal (d) is the
unknown system's output and the input to the unknown system and the adaptive processor (x) is a broadband test signal (perhaps white Gaussian noise). After adaptation, the
Adaptive
algorithm
(a)
(b)
d
Plant
Adaptive
processor
(e)
FIGURE 1.27 (ai Closed-loop adaptive system; (b) prediction; (e) system
modeling.
48
Chap. 1
unknown system is modeled by the final transfer function of the adaptive processor. By
using an AR, MA, or ARMA adaptive processor, different system models can be obtained. The magnitude of the error (E) can be used to judge the relative success of each
model.
Sec. 1.7
Toe problem of determining the optimum linear filter given the structure shown in Figure
l .28 was solved by Norbert Wiener and others. Toe solution is referred to as the Wiener
filter. The statement ofthe problem is as follows:
Determine a set of coefficients, wk, that minimize the mean of the squared error of
the filtered output as compared to some desired output. Toe error is written
M
e(n) == d(n)-
L, w;u(n -k + 1),
(l.106)
or in vector form
-2p+2Rw 0 ==0
(1.111)
Rw 0 == p.
(1.112)
or
8
Toe mean squared error is a function of the tap weight vector w chosen and is written
J(w) ==
E{ e(n)e*(n)}.
wo ==R-1p.
(1.113)
So the optimum tap weight vector depends on the autocorrelation of the input
process and the cross correlation between the input process and the desired output.
Equation (1.113) is called the normal equation because a filter derived from this equation
will produce an error that is orthogonal (or normal) to each element of the input vector.
This can be written
w;
(1.110)
where p == E{u(n)d*(n)}, the vector that is the product of the cross correlation between
the desired signal and each element of the input vector.
ln order to minimize J(w) with respect to w, the tap weight vector, one must set the derivative of J(w) with respect to w equal to zero. This will give an equation which, when
solved for w, gives Wo, the optimum value of w. Setting the total derivative equal to zero gives
k=I
49
u(_n)
w;
w;_1
w;,
E{u(n)e0 *(n)} == O.
(l.114)
lt is helpful at this point to consider what must be known to solve the Wiener filter
problem:
----------
(2) Toe cross correlation vector between u(n) and d(n) the desired response.
+ d(n)
e(n)
AGURE 1.28 Wiener filter problem.
lt is clear that knowledge of any individual u(n) will not be sufficient to calculate
these statistics. One must take the ensemble average, E{ }, to form both the autocorrelation and the cross correlation. ln practice, a model is developed for the input process and
from this model the second order statistics are derived.
A legitimate question at this point is: What is d(n)? It depends on the problem. One example of the use of Wiener filter theory is in linear predictive filtering. ln this case, the desired signal is the next value ofu(n), the input. Toe actual u(n) is always available one sample after the prediction is made and this gives the ideal check on the quality of the prediction.
52
Chap. 1
EMBREE, P. and KIMBLE, B. (1991). C Language Algorithms for Digital Signal Processing.
Englewood Cliffs, NJ: Prentice Hall.
HARRIS, F. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Fourier
Transform. Proceedings ofthe IEEE., 66, (l), 51-83.
HA YKIN, S. (1986). Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice Hall.
MCCLELLAN, J., PARKS, T. and RABINER, L.R. (1973). A Computer Program for Designing
Optimum FIR Linear Phase Digital Filters. IEEE Transactions on Audio and Electro-acoustics,
AU-21. (6), 506--526.
CHAPTER
MOLER, C., LITrLE, J. and BANGERT, S. (1987). PC-MATLAB User's Guide. Sherbourne, MA: The
Math Works.
C PROGRAMMING
PPENHEIM, A. and SCHAPER, R. (1975). Digital Signal Processing. Englewood Cliffs, NJ:
Prentice Hall.
fUNDAMENTALS
PPENHEIM, A. and SCHAPER, R. (1989). Discrete-time Signal Processing. Englewood C!iffs, NJ:
Prentice Hall.
PAPOUL!S, A. (1965). Probability, Random Variables and Stochastic Processes. New York:
McGraw-Hill.
RABINER, L. and GOLD, B. (1975). Theory and Application of Digital Signal Processing.
Englewood Cliffs, NJ: Prentice Hall.
STEARNS, S. and DAVID, R. (1988). Signal Processing Algorithms. Englewood Cliffs, NJ: Prentice
Hall.
STEARNS, S. and DAVID, R. ( 1993). Signal Processing Algorithms in FORTRAN and C. Englewood
Cliffs, NJ: Prentice Hall.
VAIDYANATHAN, P. (1993). Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice
Hall.
WIDROW, B. and STEARNS, S. (1985). Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice
Hall.
The purpose of this chapter is to provide the programmer with a complete overview of
the fundamentais of the C programming language that are important in DSP applications.
ln particular, text manipulation, bitfields, enumerated data types, and unions are not discussed, because they have limited utility in the majority of DSP programs. Readers with
C programming experience may wish to skip the bulk of this chapter with the possible
exception of the more advanced concepts related to pointers and structures presented in
sections 2.7 and 2.8. The proper use of pointers and data structures in C can make a DSP
program easier to write and much easier for others to understand. Example DSP programs
in this chapter and those which follow will clarify the importance of pointers and data
structures in DSP programs.
50
Chap. 1
The LMS algorithm is the simplest and most used adaptive algorithm in use today. ln this
brief section, the LMS algorithm as it is applied to the adaptation of time-varying FIR filters (MA systems) and IIR filters (adaptive recursive filters or ARMA systems) is described. A detailed derivation, justification and convergence properties can be found in
the references.
For the adaptive FIR system the transfer function is described by
Q-1
Sec. 1.8
References
51
where b(k) and a(k) indicate the time-varying coefficients of the filter. With an IIR filter,
the mean squared error performance surface in the multidimensional space of the filter
coefficients is not a quadratic function and can have multiple minimums that may cause
the adaptive algorithm to never reach the MMSE solution. Because the IIR system has
poles, the system can become unstable if the poles ever move outside the unit circle during the adaptive process. These two potential problems are serious disadvantages of adaptive recursive filters that limit their application and complexity. For this reason, most applications are limited to a small number of poles. The LMS algorithm can again be used
to update the filter coefficients based on the method of steepest descent. This can be described in vector notation as follows:
(1.115)
q=O
wk+1
where b(k) indicates the time-varying coefficients of the filter. With an FIR filter the
mean squared error performance surface in the multidimensional space of the filter coefficients is a quadratic function and has a single minimum mean squared error (MMSE).
The coefficient values at the optimal solution is called the MMSE solution. The goal of
the adaptive process is to adjust the filter coefficients in such a way that they move from
their current position toward the MMSE solution. If the input signal changes with time,
the adaptive system must continually adjust the coefficients to follow the MMSE solution. ln practice, the MMSE solution is often never reached.
The LMS algorithm updates the filter coefficients based on the method of steepest
descent. This can be described in vector notation as follows:
Bk+I =Bk -\\
= wk -M\\,
(I.120)
where W k is the coefficient column vector containing the a and b coefficients, M is a diagonal matrix containing convergence parameters for the a coefficients and v 0 through
v P- l that controls the rate of convergence of the b coefficients. ln this case, the gradient
is approximated as
Vk ':-2Ek[aoQ-lf31f3p]7,
(l.121)
an(k)=x(k-n)+ Ll/k)an(k-q)
(1.116)
(l.122)
q=O
where Bk is the coefficient column vector, is a parameter that controls the rate of convergence and the gradient is approximated as
P-1
f3n(k)
(1.123)
p=O
aE(Ei] =-2Ekxk
\\=~-
(1.117)
where Xk is the input signal column vector and Ek is the error signal as shown on Figure
1.27. Thus, the basic LMS algorithm can be written as
Bk+l = Bk + 2EkXk
(1.118)
q=O
p=l
1.8 REFERENCES
BRIGHAM, E. (1974). The Fast Fourier Transform. Englewood Cliffs, NJ: Prentice Hall.
CLARKSON, P. (1993). Optimal and Adaptive Signal Processing. FL: CRC Press.
P-1
The selection of the convergence parameters must be done carefully because if they
are too small the coefficient vector will adapt very slowly and may not react to changes in
the input signal. If the convergence parameters are too large the system will adapt to
noise in the signal or may become unstable. The proposed new Iocation of the poles
should also be tested before each update to determine if an unstable adaptive filter is
about to be used. If an unstable pole Iocation is found the update should not tak.e place
and the next update value may Iead to a better solution.
(1.119)
ELIOTI, D.F. (Ed.). (1987). Handbook of Digital Signal Processing. San Diego, CA: Academic
Press.
54
C Programming Fundamentais
Sec. 2.1
Chap. 2
(4) A method of organizing the data and the operations so that a sequence of program
steps can be executed from anywhere in the program (functions and data structures)
and
(5) A method to move data back and forth between the outside world and the program
(input/output)
These tive elements are required for efficient programming of DSP algorithms. Their implementation in C is described in the remainder of this chapter.
As a preview of the C programming language, a simple real-time DSP program is
shown in Listing 2.1. It illustrates each of the tive elements of DSP prograrnming. Toe
listing is divided into six sections as indicated by the comments in the program. This simple DSP program gets a series of numbers from an input source such as an A/D converter
(the function getinput () is not shown, since it would be hardware specitic) and determines the average and variance of the numbers which were sampled. ln signal-processing
terms, the output of the program is the DC levei and total AC power of the signal.
Toe first line of Listing 2.1, main ( ), declares that the program called main, which
has no arguments, will be detined after the next left brace ( { on the next line). Toe main
program (called main because it is executed first and is responsible for the main control
of the program) is declared in the sarne way as the functions. Between the left brace on
the second line and the right brace half way down the page (before the line that starts
float average .. .) are the statements that form the main program. As shown in this
example, all statements in C end in a semicolon (;) and may be placed anywhere on the
input line. ln fact, ali spaces and carriage control characters are ignored by most C compilers. Listing 2.1 is shown in a format intended to make it easier to follow and modify.
Toe third and fourth lines of Listing 2.1 are statements declaring the functions
(average, variance, sqrt) that will be used in the rest of the main program (the
function s~rt ( ) is detined in the standard C library as discussed in the Appendix. This
first section of Listing 2.1 relates to program organization (element four of the above
list). Toe beginning of each section of the program is indicated by comments in the program source code (i. e., /* section 1 * /). Most C compilers allow any sequence of
characters (including multiple lines and, in some cases, nested comments) between the
/ * and * / delimiters.
Section two of the program declares the variables to be used. Some variables are
declared as singlefloating-point numbers (such as ave and var); some variables are declared as single integers (such as i, count, and number); and some variables are arrays (such as signal [100] ). This program section relates to element one, data organization.
Section three reads 100 floating-point values into an array called signal using a for
loop (similar to a DO loop in FORTRAN). This loop is inside an infinite while loop
that is common in real-time programs. For every 100 samples, the program will display
the results and then get another 100 samples. Thus, the results are displayed in real-time.
This section relates to element tive (input/output) and element three (program control).
Section four of the example program uses the functions average and variance
main()
55
/* section 1 */
float average(),variance(),sqrt();
/* declare functions */
float signal[lOO],ave,var;
int count,i;
/*section 2 */
/* declare variables */
while(l) {
for(count =O; count < 100; count++) { /* section 3 */
signal[count] = getinput();
/* read input signal */
ave= average(signal,count);
var = variance(signal,count);
/* section 4 */
/* calculate results */
printf("\n\nAverage = %f",ave);
printf ( Variance = %f", var);
/* section 5 */
/* print results */
/* section 6 */
/* calculate average */
/* return variance */
Listing 2.1
to calculate the statistics to be printed. Toe variables ave and var are used to store the
results and the library function printf is used to display the results. This part of the
program relates to element four (functions and data structures) because the operations detined in functions average and variance are executed and stored.
56
C Programming Fundamentais
Chap. 2
Section five uses the library function printf to display the results ave, var
and also calls the function sqrt in order to display the standard deviation. This p~
of the program relates to element four (functions) and element five (input/output), because the operations defined in function sqrt are executed and the results are also displayed.
Toe two functions, average and variance, are defined in the remaining part of
Listing 2.1. This last section relates primarily to element two (operators), since the detailed operation of each function is defined in the sarne way that the main program was
defined. The function and argument types are defined and the local variables to be used in
each function are declared. The operations required by each function are then defined followed by a retum statement that passes the result back to the main program.
Sec. 2.2
57
main()
{
int i;
short j;
long k;
float a;
double b;
k = 72000;
/*
/*
/*
/*
/*
j = k;
i = k;
b = 0.1;
a= b;
A C program must declare the variable before it is used in the program. There are severa!
types of numbers used depending on the format in which the numbers are stored (floating-point format or integer format) and the accuracy of the numbers (single-precision versus double-precision floating-point, for example). The following example program illustrates the use offive different types ofnumbers:
Three types of integer numbers (int, short int, and long int) and two types of
floating-point numbers (float and double) are illustrated in this example. Toe actual
sizes (in terms of the number of bytes used to store the variable) of these five types depends upon the implementation; ali that is guaranteed is that a short int variable will
not be larger than a long int and a double wi!l be twice as large as a float. The
size of a variable declared as just int depends on the compiler implementation. lt is normally the size most conveniently manipulated by the target computer, thereby making
programs using ints the most efficient on a particular machine. However, if the size of
the integer representation is important in a program (as it often is) then declaring variables as int could make the program behave differently on different machines. For example, on a 16-bit machine, the above program would produce the following results:
72000 6464 6464
o.100000000000000
0.100000001490116
But on a 32-bit machine (using 32-bit ints), the output would be as follows:
72000 6464 72000
o.100000000000000
0.100000001490116
Note that in both cases the short and long variables, k and j, (the first two numbers displayed) are the sarne, while the third number, indicating the int i, differs. ln both cases,
the value 6464 is obtained by masking the lower 16 bits of the 32-bit k value. Also, in both
cases, the floating-point representation ofO.l with 32 bits (float) is accurate to eight decimal places (seven places is typical). With 64 bits it is accurate to at least 15 places.
Thus, to make a program truly portable, the program should contain only short
int and long int declarations (these may be abbreviated short and long). ln addi-
C Programming Fundamentais
58
Chap. 2
tion to the five types illustrated above, the three ints can be declared as unsigned by
preceding the declaration with unsigned. Also, as will be discussed in more detail in the
next section conceming text data, a variable may be declared to be only one byte long by
declaring it achar (signed or unsigned). Toe following table gives the typical sizes
and ranges of the different variable types for a 32-bit machine (such as a VAX) anda 16bit machine (such as the IBM PC).
16-bit
Machine
Size (bits)
16-bit
Machine
Range
32-bit
Macbine
Size (bits)
32-bit
Macbine
Range
char
-128 to 127
-128 to 127
unsigoed char
O to 255
O to 255
int
16
-32768 to 32767
32
2.le9
unsigned int
16
Oto65535
32
O to 4.3e9
short
16
-32768 to 32767
16
-32768 to 32767
unsigned short
16
Oto 65535
16
O to 65535
long
32
2.le9
32
2.le9
unsigned long
32
O to4.3e9
32
Oto 4.3e9
float
32
l.Oe38
32
le38
double
64
l.Oe306
64
le308
Variable
Declaration
Sec. 2.3
Operators
59
Note that the array definition unsigned char image [256] [256] could define an
8-bit, 256 by 256 image plane where a grey scale image is represented by values from O
to 255. Toe last definition defines a three-dimensional matrix in a similar fashion. One
difference between C and other languages is that arrays are referenced using brackets to
enclose each index. Thus, the image array, as defined above, would be referenced as
image [ i] [j] where i and j are row and column indices, respectively. Also, the first
element in ali array indices is zero and the last element is N-1, where N is the size of the
array in a particular dimension. Thus, an assignment of the first element of the five element, one-dimensional array input (as defined above) such as input[O]=l.3; is
legal while input [ 5] =1. 3; is n:ot.
Arrays may be initialized when they are declared. Toe values to initialize the array are
enclosed in one or more sets of braces ( {}) and the values are separated by commas. For
example, a one-dimensional array called vector can be declared and initialized as follows:
int vector[6]
= { 1,
2, 3, 5, 8, 13 };
Note that commas separate the three sets of inner braces that designate each of the three rows
of the matrix a, and that each array initialization is a statement that must end in a sernicolon.
2.2.2 Arrays
Almost ali high-level languages allow the definition of indexed lists of a given data type,
commonly referred to as arrays. ln C, ali data types can be declared as an array simply by
placing the number of elements to be assigned to the array in brackets after the array
name. Multidimensional arrays can be defined simply by appending more brackets containing the array size in each dimension. Any N-dimensional array is defined as follows:
type name[sizel] [size2] ... [sizeN];
For example, each of the following statements are valid array definitions:
unsigned int list[lO];
double input[5];
short int x[2000];
char input_buffer[20];
unsigned char image[256] (256];
int matrix[4] (3] [2];
2.3 OPERATORS
Once variables are defined to be a given size and type, some sort of manipulation must be
performed using the variables. This is done by using operators. The C language has more
operators than most languages; in addition to the usual assignment and arithmetic operators, C also has bitwise operators and a full set of logical operators. Some of these operators (such as bitwise operators) are especially important in order to write DSP programs
that utilize the target processor efficiently.
2.3.1 Assignment Operators
The most basic operator is the assignment operator which, in C, is the single equal sign
(=). The value on the right of the equal sign is assigned to the variable on the left.
Assignment statements can also be stacked, as in the statement a=b=l;. ln this case, the
statement is evaluated right to left so that l is assigned to b and b is assigned to a. ln C,
C Programming Fundamentais
60
Chap. 2
Sec. 2.3
61
Operators
a=ave (x) is an expression, while a=ave {x); is a statement. The addition of the
semicolon tells the compiler that this is all that will be done with the result from the function ave {x). An expression always has a value that can be used in other expressions.
Thus, a=b+ {c=ave {x)); is a legal statement. The result of this statement would be
that the result retumed by ave {x) is assigned to e and b+c is assigned to a. C also allows multiple expressions to be placed within one statement by separating them with the
commas. Each expression is evaluated left to right, and the entire expression (comprised
of more than one expression) assumes the value of the last expression which is evaluated.
For example, a= { olda=a,ave {x) ) ; assigns the current value of a
olda, calls the
function ave {x) and then assigns the value retumed by ave {x) to a.
The unary bitwise NOT operator, which inverts ali the bits in the operand, is implemented with the - symbol. For example, if i is declared as an unsigned int, then
i = -o; sets i to the maximum integer value for an unsigned int.
Toe usual set of binary arithmetic operators (operators which perform arithmetic on two
operands) are supported in C using the following symbols:
C allows operators to be combined with the assignment operator (=) so that almost any
statement of the form
to
multiplication
division
addition
subtraction
modulus (integer remainder after division)
*
/
+
%
The first four operators listed are defined for ali types of variables (char, int, float,
and double). Toe modulus operator is only defined for integer operands. Also, there is
no exponent operator in C; this floating-point operation is supported using a simple function call (see the Appendix for a description of the pow function).
ln C, there are three unary arithmetic operators which require only one operand.
First is the unary minus operator (for example, -i, where i is an int) that performs a
two' s-complement change of sign of the integer operand. Toe unary minus is often useful
when the exact hardware implementation of a digital-signal processing algorithm must be
simulated. The other two unary arithmetic operators are increment and decrement, represented by the symbols ++ and --, respectively. These operators add or subtract one from
any integer variable or pointer. Toe operand is often used in the middle of an expression,
and the increment or decrement can be done before or after the variable is used in the expression (depending on whether the operator is before or after the variable). Although the
use of ++ and -- is often associated with pointers (see section 2.7), the following example illustrates these two powerful operators with the ints i, j, and k:
i =
=
k =
k =
k =
j
4;
7;
i++ + j;
k + --j;
k + i++;
&:
"
<<
>>
bitwise AND
bitwise OR
bitwise exclusive OR
arithmetic shift left (number of bits is operand)
arithmetic shift right (number of bits is operand)
<variable>
where <variable> represents the sarne variable name in ali cases. For example, the
following pairs of expressions involving x and y perform the sarne function:
X= X+ Yi
X=
X=
X =
X=
X=
X=
X X*
X /
X%
X &
X
y;
y;
y;
y;
y;
y;
y;
X= X
<< y;
X= X >> y;
X+= y;
X-= y;
X
*= Yi
y;
y;
y;
y;
X/=
X%=
X&=
X :=
X"'=
y;
X<<= y;
X>>= y;
ln many cases, the Ieft-hand column of statements will result in a more readable and easier to understand program. For this reason, use of combined operators is often avoided.
Unfortunately, some compiler implementations may generate more efficient code if the
combined operator is used.
2.3.4 logical Operators
/* i is incremented to 5, k = 11 */
/* j is decremented to 6, k = 17 */
/* i is incremented to 6, k = 22 */
Binary bitwise operations are performed on integer operands using the following symbols:
Like ali C expressions, an expression involving a logical operator also has a value. A logical operator is any operator that gives a result of true or false. This could be a comparison between two values, or the result of a series of ANDs and ORs. If the result of a logical operation is true, it has a nonzero value; if it is false, it has the value O. Loops and if
C Programming Fundamentais
62
Chap. 2
statements (covered in section 2.4) check the result of logical operations and change pr0gram flow accordingly. Toe nine logical operators are as follows:
<
<=
==
>=
>
!=
&&:
li
!
less than
Iess than or equal to
equal to
greater than or equal to
greater than
not equal to
logica!AND
logical OR
logical NOT (unary operator)
Note that == can easily be confused with the assignment operator (=) and will result in a
valid expression because the assignment also has a value, which is then interpreted as
true or false. Also, &:&: and 11 should not be confused with their bitwise counterparts (&:
and 1) as this may result in hard to find logic problems, because the bitwise results may
not give true or false when expected.
2.3.5 Operator Precedence and Type Conversion
Like all computer languages, C has an operator precedence that defines which operators in
an expression are evaluated first. 1f this order is not desired, then parentheses can be used
to change the order. Thus, things in parentheses are evaluated first and items of equal
precedence are evaluated from left to right. Toe operators contained in the parentheses or
expression are evaluated in the following order (listed by decreasing precedence):
++,--
*,/,%
+,<<,>>
<,<=,>=,>
==, !=
&:
"
&:&:
11
increment, decrement
unary minus
multiplication, division, modulus
addition, subtraction
shift Ieft, shift right
relational with Iess than or greater than
equal, not equal
bitwise AND
bitwise exclusive OR
bitwise OR
logical AND
logical OR
Statements and expressions using the operators just described should normally use variables and constants of the sarne type. If, however, you mix types, C doesn't stop dead
Sec. 2.4
63
Program Control
(like Pascal) or produce a strange unexpected result (like FORTRAN). lnstead, C uses a
set of roles to make type conversions automatically. The two basic roles are:
(1) If an operation involves two types, the value with a Iower rank is converted to the
type of higher rank. This process is called promotion and the ranking from highest
to Iowest type is double, float, long, int, short, and char. Unsigned of each of the
types outranks the individual signed type.
(2) ln an assignment statement, the final result is converted to the type of the variable
that is being assigned. This may result in promotion or demotion where the value is
troncated to a lower ranking type.
Usually these roles work quite well, but sometimes the conversions must be stated
explicitly in order to demand that a conversion be done in a certain way. This is accomplished by type casting the quantity by placing the name of the desired type in parentheses before the variable or expression. Thus, if i is an int, then the statement
i=lO* ( 1. 55+1. 67); would set i to 32 (the truncation of 32.2), while the statement
i=lO* ( ( int) 1. 55+1. 67); would set i to 26 (the troncation of 26.7 since
( int) 1. 55 is truncated to l).
Toe large set of operators in C allows a great <leal of programming flexibility for DSP applications. Programs that must perform fast binary or logical operations can do so without
using special functions to do the bitwise operations. C also has a complete set of program
control features that allow conditional execution or repetition of statements based on the
result of an expression. Proper use of these control structures is discussed in section
2.11.2, where structured prograrnming techniques are considered.
2.4.1 Conditional Execution: if-else
where value is any expression that results in (or can be converted to) an integer value.
If value is nonzero (indicating a true result), then statementl is executed; otherwise,
statement2 is executed. Note that the result of an expression used for value need
not be the result of a logical operation-all that is required is that the expression results in
a zero value when statement2 should be executed instead of statementl. Also, the
C Programming Fundamentais
64
Chap.2
/* positive outputs*/
else {
if(result < sigma)
out = 2;
else
out = 1;
/* negative outputs*/
/* biggest output*/
/*O< result <=sigma*/
Sec. 2.4
65
Program Control
Program control jumps to the statement after the case label with the constant (an integer
or single character in quotes) that matches the result of the integer expression in the
swi tch statement. If no constant matches the expression value, control goes to the statement following the default label. If the default label is not present and no matching case
labels are found, then control proceeds with the next statement following the switch
statement. When a matching constant is found, the remaining statements after the corresponding case label are executed until the end of the switch statement is reached, or a
break statement is reached that redirects control to the next statement after the swi tch
statement. A simple example is as follows:
switch(i) {
case O:
printf ( \nError: I is zero");
break;
case 1:
j = k*k;
break;
default:
j = k*k/i;
/* smallest output*/
Note that the inner if-else statements are compound statements (each consisting of two
statements), which make the braces necessary in the outer if-else control structure (without the braces there would be too many else statements, resulting in a compilation error).
The use of the break statement after the first two case statements is required in order to
prevent the next statements from being executed (a break is not required after the last
case or default statement). Thus, the above code segment sets j equal to k*k/i,
unless i is zero, in which case it will indicate an error and leave j unchanged. Note that
since the divide operation usually takes more time than the case statement branch, some
execution time will be saved whenever i equals 1.
When a program must choose between severa! altematives, the if-else statement becomes inconvenient and sometimes inefficient. When more than four altematives from a
single expression are chosen, the switch statement is very useful. The basic form ofthe
switch statement is as follows:
C offers a way to express one if-else control structure in a single Iine. lt is called a
conditional expression, because it uses the conditional operator, ? : , which is the only trinary operator in C. The general form of the conditional expression is:
expressionl? expression2
switch(integer expression)
case constantl:
statements;
(optional)
break;
(optional)
case constant2:
statements;
(optional)
break;
(optional)
(more optional statements)
default:
(optional)
(optional)
statements;
expression3
If expressionl is true (nonzero), then the whole conditional expression has the value
of expression2. If expressionl is false (O), the whole expression has the value of
expression3. One simple example is finding the maximum of two expressions:
maxdif = (aO > a2) ? aO-al
a2-al;
Conditional expressions are not necessary, since if-else statements can provide the
sarne function. Conditional expressions are more compact and sometimes lead to more
C Programming Fundamentais
66
Chap. 2
efficient rnachine code. On the other hand, they are often more confusing than the familiar if-else control structure.
Sec. 2.4
Program Control
67
Toe for loop combines an initialization statement, an end condition statement, and
an action statement (executed at the end of the loop) into one very powerful control structure. Toe standard form is:
= O;
i = O;
/* space_count is an int */
/* array index, i =O*/
while{string[i])
if(string[i]
i++;
) space_count++;.
/* next char */
space_count
Note that if the string is zero length, then the value of string [i] will initially point to
the null terminator (which has a zero or false value) and the while loop will not be executed. Normally, the while loop will continue counting the spaces in the string until the
null tenninator is reached.
Toe do-while loop is used when a group of statements need to be repeated and
the exit condition should be tested at the end of the loop. Toe decision to go through the
loop one more time is made after the loop is traversed so that the loop is always executed
at least once. Toe format of do-while is similar to the while loop, except that the do
keyword starts the statement and while (expression) ends the statement. A single
or compound statement may appear between the do and the while keywords. A common
use for this loop is in testing the bounds on an input variable as the following example illustrates:
do
printf("\nEnter FFT length {less than 1025) :");
scanf {%d", &fft_length) ;
while(fft_length > 1024);
ln this code segment, if the integer fft_length entered by the user is larger than 1024,
the user is prompted again until the fft_length entered is 1024 or less.
for{initialize
staternent;
test condition
end update)
Toe three expressions are all optional (for (;; ) ; is an infinite loop) and the statement may
be a single statement, a compound statement or justa semicolon (a null statement). Toe most
frequent use of the for loop is indexing an array through its elements. For example,
for{i = O
i < length
i++) a[i] = O;
sets the elements of the array a to zero from a (O] up to and including a [length-1].
This for statement sets i to zero, checks to see if i is less than length, if so it executes the statement a [ i] =O; , increments i, and then repeats the loop until i is equal to
length. Toe integer i is incremented or updated at the end of the loop and then the test
condition statement is executed. Thus, the statement after a for loop is only executed if
the test condition in the for loop is true. For loops can be much more complicated, because each statement can be multiple expressions as the following example illustrates:
for{i =O, i3 = 1; i < 25
printf{"\n%d %d",i,i3);
i++, i3 = 3*i3)
This statement uses two ints in the for loop (i, i3) to print the first 25 powers of 3.
Note that the end condition is still a single expression (i < 25), but that the initialization
and end expressions are two assignments for the two integers separated by a comma.
2.4.5 Program Jumps: break, continue, and goto
Toe loop control structures just discussed and the conditional statements (if, if-else,
and switch) are the most important control structures in C. They should be used exclusively in the majority of programs. Toe last three control statements (break,
continue, and goto) allow for conditional program jumps. If used excessively, they
will make a program harder to follow, more difficult to debug, and harder to modify.
Toe break statement, which was already illustrated in conjunction with the switch
statement, causes the program flow to break free of the switch, for, while, or
do-while that encloses it and proceed to the next statement after the associated control
structure. Sometimes break is used to leave a loop when there are two or more reasons
to end the loop. Usually, however, it is much clearer to combine the end conditions in a
single logical expression in the loop test condition. The exception to this is when a large
number of executable statements are contained in the loop and the result of some statement should cause a premature end of the loop (for example, an end of file or other error
condition).
C Programming Fundamentais
68
Chap. 2
program statements
status= function_one(alpha,beta,constant);
if(status != O) goto error_exit;
status= function_two(delta,time);
if(status != O) goto error_exit;
Sec. 2.5
Functions
69
2.5 FUNCTIONS
AII C programs consist of one or more functions. Even the program executed first is a
function called main { ) , as illustrated in Listing 2.1. Thus, unlike other programming
Ianguages, there is no distinction between the main program and programs that are called
by the main program (sometimes called subroutines). AC function may or may not return a value thereby removing another distinction between subroutines and functions in
languages such as FORTRAN. Each C function is a program equal to every other function. Any function can call any other function (a function can even call itself), or be
called by any other function. This makes C functions somewhat different than Pascal procedures, where procedures nested inside one procedure are ignorant of procedures elsewhere in the program. It should also be pointed out that unlike FORTRAN and severa!
other Ianguages, C always passes functions arguments by value not by reference. Because
arguments are passed by value, when a function must modify a variable in the calling
program, the C prograrnmer must specify the function argument as a pointer to the beginning of the variable in the calling program's memory (see section 2.7 for a discussion of
pointers).
A function is defined by the function type, a function name, a pair of parentheses containing an optional formal argument Iist, and a pair of braces containing the optional executable statements. The general format for ANSI C is as follows:
type name(formal argument list with declarations)
{
function body
The type determines the type of value the function returns, not the type of arguments. If
no type is given, the function is assumed to return an int (actually, a variable is also
assumed to be of type int if no type specifier is provided). If a function does not return a
value, it should be declared with the type void. For example, Listing 2.1 contains the
function average as follows:
70
C Programming Fundamentais
Chap. 2
The first line in the above code segment declares a function called average will retum
a single-precision floating-point value and will accept two arguments. Toe two argument
names (array and size) are defined in the formal argument list (also called the formal
pararneter list). Toe type of the two arguments specify that array is a one-dimensional
array ( of unknown length) and size is an int. Most modem C compilers allow the argument declarations for a function to be condensed into the argument list.
Note that the variable array is actually just a pointer to the beginning of the
float array that was allocated by the calling program. By passing the pointer, only one
value is passed to the function and not the large floating-point array. ln fact, the function
could also be declared as follows:
float average{float *array,int size)
This method, although more correct in the sense that it conveys what is passed to the
function, may be more confusing because the function body references the variable as
array[i].
The body of the function that defines the executable statements and local variables
to be used by the function are contained between the two braces. Before the ending brace
(} ), a retum statement is used to retum the float result back to the calling program. If
the function did not retum a value (in which case it should be declared void), simply
omitting the retum statement would retum control to the calling program after the last
statement before the ending brace. When a function with no retum value must be terminated before the ending brace (if an error is detected, for example), a return; statement without a value should be used. Toe parentheses following the retum statement are
only required when the result of an expression is retumed. Otherwise, a constant or variable may be retumed without enclosing it in parentheses (for example, return O; or
return n;).
Arguments are used to convey values from the calling program to the function.
Because the arguments are passed by value, a local copy of each argument is made for
the function to use (usually the variables are stored on the stack by the calling program).
Toe local copy of the arguments may be freely modified by the function body, but will
not change the values in the calling program since only the copy is changed. Toe retum
statement can communicate one value from the function to the calling program. Other
than this retumed value, the function may not directly communicate back to the calling
prograrn. This method of passing arguments by value, such that the calling program's
Sec. 2.5
Functions
71
variables are isolated from the function, avoids the common problem in FORTRAN
where modifications of arguments by a function get passed back to the calling program,
resulting in the occasional modification of constants within the calling program.
When a function must return more than one value, one or more pointer arguments
must be used. Toe calling program must allocate the storage for the result and pass the
function a pointer to the memory area to be modified. Toe function then gets a copy of
the pointer, which it uses (with the indirection operator, *, discussed in more detail in
Section 2. 7 .1) to modify the variable allocated by the calling program. For example, the
functions average and variance in Listing 2.1 can be combined into one function
that passes the arguments back to the calling program in two float pointers called ave
and var, as follows:
void stats{float *array,int size,float *ave,float *var)
{
int i;
!* initialize sum of signal */
float sum = O.O;
/ * sum of signal squared * /
float sum2 = O.O;
for{i =O; i < size; i++) {
sum = sum + array[i);
/* calculate sums */
sum2 = sum2 + array[i)*array[i);
*ave= sum/size;
/* pass average and variance */
*var = {sum2-sum*{*ave))/{size-1);
ln this function, no value is retumed, so it is declared type void and no retum statement
is used. This stats function is more efficient than the functions average and variance together, because the sum of the array elements was calculated by both the average
function and the variance function. If the variance is not required by the calling program,
then the average function alone is much more efficient, because the sum of the squares of
the array elements is not required to determine the average alone.
2.5.2 Storage Class, Privacy, and Scope
ln addition to type, variables and functions have a property called storage class. There
are four storage classes with four storage class designators: auto for automatic variables
stored on the stack, extern for externai variables stored outside the current module,
static for variables known only in the current module, and register for temporary
variables to be stored in one of the registers of the target computer. Each of these four
storage classes defines the scope or degree of the privacy a particular variable or function
holds. Toe storage class designator keyword (auto, extern, static, or register)
must appear first in the variable declaration before any type specification. Toe privacy of
a variable or function is the degree to which other modules or functions cannot access a
variable or call a function. Scope is, in some ways, the complement of privacy because
72
C Programming Fundamentais
Chap.2
the scope of a variable describes how many modules or functions have access to the vafi.
able.
Auto variables can only be declared within a function, are created when the function is invoked, and are lost when the function is exited. Auto variables are known only
to the function in which they are declared and do not retain their value from one invocation of a function to another. Because auto variables are stored on a stack, a function
that uses only auto variables can call itself recursively. The auto keyword is rarely
used in C programs, since variables declared within functions default to the auto storage
class.
Another important distinction of the auto storage class is that an auto variable is
only defined within the control structure that surrounds it. That is, the scope of an auto
variable is limited to the expressions between the braces ( { and } ) containing the variable
declaration. For exarnple, the following simple prograrn would generate a compiler error,
since j is unknown outside of the for loop:
main()
{
int i;
for {i = O ; i < 1 O ; i ++) {
int j;
/* declare j here */
j = i*i;
printf ("%d", j);
printf {"%d", j);
/* j unknown here */
Sec. 2.5
Functions
73
Static variables differ from extern variables only in scope. A static variable declared outside of a function in one module is known only to the functions in that
module. A static variable declared inside a function is known only to the function in
which it is declared. Unlike an auto variable, a static variable retains its value from
one invocation of a function to the next. Thus, static refers to the memory area assigned to the variable and does not indicate that the value of the variable cannot be
changed. Functions may also be declared static, in which case the function is only
known to other functions in the sarne module. ln this way, the programmer can prevent
other modules (and, thereby, other users of the object module) from invoking a particular
function.
2.5.3 Function Prototypes
Although not in the original definition of the C language, function prototypes, in one
form or another, have become a standard C compiler feature. A function prototype is a
statement (which must end with a semicolon) describing a particular function. lt tells the
compiler the type of the function (that is, the type of the variable it will retum) and the
type of each argument in the formal argument Iist. The function named in the function
prototype may or may not be contained in the module where it is used. lf the function is
not defined in the module containing the prototype, the prototype must be declared externai. Ali C compilers provide a series of header files that contain the function prototypes
for ali of the standard C functions. For example, the prototype for the stats function
defined in Section 2.5.1 is as follows:
extern void stats{float *,int,float *,float *);
Register variables have the sarne scope as auto variables, but are stored in
some type of register in the target computer. If the target computer does not have registers, or if no more registers are available in the target computer, a variable declared as
register will revert to auto. Because almost ali microprocessors have a large number of registers that can be accessed much faster than outside memory, register variables can be used to speed up prograrn execution significantly. Most compilers limit the
use of register variables to pointers, integers, and characters, because the target machines rarely have the ability to use registers for floating-point or double-precision operations.
Extern variables have the broadest scope. They are known to ali functions in a
module and are even known outside of the module in that they are declared. Extern
variables are stored in their own separate data area and must be declared outside of any
functions. Functions that access extern variables must be careful not to call themselves
or cal! other functions that access the sarne extern variables, since extern variables
retain their values as functions are entered and exited. Extern is the default storage
class for variables declared outside of functions and for the functions themselves. Thus,
functions not declared otherwise may be invoked by any function in a module as well as
by functions in other modules.
This prototype indicates that stats (which is assumed to be in another module) retums
no value and takes four arguments. The first argument is a pointer to a float (in this
case, the array to do statsistics on). The second argument is an integer (in this case, giving the size of the array) and the last two arguments are pointers to floats which will
retum the average and variance results.
The result of using function prototypes for ali functions used by a program is that
the compiler now knows what type of arguments are expected by each function. This information can be used in different ways. Some compilers convert whatever type of actual
argument is used by the calling program to the type specified in the function prototype
and issue a waming that a data conversion has taken place. Other compilers simply issue
a waming indicating that the argument types do not agree and assume that the programmer will fix it if such a mismatch is a problem. The ANSI C method of declaring functions also allows the use of a dummy variable with each formal parameter. ln fact, when
this ANSI C approach is used with dummy arguments, the only difference between function prototypes and function declarations is the semicolon at the end of the function prototype and the possible use of extern to indicate that the function is defined in another
module.
74
C Programming Fundamentais
Chap. 2
Sec. 2.6
75
#ifdef DEBUG
printf("\nln stats sum = %f sum2 = %f",sum,sum2);
printf("\nNurober of array elements = %d",size);
#endif
*ave= sum/size;
/* pass average */
*var = (sum2 - sum* (*ave))/(size-1);
/* pass variance */
If the preprocessor parameter DEBUG is defined anywhere before the lifdef DEBUG
statement, then the printf statements will be compiled as part of the program to aid in
debugging stats (or perhaps even the calling program). Many compilers allow the definition of preprocessor directives when the compiler is invoked. This allows the DEBUG
option to be used with no changes to the program text.
.replaces every occurrence of the string DO (ali capital letters so that it is not confused
with the C keyword do) with the four-character string for (. Similarly, new aliases of all
the C keywords could be created with several ldefine statements (although this seems
silly since the C keywords seem good enough). Even single characters can be aliased. For
example, BEG:CN could be aliased to { and BND could be aliased to } , which makes a C
program look more like Pascal.
The #define directive is much more powerful when parameters are used to create
a true macro. The above DO macro can be expanded to define a simple FORTRAN style
DO loop as follows:
#define DO(var,beg,end) for(var=beg; var<=end; var++)
C Programming Fundamentais
76
Chap.2
Toe three macro parameters var, beg, and end are the variable, the beginning value
and the ending value of the DO loop. ln each case, the macro is invoked and the strin~
placed in each argument is used to expand the macro. For example,
OO(i,1,10)
which is the valid beginning of a for loop that will start the variable i at 1 and stop it at
10. Although this DO macro does shorten the amount of typing required to create such a
simple for loop, it must be used with caution. When macros are used with other operators, other macros, or other functions, unexpected program bugs can occur. For example,
the above macro will not work at ali with a pointer as the var argument, because
DO ( *ptr, 1, 10) would increment the pointer's value and not the value it points to (see
section 2.7.1). This would probably result in a very strange number of cycles through the
loop (if the loop ever terminated). As another example, consider the following CUBE
macro, which will determine the cube of a variable:
#define CUBE(x) (x)*(x)*(x)
This macro will work fine (although inefficiently) with CUBE ( i+j), since it would expand to (i+j)*(i+j)*(i+j). However, COBE(i++) expands to (i++)*(i++)
* ( i++) , resulting in i getting incremented three times instead of once. Toe resulting
value would bex(x+I)(x+2) not.x3.
Toe temary conditional operator (see section 2.4.3) can be used with macro definitions to make fast implementations of the absolute value of a variable (ABS), the minimum of two variables (IUN), the maximum of two variables (MAX), and the integer
rounded value of a floating-point variable (RotJND) as follows:
(((a) < O) ? (-a) : (a)
( ( (a)
(( (a)
> (b))
< (b))
lower case for variable and function names will be used in ali programs in this book and
on the accompanying disk.
A pointer is a variable that holds an address of some data, rather than the data itself. The
use of pointers is usually closely related to manipulating (assigning or changing) the elements of an array of data. Pointers are used primarily for three purposes:
ABS(a)
MAX(a,b)
MIN(a,b)
ROUND(a)
77
expands to
#define
#define
#define
#qefine
Sec. 2.7
? (a)
? (a):
(bl)
(b))
(2) To allow a program to create new variables while a program is executing (dynamic
memory allocation)
(3) To access different locations in a data structure
The first two uses of pointers will be discussed in this section; pointers to data structures
are considered in section 2.8.2.
2. 7 .1 Special Pointer Operators
Two special pointer operators are required to effectively manipulate pointers: the indirection operator (*) and the address of operator (&:). Toe indirection operator (*) is used
whenever the data stored at the address pointed to by a pointer is required, that is, whenever indirect addressing is required. Consider the following simple program:
rnain ()
(
int i, *ptr;
/*
/*
ptr = &i;
/*
printf("\n%d",i);
printf("\n%d", *ptr);
*ptr = 11;
/*
printf("\n%d %d",*ptr,i);
i = 7;
Note that each of the above macros is enclosed in parentheses so that it can be used freely
in expressions without uncertainty about the order of operations. Parentheses are also required around each of the macro parameters, since these may contain operators as well as
simple variables.
Ali of the macros defined so far have names that contain only capital letters. While
this is not required, it does make it easy to separate macros from normal C keywords
in programs where macros may be defined in one module and included (using the
#include directive) in another. This practice of capitalizing ali macro names and using
This program declares that i is an integer variable and that ptr is a pointer to an integer
variable. Toe program first sets i to 7 and then sets the pointer to the address of i by the
statement ptr=&:i;. Toe compiler assigns i and ptr storage locations somewhere in
memory. At run time, ptr is set to the starting address ofthe integer variable i. Toe above
program uses the function printf (see section 2.9.1) to print the integer value of i in two
different ways-by printing the contents of the variable i (printf ( "\n%d", i) ; ),
and by using the indirection operator (printf ( "\n%d n, *ptr) ; ). The presence of
the * operator in front of ptr directs the compiler to pass the value stored at the address
C Programming Fundamentais
78
Chap. 2
ptr to the printf function (in this case, 7). If only ptr were used, then the address assigned to ptr would be displayed instead of the value 7. Toe last two lines of the example illustrate indirect storage; the data at the ptr address is changed to 11. This results in
changing the value of i only because ptr is pointing to the address of i.
An array is essentially a section of memory that is allocated by the compiler and
assigned the name given in the declaration statement. ln fact, the name given is nothing
more than a fixed pointer to the beginning of the array. ln C, the array name can be used
as a pointer or it can be used to reference an element of the array (i.e., a [ 2] ). If a is declared as some type of array then * a and a [O] are exactly equivalent. Furthermore,
* (a+i) anda [i] are also the sarne (as longas i is declared as an integer), although
the meaning of the second is often more clear. Arrays can be rapidly and sequentially accessed by using pointers and the increment operator (++ ). For example, the following
three statements set the first 100 elements of the array a to 1O:
int *pointer;
pointer = a;
for(i = O; i < 100
i++) *pointer++
Sec. 2.7
function. The function then retums a pointer to a block of memory at least the size of the
item or items requested. ln order to make the use of the memory allocation functions
portable from one machine to another, the built-in compiler macro sizeof must be
used. For example:
int *ptr;
ptr = (int *) rnalloc(sizeof(int));
allocates storage for one integer and points the integer pointer, ptr, to the beginning of
the memory block. On 32-bit machines this will be a four-byte memory block (or one
word) and on 16-bit machines (such as the IBM PC) this will typically be only two bytes.
Because malloc (as well as calloc and realloc) retums a character pointer, it must
be cast to the integer type of pointer by the (int *) cast operator. Similarly, calloc
anda pointer, array, can be used to define a 25-element integer array as follows:
int *array;
array = (int *) calloc(25,sizeof(int));
= 10;
On many computers this code will execute faster than the single statement for ( i=O;
i<lO O; i++) a [i) =10;, because the post increment of the pointer is faster than the
array ndex calculation.
79
This statement will allocate an array of 25 elements, each of which is the size of an int
on the target machine. The array can then be referenced by using another pointer (changing the pointer array is unwise, because it holds the position of the beginning of the allocated memory) or by an array reference such as array[i] (where i may be from O to
24). The memory block allocated by calloc is also initialized to zeros.
Malloc, calloc, and free provide a simple general purpose memory allocation
package. Toe argument to free (castas a character pointer) is a pointer to a block previously allocated by malloc or calloc; this space is made available for further allocation, but its contents are left undisturbed. Needless to say, grave disorder will result if the
space assigned by malloc is overrun, or if some random number is handed to free.
Toe function free has no retum value, because memory is always assumed to be happily given up by the operating system.
Realloc changes the size of the block previously allocated to a new size in bytes
and retums a pointer to the (possibly moved) block. Toe contents of the old memory
block will be unchanged up to the Jesser of the new and old sizes. Realloc is used less
than calloc and malloc, because the size of an array is usually known ahead of time.
However, if the size of the integer array of 25 elements allocated in the Jast example must
be increased to 100 elements, the following statement can be used:
array
Note that unlike calloc, which takes two arguments (one for the number of items and
one for the item size), realloc works similar to malloc and takes the total size of the
array in bytes. lt is also important to recognize that the following two statements are not
equivalent to the previous realloc statement:
free((char *)array);
array = (int *) calloc(lOO,sizeof(int));
C Programming Fundamentais
80
Chap. 2
These statements do change the size of the integer array from 25 to 100 elements, but do
not preserve the contents of the first 25 elements. ln fact, calloc will initialize ali 10()
integers to zero, while realloc will retain the first 25 and not set the remaining 75
array elements to any particular value.
Unlike free, which retums no value, malloc, realloc, and calloc retum a
null pointer (0) if there is no available memory or if the area has been corrupted by storing outside the bounds of the memory block. When realloc retums O, the block
pointed to by the original pointer may be destroyed.
2. 7 .3 Arrays of Pointers
Any of the C data types or pointers to each of the data types can be declared as an array.
Arrays of pointers are especially useful in accessing large matrices. An array of pointers
to 10 rows each of 20 integer elements can be dynamically allocated as follows:
int *mat[lO];
int i;
for(i =O; i < 10 ; i++) {
mat[i] = (int *)calloc(20,sizeof(int));
i f ( !mat [i]) {
printf("\nError in matrix allocation\n");
exit(l);
Sec. 2.7
81
a [ i] [ j ] . For exarnple, the product of two 100 x 100 matrices could be coded using
two-dmensonal array references as follows:
/* 3 matrices */
/* inclices */
Toe sarne matrix product could also be performed using arrays of pointers as follows:
int a[lOO] (100] ,b[lOO] (100] ,c[lOO] (100];
int *aptr,*bptr,*cptr;
int i,j,k;
/* 3 matrices */
/* pointers to a,b,c */
/*indicies*/.
ln this code segment, the array of 10 integer pointers is declared and then each pointer is
set to 10 different memory blocks allocated by 10 successive calls to calloc. After each
call to calloc, the pointer must be checked to insure that the memory was available
(lmat [i] will be true ifmat [i] is null). Each element in the matrix mat can now be
accessed by using pointers and the indirection operator. For exarnple, * (mat [ i J + j)
gives the value of the matrix element at the ith row (0-9) and the j th column (0-19) and
is exactly equivalent to mat [i] [j J. ln fact, the above code segment is equivalent (in
the way mat may be referenced at Ieast) to the array declaration int mat (10) (20] ;,
except that mat (10] (20] is allocated as an auto variable on the stack and the above
calls to calloc allocates the space for mat on the heap. Note, however, that when mat
is allocated on the stack as an auto variable, it cannot be used with free or realloc
and may be accessed by the resulting code in a completely different way.
Toe calculations required by the compiler to access a particular element in a twodmensional matrix (by using matrix til [j], for exarnple) usually take more instructions and more execution time than accessing the sarne matrix using pointers. This is especially true if many references to the sarne matrix row or column are required. However,
depending on the compiler and the speed of pointer operations on the target machine, access to a two-dimensional array with pointers and simple pointers operands (even increment and decrement) may take almost the sarne time as a reference to a matrix such as
/*doe= a* b */
for(i =O; i < 100
i++) {
cptr = c[i];
bptr = b[O];
for(j =O; j < 100; j++) {
aptr = a[i];
*cptr = (*aptr++) * (*bptr++);
for(k = 1 ; k < 100; k++) {
*cptr += (*aptr++) * b(k] [j];
cptr++;
Toe latter form of the matrix multiply code using arrays of pointers runs 1O to 20 percent
faster, depending on the degree of optimization done by the compiler and the capabilities
of the target machine. Note that e [ i] and a [ i] are references to arrays of pointers each
pointing to 100 integer values. Three factors help make the prograrn with pointers faster:
(1) Pointer increments (such as * aptr++) are usually faster than pointer adds.
(2) No multiplies or shifts are required to access a particular element of each matrix.
82
C Programming Fundamentais
Chap. 2
(3) The first add in the inner most loop (the one involving k) was taken outside the
loop (using pointers aptr and bptr) and the initialization of e [i] [j] to zero
was removed.
2.8 STRUCTURES
Pointers and arrays allow the sarne type of data to be arranged in a list and easily accessed
by a program. Pointers also allow arrays to be passed to functions efficiently and dynamically created in memory. When unlike logically related data types must be manipulated,
the use of severa! arrays becomes cumbersome. While it is always necessary to process
the individual data types separately, it is often desirable to move ali of the related data
types as a single unit. The powerful C data construct called a structure allows new data
types to be defined as a combination of any number of the standard C data types. Once the
size and data types contained in a structure are defined (as described in the next section),
the named structure may be used as any of the other data types in C. Arrays of structures,
pointers to structures, and structures containing other structures may ali be defined.
One drawback of the user-defined structure is that the standard operators in C do not
work with the new data structure. Although the enhancements to C available with the C++
programming language do allow the user to define structure operators (see The C++
Programming Language, Stroustrup, 1986), the widely used standard C language does not
support such concepts. Thus, functions or macros are usually created to manipulate the
structures defined by the user. As an example, some of the functions and macros required
to manipulate structures of complex floating-point data are discussed in section 2.8.3.
2.8.1 Declaring and Referencing Structures
A structure is defined by a structure template indicating the type and name to be used to
reference each element listed between a pair of braces. The general form of an N-element
structure is as follows:
struct tag_name {
typel element_namel;
type2 element_name2;
typeN element_nameN;
} variable_name;
ln each case, typel, type2, ... , typeN refer to a valid C data type (char, int,
float, or double without any storage class descriptor) and element_name1,
element_name2, ... , element_nameN refer to the name of one of the elements
of the data structure. The tag_name is an optional name used for referencing the struc-
Sec. 2.8
Structures
83
ture !ater. The optional variable_name, or Iist of variable names, defines the names
of the structures to be defined. The following structure template with a tag name of
record defines a structure containing an integer called length, a float called
sample_rate, a character pointer called name, and a pointer to an integer array called
data:
struct record {
int length;
float sample_rate;
char *name;
int *data;
};
This structure template can be used to declare a structure called voice as follows:
struct record voice;
The structure called voice of type record can then be initialized as follows:
voice.length = 1000;
voice.sample_rate = 10.e3;
voice.name = voice signal";
The last element of the structure is a pointer to the data and must be set to the beginning
of a 1000-element integer array (because length is 1000 in the above initialization). Each
element of the structure is referenced with the form struct_name. element. Thus,
the 1000-element array associated with the voice structure can be allocated as follows:
voice.data= (int *) calloc(lOOO,sizeof(int));
Similarly, the other three elements of the structure can be displayed with the following
code segment:
printf("\nLength = %d",voice.length);
printf ( "\nSampling rate = %f", voice. sample_rate);
printf("\nRecord name = %s",voice.name);
A typedef statement can be used with a structure to make a user-defined data type and
make declaring a structure even easier. The typedef defines an altemative name for the
structure data type, but is more powerful than #define, since it is a compiler directive
as opposed to a preprocessor directive. An altemative to the record structure is a
typedef called RECORO as follows:
typedef struct record RECORD;
C Prograrnrning Fundamentais
84
Chap. 2
This statement essentially replaces ali occurrences of RECORD in the program with the
struct record definition thereby defining a new type of variable called RECORl)
that may be used to define the voice structure with the simple statement RECORD voice;.
Toe typedef statement and the structure definition can be combined so that the
tag name record is avoided as follows:
typedef struct
int length;
float sarople_rate;
char *name;
int *data;
} RECORD;
ln fact, the typedef statement can be used to define a shorthand form of any type of
data type including pointers, arrays, arrays of pointers, or another typedef. For example,
typedef char STRING[80];
allows 80-character arrays to be easily defined with the simple statement STRING
namel,name2;. This shorthand form using the typedef is an exact replacement for
the statement char name 1 [ 8 O] , name2 [ 8 O] ; .
Sec. 2.8
Structures
85
( *voice__ptr) . length could be used to give the length of the RECORD which was
pointed to by voice__ptr. Because this form of pointer operation occurs with structures
often in C, a special operator (->) was defined. Thus, voice__ptr->length is equivalent to ( *voice__ptr) . length. This shorthand is very useful when used with functions, since a local copy of a structure pointer is passed to the function. For example, the
following function will print the length of each record in an array of RECORD of Iength
size:
void print_record_length(RECORD *rec,int size)
{
int i;
for(i =O; i < size; i++) {
printf("\nLength of record %d= %d",i,rec_>length);
rec++;
Three complex numbers, x, y, and z can be defined using the above structure as follows:
RECORD *voices;
voices= (RECORD
*)
calloc(S,sizeof(RECORD));
The voices array can also be accessed by using a pointer to the array of structures. If
voice_ptr is a RECORD pointer (by declaring it with RECORD *voice__ptr; ), then
COMPLEX x,y,z;
ln order to perform the complex addition z = x + y without functions or macros, the following two C statements are required:
z.real =X.real+ y.real;
z.lllag = x.lllag + y.imag;
These two statements are required because the C operator + can only work with the individual parts of the complex structure and not the structure itself. ln fact, a statement involving any operator and a structure should give a compiler error. Assignment of any
structure (like z = x;) works just fine, because only data movement is involved. A simple function to perform the complex addition can be defined as follows:
C Programming Fundamentais
86
Chap. 2
/* pass by value */
This function passes the value of the a and b structures, forms the sum of a and b, and
then retums the complex summation (some compilers may not allow this method of passing structures by value, thus requiring pointers to each of the structures). The cadd function may be used to set z equal to the sum of x and y as follows:
z = cadd(x,y);
The sarne complex sum can also be performed with a rather complicated single line
macro defined as follows:
#define CADD(a,b)\
(C_t.real=a.real+b.real,C_t.irnag=a.imag+b.irnag,C_t)
This macro can be used to replace the cadd function used above as follows:
COMPLEX C_t;
z = CADD(z,y);
This CADD macro works as desired because the macro expands to three operations separated
by commas. The one-line macro in this case is equivalent to the following three statements:
c_t.real = x.real + y.real;
c_t.irnag = x.imag + y.real;
z = c_t;
The first two operations in the macro are the two sums for the real and imaginary parts.
The sums are followed by the variable C_t (which must be defined as COMPLEX before
using the macro). The expression formed is evaluated from left to right and the whole expression in parentheses takes on the value of the last expression, the complex structure
C_t, which gets assigned to z as the last statement above shows.
The complex add macro CADD will execute faster than the cadd function because
the time required to pass the complex structures x and y to the function, and then pass the
sum back to the calling prograrn, is a significant part of the time required for the function
call. Unfortunately, the complex add macro cannot be used in the sarne manner as the
function. For example:
COMPLEX a,b,c,d;
d= cadd(a,cadd(b,c));
Sec. 2.9
87
will form the sum d=a+b+c; as expected. However, the sarne formal using the CADD
macro would cause a compiler error, because the macro expansion performed by the C
preprocessor results in an illegal expression. Thus, the CADD may only be used with simple single-variable arguments. If speed is more important than ease of programming, then
the macro form should be used by breaking complicated expressions into simpler twooperand expressions. Numerical C extensions to the C language support complex numbers in an optimum way and are discussed in section 2.10. l.
The following sections describe some of the more common errors made by prograrnmers
when they first start coding in C and give a few suggestions how to avoid them.
2.9.1 Array lndexing
ln C, all array indices start with zero rather than one. This makes the last ndex of a N
long array N-1. This is very useful in digital signal processing, because many of the expressions for filters, z-transforms, and FFfs are easier to understand and use with the
index starting at zero instead of one. For exarnple, the FFf output for k =O gives the zero
frequency (DC) spectral component of a discrete time signal. A typical indexing problem
is illustrated in the following code segment, which is intended to determine the first 10
powers of 2 and store the results in an array called power2 :
int power2 [10);
int i,p;
p
= l;
for (i = 1; i<= 10
power2[il = p;
p = 2*p;
i++) {
This code segment will compile well and may even run without any difficulty. The problem is that the for loop ndex i stops on i=10, and power2 [ 10 J is not a valid ndex
to the power2 array. Also, the for loop starts with the index l causing power2 [O] to
not be initialized. This results in the first power of two (2, which should be stored in
power2 [ OJ) to be placed in power2 [ 1 J. One way to correct this code is to change the
for loop to read for { i = O; i<10; i++), so that the ndex to power2 starts at O
and stops at 9.
2.9.2 Failure to Pass-by-Address
This problem is most often encountered when first using scanf to read in a set of variables. If i is an integer (declared as int i; ), then a statement like scanf {"%d", i);
is wrong because scanf expects the address of (or pointer to) the location to store the
C Programming Fundamentais
88
Chap. 2
integer that is read by scanf. The correct statement to read in the integer i is
scanf ( "%d", &:i) ; , where the address of operator (&:) was used to point to the address
of i and pass the address to scanf as required. On many compilers these types of errors
can be detected and avoided by using function prototypes (see section 2.5.3) for ali user
written functions and the appropriate include files for ali C library functions. By using
function prototypes, the compiler is informed what type of variable the function expects
and will issue a warning if the specified type is not used in the calling program. On many
UNIX systems, a C program checker called LINT can be used to perform parameter-type
checking, as well as other syntax checking.
2.9.3 Misusing Pointers
Because pointers are new to many programmers, the misuse of pointers in C can be particularly difficult, because most C compilers will not indicate any pointer errors. Some
compilers issue a warning for some pointer errors. Some pointer errors will result in the
programs not working correctly or, worse yet, the program may seem to work, but will
not work with a certain type of data or when the program is in a certain mode of operation. On many small single-user systems (such as the IBM PC), misused pointers can easily result in writing to memory used by the operating system, often resulting in a system
crash and requiring a subsequent reboot.
There are two types of pointer abuses: setting a pointer to the wrong value (or not
initializing it at ali) and confusing arrays with pointers. Toe following code segment
shows both of these problems:
char *string;
char msg[lO];
int i;
printf("\nEnter title");
scanf ( "%s" , string) ;
i = O;
while ( *string ! =
')
i++;
string++;
msg="Title - ,
printf ( "%s %s %d before space" , msg, string, i) ;
-
li.
The first three statements declare that memory be allocated to a pointer variable called
string, a 10-element char array called msg and an integer called i. Next, the user is
asked to enter a title into the variable called string. The while loop is intended to
search for the first space in the string and the last printf statement is intended to display the string and the number of characters before the first space.
There are three pointer problems in this program, although the program will compile with only one fatal error (and a possible warning). The fatal error message will reference the msg=Title ="; statement. This line tells the compiler to set the address of
the msg array to the constant string "Title =". This is not allowed so the error
Sec. 2.9
89
"Lvalue required" (or something less useful) will be produced. Toe role of an array anda
pointer have been confused and the msg variable should have been declared as a pointer
and used to point to the constant string "Title =, which was already allocated storage by the compiler.
Toe next problem with the code segment is that scanf will read the string into the
address specified by the argument string. Unfortunately, the value of string at execution time could be anything (some compilers will set it to zero), which will probably
not point to a place where the title string could be stored. Some compilers will issue a
warning indicating that the pointer called string may have been used before it was defined. Toe problem can be solved by initializing the string pointer to a memory area allocated for storing the title string. Toe memory can be dynamically allocated by a simple
call to calloc as shown in the following corrected code segment:
char *string,*msg;
int i;
string=calloc(80,sizeof(char));
printf("\nEnter title");
scanf ( "%s" , string) ;
i = O;
while(*string !=
')
i++;
string++;
msg="Title =
printf("%s %s %d before space",msg,string,i);
1
';
Toe code will now compile and run but will not give the correct response when a title
string is entered. ln fact, the first characters of the title string before the first space will
not be printed because the pointer string was moved to this point by the execution of
the while loop. This may be useful for finding the first space in the while loop, but results in the address of the beginning of the string being lost. It is best not to change a
pointer which points to a dynamically allocated section of memory. This pointer problem
can be fixed by using another pointer (called cp) for the while loop as follows:
char *string,*cp,*msg;
int i;
string=calloc(80,sizeof(char));
printf("\nEnter title");
scanf ( "%s", string);
i = O;
cp = string;
while(*cp != ' ') {
i++;
cp++;
msg="Title =";
printf ( "%s %s %d before space", msg, string, i);
C Programming Fundamentais
90
Another problem with this program segment is that if the string entered contains
spaces, then the while loop will continue to search through memory until it finds
space. On a PC, the program will almost always find a space (in the operating system
haps) and will set i to some large value. On larger multiuser systems, this may result i
fatal run-time error because the operating system must protect memory not allocatect
the program. Although this programming problem is not unique to C, it does illustrate
important characteristic of pointers-pointers can and will point to any memory locati,
without regard to what may be stored there.
10 NUMERICAL C EXTENSIONS
Some ANSI C compilers designed for DSP processors are now available with numeric
extensions. These language extensions were developed by the ANSI NCEG (Numeric
Extensions Group), a working committee reporting to ANSI X3Jll. This section gives
overview of the Numerical C language recommended by the ANSI standards commi
Numerical C has severa! features of interest to DSP programmers:
(1) Fewer lines of code are required to perform vector and matrix operations.
(2) Data types and operators for complex numbers (with real and imaginary compo.;}
nents) are defined and can be optimized by the compiler for the target processod,
This avoids the use of structures and macros as discussed in section 2.8.3.
,,
(3) Toe compiler can perform better optimizations of programs containing iterati,
which allows the target processor to complete DSP tasks in fewer instruction cycles.1,
Toe real and imaginary parts of the complex types each have the sarne representations
the type defined without the complex keyword. Complex constants are represented as
Sec. 2.10
Numerical C Extensions
91
sum of a real constant and an imaginary constant, which is defined by using the suffix i
after the imaginary part of the number. For example, initialization of complex numbers is
performed as follows:
short int complex i = 3 + 2i;
float complex x[3] = {1.0+2.0i, 3.0i, 4.0};
Toe following operators are defined for complex types: &: (address of), * (point to complex number), + (add), - (subtract), * (multiply), / (divide). Bitwise, relational, and logical operators are not defined. If any one of the operands are complex, the other
operands will be converted to complex, and the result of the expression will be
complex. The creal and cimag operators can be used in expressions to access the
real or imaginary part of a complex variable or constant. The conj operator returns the
complex conjugate of its complex argument. Toe following code segment illustrates these
operators:
float complex a,b,c;
creal(a)=l.O;
cimag(a)=2.0;
creal(b)=2.0*cimag(a);
cimag(b)=3.0;
c=conj (b);
/* e will be 4 - 3i */
Toe sum operator can be used to represent the sum of values computed from values of an
iterator. Toe argument to sum must be an expression that has a value for each of the iterated variables, and the order of the iteration cannot change the result. Toe following code
segment illustrates the sum operator:
float a[lO] ,b[lO] ,c[lO] ,d[lO] [10] ,e[lO] (10], f[lO] (10];
float s;
iter I=lO, J=lO, K=lO;
s=sum(a[I]);
/* computes the sum of a into s */
C Programming Fundamentais
92
sec. 2.11
b[J]=sum(a[I]);
93
The four measures of software quality (reliability, maintainability, extensibility, and efficiency) are rather difficult to quantify. One almost has to try to modify a program to find
out if it is maintainable or extensible. A program is usually tested in a finite number of
ways much smaller than the millions of input data conditions. This means that a program
can be considered reliable only after years of bug-free use in many different environments.
Programs do not acquire these qualities by accident. lt is unlikely that good programs will be intuitively created just because the programmer is clever, experienced, or
uses lots of comments. Even the use of structured-programming techniques (described
briefly in the next section) will not assure that a program is easier to maintain or extend.
It is the author's experience that the following five coding situations will often lessen the
software quality of DSP programs:
(1)
(2)
(3)
(4)
(5)
An oversizedfunction (item 1) might be defined as one that exceeds two pages of source
Iisting. A function with more than one purpose lacks strength. A function with one clearly
defined purpose can be used by other programs and other programmers. Functions with
many purposes will find limited utility and limited acceptance by others. All of the functions described in this book and contained on the included disk were designed with this
important consideration in mind. Functions that have only one purpose should rarely exceed one page. This is not to say that ali functions will be smaller than this. ln timecriticai DSP applications, the use of in-line code can easily make a function quite long
but can sometimes save precious execution time. It is generally true, however, that big
programs are more difficult to understand and maintain than small ones.
A main program that does not use functions (item 2) will often result in an extremely long and hard-to-understand program. Also, because complicated operations
often can be independently tested when placed in short functions, the program may be
easier to debug. However, taking this mie to the extreme can result in functions that are
tightly bound to the main program, violating item 3. A function that is tightly bound to
the rest of the program (by using too many global variables, for example) weakens the
entire program. lf there are lots of tightly coupled functions in a program, maintenance
becomes impossible. A change in one function can cause an undesired, unexpected
change in the rest of the functions.
Clever programming tricks (item 4) should be avoided at all costs as they will often
not be reliable and will almost always be difficult for someone else to understand (even
with lots of comments). Usually, if the program timing is so close that a trick must be
94
C Programming Fundamentais
used, then the wrong processor was chosen for the application. Even if the progr:
trick solves a particular timing. pr?blem, as soo~ as the syst~m requirements chang~
they almost always do), a new timmg problem w1thout a solut1on may soon develop. ~.
A program that does not use meaningful variables and comments (item 5) is
anteed to be very difficult to maintain. Consider the following valid C program:
main(){int _o_oo_,_ooo;for(_o_oo_=2;;_o_o_++)
{for(_ooo_=2;_o_oo_%_ooo_!=O;_ooo_++;
if (_ooo_==_o_oo_)printf ("\n%d" ,_o_oo_);}}
Even the most experienced C programmer would have difficulty determining what
three-line program does. Even after running such a poorly documented program, it
be hard to determine how the results were obtained. Toe following program does ex.
the sarne operations as the above three lines but is easy to follow and modify:
main()
{
int prime_test,divisor;
/* The outer for loop trys all numbers >1 and the inner
lt is easy for anyone to discover that the above well-documented program prints a
prime numbers, because the following three documentation rules were followed:
listl
(1) V ariable names that are meaningful in the context of the program were used. A
variable names such as x,y,z or i,j,k, unless they are simple indexes used ij
very obvious way, such as initializing an entire array to a constant.
(2) Comments preceded each major section of the program (the above program -,
has one section). Although the meaning of this short program is fairly clear wiffi!
the comments, it rarely hurts to have too many comments. Adding a blank Iine{
tween different parts of a program also sometimes improves the readability
program because the different sections of code appear separated from each oth~,
(3) Statements at different leveis of nesting were indented to show which control si ,,
ture controls the execution of the statements at a particular level. Toe author p:
to place the right brace ( () with the control structure (for, while, if, etc.)
place the left brace (}) on a separate line starting in the sarne column as the ~"
ning of the corresponding control structure. Toe exception to this practice i'
function declarations where the right brace is placed on a separate line after
gument declarations.
ant,
95
.Siructured programming has developed from the notion that any algorithm, no matter
how complex, can be expressed by using the prograrnming-control structures if-else,
while, and sequence. All prograrnming languages must contain some representation of
these three fundamental control structures. Toe development of structured prograrnming
'revealed that if a program uses these three control structures, then the logic of the pro.gram can be read and understood by beginning at the first statement and continuing
:downward to the last. Also, ali programs could be written without goto statements.
Generally, structured-programming practices lead to code that is easier to read, easier to
maintain, and even easier to write.
.
Toe C language has the three basic control structures as well as three additional
structured-prograrnming constructs called do-while, for, and case. Toe additional three
control structures have been added to C and most other modem languages because they
are convenient, they retain the original goals of structured programming, and their use
often makes a program easier to comprehend.
Toe sequence control structure is used for operations that will be executed once in a
function or program in a fixed sequence. This structure is often used where speed is most
important and is often referred to as in-line code when the sequence of operations are
identical and could be coded using one of the other structures. Extensive use of in-line
code can obscure the purpose of the code segment.
Toe if-else control structure in C is the most common way of providing conditional
execution of a sequence of operations based on the result of a logical operation. lndenting
of different leveis of if and else statements (as shown in the example in section 2.4.1)
\,)s not required; it is an expression of C programming style that helps the readability of the
. if-else control structure. Nested while and for loops should also be indented for improved readability (as illustrated in section 2.7.3).
Toe case control structure is a convenient way to execute one of a series of opera.tions based upon the value of an expression (see the example in section 2.4.2). It is often
used instead of a series of if-else structures when a large number of conditions are tested
Jmsed upon a common expression. ln C, the switch statement gives the expression to
ttest and a series of case statements give the conditions to match the expression. A
.,4efault statement can be optionally added to execute a sequence of operations if none
of the listed conditions are met.
Toe last three control structures (while, do-while, and for) all provide for repeating
, sequence of operations a fixed or variable number of times. These loop statements can
,,make a program easy to read and maintain. Toe while loop provides for the iterative ex:ecution of a series of statements as longas a tested condition is true; when the condition
!!> false, execution continues to the next statement in the program. Toe do-while con}rol structure is similar to the while loop, except that the sequence of statements is exe.cuted at least once. Toe for control structure provides for the iteration of statements
with automatic modification of a variable and a condition that terrninates the iterations.
)!'or loops are more powerful in C than most languages. C allows for any initializing
.tement, any iterating statement and any terrninating statement. Toe three statements do
Bibliography
97
C Programming Fundamentais
96
not need to be related and any of them can be a null statement or multiple statemeni
following three examples ofwhile, do-while, and for loops all calculate the
of two of an integer i (assumed to be greater than O) and set the result to k. The
loop is as follows:
k = 2; /* while loop k=2**i */
while(i > O) {
k = 2*k;
i--;
= 2**i
*/
FEUER, A.R. (1982). The C Puzzle Book. Englewood Cliffs, NJ: Prentice Hall.
J{ERNIGHAM, B. and PLAUGER, P. (1978). The Elements of Programming Style. New York:
McGraw-Hill.
J{ERNIGHAN, B.W. and RITCHIE, D.M. (1988). The C Programming mguage (2nd ed.).
Englewood Cliffs, NJ: Prentice Hall.
PRATA, S. (1986). Advanced C Primer++. Indianapolis, IN: Howard W. Sams and Co.
PURDUM, J. and LESLIE, T.C. (1987). C Standard Library. Indianapolis, IN: Que Co.
ROCHKIND, MJ. (1988). Advanced C Programming for Displays. Englewood Cliffs, NJ: Prentice
Hall.
STEVENS, A. (1986). C Development Toolsfor the IBM PC. Englewood Cliffs, NJ: Prentice Hall.
STROUSTRUP, B. (1986). The C++ Programming mguage. Reading, MA: Addison-Wesley.
WAITE, M., PRATA,
andCo.
after displaying an error message. Such an error-related exit is performed by calling the C
Jibrary function exit (n) with a suitable error code, if desired. Similarly, many of the
functions have more than one return statement as this can make the logic in a function
much easier to program and in some cases more efficient.
=2
; i > 1 ; i--)
/* for loop k=2**i */
= 2*k;
Which forro of loop to use is a personal matter. Of the three equivalent code se.
shown above, the for loop and the while loop both seem easy to understand and w,
probably be preferred over the do-while construction.
Toe C language also offers severa! extensions to the six structured pro
control structures. Among these are break, continue, and goto (see section 2.4<\'
Break and continue statements allow the orderly interruption of events that are e;
cuting inside of loops. They can often make a complicated loop very difficult to foi! "
because more than one condition may cause the iterations to stop. Toe infamous gc>.1
statement is also included in C. Nearly every language designer includes a goto s
ment with the advice that it should never be used along with an example of whe
might be useful.
Toe program examples in the following chapters and the programs contained on
enclosed disk were developed by using structured-programming practices. Toe code
be read from top to bottom, there are no goto statements, and the six accepted con:
structures are used. One requirement of structured programming that was not ad<
throughout the software in this book is that each program and function have only
entry and exit point. Although every function and program does have only one
point (as is required in C), many of the programs and functions have multiple exit poi
Typically. this is done in order to improve the readability of the program. For exam:
error conditions in a main program often require terrninating the program prema:
Sec. 3.1
CHAPTER
DSP
MICROPROCESSORS
IN EMBEDDED SVSTEMS
Toe tenn embedded system is often used to refer to a processor and associated circuits
required to perform a particular function that is not the sole purpose of the overall system. For example, a keyboard controller on a computer system may be an embedded
system if it has a processor that handles the keyboard activity for the computer system.
ln a similar fashion, digital signal processors are often embedded in larger systems to
perform specialized DSP operations to allow the overall system to handle general purpose tasks. A special purpose processor used for voice processing, including analog-todigital (A/D) and digital-to-analog (D/A) converters, is an embedded DSP system when
it is part of a personal computer system. Often this type of DSP runs only one application (perhaps speech synthesis or recognition) and is not prograrnmed by the end user.
The fact that the processor is embedded in the computer system may be unknown to
the end user.
A DSP' s data format, either fixed-point or floating-point, determines its ability
to handle signals of differing precision, dynamic range, and signal-to-noise ratios.
Also, ease-of-use and software development time are often equally important when
deciding between fixed-point and floating-point processors. Floating-point processors
are often more expensive than similar fixed-point processors but can execute more
instructions per second. Bach instruction in a floating-point processor may also be
more complicated, leading to fewer cycles per DSP function. DSP microprocessors can
be classified as fixed-point processors if they can only perform fixed-point multiplies and adds, or as floating-point processors if they can perform floating-point operations.
98
99
The precision of a particular class of A/D and D/A converters (classified in terms
of cost or maximum sampling rate) has been slowly increasing at a rate of about one
bit every two years. At the sarne time the speed (or maximum sampling rate) has also
been increasing. The dynamic range of many algorithms is higher at the output than at
the input and intermediate results are often not constrained to any particular dynarnic
range. This requires that intermediate results be scaled using a shift operator when a
fixed-point DSP is used. This will require more cycles for a particular algorithm in
fixed-point than on an equal floating-point processor. Thus, as the A/D and D/A requirements for a particular application require higher speeds and more bits, a
fixed-point DSP may need to be replaced with a faster processor with more bits. Also,
the fixed-point program may require extensive modification to accommodate the greater
precision.
ln general, floating-point DSPs are easier to use and allow a quicker time-tomarket than processors that do not support floating-point formats. The extent to which
this is true depends on the architecture of the floating-point processor. High-Ievel language programmability, large address spaces, and wide dynamic range associated with
floating-point processors allow system development time to be spent on algorithms and
signal processing problems rather than assembly coding, code partitioning, quantization
error, and scaling. ln the remainder of this chapter, floating-point digital signal processors and the software required to develop DSP algorithms are considered in more detail.
This section describes the general properties of the following three floating-point DSP
processor families: AT&T DSP32C and DSP3210, Analog Devices ADSP-21020 and
ADSP-21060, and Texas Instruments TMS320C30 and TMS320C40. The information
was obtained from the manufacturers' data books and manuais and is believed to be an
accurate summary of the features of each processor and the development tools available.
Detailed information should be obtained directly from manufacturers, as new features are
constantly being added to their DSP products. The features of the three processors are
summarized in sections 3.1.1, 3.1.2, and 3.1.3.
The execution speed of a DSP algorithm is also important when selecting a processor. Various basic building block DSP algorithms are carefully optimized in assembly
language by the processor's manufacturer. The time to complete a particular algorithm is
often called a benchmark. Benchmark code is always in assembly language (sometimes
without the ability to be called by a C function) and can be used to give a general measure of the maximum signal processing performance that can be obtained for a particular
processor. Typical benchmarks for the three floating-point processor families are shown
in the following table. Times are in microseconds based the on highest speed processor
available at publication time.
100
DSP32C
DSP3210
ADSP21020
ADSP21060
TMS320C30
20
40
20
30
38629
cycles
161311 *
19245
2016.4
187*
481.13
FIR Filter
time
cycles
40457
2022.85
44
45
(35 Taps)
time
2.34
1.1
2.25
4x4 *4xl
..
Aeo-,t.821
1024 Complex
Matrix Multiply
TMS320C.;-
101
Maximum Instruction
Cycle Speed (M!Ps)
IIR Filter
(2 Biquads)
Sec. 3.1
Chap. 3
cycles
85*
time
cycles
1.06
80*
time
1.0
14
0.35
24
0.6
PARC16
PARE78
-1
lR2 16
IR'16
1287.63
42
1.15
1.4
21
0.7
58
2.9
1.23
23
37
"
-
10
16
SR!Bl
W 16
8
DI'
l(l).
[)800--DB31
*Cycle counts for DSP32C and DSP32 I Oare clock cycles including wait states ( 1 instruction = 4 clock cycles).
~,,,
(32)
FLOATING- '
POINT
JOAl
POINT
AOOER
(40)
N>--l,;3(40)
~I
,----;;s--,
_1
..----.
ALU 16124
MULTIPUER
(40)
I D,\UCI
PC(24)
R1---R14 (24)
R15-R19 (24)
PIN (24)
POUT(24)
IVTP (24)
I.EGEND':
~
ALU
CAU
DAU
DAUC
EIIR
ESR
IBUF
Accumulalon 0-3
Arilhmetic logic IMlk
Con1n>I ariflmalie r i
Data arithmetic IMlk
DAU control register
Enor mask regisler
Enor - registar
Input buller
IOC
lnputloutput conlJol register
IR
lnslruclion regisler
IR1--IR4 lnslruclion regisler pipeine
ISR
IVTP
OBUF
OSR
PAR
PARE
PC
PCR
PCW
POR
Program counler
PIO control regisler
Processar conlrol word
PIO data register
PDR2
PIN
PIO
PIOP
PIR
FIGURE 3.1
102
Chap. 3
Sec. 3.1
103
KEY FEATURES
A2-KJ1
c;SN. RW, LOCKN
pSOISLMN. ASN
BAN,BGACKN
00-001
SERRN, SRDYN
OLE.BGN
AEN.t.tRN
OENIMWN
RAM
RAM
1Kx32
1Kx32
OI
DO
CK.ILD,OCK,
OLD,SY
~LU 16132
BIOO-BI07
BARREL SHIFTER16132
(32)(32~
pclpcsh (32)
(40,...)- ~ - ~ - - .
HARDWARE
CK~ RESTN, ZN
IRON-fl1N
CONTROL
sp(32)
FLOATNG
evtp(32)
POINT
IACKO-IACK1
c..a,1
A~R
(40)
Voo---.
Figure 3.2 shows a block diagram of the DSP3210 microprocessor manufactured by
AT &T. Toe following is a brief description of this processor provided by AT&T.
r0-<14
r15--r20 (32)
CONVERSIONS
a0-.:3 (40)
Vss---.
LEGEND:
CAU
DAU
RAM
ROM
FIGURE 3.2
SIO
TSC
DMAC
MMIO
Serial Input/Output
Timer/Status/Conlrol
DMA Controller
Memory Mapped LO
104
Chap, 3
Microprocessor bus compatibility (Toe DSP32 IO is designed for efficient bus master
designs. This allows the DSP3210 to be easily incorporated into microprocessorbased designs
32-bit, byte-addressable address space allowing the DSP3210 and a microprocessor
to share the sarne address space and to share pointer values.
Retry, relinquish/retry, and bus error support
Page mode DRAM support
Direct support for both Motorola and Intel signaling
AT&T DSP3210 FAMILY HARDWARE DEVELOPMENT SYSTEM DESCRIPTION
The MP3210 implements one or two AT&T DSP3210 32-bit floating-point DSPs with a
comprehensive mix of memory, digital 1/0 and professional audio signal 1/0. The
MP3210 holds up to 2 Mbytes of dynarnic RAM (DRAM). The DT-Connect interface enables real-time video 1/0. MP3210 systems include: the processor card; C Host drivers
with source code; demos, exarnples and utilities, with source code; User's Manual; and
the AT&T DSP3210 Information Manual. DSP3210 is the low-cost Multimedia
Processor of Choice. New features added to the DSP3210 are briefly outlined below.
Q,
~ l
i!l
!ii;~
0
Q, o
~~3
i= w ,e
lii
I!!~
e,:::,
~ili
,e !S
"'
!I~
ui
a,
u
'>a,
o
e,
e:
<(
~
:,
o
o
(/)
(/)
a,
DSP3210 FEATURES
Speeds up to 33 MFLOPS
2k x 32 on-chip RAM
Full, 32-bit, floating-point
Ali instructions are single cycle
Four memory accesses per
instruction cycle
Microprocessor bus compatibility
32-bit byte-addressable designs.
Retry, relinquish/retry
error support
BootROM
Page mode DRAM support
Directly supports 680XO
and 80X86 signaling
USER BENEFITS
o
o
Cl..
a:
"'l!!wi
.....
ci li: <O
w a:
(/)
<(
.....o
E
f!
e,
Higher performance.
Designed for efficient bus master.
This allows the DSP32 IO address space to
easily be incorporated into P
bus-based designs. The 32-bit, byteaddressable space allows the
DSP32 IO and a P to share the sarne
address space and to share pointer
values as well.
i
l
.,
'
"'
:::,
Ili
i!l
..><
......
zz
O
wwl-
e,;
w
a:
Q, Q, a:
I')
~~~
:::,
~~B
i= a. (J
,os
a:::E
~ !:i ,e
_.:::,
l&.::li
Sec. 3.1
t::
_g
:
CJ
oll
-,
t-
.E
j!
::,
a:
....w
e( ....
!!!a:
:::iQ d!zi~
u
O )l:>018
"'
IOA
...: .:!
i~
e:
li
~~
:::;
r!
"'a:w ....,."'!
QI
]~
o-!.
D.
!lo:
ii:
w
EPA
D.
DMD
PMD
~~
!gg;I
lii
- e,
w
a:
Q)
1
-!e. ...o :;o"
....e( ~
~)(:><na
.;
I!a:
o.-
~e: -!f 15
ll El Q!
Q)
O)
O)
CI)
(.)
E
a.
O)
o
e
.;
<(
g .
:,.
Q)
t'.
:,
...
.,.,o
"e
CI)
...
Q.
IC
o
<D
o
N
ri.
li)
o<(
...o
E
E
..
O)
'
with the core processor. Toe following is a brief description of these processors provided
by Analog Devices.
Toe ADSP-210XX processors provide fast, flexible arithmetic computation units,
unconstrained data flow to and from the computation units, extended precision and dynamic range in the computation units, dual address generators, and efficient program sequencing. All instructions execute in a single cycle. lt provides one of the fastest cycle
times available and the most complete set of arithmetic operations, including seed 1/x,
min, max, clip, shift and rotate, in addition to the traditional multiplication, addition, subtraction, and combined addition/subtraction. lt is IEEE floating-point compatible and allows interrupts to be generated by arithmetic exceptions or latched status exception handling .
Toe ADSP-210XX has a modified Harvard architecture combined with a 10-port
data register file. ln every cycle two operands can be read or written to or from the register file, two operands can be supplied to the ALU, two operands can be supplied to the
multiplier, and two results can be received from the ALU and multiplier. Toe processor's
48-bit orthogonal instruction word supports fully parallel data transfer and arithmetic operations in the sarne instruction.
Toe processor handles 32-bit IEEE floating-point format as well as 32-bit integer
and fractional formats. lt also handles extended precision 40-bit IEEE floating-point formats and carries extended precision throughout its computation units, limiting data truncation errors .
Toe processor has two data address generators (DAGs) that provide immediate or
indirect (pre- and post-modify) addressing. Modulus and bit-reverse addressing operations are supported with no constraints on circular data buffer placement. ln addition to
zero-overhead loops, the ADSP-210XX supports single-cycle setup and exit for loops.
Loops are both nestable (six leveis in hardware) and interruptable. Toe processor supports both delayed and nondelayed branches. ln summary, some of the key features of the
ADSP-210XX core processor follow:
-"
e(
:E
Q
ffl I
li
I!
:,
'D
'D
:E
N:1
e,
"
e( ...
Q"
..
_..,
N
e,
"
e( ...
..
Q"
106
o:
li
...:E
:E
...:E :Ee
:,
li
Q
:,
s..
Q
Q
"
o
i
"':
(')
w
a:
::::,
e,
iI
107
108
e:
a,
E
(1)
Figure 3.5 shows a block diagram of the TMS320C30 microprocessor and Figure 3_6
shows the TMS320C40, both manufactured by Texas Instruments (Houston, TIC). 1'h
TMS320C30 and TMS320C40, processors are similar in architecture except that
TMS320C40 provides hardware support for multiprocessor configurations. The following
is a brief description of the TMS320C30 processor as provided by Texas Instruments.
Toe TMS320C30 can perform parallel multiply and ALU operations on integer or
floating-point data in a single cycle. The processor also possesses a general-purpose
register file, program cache, dedicated auxiliary register arithmetic units (ARAU), internai dual-access memories, one DMA channel supporting concurrent 1/0, and a short
machine-cycle time. High performance and ease of use are products of these features.
General-purpose applications are greatly enhanced by the large address space, multiprocessor interface, intemally and extemally generated wait states, two externai interface ports, two timers, two serial ports, and multiple interrupt structure. High-level Janguage is more easily implemented through a register-based architecture, large address
space, powerful addressing modes, flexible instruction set, and well-supported floatingpoint arithmetic. Some key features of the TMS320C30 are listed below.
(1)
"'a,X
1-
th:
>(1)
I!
lj i
(1)
(1)
IJ
a,
<..>
e.
;;;
u
N
(')
U)
i.:::,
lo.
:sf:.i! .
1-
,,_
li 11
lj
---t,
]1
lcf}
a:
:e
a,
t:
:,
'i
8':::,
- a:'fi :E
I
~~i~~i
"O
e:
"'
(')
u
o
e,
(')
U)
j.
Jj
CID
1 ..
ef
ti
CID
CII
,,_
rrrrrrrrrr
,,;i
Ili
];!j!j!
e,
JellOJlUOO
l 1
1-
"'C>
.!!!
"O
-"'
<..>
5
li)
a:
:::,
e,
rc
H~ijxp~
110
Chap. 3
Sec. 3.2
111
a,:.otAddlWlllle~
1
,...
11ax1111
.....
oea1..q
-- .
AC30::!ll
,1
!I
I ffliio.1
STA~
WWW
1 1 WWW
i i
, ,---~-- 11 ,. -- ~-. 1 1
a
a
a
C......,Wlll100
...,.... DIIII Trlllllll'
;n
;n
anu..:a
a
....
l.ocllbtor
............
......... O.,nlDMA
Mor,,._0,-.
tlana
* .......
u.
'X
LE
~1
I.CE0,1
l!!l!0,1
CE0,1
--
- . - .
' l " -~
.:
x,
1 1 r 1
ll2ICIJCIN
TCIJCD
TCU<1
'm
...
FIEIE11.D!o!!.1
..
RIW0,1
FE'(0,1
tr&IMIJ
WM,,1
~,
!,!!!!'1'11.1
........
Ois
""'*
........,a... ....
Dlf
JTAG"Alatl'lnllor
1 ........
Ili ll!l::tlm
......... Adlnalllll
l)m~I~
FIGURE 3.6
r;l'
IACK
Plnlll;
~ ~ 1ln1
-..i
.......
cc.c4
IICF.ll:91
Geiw-. lwo . - . . . -
.... /_ LD(a1-G9
The manufacturers of DSP microprocessors typically provide a set of software tools designed to enable the user to develop efficient DSP algorithms for their particular processors. The basic software tools provided include an assembler, linker, C compiler, and
simulator. The simulator can be used to determine the detailed timing of an algorithm and
then optimize the memory and register accesses. The C compilers for DSP processors
will usually generate assembly source code so that the user can see what instructions are
generated by the compiler for each line of C source code. The assembly code can then be
optimized by the user and then fed into the assembler and linker.
Most DSP C compilers provide a method to add in-line assembly language routines
to C programs (see section 3.3.2). This allows the programmer to write highly efficient
assembly code for time-criticai sections of a program. For example, the autocorrelation
function of a sequence may be calculated using a function similar to a FIR filter where
the coefficients and the data are the input sequence. Each multiply-accumulate in this algorithm can often be calculated in one cycle on a DSP microprocessor. The sarne C algorithm may take 4 or more cycles per multiple-accumulate. If the autocorrelation calculation requires 90 percent of the time in a C program, then the speed of the program can be
improved by a factor of about 3 if the autocorrelation portion is coded in assembly language and interfaced to the C program (this assumes that the assembly code is 4 times
faster than the C source code). The amount of effort required by the programmer to create
efficient assembly code for just the autocorrelation function is much less than the effort
required to write the entire program in assembly language.
Many DSP software tools come with a library of DSP functions that provide highly
optimized assembly code for typical DSP functions such as FFTs and DFTs, FIR and IIR
filters, matrix operations, correlations, and adaptive filters. ln addition, third parties may
provide additional functions not provided by the manufacturer. Much of the DSP library
code can be used directly or with small modifications in C programs.
112
DSP32C assembly code, an assembler, a simulator, and a number of other usefuJ utiliti
for source and object code management. The three forms of provided libraries are:
es
libc
libm
libap
113
ing extensions to the C language based on the work of the ANSI Numerical C Extensions
Group (NCEG) subcommittee.
Toe C runtime library functions perform floating-point mathematics, digital signal
processing, and standard C operations. The functions are hand-coded in assembly language for optimum runtime efficiency. The C tools augment the ADSP-21000 farnily
assembler tools, which include the assembler, linker, librarian, simulator, and PROM
splitter.
114
Sec. 3.2
Chap. 3
pie memory busses. Ali three manufacturers of DSPs described here provide a method to
assign separate physical memory blocks to different C variable types. For example, auto
variables that are stored on the heap can be moved from internai memory to extemal
memory by assigning a different address range to lhe heap memory segment. ln the assembly language generated by lhe compiler the segment name for a particular C variable
or array can be changed to Iocate it in internai memory for faster access or to allow it to
be accessed at the sarne time as the other operands for the multiply or accumulate operation. Memory maps and segment names are used by the C compilers to separate different
types of data and improve the memory bus utilization. Internai memory is often used for
coefficients (because there are usually fewer coefficients) and externai memory is used
for large data arrays.
Toe ADSP-210XX C compiler also supports special keywords so that any C variable or array can be placed in program memory or data memory. Toe program memory is
used to store the program instructions and can also store floating-point or integer data.
When the processor executes instructions in a loop, an instruction cache is used to allow
the data in program memory (PM) and data in the data memory (DM) to flow into the
ALU at full speed. Toe pm keyword places the variable or array in program memory, and
the dm keyword places the variable or array in data memory. Toe default for static or
global variables is to place them in data memory.
3.2.3 Assembly Language Simulators and Emulators
Simulators for a particular DSP allow the user to determine the performance of a DSP algorithm on a specific target processor before purchasing any hardware or making a major
investment in software for a particular system design. Most DSP simulator software is
available for the IBM-PC, making it easy and inexpensive to evaluate and compare lhe
performance of several different processors. ln fact, it is possible to write ali the DSP application software for a particular processor before designing or purchasing any hardware. Simulators often provide profiling capabilities that allow the user to determine the
amount of time spent in one portion of a program relative to another. One way of doing
this is for the simulator to keep a count of how many times the instruction at each address
in a program is executed.
Emulators allow breakpoints to be set at a particular point in a program to examine
registers and memory locations, to determine the results from real-time inputs. Before a
breakpoint is reached, the DSP algorithm is running at full speed as if the emulator were
not present. An in-circuit emulator (ICE) allows the final hardware to be tested at full
speed by connecting to the user's processor in the user's real-time environment. Cycle
counts can be determined between breakpoints and the hardware and software timing of a
system can be examined.
Emulators speed up the development process by allowing the DSP algorilhm to ruo
at full speed in a real-time environment. Because simulators typically execute DSP programs several hundred times slower than in real-time, the wait for a program to reach a
particular breakpoint in a simulator can be a Iong one. Real world signals from A/D converters can only be recorded and then later fed into a simulator as test data. Allhough the
test data may test the algorilhm performance (if enough test data is available), lhe timing
115
of the algorithm under ali possible input conditions cannot be tested using a simulator.
Thus, in many real-time environments an emulator is required.
Toe AT&T DSP32C simulator is a line-oriented simulator that allows the user to
examine ali of the registers and pipelines in the processor at any cycle so that small programs can be optimized before real-time constraints are imposed. A typical computer dialog (user input is shown in bold) using the DSP32C simulator is shown below (courtesy
of AT&T):
$ im: SBONRW=l
$im: b end
bp set at addr Ox44
$im: run
12
r000004*
*
*
* 0000: rll = Ox7f(127)
16
r000008*
*
* w00007c* 0004: * r2 = rlll
20
rOOOOOc**
*
* r00007c* 0008: a3 = *r2
25
r000010**
* r5a5a5a*
* OOOc: rlOl = * rl
30
r000014*
*
*
* 0010: NOP
34
r000018*
*
*
* 0014: rlO = rlO + Oxff81(-127)
38
rOOOOlc*
*
* w000080* 0018: * r3 = rlO
42
r000020**
*
* r000080* OOlc: *r3 =ao= float(*r3)
47
r000024**
*
*
* 0020: aO = a3 * a3
52
r000028*
* r000074* r000070** 0024: al = *r4- + a3 * *r457
r00002c**
* r000068* r000064** 0028: a2 = *r5- + a3 * *r563
r000030**w000080**
*
* 002c: ao= ao* a3
69
r000034*
*
* r00006c* 0030: al = *r4 + al * a3
73
r000038**
* r00005c* r000058** 0034: a3 = *r6- + a3 * *r679
r00003c**
* r00007c* r000060** 0038: a2 = *r5 + a2 * *r2
85
r000040**
*
*
* 003c: al = al + ao
90
r000044*
* r000080*
* 0040: *r7 =ao= al + *r3
breakpoint at end{Ox000044} decode:*r7 =ao= al + *r3
$im: r7.f
r7 = 16.000000
$im: nwait.d
nwait = 16
ln the above dialog the flow of data in the four different phases of the DSP32C instruction cycle are shown along with the assembly language operation being performed.
Toe cycle count is shown on the left side. Register r7 is displayed in floating-point after
the breakpoint is reached and the number of wait states is also displayed. Memory reads
are indicated with an r and memory writes with a w. Wait states occurring when the sarne
memory is used in two consecutive cycles are shown with * *.
AT&T DSP32C EMULATION SVSTEM DESCRIPTION
116
sec. 3.3
Target Halted
eu:1t:i7,
117
- - -rC<."\ K
te h
NdHl*MHllliMi[IMMliililMl--11a@1wtmw111
r n
"i!~:~~};=~~!0233
CALL
INU_F39
& ~~u 9-9-99991_3_S_P__
9_99_8_9_a_1_8~&
8 et9 11999989
RND
R9,R9
RG 93126e69 Rl ae18e8a6
:::al1 9a19939f
MPYF
*+AR3C15),R9
R2 99999999 R3 99999999
999912 1119939d
STF
R9,+AR3(13)
R1 99999999 RS 99999999
99913 esa89682
FLOAT @ratio,R9
R6 99999999 R? 99999999
:88811 13a199b3
LDFN
@99b3H,R1
AR9 1c199999 AR1 99999919
888815 15618999
LDFGE 9. 99, R1
AR2 99999999 AR3 999999fb
888816 91899991
ADDF
R1,R9
AR1 99999999 ARS 99999999
9917 8a697189
MPYF
299.99,R9
1 AR& 99999999 AR? 999989991
88
fJLE: ch2.c
.
. .
rCALLS
~Ioat ~1gnal_in.
&I 1: .ain()
886 6
8867
rnt s1gnal_in_ptr:
8868
8869
percent_pass = 89.9;
8879
fp = percent_pass/(299.9ratio):
8871
fa = (299.9 - percent_pass)/(299.9ratio):
1
8972
onnAND
ENORY----------l...-------'
81 Sybols loaded
& 909999 9f2b9999 989b9911 9271991a 9fa69999&
Done
1 000001 0r2c0000 9f2d9999 972999b2 111993181
l'Jii~!~a]i!n!!!!!!!!!!!!!!!!!!!!!!!!!Y189999c
999998 15628999
97616299 91889992
1111939f 95a99682
9a697189 13a299b3
62999233
I!!
This section describes some of the more advanced software tools available for floatingpoint DSP microprocessors. Source-level debugging of C source code is described in the
next section. Section 3.3.2 describes severa! assembly language interfaces often used in
DSP programrning to accelerate key portions of a DSP algorithm. Section 3.3.3 illustrates
the numeric C extensions to the C Ianguage using DSP algorithms as examples (see
Section 2.1 O for a description of numeric C).
3.3.1 Source Levei Debuggers
Communication Automation & Control (CAC), Inc. (Allentown, PA) offers a debugger
for DSP32C assembly language with C-source debugging capability. Both versions are
compatible with the following vendors' DSP32C board for the AT computer under MSDOS: ali CAC boards, Ariel, AT&T DSP32C-DS and ICE, Burr-Brown ZPB34, Data
Translation, Loughborough Sound Images, and Surrey Medical Imaging Systems. Csource code of the drivers is provided to enable the user to port either debugger to an unsupported DSP32C based board.
Both D3EMU (assembly language only) and D3BUG (C-source and mixed assembly code) are screen-oriented user-friendly symbolic debuggers and have the following
features:
118
Sec. 3.3
Chap. 3
Figure 3.9 shows a typical screen generated using the D3BUG source levei debugger with the DSP32C hardware executing the program. Figure 3.9 shows the mixed assembly-C source mode of operation with the DSP32C registers displayed. Figure 3.10
shows the C source mode with the global memory location displayed as the entire C program is executed one C source line at a time in this mode.
Figure 3.11 shows a typical screen from the ADSP-21020 simulator when C source
levei debugging is being performed using CBUG. C language variables can be displayed
and the entire C program can be executed one C source line at a time in this mode. This
sarne type of C source levei debug can also be performed using the in-circuit emulator.
acc break cont disk goto halt i/o en code quit reg step uars nix !-DOS ?-help
REGISTERS
9999b1: 91Zeffe8
r1e=r11+9xffffe8
1:9x939998 freq2
9999b8: 39988177
r11++=a8=r1
2:9xfff931
nop
3 : 9xf rr 93c
8998bc 88988888
nop
1: 9x939899 1
9999c9 99989999
r1e=freq2
5:9x988191
9999c1 c9&19998
r11++=a9=r1
6:9xfffd9a
8999c8 39999177
no
7 :9xrrrrrd
9999cc 99999999
8:9x1cf81b
9:9xbfa335
9999d1 c91199d8
r18e=9x99d8
19:9xrtfrrc
8898dH ~aueee0u
r11e=r11-u
11:9x99195a
8999dc c9&19919
r1e=freq_ratio2
12:9xf933e3
9999e9 39299998
r1=a1=a9
13 : 9xf rr 999
8998e1 99999999
nop
9961> oscinit(freq_ratio1,state_uariables1);
11: 9xUf938
15:9xrrrrrr
8999e8: c8&19911
rle=state_uariablesl
16 : 0xrr rr rr
9898ec: 1fe181d5
r11++r19=r1e
17:9x5ee7rr
6888f8: c8b1888c
rle=freq_ratiol
18:9x9999a9
8999f1: 38999177
r11++=a9=r1
19:9x999991
9999f8: 99999999
nop
29:9x16bf11
8999fc: e91191d1
call oscinit <r18>
21:9x999998
999199: c9118191
r18e=9x9191
22:0xrrrrrr
989181: 9a8e9998
r11e=r11-8
~e: 9.9888988e+888 al: 8.8988898e+888 az: 8.8888888e+888 a3: 1.7888898e+83b
FIGURE 3.9 DSP32C debugger D3BUG in mixed assembly C-source mode (Courtesy
Communication Automation & Control (CAC), lnc. (Allentown, PA).)
119
acc break cont disk goto halt l/o en code quit reg step vars ix !-DOS ?-help
9959
GLOBAL UARIABLES
9951 / Select two frequencies /
data1[8J
8952> freql = 576.8;
93992c: 2.685559e-993
9853> freq2 = 1172.8;
data2[91
8951
93922c: 2.685559e-993
9955 / Calculate the frequency ratio between the sei
data_in[91
8956 / saplinf rate for both oscillators. /
93912c: 2.685559e-993
data_outc9J
9858> freq_ratio2 = freq2/saple_rate;
93962c: 2.685559e-993
9859
errno
8868 / lnitialize each oscillator /
939999:
895319799
8961> oscinit(freq_ratio1,state_uariables1);
find_ax
9962> oscinit(freq_ratio2,state_variables2);
98938c:
536881951
t)963
freq1
9961 / 6enerate 128 saples for each oscillator /
939891: 1.933381e+926
8865) oscNCstate_uariables1,128,data1);
freq2
88&&> oscN(state_variables2,128,data2);
939998: 1.933381e+926
88b'/
freq_ratiol
8868 / Add the two wauefors together /
93989c: 1.933381e+926
9869) add_tones(data1,data2);
freq_ratio2
tl879
1.933381e+926
8971 / Now copute the fft using the AT&T applicatio 838818:
log19
8872> rfftaC128,7,data_in);
888118:
899581817
8873
RGURE 3.10 DSP32C debugger D3BUG in C-source mode (Courtesy Communication
Automation & Control (CAC), lnc. (Allentown, PA).)
file
Core
Neory
Execution
Setup Help
CBU6 <u21k.exe)
<Continue>
<Step>
<Next>
<Finish>
<Break>
<Up>
<Execution .. > <Breaks .. > <Data .. > <Context .. > <Sybols .. >
<Down>
<Nodes .. >
~~~~~~~~~~~~~~~~~----u21k.c~~~~~~~~~~~~~~~~~--,
83
81
85
86
87
88
89
99
1!~
sendout(sig_out);
}
}
91
92
93
flushO:
flagsC9): / turn off LED /
expr
>sig_out
-9813.2988
CBUG Status
88:1&:19
ADSP-21020 simulator displaying C source code (Courtesy Analog
120
oad
reak
ILE: ch2.c
9979
0971
0972
8873
0071
8975
8876
9077
8878
8879
8988 BP>
8081
9982
9883
8881
atch
eAor
olor
Mo e
Run=F5
Step=FB
Next= 10
rp = percent_pass/C298.8ratio);
ra = (298.8 - percent_pass)/C289.8ratio);
deitar = ra-rp;
nr ilt
lsize
nr llt/ratio;
nr ilt
npair
lsizeratio + 1;
Cnrilt - 1)/Z;
Chap. 3
~...
FIGURE 3.12
Figure 3.12 shows a typical screen from the TMS320C30 simulator when C source
levei debugging is being performed. C language variables can be displayed and the entire
C program can be executed one C source line at a time in this mode.
3.3.2 Assembly-C Language Interfaces
Toe DSP32C/DSP3210 compiler provides a macro capability for in-line assembly language and the ability to link assembly language functions to C programs. In-line assembly is useful to control registers in the processor directly or to improve the efficiency of
key portions of a C function. C variables can also be accessed using the optional operands
as the following scale and clip macro illustrates:
asm void scale(flt__ptr,scale_f,clip)
{
% ureg flt__ptr,scale_f,clip;
aO = scale_f * *flt__ptr
al =-ao+ clip
ao = ifalt (clipl
*flt__ptr++ =ao= ao
Sec. 3.3
121
Toe macro 9-B saves the calling function's frame pointer and the retum address. Toe
macro 9-EO reads the return address off the stack, performs the stack and frame pointer
adjustments, and returns to the calling function. Toe macros do not save registers used in
the assembly language code that may also be used by the C compiler-these must be
saved and restored by the assembly code. Ali parameters are passed to the assembly language routine on the stack and can be read off the stack using the macro param ( ), which
gives the address of the parameter being passed.
Toe ADSP-21 OXX compiler provides an asm () construct for in-line assembly language and the ability to link assembly language functions to C programs. In-line assembly is useful for directly accessing registers in the processor, or for improving the efficiency of key portions of a C function. Toe assembly language generated by asm() is
embedded in the assembly language generated by the C compiler. For example,
asm( bit set imask Ox40; ) will enable one of the interrupts in one cycle. C
variables can also be accessed using the optional operands as follows:
asm( "%0=clip %1 by %2;
"=d" (result)
"d" (x),
(y));
where result, x and y are C language variables defined in the C function where the
macro is used. Note that these variables will be forced to reside in registers for maximum
efficiency.
ADSP-210:XX assembly language functions can be easily linked to C programs
using several macros that define the beginning and end of the assembly function so that it
conforms to the register usage of the C compiler. Toe macro entry saves the calling
function' s frame pointer and the return address. Toe macro exi t reads the return address
off the stack, performs the stack and frame pointer adjustments, and retums to the calling
function. Toe macros do not save registers that are used in the assembly language code
which may also be used by the C compiler-these must be saved and restored by the assembly code. Toe first three parameters are passed to the assembly language routine in
registers r4, r8, and r12 and the remaining parameters can be read off the stack using the
macro reads ( ) .
Toe TMS320C30 compiler provides an asm () construct for in-Iine assembly language. ln-line assembly is useful to control registers in the processor directly. The assembly language generated by asm() is embedded in the assembly language generated by
the C compiler. For example, asm( LDJ: @KASK, J:E) will unmask some of the interrupts controlled by the variable MASK. Toe assembly language routine must save the
calling function frame pointer and return address and then restore them before returning
to the calling program. Six registers are used to pass arguments to the assembly language
routine and the remaining parameters can be read off the stack.
3.3.3 Numeric C Compilers
As discussed in section 2.10, numerical C can provide vector, matrix, and complex operations using fewer lines of code than standard ANSI C. ln some cases the compiler may
be able to perform better optirnization for a particular processor. A complex FIR filter
can be implemented in ANSI C as follows:
122
Chap. 3
Sec. 3.3
typedef struct {
float real, imag;
} COMPLEX;
123
out.real = O.O;
out.imag = O.O;
for(i = O ; i < n; i++) {
out.real += xc[il .real*wc[i] .real - xc[i] .imag*wc[i] .imag;
out.imag += xc[il .real*wc[i] .imag + xc[i] .imag*wc[i] .real;
}
Toe following code segment shows the numeric C implementation of the sarne complex
FIR filter:
windex = 2*windex;
Toe following code segment shows the numeric C implementation of a complex FFf
without the bit-reversai step:
void fft_nc(int n, complex float *x, complex float *w)
{
Toe numeric C code is only five lines versus the ten lines required by the standard C implementation. Toe numeric C code is more efficient, requiring 14 cycles per filter tap versus 17 in the standard C code.
More complicated algorithms are also more compact and readable. Toe following
code segment shows a standard C implementation of a complex FFf without the bitreversal step (the output data is bit reversed):
void fft_c(int n,COMPLEX *x,COMPLEX *w)
{
COMPLEX u, temp, trn;
COMPLEX *xi,*xip,*wptr;
int i,j,le,windex;
windex = l;
for(le=n/2 ; le > O ; le/=2) {
wptr = w;
for (j =O; j < le; j++)
u = *wptr;
i < n
i
for (i = j
int size,sect,deg = 1;
for(size=n/2 ; size >O; size/=2) {
for(sect=O; sect < n; sect += 2*size)
complex float *xl=x+sect;
complex float *x2=xl+size;
{ iter I=size;
for(!) {
complex float temp;
temp = xl[I) + x2[I);
x2(I] = (xl[I) - x2(I]) * w[deg*I);
xl[I) = temp;
deg *= 2;
Toe twiddle factors (w) can be initialized using the following numeric C code:
void init_w(int n, complex float *w)
{
iter I = n;
float a= 2.0*PI/n;
w[I) = cosf(I*a) + li*sinf(I*a);
i + 2*le) {
X =X+ ;
Note that the performance of the ini t_w function is almost identical to a standard C implementation, because most of the execution time is spent inside the cosine and sine func-
124
Chap. 3
tions. Toe numerical C implementation of the FFf also has an almost identical executio
time as the standard C version.
n
Sec. 3.4
125
10 MHz A/D converter at full speed in 100 sec, and then a FFf power spectrum calculation could be performed for the next 5 msec. Thus, every 5.1 msec the A/D converter's
output would be used.
Two different methods are typically used to synchronize the microprocessor with
the input or output samples. Toe first is polling loops and the second is interrupts which
are discussed in the next section. Polling loops can be highly efficient when the input and
output samples occur at a fixed rate and there are a small number of inputs and outputs.
Consider the following example of a single input and single output at the sarne rate:
for(;;)
lt is assumed that the memory addresses of in, out, and in_status have been defined previously as global variables representing the physical addresses of the 1/0 ports.
The data read at in_status is bitwise ANDed with 1 to isolate the least significant bit.
If this bit is 1, the while loop will loop continuously until the bit changes to O. This bit
could be called a "not ready flag" because it indicates that an input sample is not available. As soon as the next line of C code accesses the in location, the hardware must set
the flag again to indicate that the input sample has been transferred into the processor.
After the filter function is complete, the retumed value is written directly to the output location because the output is assumed to be ready to accept data. If this were not the
case, another polling loop could be added to check if the output were ready. The worst
case total time involved in the filter function and at least one time through the while
polling loop must be less than the sampling interval for this program to keep up with the
real-time input. While this code is very efficient, it does not allow for any changes in the
filter program execution time. If the filter function takes twice as long every l 00 samples
in order to update its coefficients, the maximum sampling interval will be limited by this
larger time. This is unfortunate because the microprocessor will be spending almost half
of its time idle in the while loop. lnterrupt-driven 1/0, as discussed in the next section,
can be used to better utilize the processor in this case.
126
Chap. 3
Sec. 3.4
127
in_inx++;
the parameters associated with the algorithm. Toe disadvantage of interrupts is the overhead associated with the interrupt latency, context save, and restore associated with th
.
e
mterrupt process.
The following C code example (file INTOUT.C on the enclosed disk) illustrates the
functions required to implement one output interrupt driven process that will generate
1000 samples of a sine wave:
#include <signal.h>
#include <math.h>
#include rtdspc.h"
#define SIZE 10
int output_store[SIZE];
int in_inx = O;
volatile int out_inx = O;
void sendout(float x);
void output_isr(int ino);
int in_fifo[lOOOO];
int index = O;
void main()
{
= O;
= 1;
for(;;)
Toe C function output_isr is shown for illustration purposes only (the code is ADSP210XX specific), and would usually be written in assembly language for greatest efficiency. Toe functions sendout and output_isr form a software first-in first-out
(FIFO) sample buffer. After each interrupt the output index is incremented with a circular
0-10 index. Each call to sendout increments the in_inx variable until it is equal to
the out_inx variable, at which time the output sample buffer is full and the while loop
will continue until the interrupt process causes the out_inx to advance. Because the
above example generates a new a value every 25 samples, the FIFO tends to empty during the ex.p function call. Toe following table, obtained from measurements of the example program at a 48 KHz sampling rate, illustrates the changes in the number of samples
in the software FIFO.
Sample lndex
o
1
out_im< value
2
3
4
10
10
3
4
3
4
5
4
4
8
9
6
5
9
2
10
6
7
8
9
10
void sendout(float x)
iD_im< value
10
9
8
As shown in the table, the number of samples in the FIFO drops from l O to 5 and then is
quickly increased to l O, at which point the FIFO is again full.
128
Chap, 3
Sec. 3.4
129
The FIR C code will execute on the three different processors as follows:
The efficiency of compiled C code varies considerably from one compiler to the n
One way to evaluate the efficiency of a compiler is to try different C constructs, suc:xt.
case statements, nested if statements, integer versus floating-point data, while Iooas
versus for Ioops and so on. It is also important to reformulate any algorithm or exprt
sion to eliminate time-consuming function calls such as calls to exponential, square roo~
or transcendental functions. The following is a brief Iist of optimization techniques th t
can improve the performance of compiled C code.
Optimi7.ed e
CodeCycles
Optimized
Assembly Cycles
Relative Efficiency
ofCCode(%)
DSP32C
462
187
40.5
ADSP-21020
185
44
23.8
TMS320C30
241
45
18.7
Processor
ln order to illustrate the efficiency of C code versus optimized assembly code, the following C code for one output from a 35 tap FIR filter will be used:
Toe relative efficiency of the C code is the ratio of the assembly code cycles to the C
code cycles. An efficiency of 100 percent would be ideal. Note that this code segment is
one of the most efficient loops for the DSP32C compiler but may not be for the other
compilers. This is illustrated by the following 35-tap FIR filter code:
float in[35],coefs[35],y;
main()
{
register
register
register
register
int i;
float *x = in;
float *w = coefs;
float out;
y=out;
This for-loop based FIR C code will execute on the three different processors as folIows:
float in[35],coefs[35],y;
main()
{
register int i;
register float *x = in, *w = coefs;
register float out;
out
*x++
*w++;
for(i = 16 ; i- >= O; ) {
out += *x++ * *w++;
out += *x++ * *w++;
y=out;
OptimizedC
CodeCycles
Optimized
Assembly Cycles
DSP32C
530
187
35.3
ADSP-21020
109
44
40.4
TMS320C30
211
45
21.3
Proces.!or
Relatlve Efficiency
ofCCode(%)
Note that the efficiency of the ADSP-21020 processor C code is now almost equal to the
efficiency of the DSP32C C code in the previous example.
The complex FFf written instandard C code shown in Section 3.3.3 can be used to
Sec. 4.1
CHAPTER
133
Xi-N
X;.,,_2
REAL-TIME flLTERING
Y;
Y;-1
Y1Filtering is the most commonly used signal processing technique. Filters are usually used
to remove or attenuate an undesired portion of a signal' s spectrum while enhancing the
desired portions of the signal. Often the undesired portion of a signal is random noise
with a different frequency content than the desired portion of the signal. Thus, by designing a filter to remove some of the random noise, the signal-to-noise ratio can be improved
in some measurable way.
Filtering can be performed using analog circuits with continuous-time analog inputs or using digital circuits with discrete-time digital inputs. ln systems where the input
signal is digital samples (in music synthesis or digital transmission systems, for exarnple)
a digital filter can be used directly. If the input signal is from a sensor which produces an
analog voltage or current, then an analog-to-digital converter (A/D converter) is required
to create the digital samples. ln either case, a digital filter can be used to alter the spectrum of the sampled signal, X;, in order to produce an enhanced output, Y; Digital filtering
can be performed in either the time domain (see section 4.1) or the frequency domain (see
section 4.4), with general-purpose computers using previously stored digital samples or
in real-time with dedicated hardware.
Figme 4.1 shows a typical digital filter structure containing N memory elements used to
store the input samples and N memory elements (or delay elements) used to store the output sequence. As a new sample comes in, the contents of each of the input memory elements are copied to the memory elements to the right. As each output sample is fonned
132
Filter structure of Nth order filter. The previous N input and output samples stored in the delay elements are used to form the output sum.
FIGURE 4.1
by accumulating the products of the coefficients and the stored values, the output memory elements are copied to the Ieft. Toe series of memory elements forms a digital delay
Iine. Toe delayed values used to form the filter output are called taps because each output
makes an intermediate connection along the delay line to provide a particular delay. This
filter structure implements the following difference equation:
Q-1
y(n)
P-1
= Liqx(n-q)-LPy(n- p).
q=O
(4.1)
p=I
As discussed in Chapter 1, filters can be classified based on the duration of their impulse
response. Filters where the n terms are zero are called finite impulse response (FIR) filters, because the response of the filter to an impulse (or any other input signal) cannot
change N samples past the Iast excitation. Filters where one or more of the n terms are
nonzero are infinite impulse response (IIR) filters. Because the output of an IIR filter depends on a sum of the N input samples as well as a sum of the past N output samples, the
output response is essentially dependent on ali past inputs. Thus, the filter output response to any finite length input is infinite in Iength, giving the IIR filter infinite memory.
Finite impulse response (FIR) filters have severa! properties that make them useful
for a wide range of applications. A perfect linear phase response can easily be con-
134
Real-Time Filtering
Chap. 4
structed with an FIR filter allowing a signal to be passed without phase distortion. FIR
filters are inherently stable, so stability concems do not arise in the design or implem
tation phase of development. Even though FIR filters typically require a large numbe:n;
multiplies and adds per input sample, they can be implemented using fast convoluti:
with FFf algorithms (see section 4.4.1). Also, FIR structures are simpler and easier
implement with standard fixed-point digital circuits at very high speeds. The only possi
ble disadvantage of FIR filters is that they require more multiplies for a given frequenc
response when compared to IIR filters and, therefore, often exhibit a longer processi/
delay for the input to reach the output.
g
During the past 20 years, many techniques have been developed for the design and
implementation of FIR filters. Windowing is perhaps the simplest and oldest FIR design
technique (see section 4.1.2), but is quite Iimited in practice. The window design method
has no independent control over the passband and stopband ripple. Also, filters with unconventional responses, such as multiple passband filters, cannot be designed. On the
other hand, window design can be done with a pocket calculator and can come close to
optimal in some cases.
This section discusses FIR filter design with different equiripple error in the passbands and stopbands. This class of FIR filters is widely used primarily because of the
well-known Remez exchange algorithm developed for FIR filters by Parks and
McClellan. The general Parks-McCiellan program can be used to design filters with several passbands and stopbands, digital differentiators, and Hilbert transformers. The FIR
coefficients obtained program can be used directly with the structure shown in Figure 4.1
(with the a0 terrns equal to zero). The floating-point coefficients obtained can be directly
used with floating~point arithmetic (see section 4.1.1).
The Parks-McClellan program is available on the IEEE digital signal processing
tape or as part of many of the filter design packages available for personal computers.
The program is also printed in severa! DSP texts (see Elliot, 1987, or Rabiner and Gold,
1975). The program REMEZ.C is a C language implementation of the Parks-McClellan
program :md is included on the enclosed disk. An example of a filter designed using the
REMEZ program is shown at the end of section 4.1.2. A simple method to obtain FIR filter coefficients based on the Kaiser window is also described in section 4.1.2. Although
this method is not as flexible as the Remez exchange algorithm it does provide optimal
designs without convergence problems or filter length restrictions.
135
Sec. 4.1
output
input
y;
X;
z-1
history [ N - 2]
z-1
history [ N - 3]
----
history [1]
history [O]
AGURE 4.2 Block diagram of real-time N tap FIR filter structure as implemented by function fir_filter.
coefficients so that a true convolution is implemented. On some microprocessors, postdecrement is not implemented efficiently so this code becomes Iess efficient. Improved
efficiency can be obtained in this case by storing the filter coefficients in time-reversed
order. Note that if the coefficients are symmetrical, as for simple linear phase Iowpass filters, then the time-reversed order and normal order are identical. After the for Joop and
N - l multiplies have been completed, the history array values are shifted one sample toward history [O], so that the new input sample can be stored in history [N-1] .
The fir_filter implementation uses pointers extensively for maximum efficiency.
Real-Time Filtering
136
Chap. 4
/**************************************************************************
sec. 4.1
137
the following approximation for the filter length (N) of an optimal lowpass filter has been
developed by Kaiser:
(4.2)
where:
float fir_filter(float input,float *coef,int n,float *history)
float input
float *coef
int n
float *history
*************************************************************************/
/* input tap * /
/* last history */
return(output);
USTING 4.1
Function fir_filter(iq;,ut,coef,n,hiatmy).
= 10-A,.,,JW
max =-40log10(1-l0-A-/20)
As a simple example, consider the following filter specifications, which specify a
lowpass filter designed to remove the upper half of the signal spectrum:
0--0.19f8
<0.2dB
0.25-0.5fs
>40dB
Passband ifpass):
Passband ripple (Amax):
Stopband ifstop):
Stopband Attenuation (Astop):
From these specifications
, = 0.01145,
2
=0.01,
!if =0.06.
The result of Equation (4.2) is N = 37. Greater stopband attenuation ora smaller transition band can be obtained with a longer filter. Toe filter coefficients are obtained by multiplying the Kaiser window coefficients by the ideal lowpass filter coefficients. The ideal
lowpass coefficients for a very long odd length filter with a cutoff frequency of fc are
given by the following sinc function:
ck
sin(2 fclat)
k1t
.
(4.3)
Note that the center coefficient is k = O and the filter has even symmetry for ali coefficients above k = O. Very poor stopband attenuation would result if the above coefficients
138
Real-Time Filtering
were truncated by using the 37 coefficients (effectively multiplying the sinc functio b
rectangular wind~w, _which would ~ave a stopband at~enuati~n of about 1
a
However, by multiplymg these coeffic1ents by the appropnate Kruser window th
).
band and passband specifications can be realized. Toe symmetrical Kaiser windowe sto':
given by the following expression:
' wk> 1&
tJ
w, = +~1-(1-
:~J)
lo(f3)
Of-r-r-r-~r-r-~~---
-10
(a) -~
where /o(f3) is a modified zero order Bessel function of the first kind, f3 is the Kaiser window parameter which determines the stopband attenuation. Toe empirical formula for ~
when Astop is less than 50 dB is f3 0.5842*(Astop - 21) 0.4 + 0.07886*(Astop - 21). Thus
for a stopband attenuation of 40 dB, f3 = 3.39532. Listing 4.2 shows program KSRFIR.c'.
which can be used to calculate the coefficients of a FIR filter using the Kaiser window
method. Toe length of the filter must be odd and bandpass; bandstop or highpass filters
can also be designed. Figure 4.3(a) shows the frequency response of the resulting 37_
point lowpass filter, and Figure 4.3(b) shows the frequency response of a 35-point lowpass filter designed using the Parks-McClellan program. Toe following computer dialog
shows the results obtained using the REMEZ.C program:
-50
-60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Frequency (f/fs)
FIR Filter Frequency Response
lO~~~~~r-~~r-~~~~~~~r-~r-~r-r-~r-~
-30
-40
1:
2:
3:
4:
5:
-20
o~r-~r-~r-~~r-~
u:::wPASS FILTER
BANDPASS FILTER
DIFFERENTIATOR
HILBERT TRANSFORMER
GET INPUT PARAMETERS FROM KEYBOARD
selection (1 to 5] ? 5
number of coefficients (3 to 128] ? 35
Filter types are: l=Bandpass, 2=Differentiator, 3=Hilbert
filter type (1 to 3] ? 1
number of bands (1 to 10] ? 2
Now inputting edge (comer) frequencies for 4 band edges
edge frequency for edge (comer) # 1 [O to 0.5] ? O
edge frequency for edge (comer) # 2 [O to 0.5] ? .19
edge frequency for edge (comer) # 3 (0.19 to 0.5] ? .25
-10
MJ
-20
-30
-40
-50
-60~~~~~~~~~~~~~~~~~~~~~~~~
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Frequency (f/fs)
FIGURE 4.3 (a) Frequency response of 37 tap FIA filter designed using the
Kaiser window method. (bl Frequency response of 35 tap FIA filter designed
using the Parks-McClellan program.
139
Real-Time Filtering
140
Sec. 4.1
BAND 1
0.00000000
0.19000000
1.00000000
1.00000000
0.00808741
-41.84380886
EXTREMAL FREQUENCIES
#coeff
Type
0.0156250
0.1614583
0.2777778
0.4131944
= 35
=1
#bands
=2
Grid = 16
E[l] = O.DO
E[2] = 0.19
E[3] = 0.25
E[4] = 0.50
Gain, wt[l]
Gain, wt[2]
0.0520833
0.1822917
0.3038194
0.4427083
141
BAND 2
0.25000000
0.50000000
0.00000000
1.00000000
0.00808741
-41.84380886
0.0815972
0.1900000
0.3298611
0.4704861
0.1093750
0.2500000
0.3576389
0.5000000
0.1371528
0.2586806
0.3854167
= 1.00
= 0.00
1.00
1.00
Iteration 1 2 3 4 5 6 7
**********************************************************************
FINITE IMPULSE RESPONSE (FIR)
LINEAR PHASE DIGITAL FILTER DESIGN
REMEZ EXCHAOOE ALGORITHM
BANDPASS FILTER
FILTER LENGTH = 35
***** IMPULSE RESPONSE *****
H( 1) = -6.360096001e-003 = H( 35)
H( 2) = -7.662615827e-005 = H( 34)
H( 3) = 7.691285583e-003 = H( 33)
H( 4) = 5.056414595e-003 = H( 32)
H( 5) = -8.359812578e-003 = H( 31)
H( 6) = -1.040090568e-002 = H( 30)
H( 7) = 8.696002091e-003 = H( 29)
H( 8) = 2.017050147e-002 = H( 28)
H( 9) = -2.756078525e-003 = H( 27)
H( 10) = -3.003477728e-002 = H( 26)
H( 11) = -8.907503106e-003 = H( 25)
H( 12) = 4.171576865e-002 = H( 24)
H( 13) = 3.410815421e-002 = H( 23)
H( 14) = -5.073291821e-002 = H( 22)
H( 15) = -8.609754956e-002 = H( 21)
H( 16) = 5.791494030e-002 = H( 20)
H( 17) = 3.117008479e-001 = H( 19)
H( 18) = 4.402931165e-001 = H( 18)
Note that the Parks-McC!ellan design achieved the specifications with two fewer coefficients, and the stopband attenuation is 1.8 dB better than the specification. Because the
stopband attenuation, passband ripple, and filter length are ali specified as inputs to the
Parks-McClellan filter design program, it is often difficult to determine the filter length
required for a particular filter specification. Guessing the filter length will eventually
reach a reasonable solution but can take a long time. For one stopband and one passband,
the following approximation for the filter length (N) of an optimal lowpass filter has been
developed by Kaiser:
N= -20log10AA-13
14.6Af
+l
(4.5)
where:
o,= 1-10-A...,./40
2 =I0-~/20
Af = (swp - pass) f s
Amax is the total passband ripple (in dB) of the passband from O to pass If the maximum
of the magnitude response is O dB, then Amax is the maximum attenuation throughout the
passband. Astop is the minimum stopband attenuation (in dB) of the stopband fromf510l' to
f/2. The approximation for N is accurate within about 10 percent of the actual reqmred
filter length (usually on the low side). Toe ratio of the passband error (0 1) to the stopband
error (o2) is entered by choosing appropriate weights for each band. Higher weighting of
stopbands will increase the minimum attenuation; higher weighting of the passband will
decrease the passband ripple.
Toe coefficients for the Kaiser window design (variable name fir_lpf37k) and
the Parks-McC!ellan design (variable name fir_lpf35) are contained in the include
file FILTER.H.
142
Real-Time Filtering
Chap.4
/* Linear phase FIR filter coefficient CO!ll)utation using the Kaiser window
design method. Filter length is odd. * /
#include
#include
#include
#include
#include
<stdio.h>
<stdlib.h>
<string.h>
<math.h>
"rtdspc.h"
fp_s [ ]
fa_s []
fpl_s []
fp2_s []
fal_s []
fa2_s []
=
=
=
=
=
=
Fpl";
Fp2 ;
Fal ;
Fa2 ;
fp_s, O, 0.5 );
fa_s, fp, 0.5 ); break;
fa_s, O, 0.5 ) ;
fp_s, fa, 0.5 ) ;
USTING 4.2 Program KSRFIR to calculate FIA filter coefficients using the
Kaiser window method. (Continued)
Sec. 4.1
143
be equal <-") ;
O, 0.5);
faL 0.5);
fpl, 0.5);
fp2, 0.5); break;
O, 0.5);
fpl, 0.5);
faL 0.5);
fa2, 0.5);
(Continued)
144
Real-Time Filtering
145
} break;
/* Use att to get beta (for Kaiser window funetion) and nfilt (always odd
valued and = 2*npair +l) using Kaiser's empirieal foillU.llas */
void filter_length(double att,double deltaf,int *nfilt,int *npair,double *beta)
Section 1
*beta= O;
/* value of beta if att < 21 */
if(att >= 50) *beta= .1102 * (att - 8.71);
if (att < 50 & att >= 21)
*beta= .5842 * pow( (att-21), 0.4) + .07886 * (att - 21);
*npair = (int) ( (att - 8) / (29 * deltaf) );
*nfilt = 2 * *npair +l;
Sec. 4.1
Chap.4
Section N
new_hist
Yi
X;
FIGURE 4.4 Block diagram of real-time IIR filter structure as implemented by function iir_filter.
146
Real-Time Filtering
/**************************************************************************
Sec. 4.1
147
histl_ptr++;
hist2_ptr++;
/* coefficient pointer */
histl_ptr = history;
hist2_ptr = histl_ptr + l;
output= input* (*coef_ptr++);
for(i =O; i < n; i++) {
historyl = *histl_ptr;
history2 = *hist2_ptr;
/* first history */
/* next history */
/* overall input scale factor */
(Continued)
two delay elements for each second-order section. This realization is canonic in the sense
that the structure has the fewest adds (4), multiplies (4), and delay elements (2) for each
second order section. This realization should be the most efficient for a wide variety of
general purpose processors as well as many of the processors designed specifically for
digital signal processing.
IIR filtering will be illustrated using a lowpass filter with similar specifications as used
in the FIR filter design example in section 4.1.2. The only difference is that in the IIR filter
specification, linear phase response is not required. Thus, the passband is O to 0.2 fs and the
stopband is 0.25 fs to 0.5 fs. The passband ripple must be less than 0.5 dB and the stopband
attenuation must be greater than 40 dB. Because elliptic filters (also called Cauer filters)
generally give the smallest transition bandwidth for a given order, an elliptic design will be
used. After referring to the many elliptic filter tables, it is determined that a fifth order elliptic
filter will meet the specifications. The elliptic filter tables in Zverev (1967) give an entry for
a filter with a 0.28 dB passband ripple and 40.19 dB stopband attenuation as follows:
ns
1 = --0.5401
03 = --0.5401
n 1 = 1.0211
il2 = 1.9881
il3=0.7617
il4= 1.3693
As shown above, the tables in Zverev give the pole and zero locations (real and imagi/* history values */
/*peles*/
/*zeros*/
*hist2_ptr++ = *histl_ptr;
*histl_ptr++ = new_hist;
LISTING 4.3
USTING 4.3
= 1.3250
cr0 = --0.5401
float *histl_ptr,*hist2_ptr,*coef_ptr;
float output,new_hist,historyl,history2;
coef_ptr = coef;
return(output);
nary coordinates) of each biquad section. The two second-order sections each form a conjugate pole pair and the first-order section has a single pole on the real axis. Figure 4.5(a)
shows the locations of the 5 poles and 4 zeros on the complex s-plane. By expanding the
complex pole pairs, the s-domain transfer function of a fifth-order filter in terms of the above
variables can be obtained. The z-domain coefficients are then determined using the bilinear
transform (see Embree and Kimble, 1991). Figure 4.5(b) shows the locations ofthe poles and
zeros on the complex z-plane. Toe resulting z-domain transfer function is as follows:
0.0553(1 + z- 1 )
1- 0.436z -I
1-0.0103z-1 + z-2
1 +0.704z- 1 + z-2
1- 0.523z-
Figure 4.6 shows the frequency response of this 5th order digital IIR filter.
148
Real-Time Filtering
Chap. 4
Sec. 4.1
149
11
''-7\
0.9
0.8
0.7
,8
aa
(a)
(a)
0.6
0.5
0.4
0.3
0.2
0.1
o
o
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.4
0.45
0.5
Frequency (flfs)
DR Filter Frequency Response
10
-10
(b)
',
"-
FIGURE 4.5 Pole-zero plot of fifthorder elliptic IIR lowpass filter. (ai splane representation of analog prototype fifth-order elliptic filter. Zeros are
indicated by "o" and poles are indicated by "x". (b) z-plane representation of lowpass digital filter with cutoff frequency at 0.2 fS" ln each case,
poles are indicated with "x"and
zeros with o.
~.,
-20
'O
(b)
-30
);
-40
-50
-60
Toe function iir_filter (shown in Listing 4.3) implements the direct form TI cascade
filter structure illustrated in Figure 4.4. Any number of cascaded second order sections
can be implemented with one overall input (xi) and one overall output (y/ Toe coefficient array for the fifth order elliptic lowpass filter is as follows:
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Frequency (f/fs)
FIGURE 4.6 (ai Lowpass fifth-order elliptic IIR filter linear magnitude frequency response. (bl Lowpass fifth-order elliptic IIR filter frequency response. Log magnitude in decibels versus frequency.
150
Real-Time Filtering
float iir_lpf5[13] = {
0.0552961603,
-0,4363630712, 0.0000000000, 1.0000000000,
-0.5233039260, 0.8604439497, 0.7039934993, 1.0000000000'
-0.6965782046, 0.4860509932, -0.0103216320, 1.000000000~
Sec. 4.1
151
histl__ptr value for use by the next call to iir_filter. Both history pointers are
incremented twice to point to the next pair of history values to be used by the next
second-order section.
};
Toe number of sections required for this filter is three, because the first-order section i
8
implemented in the sarne way as the second-order sections, except that the second-ord
terms (the third and fifth coefficients) are zero. Toe coefficients shown above were 0 :
tained using the bilinear transform and are contained in the include file FILTER.H. The
definition of this filter is, therefore, global to any module that includes FILTER.H. The
iir_filter function filters the floating-point input sequence on a sample-by-sample
basis so that one output sample is retumed each time iir_filter is invoked. The history array is used to store the two history values required for each second-order section.
Toe history data (two elements per section) is allocated by the calling function. The initial condition of the history variables is zero if calloc is used, because it sets ali the allocated space to zero. If the history array is declared as static, most compilers initialize
static space to zero. Other initial conditions can be loaded into the filter by allocating and
initializing the history array before using the iir_filter function. Toe coefficients
of the filter are stored with the overall gain constant (K) first, followed by the denorninator coefficients that form the poles, and the numerator coefficients that form the zeros for
each section. Toe input sample is first scaled by the K value, and then each second-order
section is implemented. Toe four lines of code in the iir_filter function used to implement each second-order section are as follows:
output= output - historyl * (*coef__ptr++);
new_hist = output - history2 * (*coef__ptr++);
!* poles */
/*zeros*/
The historyl and history2 variables are the current history associated with the section and should be stored in floating-point registers (if available) for highest efficiency.
Toe above code forms the new history value (the portion of the output which depends on
the past outputs) in the variable new_hist to be stored in the history array for use by
the next call to iir_filter. Toe history array values are then updated as follows:
*hist2__ptr++ = *histl__ptr;
*histl__ptr++ = new_hist;
histl__ptr++;
hist2__ptr++;
This results in the oldest history value ( *hist2__ptr) being lost and updated with
the more recent *histl__ptr value. Toe new_hist value replaces the old
Real-time filters are filters that are implemented so that a continuous stream of input samples can be filtered to generate a continuous stream of output samples. ln many cases,
real-time operation restricts the filter to operate on the input samples individually and
generate one output sample for each input sample. Multiple memory accesses to previous
input data are not possible, because only the current input is available to the filter at any
given instant in time. Thus, some type of history must be stored and updated with each
new input sample. Toe management of the filter history almost always takes a portion of
the processing time, thereby reducing the maximum sampling rate which can be supported by a particular processor. Toe functions fir_filter and iir_filter are implemented in a form that can be used for real-time filtering. Suppose that the functions
getinput () and sendout () retum an input sample and generate an output sample at
the appropriate time required by the externai hardware. Toe following code can be used
with the iir_filter function to perform continuous real-time filtering:
static float histi[6];
for(;;)
sendout(iir_filter(getinput(),iir_lpf5,3,histi));
ln the above infinite Joop for statement, the total time required to execute the in,
iir_filter, and out functions must be less than the filter sampling rate in order to
insure that output and input samples are not lost. ln a similar fashion, a continuous realtime FIR filter could be implemented as follows:
static float histf[34];
for(;;)
sendout(fir_filter(getinput(),fir_lpf35,35,histf));
Source code for sendout () and getinput () interrupt driven input/output functions
is available on the enclosed disk for several DSP processors. C code which emulates
getinput () and sendout () real-time functions using disk read and write functions
is also included on the disk and is shown in Listing 4.4. These routines can be used to
debug real-time programs using a simpler and Jess expensive general purpose computer
environment (IBM-PC or UNIX system, for example). Toe functions shown in
Listing 4.4 read and write disk files containing floating-point numbers in an ASCII text
format. Toe functions shown in Listings 4.5 and 4.6 read and write disk files containing
fixed-point numbers in the popular WA V binary file format. Toe WAV file format is part
of the Resource lnterchange File Format (RIFF), which is popular on many multimedia
platforms.
( text continues on page 158)
152
Real-Time Filtering
#include <stdlib.h>
#include <stdio.h>
/* getinput - get one sanple from disk to simulate real-time lllput */
float getinput ()
{
Sec. 4.1
Chap. 4
finclude
Jinclude
tinclude
tinclude
finclude
tinclude
tinclude
153
<stdlib.h>
<stdio.h>
<string.h>
<math.h>
<conio.h>
"wavfmt.h
"rtdspc.h"
*/
*/
float getinput ()
{
if(!fp_getwav) {
char s[80];
printf("\nEnter input .WAV file name? ");
gets(s);
fp_getwav = fopen(s,"rb");
if(!fp_getwav) {
printf("\nError opening *.WAV input file in GETINPUT\n");
exit(l);
Real-Time Filtering
154
sec. 4.1
fread(&cin,sizeof(CHUNK_HDR),l,fp_getwav);
printf ( \n");
for(i = O ; i < 8 ; i++) printf("%c",cin.forrn_type(i]);
printf("\n");
if ( strnicmp (cin. forrn_type, "WAVEfmt ", 8) ! = O) (
printf ( "\nError in WAVEfmt header\n");
exit(l);
155
i = getche() - o;
if(i < (4-'0')) exit(l);
while(i < O 11 i >= wavin.nChannels);
channel_number = i;
if(cin.hdr_size != sizeof(WAVEFORMAT)) {
printf("\nError in WAVEfmt header\n");
exit(l);
else {
if(fread(byte_data,wavin.nBlockAlign,l,fp_getwav) != 1)
flush(); /* flush the output when input runs out */
exit(l);
fread(&wavin,sizeof(WAVEFORMAT),l,fp_getwav);
if(wavin.wFormatTag != WAVE_FORMAT_PCM) {
printf("\nError in WAVEfmt header - not PCM\n");
exit(l);
printf (" \nNurober of channels = %d , wavin. nChannels) ;
printf("\nSample rate= %ld",wavin.nSamplesPerSec);
printf("\nBlock size of data= %d bytes,wavin.nBlockAlign);
printf("\nBits per Sample = %d\n",wavin.wBitsPerSample);
j = byte_data[channel_number];
j
Ox80;
j <<= 8;
A=
fread(&din,sizeof(DATA_HDR),l,fp_getwav);
printf ( "\n%c%c%c%c" ,
din.data_type[OJ,din.data_type[l],din.data_type[2],din.data_type[3]);
printf("\nData Size = %ld bytes",din.data_size);
tinclude
iinclude
tinclude
tinclude
tinclude
iinclude
<stdlib.h>
<stdio.h>
<string.h>
<math.h>
wavfmt.h
rtdspc.h"
number_of_samples = din.data_size/wavin.nBlockAlign;
printf ( "\nNurober of Samples per Channel = %ld\n , number_of_samples) ;
if(wavin.nChannels > 1) {
do {
printf("\nError Channel Number [O .. %d] - ",wavin.nchannels-1);
USTING 4.5 (ContinuedJ
*/
Real-Time Filtering
156
Sec. 4.1
Chap.4
157
WAVE_HDR win;
CHUNK._HDR cin;
DATA..._HDR din;
WAVEFORMAT wavin;
#else
/* clip output to 8 bits*/
j = j >> 8;
j A= Ox80;
void sendout(float x)
{
int BytesPerSample;
short int j;
/* open output file if not done in previous calls */
if(!fp_sendwav) {
char s[80];
printf ( "\nEnter output * . WAV file name ? ") ;
gets(s);
fp_sendwav = fopen (s, "wb" ) ;
if ( ! fp_sendwav) {
printf("\nError opening output *.WAV file in SENDOUT\n");
exit(l);
/* write out the *.WAV file format header */
if(fputc(j,fp_sendwav) == EOF) {
printf("\nError writing output *.WAV file in SENDOUT\n");
exit(l);
#endif
samples_sent++;
/* routine for flush - rnust call this to update the WAV header */
void flush ( l
{
#ifdef BITS16
wavout.wBitsPerSample = 16;
wavout.nBlockAlign = 2;
printf("\nUsing 16 Bit Samples\n");
#else
wavout.wBitsPerSample = 8;
#endif
wavout.nSamplesPerSec = SAMPLE_RATE;
BytesPerSample = (int)ceil(wavout.wBitsPerSample/8.0);
wavout.nAvgBytesPerSec = BytesPerSample*wavout.nSamplesPerSec;
fwrite(&wout,sizeof(WAVE_HDR),l,fp_sendwav);
fwrite(&cout,sizeof(CHUNK_HDR),l,fp_sendwav);
fwrite(&wavout,sizeof(WAVEFORMAT),l,fp_sendwav);
fwrite(&dout,sizeof(DATA..._HDR),l,fp_sendwav);
/* write the sample and check for errors */
/* clip output to 16 bits*/
j = (short int)x;
USTING 4.6
(Continued)
int BytesPerSample;
BytesPerSample = (int)ceil(wavout.wBitsPerSample/8.0);
dout.data_size=BytesPerSample*samples_sent;
wout.chunk_size=
dout.data_size+sizeof(DATA_HDR)+sizeof(CHUNK_HDR)+sizeof(WAVEFORMAT);
/* check for an input WAV header and use the sampling rate, if valid */
if(strnicnp(win.chunk_id,"RIFF",4) == O && wavin.nSamplesPerSec != O)
wavout.nSamplesPerSec = wavin.nSamplesPerSec;
wavout.nAvgBytesPerSec = BytesPerSample*wavout.nSamplesPerSec;
fseek(fp_sendwav,OL,S~SET);
fwrite(&wout,sizeof(WAVE_HDR),l,fp_sendwav);
fwrite(&cout,sizeof(CHUNK._HDR),l,fp_sendwav);
fwrite(&wavout,sizeof(WAVEFORMAT),l,fp_sendwav);
fwrite(&dout,sizeof(DATA..._HDR),1,fp_sendwav);
USTING 4.6
(Continued)
158
Real-Time Filtering
Sec. 4.2
Chap. 4
Noise is generally unwanted and can usually be reduced by some type of filtering N .
can be highly correlated with the signal or in a completely different frequency b~t~
which case it is uncorrelated. Some types of noise are impulsive in nature and occur ~/~
tiv~ly infrequently, while other types of n?is~ ap~ar as narrowband _tones near the sigo:
of mterest. Toe most common type of n01se 1s w1deband thermal no1se, which originate
in the sensor or the amplifying electronic circuits. Such noise can often be considere:
white Gaussian noise, implying that the power spectrum is flat and the distribution is normal. The most important considerations in deciding what type of filter to use to remove
noise are the type and characteristics of the noise. ln many cases, very Iittle is known
about the noise process contaminating the digital signal and it is usually costly (in terms
of time and/or money) to find out more about it. One method to study the noise performance of a digital system is to generate a model of the signal and noise and simulate the
system performance in this ideal condition. System noise simulation is illustrated in the
next two sections. Toe simulated performance can then be compared to the system performance with real data or to a theoretical model.
4.2.1 Gaussian Noise Generation
Toe function gaussian (shown in Listing 4.7) is used for noise generation and is contained in the FILTER.C source file. Toe function has no arguments and returns a single
random floating-point number. Toe standard C Iibrary function rand is called to generate uniformly distributed numbers. Toe function rand normally returns integers from o
to some maximum value (a defined constant, RAND_MAX, in ANSI implementations). As
shown in Listing 4. 7, the integer values returned by rand are converted to float values to be used by gaussian. Although the random number generator provided with
most C compilers gives good random numbers with uniform distributions and Iong periods, if the random number generator is used in an application that requires truly random,
uncorrelated sequences, the generator should be checked carefully. If the rand function
is in question, a standard random number generator can be easily written in C (see Park
and Miller, 1988). Toe function gaussian returns a zero mean random number with a
unit variance anda Gaussian (or normal) distribution. It uses the Box-Muller method (see
Knuth, 1981; or Press, Flannary, Teukolsky, and Vetterling, 1987) to mapa pair of independent uniforrnly distributed random variables to a pair of Gaussian random variables.
Toe function rand is used to generate the two uniform variables vl and v2 from -1 to
+1, which are transformed using the following statements:
r = vl*vl + v2*v2;
fac= sqrt(-2.*log(r)/r);
gstore = vl*fac;
gaus = v2*fac;
Toe r variable is the radius squared of the random point on the (v1, v2) plane. ln the
gaussian function, the r value is tested to insure that it is always Iess than 1 (which it
159
/**************************************************************************
noat gaussian ()
{
else
ready = O;
/* reset ready flag for next pair */
gaus = gstore; /* return the stored one */
return(gaus);
USTING 4.7
Function 11&W1ianO.
usually is), so that the region uniformly covered by (v1, v2) is a circle and so that
log (r) is always negative and the argument for the square root is positive. Toe variables gstore and gaus are the resulting independent Gaussian random variables.
Because gaussian must return one value ata time, the gstore variable is a static
floating-point variable used to store the vl*fac result until the next cal! to gaussian.
160
Real-Time Filtering
1.5.------~----~----.-------,------,
0.5
~.,
..
(a) }
"'
o
-0.5
-1
-1.5----~----~-------'-------'------'----'
o
50
100
150
200
250
Sample Number
Program MKGWN.C Output
1.5.-------,------...-------,------,------~
,.,
l
(\
0.5
g
>
.;
(b) ~
Of-.--,v'\i
"'
-0.5
-1
-1.5O
\
50
100
150
200
250
Sample Number
FIGURE 4.7 MKGWN program example output. Flterng a sine wave wth
added noise (frequency = 0.05). (ai Unfltered version wth Gaussan nose
(standard devaton = 0.2). (b) Output after lowpass fltering with 35-pont
FIR flter.
161
162
#include
#include
#include
#include
#include
#include
Real-Time Filtering
<stdlib.h>
<stdio.h>
<string.h>
<math.h>
"rtdspc.h"
"filter.h"
/***********************************************************************
i, j;
int
float x;
static float hist[34];
for(i =O; i < 250; i++)
x = sin(0.05*2*PI*i) + sigrna*gaussian();
sendout(fir_filter(x,fir_lpf35,35,hist));
}
}
Sec. 4.3
163
tween each input sample. This is called zero-packing (as opposed to zero-padding).
Toe zero values are located where the new interpolated values will appear. Toe effect of zero-packing on the input signal spectrum is to replicate the spectrum P
times within the output spectrum. This is illustrated in Figure 4.8(a) where the output sampling rate is three times the input sampling rate.
(2) Design a lowpass filter capable of attenuating the undesired P - 1 spectra above the
original input spectrum. Ideally, the passband should be from O to fs'/(2P) and the
stopband should be from fs' /(2P) to fs'/2 (where fs' is the filter sampling rate that is
P times the input sampling rate). A more practical interpolation filter has a transition band centered aboutf//(2P). This is illustrated in Figure 4.8(b). Toe passband
gain of this filter must be equal to P to compensate for the inserted zeros so that the
original signal amplitude is preserved.
(3) Filter the zero-packed input sequence using the interpolation filter to generate the
final P:1 interpolated signal. Figure 4.8(c) shows the resulting 3:1 interpolated
spectrum. Note that the two repeated spectra are attenuated by the stopband attenuation of the interpolation filter. ln general, the stopband attenuation of the filter
must be greater than the signal-to-noise ratio of the input signal in order for the interpolated signal to be a valid representation of the input.
4.3.2 Real-Time lnterpolation Followed by Decimation
Figure 4.8(d) illustrates 2: 1 decimation after the 3: 1 interpolation, and shows the spectrum of the final signal, which has a sampling rate 1.5 times the input sampling rate.
Because no lowpass filtering (other than the filtering by the 3:1 interpolation filter) is performed before the decimation shown, the output signal nearfs"/2 has an unusually shaped
power spectrum due to the aliasing of the 3:1 interpolated spectrum. If this aliasing
causes a problem in the system that processes the interpolated output signal, it can be
Real-Time Filtering
164
lnp<rt
~ 1 ~ 1 ~ 1C'\J
Spectrum
O
Input
Frequency
Sf.
2t.
f.
23t,
2t
......!
3'5
Scale
(a)
3:1
~1
lnterpolationp
Frequency
Response
1
,..
lnlelJ>Olated
Frequency
t;
...L
Scale
2
(b)
3:1
lnte,potatedvi
Output
Spectrum
[\.
C":::::::,,,-c:::::::::J I C'::::::,., ,.............,
,.
...!.
lnteq,olated
Frequency
Scale
2
(e)
2:1
[:\.,./1
~
~ .r-.
~
..
e,~
,.
,,
3f~'
2r
Decimated~
Output
Spectrum
2'
Decimaled
Frequency
Scale
(d)
FIGURE 4.8 lllustration of 3:1 interpolation followed by 2:1 decimation. The aliased
input spectrum in the decimated output is shown with a dashed line. (ai Example real
input spectrum. (bl 3:1 interpolation filter response (f; = 3fJ. (e) 3:1 interpolated
spectrum. (d) 2:1 decimated output (fs" = t;/2.).
eliminated by either lowpass filtering the signal before decimation or by designing the interpolation filter to further attenuate the replicated spectra.
The interpolation filter used to create the interpolated values can be an IIR or FIR
lowpass filter. However, if an IlR filter is used the input samples are not preserved ex
actly because of the nonlinear phase response of the IIR filter. FIR interpolation filters
can be designed such that the input samples are preserved, which also results in some
computational savings in the implementation. For this reason, only the implementation of
FIR interpolation will be considered further. The FIR lowpass filter required for interpolation can be designed using the simpler windowing techniques. ln this section, a Kaiser
Sec. 4.3
165
window is used to design 2: 1 and 3: 1 interpolators. The FIR filter length must be odd so
that the filter delay is an integer number of samples and the input samples can be preserved. Toe passband and stopband must be specified such that the center coefficient of
the filter is unity (the filter gain will be P) and P coefficients on each side of the filter
center are zero. This insures that the original input samples are preserved, because the result of ali the multiplies in the convolution is zero, except for the center filter coefficient
that gives the input sample. The other P - 1 output samples between each original input
sample are created by convolutions with the other coefficients of the filter. The following
passband and stopband specifications will be used to illustrate a P: 1 interpolation filter:
Passband frequencies:
Stopband frequencies:
Passband gain:
Passband ripple:
Stopband attenuation:
0-0.8fs/(2P)
1.2 fs/(2P}--0.5 fs
p
<0.03 dB
>56dB
Toe filter length was determined to be 16P - 1 using Equation (4.2) (rounding to the
nearest odd length) and the passband and stopband specifications. Greater stopband attenuation or a smaller transition band can be obtained with a longer filter. The interpolation
filter coefficients are obtained by multiplying the Kaiser window coefficients by the ideal
lowpass filter coefficients. The ideal lowpass coefficients for a very long odd length filter
with a cutoff frequency of fs /2P are given by the following sinc function:
ck
Psin(lat/ P)
k1t
(4.6)
Note that the original input samples are preserved, because the coefficients are zero for
ali k = nP, where n is an integer greater than zero and c0 = 1. Very poor stopband attenuation would result if the above coefficients were truncated by using the 16P - l coefficients where lkl < 8P. However, by multiplying these coefficients by the appropriate
Kaiser window, the stopband and passband specifications can be realized. Toe symmetrical Kaiser window, wk' is given by the following expression:
w, ~ ,,{~~1-(:~J}
(4.7)
lo(~)
where 10(~) is a modified zero order Bessel function of the first kind, ~ is the Kaiser
window parameter which determines the stopband attenuation and N in equation (4.7)
is 16P + 1. The empirical formula for ~ when Astop is greater than 50 dB is
~ = 0.1102*(A 8top - 8.71). Thus, for a stopband attenuation of 56 dB, ~ = 5.2ll36.
Figure 4.9(a) shows the frequency response of the resulting 31-point 2:1 interpolation
filter, and Figure 4.9(b) shows the frequency response of the 47-point 3:1 interpolation
filter.
Sec. 4.3
Listing 4.9 shows the example interpolation program INTERP3.C, which can be used to
interpolate a signal by a factor of 3. Two coefficient arrays are initialized to have the decimated coefficients each with 16 coefficients. Each of the coefficient sets are then used
individually with the fir_filter function to create the interpolated values to be sent
to sendout ( ) . The original input signal is copied without filtering to the output every
p sample (where P is 3). Thus, compared to direct filtering using the 47-point original filter, 15 multiplies for each input sample are saved when interpolation is performed using
INTERP3. Note that the rate of output must be exactly three times the rate of input for
this program to work in a real-time system.
-10
-20
-30
-40
167
(a) -~
-50
-60
-70~~~~~~----:~~~~~~~~~~~~-'-'---1.1.._.Jl.L--l.J~
O
0.05
0.1
0.15
0.2
0.25
- -0.4
0.45
0.5
#include
#include
#include
#include
#include
<stdlib.h>
<stdio.h>
<string.h>
<math.h>
"rtdspc.h"
/**************************************************************************
Frequency (f/fs)
FIR Filter Frequency Response
lOe=~=-=~==F~=:::..--~-.-~---,-~~,--~-,-~---,-~~-.-----.
o
*************************************************************************/
-10
main()
{
(b) -~
-20
int i;
float signal_in;
/* interpolation coefficients for the decimated filters */
static float coef31[16],coef32[16];
/* history arrays for the decimated filters */
static float hist31[15],hist32[15];
-30
-40
-50
-60
-70 1
0.05
0.1
0.15
0.2
0.25
11
dl
1/
0.3
11
)1
0.35
1/
li e \f
0.4
11
1e/
!I
0.45
Frequency (f/fs)
FIGURE 4.9 (ai Frequency response of 31-point FIR 2:1 interpolation filter
(gain = 2 or 6 dB). lbl Frequency response of 47-point FIR 3:1 interpolation
filter (gain = 3 or 9.54 dB).
166
1 1
FIR interpolation.
168
Real-Time Filtering
0.00749928, 0.00556927,
o.,
Chap. 4
-0.00275941, -0.00178662
0.8
};
I*
for(i
=O
i < 16
i++) coef3l[i]
interp3[3*i];
0.6
for(i
=O
i < 16
i++) coef32[il
interp3[3*i+l];
0.4
"
:,
0.2
>
"
(a) }
til
o
-0.2
-0.4
-0.6
USTING 4.9
(Continued)
-0.8
Figure 4.10 shows the result of running the INTERP3.C program on the WAVE3.DAT
data file contained on the disk (the sum of frequencies O.OI, 0.02 and 0.4). Figure 4.IO(a)
shows the original data. The result of the 3:1 interpolation ratio is shown in Figure
4.1 O(b). Note that the definition of the highest frequency in the original data set (0.4 fs) is
much improved, because in Figure 4. IO(b) there are 7.5 samples per cycle of the highest
frequency. The startup effects and the 23 sample delay of the 47-point interpolation filter
is also easy to see in Figure 4.IO(b) when compared to Figure 4.IO(a).
-1
100
50
150
200
250
Sample Number
Program INTERP3.C Output
0.8
0.6
The FFT is an extremely useful tool for spectral analysis. However, another important application for which FFTs are often used is fast convolution. The formulas for convolution
were given in chapter 1. Most often a relatively short sequence 20 to 200 points in length
(for example, an FlR filter) must be convolved with a number of longer input sequences.
The input sequence length might be 1,000 samples or greater and may be changing with
time as new data samples are taken.
One rnethod for computation given this problem is straight implementation of the
time domain convolution equation as discussed extensively in chapter 4. The number of real
multiplies required is M * (N - M + 1), where N is the input signal size and M is the length
of the FIR filter to be convolved with the input signal. There is an alternative to this rather
lengthy computation method-the convolution theorem. The convolution theorem states
that time domain convolution is equivalent to multiplication in the frequency domain. Toe
convolution equation above can be rewritten in the frequency domain as follows:
Y(k)
= H(k) X(k)
(4.8)
Because interpolation is also a filtering operation, fast interpolation can also be performed in the frequency domain using the FFT. The next section describes the implemen-
1
(b)
.!!
til
100
200
300
400
500
600
700
800
Sample Number
169
170
Toe program RFAST (see Listing 4.10) illustrates the use of the fft function for
fast convolution (see Listing 4.11 for a C language implementation). Note that the inverse FFT is performed by swapping the real and imaginary parts of the input and output of the fft function. The overlap and save method is used to filter the continuous real-time input and generate a continuous output from the 1024 point FFT. The
convolution problem is filtering with the 35-tap low pass FIR filter as was used in section 4.2.2. The filter is defined in the FILTER.H header file (variable fir_lpf35).
The RFAST program can be used to generate results similar to the result shown in
Figure 4.7(b).
tation of real-time filters using FFT fast convolution methods, and section 4.4.2 de .be
a real-time implementation of frequency domain interpolation.
scn s
t:
Equation (4.8) indicates that if the frequency domain representations of h(n) and x(n)
known, then Y(k) can be calculated by simple multiplication. Toe sequence y(n) can
be obtained by inverse Fourier transform. This sequence of steps is detailed below:
171
Sec. 4.4
Real-Time Filtering
(1) Create the array H(k) from the impulse response h(n) using the FFT.
(2) Create the array X(k) from the sequence x(n) using the FFf.
(3) Multiply H by X point by point thereby obtaining Y(k).
(4) Apply the inverse FFT to Y(k) in order to create y(n).
There are severa! points to note about this procedure. First, very often the impulse
response h(n) of the filter does not change over many computations of the convolution
equation. Therefore, the array H(k) need only be computed once and can be used repeatedly, saving a large part ofthe computation burden ofthe algorithm.
Second, it must be noted that h(n) and x(n) may have different Iengths. ln this case, it
is necessary to create two equal length sequences by adding zero-value samples at the end of
the shorter of the two sequences. This is commonly called zero filling or zero padding. This
is necessary because ali FFf Iengths in the procedure must be equal. Also, when using the
radix 2 FFf ali sequences to be processed must have a power of 2 length. This can require
zero filling of both sequences to bring them up to the next higher value that is a power of 2.
Finally, in order to minimize circular convolution edge effects (the distortions that
occur at computation points where each value of h(n) does not have a matching value in
x(n) for multiplication), the length of x(n) is often extended by the original length of h(n)
by adding zero values to the end of the sequence. The problem can be visualized by
thinking of the convolution equation as a process of sliding a short sequence, h(n), across
a Ionger sequence, x(n), and taking the sum of products at each translation point. As this
translation reaches the end of the x(n) sequence, there will be sums where not ali h(n) values match with a corresponding x(n) for multiplication. At this point the output y(n) is actually calculated using points from the beginning of x(n), which may not be as useful as
at the other central points in the convolution. This circular convolution effect cannot be
avoided when using the FFf for fast convolution, but by zero filling the sequence its results are made predictable and repeatable.
The speed of the FFT makes convolution using the Fourier transform a practical
technique. ln fact, in many applications fast convolution using the FFf can be significantly faster than normal time domain convolution. As with other FFT applications, there
is less advantage with shorter sequences and with very small lengths the overhead can
create a penalty. The number of real multiply/accumulate operations required for fast
convolution of an N length input sequence (where N is a large number, a power of 2 and
real FFfs are used) with a fixed filter sequence is 2*N*[l + 2*1ogz(N)J. For example,
when N is 1,024 and M is 100, fast convolution is as much as 2.15 times faster.
#include
#include
#include
#include
#include
#include
<stdlib.h>
<stdio.h>
<string.h>
<math.h>
rtdspc.h"
filter.h"
/***********************************************************************
int
float
COMPLEX
static float
i, j;
tempflt;
*samp, *filt;
input_save[FILTER_~];
samp
(COMPLEX *) calloc(FFT_LEN:n'H, sizeof(COMPLEX));
if( !samp){
USTING 4.10 Program RFAST to perform real-time fast convolution using
the overlap and save method. (Continued)
172
Real-Time Filtering
Sec. 4.4
Chap.4
exit(l);
173
/* overlap the last FILTER_LENGTH-1 input data points in the next FFT */
for(i = O; i < FILTER_LENGTH; i++) {
samp[il .real= input_save[i];
samp[i].imag = O.O;
/* copy the filter into complex array and scale by 1/N for inverse FFT */
tempflt = 1. 0/FFT_LENGTH;
for(i =O; i < FILTER_LENGTH; i++)
filt[i] .real= tempflt*fir_lpf35[i];
/* FFT the zero filled filter i.npulse response */
fft(filt,M);
while(l) {
i++) {
i++) sendout(samp[i].imag);
}
}
/**************************************************************************
/* do FFT of samples */
fft(samp,M);
fft(samp, M);
/* Write the result out to a dsp data file*/
(Continued)
COMPLEX u,temp,tm;
COMPLEX *xi,*xip,*xj,*wptr;
int i,j,k,l,le,windex;
double arg,w_real,w_imag,wrecur_real,wrecur_imag,wtemp_real;
USTING 4.11
174
Real-Time Filtering
terop.real
terop.imag
xip->real
xip->imag =
*xi = terop;
if(m != mstore) {
/* free previously allocated storage and set new m */
if(mstore != O) free(w);
mstore = m;
if(m == O) return;
175
Sec.4.4
Chap. 4
xi->real
xi->imag
xi->real
xi->imag
+
+
-
xip->real;
xip->imag;
xip->real;
xip->imag;
/* n
2**m
fft length */
wptr = w + windex - 1;
for (j = 1; j < le; j++)
u = *wptr;
for (i = j
i < n
i
n = 1 << m;
le = n/2;
i + 2*le) {
+ ;
w = (COMPLEX *) calloc(le-1,sizeof(CCMPLEX));
if(!w) {
exit(l);
= O;
for (i = 1; i < (n-1)
k = n/2;
i++)
while(k <= j) {
j = j - k;
k = k/2;
/* start fft */
j
le = n;
windex
for (1
le
if
1;
O; 1 < m
le/2;
=j
+ k;
(i < j)
xi = X + ;
xj =X+ j;
terop = *xj;
*xj = *xi;
*xi = terop;
l++) {
=X
+ ;
xip = xi + le;
LISTING 4.11
USTING 4.11
(Continued)
(Continued)
Real-Time Filtering
176
Sec. 4.4
Chap,4
ln section 4.3.2 time domain interpolation was discussed and demonstrated using severa}
short FIR filters. ln this section, the sarne process is demonstrated using FFr technique
Toe steps involved in 2: 1 interpolation using the FFf are as follows:
s.
roainO
(1) Perform an FFf with a power of 2 length (N) which is greater than or equal to lhe
length of the input sequence.
(2) Zero pad the frequency domain representation of the signal (a complex array) by
inserting N - 1 zeros between the positive and negative half of the spectrum. Toe
Nyquist frequency sample output of the FFf (at the ndex N/2) is divided by 2 and
placed with the positive and negative parts of the spectrum, this results in a symmetrical spectrum for a real input signal.
(3) Perfonn an inverse FFf with a length of 2N.
(4) Multiply the interpolated result by a factor of 2 and copy the desired portion of lhe
result that represents the interpolated input, this is ali the inverse FFf samples if lhe
input Iength was a power of 2.
Listing 4.12 shows the program INTFFT2.C that performs 2:1 interpolation using
the above procedure and the fft function (shown in Listing 4.11). Note that the inverse
FFf is performed by swapping the real and imaginary parts of the input and output of lhe
fft function. Figure 4.11 shows the result of using the INTFFT2 program on the 128
samples of the W AVE3.DAT input file used in the previous examples in this chapter
(these 256 samples are shown in detail in Figure 4.lO(a)). Note that the output Iength is
twice as large (512) and more of the sine wave nature of the waveform can be seen in the
interpolated result. Toe INTFFT2 program can be modified to interpolate by a Iarger
power of 2 by increasing the number of zeros added in step (2) listed above. Also, because the FFf is employed, frequencies as high as the Nyquist rate can be accurately interpolated. FIR filter interpolation has a upper frequency lirnit because of the frequency
response of the filter (see section 4.3.1).
#include
#include
#include
#include
#include
<stdlib.h>
<stdio.h>
<string.h>
<math.h>
rtdspc.h"
177
*/
int
float
COMPLEX
i;
temp;
*samp;
/************************************************************************
INI'FFI'2.C - Interpolate 2:1 using FFl'
Generates 2:1 interpolated ti.me domain data.
*************************************************************************/
USTING 4.12 Program INTFFT2.C used to perform 2:1 interpolation using
the FFT. (Continued)
fft(sanp,M+l);
/* copy to output and nrultiply by 2/(2*LENGTH) */
temp = 1.0/LENGTH;
for (i=O; i < 2*LENGTH; i++) sendout(temp*samp[i].imag);
USTING 4.12 (Continued)
178
Real-Time Filtering
Chap.4
Sec. 4.5
If d > O then the output will decay toward zero and the peak will occur at
tan- 1(0Yd)
Q8
pea/c
o~
e-"'-
y(t pealc) = ~
02
42
44
Yn+I
(4.11)
C1Yn -C2Yn-l
+ b1Xn,
(4.12)
where the x input is only present for t = O as an initial condition to start the oscillator and
46
48
d2 +ro2
-1
(4.10)
(O
OA
V
179
100
200
300
400
500
(O
C2
=e-2dt
where 'tis the sampling period (1/fs) and ro is 21t times the oscillator frequency.
The frequency and rate of change of the envelope of the oscillator output can be
changed by modifying the values of d and ro on a sample by sample basis. This is illustrated in the OSC program shown in Listing 4.13. Toe output wavefonn grows from a
peak value of 1.0 to a peak value of 16000 at sample number 5000. After sample 5000
the envelope of the output decays toward zero and the frequency is reduced in steps every
1000 samples. A short example output wavefonn is shown in Figure 4.12.
Sample Number
c1
(4.9)
f =440 2key/12
(4.13)
Thus, a key value of zero will give 440 Hz, which is the musical note A above rniddle C. Toe WAVETAB.C program starts ata key value of-24 (two octaves below A) and
steps through a chromatic scale to key value 48 (4 octaves above A). Each sample output
value is calculated using a linear interpolation of the 300 values in the table gwave. The
300 sample values are shown in Figure 4.13 as an example wavefonn. Toe gwave array is
301 elements to make the interpolation more efficient. Toe first element (O) and the last element (300) are the sarne, creating a circular interpolated wavefonn. Any wavefonn can be
substituted to create different sounds. Toe amplitude of the output is controlled by the env
variable, and grows and decays ata rate determined by trel and amp arrays.
#include
#include
#include
#include
<stdlib.h>
<stdio.h>
<math.h>
"rtdspc.h"
float osc(float,float,int);
float rate,freq;
float amp = 16000;
void main()
{
/* change_flag:
if(change_flag != O) {
/* assume rate and freq change every time*/
wosc
freq * two_pi_div_sample_rate;
arg = 2.0 * cos(wosc);
a= arg * rate;
b =-rate* rate;
xl04
(Continuecll
2~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.5
.,
flush();
>
.!l
Cll
0.5
W
-0.5
-1
*/
180
-1.5
-2
500
1000
1500
2000
2500
3000
3500
4000
Sample Number
FIGURE 4.12 Example signal output from the OSC.C program (modified to
reach peak amplitude in 500 samples and change frequency every 500 sample for display purposes).
181
Real-Time Filtering
182
#include
#include
#include
#include
<stdlib.h>
<n'ath.h>
rtdspc.h
gwave.h"
/* gwave[301] array */
int t,told,ci,k;
float ~ld,rate,env,wave_size,dec,phase,frac,delta,sample;
register long int i,endi;
register float sig_out;
static
static
static
static
0.02,
float trel[SJ = {
0.14,
0.6, 1.0, o.o };
float amps[SJ = { 15000.0, 10000.0, 4000.0, 10.0, O.O};
float rates[lOJ;
int tbreaks[lO];
wave_size = 300.0;
endi = 96000;
for(key = -24
Chap.4
flush();
USTING 4.14
key < 48
183
Sec. 4.5
(Continued)
key++)
phase = O.O;
rate= rates[OJ;
env = 1.0;
ci = O;
for(i =O; i < endi; i++)
/* calculate envelope amplitude*/
LISTING 4.14 Program WAVETAB to generate periodic waveform at any
frequency. (Continued)
1;l
0.2
~
.!l
cn
-0.2
-0.4
-0.6
-0.8
-1
50
100
150
200
250
Sample Number
FIGURE 4.13
WAVETAB.
300
184
Real-Time Filtering
Chap, 4
4.6 REFERENCES
Sec. 4.6
References
185
STEARNS, S. and DAVID, R. (1988). Signal Processing Algorithms. Englewood Cliffs, NJ: Prentice
Hall.
ANTONIOU, A. (1979). Digital Fitters: Analysis and Design. New York: McGraw-Hill.
VAN V ALKENBURG, M.E. ( 1982). Analog Filter Design. New York: Holt, Rinehart and Winston.
BRIGHAM, E.O. (1988). The Far Fourier Transform and Its Applications. Englewood Ciiffs N.
Prentice Hall.
' J.
ZVEREV, A. 1. (1967). Handbook of Filter Synthesis. New York: John Wiley & Sons.
CROCHIERE, R.E. and RABINER, L.R. (1983). Multirate Digital Signal Processing. EnglewOOd
Cliffs, NJ: Prentice Hall.
ELIOIT, D.F. (Ed.). (1987). Handbook of Digital Signal Processing. San Diego, CA: Academic
Press.
EMBREE, P. and KIMBLE B. (1991). C Language Algorithms for Digital Signal Processin
Englewood Cliffs, NJ: Prentice Hall.
g.
GHAUSI, M.S. and LAKER, K.R. (1981). Modem Filter Design: Active RC and Switched
Capacitar. Englewood Cliffs, NJ: Prentice Hall.
IEEE DIGITAL S!GNAL PROCESSING COMMfITEE (Ed.). (1979). Programs for Digital Signal
Processing. New York: IEEE Press.
JOHNSON, D.E., JOHNSON, J.R. and MOORE, H.P. (1980). A Handbook of Active Filters. Englewood
Cliffs, NJ: Prentice Hall.
JONG, M.T. (1992). Methods of Discrete Signal and System Analysis. New York: McGraw-Hill.
KAISER, J. F. and SCHAFER, R. W. (Feb. 1980). On the Use of the /0-Sinh Window for Spectrum
Analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, (ASSP-28) (1),
105-107.
KNUTH, D.E. (1981). Seminumerical Algorithms, The Art of Computer Programming, Vol. 2. (2nd
ed.). Reading, MA: Addison-Wesley.
McCLELLAN, J., PARKS, T. and RABINER, L.R. (1973). A Computer Program for Designing
Optimum FlR Linear Phase Digital Filters. IEEE Transactions on Audio and Electro-acoustics,
AU-21. (6), 506-526.
MOLER, e., LITTLE, J. and BANGERT,
MathWorks.
s.
MSCHUYTZ, G.S. and HORN, P. (1981). Active Filter Design Handbook. New York: John Wiley &
Sons.
PPENHEIM, A. and SCHAFER, R. (1975). Digital Signal Processing, Englewood Cliffs, NJ:
Prentice Hall.
PPENHEIM, A. and SCHAFER, R. (1989). Discrete-time Signal Processing. Englewood Cliffs, NJ:
Prentice Hall.
PAPOUUS, A. (1984). Probability, Random Variables and Stochastic Processes, (2nd ed.). New
York: McGraw-Hill.
PARK, S.K. and MILLER, K.W. (Oct. 1988). Random Number Generators: Good Ones Are Hard to
Find. Communications ofthe ACM, (31) (IO).
PARKS, T.W. and BURRUS, C.S. (1987). Digital Filter Design. New York: John Wiley & Sons.
PRESS W.H., FLANNERY, B.P., TEUKOLSKY, S.A. and VETTERLING, W.T. (1987). Numerical
Recipes. New York: Cambdge Press.
RABINER, L. and GoLD, B. (1975). Theory and Application of Digital Signal Processing.
Englewood Cliffs, NJ: Prentice Hall.
Sec. 5.1
CHAPTER
187
(4) Amount of overlap between successive spectra: Determines accuracy of the estimate, directly affects computation time
(5) Number of spectra averaged: Determines maximum rate of change of the detectable
spectra and directly affects the noise floor of the estimate
REAL-TIME
DSP APPLICATIONS
One of the common application areas for power spectral estimation is speech processing.
The power spectra of a voice signal give essential clues to the sound being made by the
speaker. Almost ali the information in voice signals is contained in frequencies below
3,500 Hz. A common voice sampling frequency that gives some margin above the
Nyquist rate is 8,000 Hz. The spectrum of a typical voice signal changes significantly
every 10 msec or 80 samples at 8,000 Hz. As a result, popular FFf sizes for speech processing are 64 and 128 points.
Included on the MS-DOS disk with this book is a file called CHKL.TXT. This is
the recorded voice of the author saying the words "chicken Iittle." These sounds were
chosen because of the range of interesting spectra that they produced. By looking at a plot
of the CHKL.TXT samples (see Figure 5.1) the break between words can be seen and the
This chapter combines the DSP principies described in the previous chapters with the
specifications of real-time systems designed to solve real-world problems and provide
complete software solutions for several DSP applications. Applications of FFf spectrum
analysis are described in section 5.1. Speech and music processing are considered in sections 5.3 and 5.4. Adaptive signal processing methods are illustrated in section 5.2 (parametric signal modeling) and section 5.5 (adaptive frequency tracking).
50
1:l
Signals found in most practical DSP systems do not have a constant power spectrum. Toe
spectrum of radar signals, communication signals, and voice waveforms change continually with time. This means that the FFf of a single set of samples is of very limited use.
More often a series of spectra are required at time intervals determined by the type of signal and information to be extracted.
Power spectral estimation using FFfs provides these power spectrum snapshots
(called periodograms). The average of a series of periodograms of the signal is used as
the estimate of the spectrum of the signal at a particular time. The parameters of the average periodogram spectral estimate are:
(1) Sample rate: Determines maximum frequency to be estimated
(2) Length of FFf: Determines the resolution (smallest frequency difference detectable)
(3) Window: Determines the amount of spectral leakage and affects resolution and
noise floor
186
<I)
-100
1000
2000
3000
4000
5000
Sample Number
FIGURE 5.1
6000
188
Sec. 5.1
Chap. 5
relative volume can be inferred from the envelope of the waveform. The frequenc
. more d'ffi
. f rom th'1s plot.
y COntent 1s
1 1cu1t to determme
The program RTPSE (see Listing 5.1) accepts continuous input samples ( .
.
f
trai
.
usmg
getinput ()) and generates a contmuous set o spec
estimates. The power spectral
estimation parameters, such as FFf length, overlap, and number of spectra averagect, are
set by ~ progi:am to default values. The amount .o~ overlap and averaging can be
~hanged m real-time. RTPSE produces_ an ou~ut cons1sting of a spectral estimate every
4
mput samples. Each power spectral estlmate 1s the average spectrum of the input file over
the past 128 samples (16 FFf outputs are averaged together).
Figure 5.2 shows a contour plot of the resulting spectra plotted as a frequency versus time plot with the amplitude of the spectrum indicated by the contours. The high fre-
static float
mag[FFT_LENGTH], sig[FFT_LENGTH], hanw[FFT_LENGTH];
static COMPLEX sanp [FFT_LENGTH] ;
/* overall scale factor */
scale = 1.0f/(float)FFT.J,ENGTH;
scale *= scale/(float)nmnav;
/* calculate hanming window * /
tempflt = 8.0*atan(l.O)/(FFT_LENGTH-1);
for(i = O i < FFT_LENGTH; i++)
hamw[i] = 0.54 - 0.46*cos(tempflt*i);
/* read in the first FFT_LENGTH sanples, overlapped sanples read in loop */
for(;;)
<std1ib.h>
<stdio.h>
<math.h>
rtdspc.h"
O;
/*********************************************************************
fft(sanp,M);
*********************************************************************/
/* FFT length ImlSt be a power of 2 */
#define FFT_LENGTH 64
#define M 6
/* Im1st be log2(FFT_LENGTH) */
in t nmnav = 16 ;
int ovlap = 60;
/*
main()
{
int
float
i,j,k;
scale,tempflt;
USTING 5-1 Program RTPSE to perform real-time power spectral estimation using
the FFT. (Continued)
189
USTING 5.1
(Continuedl
190
Chap. 5
Sec. 5.1
191
targets can be removed by determining their average amplitude from a series of echoes
and subtracting them from each received echo. Any moving target will not be subtracted
and can be further processed. A simple method to remove stationary echoes is to simply
subtract successive echoes from each other (this is a simple highpass filter of the Doppler
signal). Toe mean frequency of the remaining Doppler signal of the moving targets can
then be determined using the complex FFf.
Listing 5.2 shows the program RADPROC.C, which perforrns the DSP required to
remove stationary targets and then estimate the frequency of the remaining Doppler signal. ln order to illustrate the operation of this program, the test data file RADAR.DAT
was generated. These data represent the simulated received signal from a stationary target
1
:.
r:::
ii:i
#include <stdlib.h>
#include <math.h>
#include "rtdspc.h"
/*********************************************************************
20
30
40
50
60
70
80
90
FIGURE 5.2 Contour plot of the power spectrum versus frequency and
time obtained using the RTPSE program with the input file CHKL.TXT.
Contours are at 5 dB intervals and the entire 20 power spectrum is normalized to O dB.
quency content of the "chi'' part of "chicken" and the lower frequency content of "little''
are clearly indicated.
void main()
{
int
float
static float
static COMPLEX
static COMPLEX
i,j,k;
tempflt,rin,iin,pl,p2;
mag[FFT_LENGTH);
echos[ECHO_SIZE) [FF'I'_LENGTH);
last_echo[ECHO_SIZE);
LISTING 5.2 Program RADPROC to perform real-time radar signal processing using the FFT. (Continued)
192
Sec. 5.2
for(;;)
tempflt = mag[O];
i=O;
for(j = 1 ; j < FFT_LENGTH
if(mag[j] > tempflt) {
tempflt = mag[j];
i=j;
j++) {
193
Chap. 5
added to a moving target signal with Gaussian noise. Toe data is actually a 2D matrix
representing 12 consecutive complex samples (real,imag) along the echo in time (representing 12 consecutive range locations) with each of 33 echoes following one after another. Toe sampling rates and target speeds are not important to the illustration of the program. Toe output of the program is the peak frequency location from the 16-point FFT in
bins (O to 8 are positive frequencies and 9 to 15 are -7 to -1 negative frequency bins). A
simple (and efficient) parabolic interpolation is used to give a fractional output in the results. Toe output from the RADPROC program using the RADAR.DAT as input is 24
consecutive numbers with a mean value of 11 and a small standard deviation due to the
added noise. Toe first 12 numbers are from the first set of 16 echoes and the last 12 numbers are from the remaining echoes.
Figure 5.3 shows the block diagram of a system modeling problem that will be used to illustrate the adaptive IIR LMS algorithm discussed in detail in section 1.7.2 of chapter 1.
Listing 5.3 shows the main program ARMA.C, which first filters white noise (generated
using the Gaussian noise generator described in section 4.2.1 of chapter 4) using a secondorder IIR filter, and then uses the LMS algorithm to adaptively determine the filter function.
Listing 5.4 shows the function iir_biquad, which is used to filter the white
noise, and Listing 5.5 shows the adaptive filter function, which implements the LMS algorithm in a way compatible with real-time input. Although this is a simple ideal example
where exact convergence can be obtained, this type of adaptive system can also be used
to model more complicated systems, such as communication channels or control systems.
Toe white noise generator can be considered a training sequence which is known to the
algorithm; the algorithm must determine the transfer function of the system. Figure 5.4
shows the error function for the first 7000 samples of the adaptive process. Toe error reduces relatively slowly due to the poles and zeros that must be determined. FIR LMS algorithms generally converge much faster when the system can be modeled as a MA system (see section 5.5.2 for an FIR LMS example). Figure 5.5 shows the path of the pole
coefficients (bO,bl) as they adapt to the final result where bO = 0.748 and bl = -0.272.
(text continues on page 198)
196
in_hist[2)
in_hist[ll
return(output);
/* poles */
USTING 5.5
(Continuedl
/*zeros*/
197
Sec. 5.2
Chap. 5
/* error calculation */
e= d - output;
/* update coefficients */
a[OJ += e*0.2*alpha[OJ;
a[l) += e*O.l*alpha[l);
a[2) += e*0.06*alpha[2);
b[OJ += e*0.04*beta[OJ;
b[ll += e*0.02*beta[l);
/* update history for alpha */
for(i =O; i < 3; i++) {
alpha_h2 [i) = alpha_hl [i);
alpha_hl[i] = alpha[i];
0.8
0.6
0.4
1
<Jl
-0.4
-0.6 c___ ___,__ ___,__ _ __.___ __.,__ ____,_ _ __,___ _
O
1000
2000
3000
4000
5000
6000
Sample Number
FIGURE 5.4 Errar signal during the IIR adaptive process, illustrated by the
program ARMA.e.
__l
7000
198
Chap. 5
0.1
0.05
<stdlib.h>
<stdio.h>
<string.h>
<math.h>
rtdspc.h"
o
i=
.!l
,3 -0.05
-:e
199
#include
#include
#include
#include
#include
ARMA.e
0.15
8u
Sec. 5.2
main()
{
-0.1
-0.15
-0.2
-0.25
-0.3
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
The frequency of a signal can be estimated in a variety of ways using spectral analysis
methods (one of which is the FFf illustrated in section 5.1.2). Another parametric approach is baseei on modeling the signal as resulting from an AR process with a single
complex pole. The angle of the pole resulting from the model is directly related to the
mean frequency estimate. This model approach can easily be biased by noise or other signals but provides a highly efficient real-time method to obtain mean frequency information.
Toe first step in the AR frequency estimation process is to convert the real signal
input to a complex signal. This is not required when the signal is already complex, as is
the case for a radar signal. Real-to-complex conversion can be done relatively simply by
using a Hilbert transform FIR filter. The output of the Hilbert transform filter gives the
imaginary part of the complex signal and the input signal is the real part of the complex
signal. Listing 5.6 shows the program ARFREQ.C, which implements a 35-point Hilbert
transform and the AR frequency estimation process. The AR frequency estimate determines th.e average frequency from the average phase differences between consecutive
0.032403,
0.000000,
0.207859,
0.000000,
-0.081119,
0.000000,
-0.038135
};
b[OJ Coefficient
AGURE 5.5 Pole coefficients (b0,b1) during the IIR adaptive process, illustrated by the program ARMA.e.
0.000000,
0.058420,
0.000000,
-0.635163,
0.000000,
-0.043301,
0.000000,
200
Chap. 5
201
Speech Processing
Sec. 5.3
0.351--.-----.-----.---.-----,c---.---..:;_=.---,--~-~
0.3
freq = cpi*atan2(xi,xr);
else
freq = O.O;
sendout ( freq) ;
0.25
~
,ri
.,:,
0.2
.;;
>.,
t;j
-~8
0.15
i:.i
0.1
complex samples. Toe are tangent is used to determine the phase angle of the complex results. Because the calculation of the are tangent is relatively slow, severa! simplifications
can be made so that only one are tangent is calculated for each frequency estimate. Let x
be the complex sequence after the Hilbert transform. Toe phase difference is
n
<l>n
= arg[xn]-arg[xn_i] = arg[xnx:_i].
0.05
0'--~-'--~---''----'--"--~---'~~-'-~---'-~~..,__~__,_~--1..,__~__,
20
40
60
80
100
120
140
160
180
200
(5.1)
Estirnate N urnber
w%~n = arg[w%~nX=-I]
f
21t wlen -
(5.2)
21t
where the last approximation weights the phase differences based on the amplitude of the
complex signal and reduces the number of are tangents to one per estimate. Toe constant
wlen is the window length (winlen in program ARFREQ) and controls the number of
phase estimates averaged together. Figure 5.6 shows the results from the ARFREQ program when the CHKL.TXT speech data is used as input. Note that the higher frequency
content of the "chi" sound is easy to identify.
3 SPEECH PROCESSING
Communication channels never seem to have enough bandwidth to carry the desired
speech signals from one location to another for a reasonable cost. Speech compression attempts to improve this situation by sending the speech signal with as few bits per second
as possible. Toe sarne channel can now be used to send a larger number of speech signals
at a lower cost. Speech compression techniques can also be used to reduce the amount of
memory needed to store digitized speech.
202
1clude
1clude
1clude
1clude
Sec. 5.3
Speech Processing
203
Chap. 5
<stdlib.h>
<stdio.h>
"rtdspc .h"
"mu.h"
~************************************************************************
~ . C - PROGRAM TO DEMONSTRATE MU LAW SPEECH COMPRESSION
~**********************************************************************/
functions are also shared by both the encoder and decoder of both sub-bands. Ali of the
functions are performed using fixed-point arithmetic, because this is specified in the
CCITI recommendation. A floating-point version of the G. 722 C code is included on
the enclosed disk. The floating-point version runs faster on the DSP32C processor, which
has limited support for the shift operator used extensively in the fixed-point implementation. Listing 5.8 shows the main program G722MAIN.C, which demonstrates the algorithm by encoding and decoding the stored speech signal "chicken little," and then operates on the real-time speech signal from getinput ( ) . The output decoded signal is
played using sandout () with an effective sample rate of 16 kHz (one sarnple is interpolated using simple linear interpolation giving an actual sample rate for this example of
Ln()
int i,j;
for(;;)
i = (int) getinput();
encode 14 bit linear input to mu-law */
j = abs(i);
if(j > Oxlfff) j = Oxlfff;
j = invmutab[j/2];
if(i < O) j I= Ox80;
decode the 8 bit mu-law and send out */
sendout( (float)mutab[j]);
USTING 5.7 P..-ogram MULAW.C, which encodes and decodes a speech signal using
-law compression.
pression are also shown in this listing. Because the tables are rather long, they are in the
include file MU.H.
#include <stdlib.h>
#include "rtdspc.h"
/* Main program for g722 encode and decode demo for 210XO */
extern int encode(int,int);
extern void decode(int);
extern void reset();
/* outputs of the decode function */
extern int xoutl,xout2;
int chkl_coded[6000];
extern int n chkl [ J ;
void main()
{
int i,j,tl,t2;
float xfl = O.O;
float xf2 = O.O;
/* reset, initialize required memory */
reset();
/* code the speech, interpolate because it was recorded at 8000 Hz */
The CCITI recommendation G.722 is a standard for digital encoding of speech and audio
signals in the frequency range from 50 Hz to 7000 Hz. The G.722 algorithm uses subband adaptive differential pulse code modulation (ADPCM) to compress 14-bit, 16 kHz
samples for transmission or storage at 64 kbits/sec (a compression ratio of 3.5:1).
Because the G.722 method is a wideband standard, high-quality telephone network applications as well as music applications are possible. If the sampling rate is increased, the
sarne algorithm can be used for good quality music compression.
The G.722 program is organized as a set of functions to optimize memory usage
and make it easy to follow. This program structure is especially efficient for G.722, since
most of the functions are shared between the higher and lower sub-bands. Many of the
204
Chap. 5
decode(chkl_coded[i]);
xfl = (float)xoutl;
sendout(0.5*xf2+0.5*xfl);
sendout(xfl);
xf2 = (float)xout2;
sendout(0.5*xf2+0.5*xfl);
sendout (xf2) ;
j=encode(tl, t2);
decode(j);
xfl = (float)xoutl;
sendout(0.5*(xfl+xf2));
sendout (xfl) ;
xf2 = (float)xout2;
sendout(0.5*(xf2+xfl));
sendout(xf2);
205
Speech Processing
int decis;
int
sh;
int
eh;
dh;
int
int
il,ih;
int szh,sph,ph,yh;
int szl,spl,sl,el;
USTING 5.8
Sec. 5.3
second value */
(Continued)
32 kHz). Listing 5.9 shows the encod function, and Listing 5.10 shows the decode
function; both are contained in G.722.C. Listing 5.ll shows the functions filtez,
filtep, quantl, invqxl, logscl, scalel, upzero, uppol2, uppoll, invqah, and logsch, which are used by the encode and decode functions. ln Listings
5.9, 5.10, and 5.ll, the global variable definitions and data tables have been omitted for
clarity.
xh
*tqmf_ptrl-;
int i;
int *n...ptr;
int *tqmf_ptr,*tqmf_ptrl;
long int xa,xb;
int xl,xh;
LISTING 5.9
(Continued)
206
Chap. 5
1
1
'i
sl
el
= szl + spl.;
= xl - sl;
= quantl(el.,detl);
.nvqxl: does both invqal and invqbl- cooputes quantized difference signal * /
'or invqbl, trunca te by 2 lsbs, so mode = 3 * /
.nvqal case wi th mode = 3 * /
dlt = ((long)detl*qq4_code4_table[il >> 21) >> 15;
.ogscl: updates logaritlunic quant. scale factor in low sub band* /
nbl = logscl(il,nbl);
:calel: coopute the quantizer scale factor in the lower sub band* /
,alling parameters nbl and 8 (constant such that scalel can be scaleh) * /
detl = scalel(nbl,8);
Sec. 5.3
rlt2 = rltl;
rltl = rlt;
plt2 = pltl;
pltl = plt;
/* high band encode */
szh
filtez(delay_bph,delay_dhx);
sph
filtep(rhl,ahl,rh2,ah2);
else {
ih= l;
upzero(dlt,del.ay_dltx,delay_bpl);
=2
case*/
lSL
= uppol2(all,al2,plt,pltl,plt2);
dh = ((long)deth*qq2_code2_table[ih])
207
Speech Processing
= uppoll(all,al2,plt,pltl);
lone with lower sub_band encoder; now implement delays for next time*/
USTING 5.9
(Continued)
upzero(dh,delay_dhx,delay_bph);
USTING 5.9
(Continuedl
208
= uppol2 (ahl,ah2,ph,phl,ph2);
tppoll:
ahl
= uppoll(ahl,ah2,ph,phl);
Chap. 5
Sec. 5.3
Speech Processing
= filtez(dec_del_bpl,dec_del_dltx);
dec_szl
= filtep(dec_rltl,dec_all,dec_rlt2,dec_al2);
dec_spl
dec_sl
= dec_spl
dec_szl;
= dl
dec_sl;
= logscl(ilr,dec_nbl);
= scalel(dec_nbl,8);
= dec_dlt
+ dec_szl;
l decode(int input)
i;
xal,xa2;
/* qmf accumulators */
*h_ptr;
lXll *ac_ptr,*ac_ptrl,*ad_ptr,*ad,_ptrl;
i1r,ih;
xs,xd;
r1,rh;
dl;
upzero(dec_dlt,dec_del_dltx,dec_del_bpl);
/* uppol2: update second predictor coefficient apl2 and delay it as al2 */
dec_al2
= uppol2(dec_all,dec_al2,dec_plt,dec_pltl,dec_plt2);
= uppoll(dec_all,dec_al2,dec_plt,dec_pltl);
*I
I * done wi th lower sub band decoder, inplement delays for next time * /
209
210
dec_rlt2
dec_rltl
dec_plt2
dec_pltl
Chap.5
l
1
dec_rltl;
dec_rlt;
dec_pltl;
dec_plt;
Sec. 5.3
211
Speech Processing
dec_rh2 = dec_rhl;
dec_rhl
rh;
dec_ph2
dec_phl;
dec_ph;
dec_phl
/* end of higher sub_band decoder * /
/* end with receive quadrature mirror filters */
xd
xs
= rl
= rl
- rh;
+ rh;
h_ptr = h;
ac_ptr = accumc;
ad_ptr = accumi;
xal = (long)xd * (*h_ptr++);
xa2 = (long)xs * (*h_ptr++);
/* main multiply acCUl!Ullate loop for sanples and coefficients */
for(i =O; i < 10 ; i++) {
xal += (long) (*ac_ptr++) * (*h_ptr++);
xa2 += (long) (*a~tr++) * (*h_ptr++);
/* final mult/acCUl!Ullate */
xal += (long)(*ac_ptr) * (*h_ptr++);
xa2 += (long) (*a~tr) * (*h_ptr++);
/* scale by 2~14 */
xoutl
xal >> 14;
xout2
xa2 >> 14;
=
=
ac_ptrl = ac_ptr - 1;
a~trl = ad_ptr - 1;
for(i = O ; i < 10; i++)
*ac_ptr- = *ac_ptrl-;
*a~tr- = *a~trl-;
*ac_ptr
*a~tr
= xd;
= xs;
uppoll(dec_ahl,dec_ah2,dec_ph,dec_phl);
USTING 5.10
(Continued)
USTING 5.10
(Continued)
.,
Real-Time DSP Applications
212
Sec. 5.3
Chap.5
213
Speech Processing
nt i;
14));
/* x2 here */
/* scalel: cornpute the quantizer scale factor in the lower or upper sub-band*/
int wdl,wd2,wd3;
wd1 = (nbl >> 6) & 31;
wd2 = nbl >> 11;
wd3 = ilb_table[wdl] >> (shift_constant + 1 - wd2);
return(wd3 << 3);
/* x2 here */
nt ril,mil;
ong int wd,decis;
s of difference signal */
d= abs(el);
termine mil based on decision levels and detl gain */
or(mil = O ; mil< 30; mil++) {
decis = (decis_levl[mil]*(long)detl) >> 15L;
if(wd < decis) break;
mil=30 then wd is less than all decision levels */
f(el >= O) ril = quant26bt__pos[mil];
lse ril = quant26bt_neg[mil];
eturn(ril);
LISTING 5.11 Functions used by the encode and decode algorithms of G.722 (contained in G.722.C). (Continued)
int i,wd2,wd3;
/*if dlt is zero, then no sum into bli */
if(dlt == O) {
for(i = O ; i < 6 ; i++) {
bli[i] = (int) ((255L*bli[i]) >> 8L); /* leak factor of 255/256 */
else {
for(i =O; i < 6 ; i++) {
if((long)dlt*dlti[i] >= O) wd2 = 128; else wd2 = -128;
wd3 = (int) ((255L*bli[i]) >> 8L);
/* leak factor of 255/256 */
bli[i] = wd2 + wd3;
(Continued)
,
214
dlti[5]
dlti[4]
dlti[3]
dlti[2]
dlti[ll
dlti[O]
Sec. 5.3
Chap.5
= dlti[4];
= dlti[3];
= dlti[2];
= dlti[l];
return(apll);
= dlti[O];
= dlt;
= 'Wd2
= wd4
215
Speech Processing
int wd;
wd = ((long)nbh * 127L) >> 7L;
nbh = wd + wh_code_table[ih];
if(nbh < O) nbh = O;
if(nbh > 22528) nbh = 22528;
return (nbh) ;
-'Wd2;
/ * check sarne sign * /
/* gain of 1/128 */
/* sarne sign case*/
USTING 5.11
(Continued)
- 128;
Figure 5.7 shows a block diagram of the G.722 encoder (transmitter), and
Figure 5.8 shows a block diagram of the G.722 decoder (receiver). Toe entire algorithm
has six main functional blocks, many of which use the sarne functions:
(1)
(2&3)
1ppoll - update first predictor coefficient (pole section) */
[nputs: all, apl2, plt, pltl. outputs: apll */
(4&5)
(6)
A transrnit quadrature rnirror filter (QMF) that splits the frequency band into
two sub-bands.
A lower sub-band encoder and higher sub-band encoder that operate on the
data produced by the transmit QMF.
A lower sub-band decoder and higher sub-band decoder.
A receive QMF that combines the outputs of the decoder into one value.
else {
apll
XH
XL
(int)'Wd2 - 192;
USTING 5.11
X1n
Transmit
Quadratura
1 Mirror
Filters
(Continued)
Higher Sub-Band
ADPCM Encoder
MUX
Lower Sub-Band
ADPCM Encoder
AGURE 5.7 Block diagram of ADPCM encoder (transmitter) implemented by program G.722.C.
,
Real-Time DSP Applications
216
16 kbiVs_
,ded 64 kbiV~
-
IH
Higher Sub-Band
ADPCM Decoder
fH
.-
DMUX
48 kbiV~
ILr
Lower Sub-Band
ADPCM Decoder
fL
.-
Sec. 5.3
Chap.5
Receive
Quadratura
Mirror
Filters
XH
..
Speech Processing
~
~
217
4-Level
Adaptive
Quantizer
16 kbiVs_
IH
XOUI
LH
Quantizer
Adaptation
Mode lndication
FIGURE 5.8
G.722.C.
4-Level
Inversa
Adaptive
Quantizer
The G.722.C functions have been checked against the G.722 specification and are
fully compatible with the CCITI recommendation. Toe functions and program variables are
narned according to the functional blocks of the algorithm specification whenever possible.
Quadrature mirror filters are used in the G.722 algorithm as a method of splitting
the frequency band into two sub-bands (higher and lower). The QMFs also decimate the
encoder input from 16 kHz to 8 kHz (transmit QMF) and interpolate the decoder output
from 8 kHz to 16 kHz (receive QMF). These filters are 24-tap FIR filters whose impulse
response can be considered lowpass and highpass filters. Both the transmit and receive
QMFs share the sarne coefficients and a delay Iine of the sarne number of taps.
Figure 5.9 shows a block diagrarn of the higher sub-band encoder. The lower and
higher sub-band encoders operate on an estimated difference signal. The number of bits
required to represent the difference is smaller than the number of bits required to represent the complete input signal. This difference signal is obtained by subtracting a predicted value from the input value:
dH
fH
+
+
+
Figure 5.10 shows a block diagrarn ofthe higher sub-band decoder. ln general, both
the higher and lower sub-band encoders and decoders make the sarne function calls in almost the sarne order because they are similar in operation. For mode 1, a 60 levei inverse
adaptive quantizer is used in the lower sub-band, which gives the best speech quality. The
higher sub-band uses a 4 levei adaptive quantizer.
el = xl - sl
eh=xh-sh
The predicted value, sl or sh, is produced by the adaptive predictor, which contains a second-order filter section to model poles, and a sixth-order filter section to model
zeros in the input signal. After the predicted value is deterrnined and subtracted from the
input signal, the estimate signal el is applied to a nonlinear adaptive quantizer.
One important feature of the sub-band encoders is a feedback loop. The output of
the adaptive quantizer is fed to an inverse adaptive quantizer to produce a difference signal. This difference signal is then used by the adaptive predictor to produce sl (the estimate of the input signal) and update the adaptive predictor.
The G.722 standard specifies an auxiliary, nonencoded data channel. While the
G.722 encoder always operates at an output rate of 64 kbits per second (with 14-bit,
16kHz input samples), the decoder can accept encoded signals at 64, 56, or 48 kbps. Toe
56 and 48 kbps bit rates correspond to the use of the auxiliary data channel, which operates at either 8 or 16 kbps. A mode indication signal informs the decoder which mode is
being used. This feature is not implemented in the G.722.C prograrn.
Adaptive
Predictor
16 kbiVs
IH
4-Level
Inversa
Adaptive
Quantizer
dH
1
LL
1
Quantizer
Adaptation
(!)
Adaptive
Predictor
fH
~!
Real-Time DSP Applications
218
Chap.5
Sec. 5.4
MUSIC PROCESSING
#include <stdlib.h>
#include <ma.th.h>
#include rtdspc.h"
Music signals require more dynamic range and a much wider bandwidth than speech signals. Professional quality music processing equipment typically uses 18 to 24 bits to represent each sample and a 48 kHz or higher sampling rate. Consumer digital audio processing (in CD players, for example) is usually done with 16-bit samples anda 44.I kHz
sampling rate. ln both cases, music processing is a far greater challenge to a digital signal
processor than speech processing. More MIPs are required for each operation and quantization noise in filters becomes more important. ln most cases DSP techniques are less expensive and can provide a higher levei of performance than analog techniques.
/**************************************************************************
EQUALIZ. C - PROGRAM TO DEMONSTRATE AUDIO EQUALIZATION
USING 7 IIR BANDPASS FILTERS.
*************************************************************************/
int i;
float signal_in,signal_out;
/* history arrays for the filters */
Gain [O]
Bandpass
Filter
I f\
RGURE 5.11
EQUALIZ.C.
void main()
-1
Bandpass
Filter
lD
219
Music Processing
Yaut
0.9948840737,
0.9872598052,
0.9663984776,
0.9182843566,
0.8171985149,
0.6308654547,
0.2478443682,
o.o,
o.o,
o.o,
o.o,
o.o,
o.o,
o.o,
-1. o
-1. o
-1. o
-1. o
-1. o
-1. o
-1. o
}.
}.
}.
}.
}.
}.
}.
USTING 5.12
in real-time.
,
!
220
Sec. 5.4
Chap.5
shift a sound up or down by any number of semitones (12 semitones is an octave as indicated by equation 4.13). It uses a long Kaiser window filter for interpolation of the samples as illustrated in section 4.3.2 in chapter 4. Toe filter coefficients are calculated in the
first part of the PSHIFT program before real-time input and output begins. Toe filtering is
done with two FIR filter functions, which are shown in Listing 5.14. Toe history array is
only updated when the interpolation point moves to the next input sample. This requires
that the history update be removed from the fir_filter function discussed previously. Toe history is updated by the function fir_history_update. Toe coefficients
are decimated into short polyphase filters. An interpolation ratio of up to 300 is performed and the decimation ratio is determined by the amount of pitch shift selected by
the integer variable key.
150, 400, 1000, 2400, 6000, and 15000 Hz is implemented by program EQUALIZ.C. The
bandwidth of each filter is 60 percent of the center frequency in each case, and the sampling rate is 44100 Hz. This gives the coefficients in the example equalizer program
EQUALIZ.C shown in Listing 5.12. Toe frequency response of the 7 filters is shown in
Figure 5.12.
5.4.2 Pitch-Shifting
Changing the pitch of a recorded sound is often desired in order to allow it to mix with a
new song, or for special effects where the original sound is shifted in frequency to a point
where it is no longer identifiable. New sounds are often created by a series of pitch shifts
and mixing processes.
Pitch-shifting can be accomplished by interpolating a signal to a new sampling rate,
and then playing the new samples back at the original sampling rate (see Alies, 1980; or
Smith and Gossett, 1984). If the pitch is shifted down (by an interpolation factor greater
than one ), the new sound will have a longer duration. If the pitch is shifted upward (by an
interpolation factor less than one where some decimation is occurring), the sound becomes shorter. Listing 5.13 shows the program PSHIFT.C, which can be used to pitch-
#include <stdlib.h>
#include <math.h>
#include "rtdspc.h"
/* Kaiser Window Pitch Shift Algorithm */
/* set interpolation ratio */
int ratio = 300;
/* passband specified, larger rnakes longer filters */
float percent_pass = 80.0;
/* minimum attenuation in stopbands (dB), larger make long filters */
float att = 50.0;
/* key value to shift by (semi-tones up or down) */
/* 12 is one octave */
int key = -12;
int lsize;
1,--.-r-r,----r;;;,7,s;rr\':;_".,,~/\1 Ec
-2
\,..'
i.,
-8
-10
::s
)i
'
,'
///
void main()
{
,,,,,><\\\
-12
_),
int i,j;
int nfilt,npair,n,k;
float fa,fp,deltaf,beta,valizb,alpha;
float w,ck,y,npair_inv,pi_ratio;
float signal_in,phase,dec;
int old_key = O;
/* remember last key value */
float **h;
-16
-18
""J
' ,.
'
/X
-4
-6
'1 '"
221
Music Processing
-20
101
\
/
'
'' ''
103
1 't q
'
104
1~
''.I 'f
l li
10s
LISTING 5.13 Program PSHIFT.C, which performs pitch shifting on audio samples in
real-time. (Continued)
~
Real-Time DSP Applications
222
Chap.5
lsize
nfilt/ratio;
nfilt
npair
(long)lsize*ratio + l;
(nfil.t - 1) /2;
k = npair + n;
h[k%ratio) [k/ratio)
/* to location */
h[i%ratio) [i/ratio);
223
Music Processing
Sec. 5.4
;)
signal_in = getinput();
while(phase < (float)ratio)
k = (int)phase;
/* pointer to poly phase values */
sendout(fir_filter_no_update(signal_in,h[k],lsize,hist));
phase += dec;
= O;
phase -= ratio;
fir_update_history(signal_in,lsize,hist);
/* Use att to get beta (for Kaiser window function) and nfilt (always odd
valued and = 2*npair +1) using Kaiser's ercpirical formulas.
*/
long int filter_length(float att,float deltaf,float *beta)
{
long int npair;
*beta= O.O;
/* value of beta if att < 21 */
if(att >= 50.0) *beta= .1102 * (att - 8.71);
if (att < 50.0 & att >= 21.0)
*beta= .5842 * pow( (att-21.0), 0.4) + .07886 * (att - 21.0);
npair = (long int) ( (att - 8.0) / (28.72 * deltaf) );
return(2*npair + 1);
USTING 5.13
(Continuecf)
oo{
USTING 5.13
(Continuecf)
-,
224
d= d+
ds = ds
s =s +
while( ds
return(s);
Chap.s
Sec. 5.4
225
Music Processing
2;
* (y*y) / (d*d);
ds;
> lE-7 * s);
USTING 5.13
(Continued)
/* run the fir filter and do not update the history array */
float fir_filter_no_update(float input,float *coef,int n,float *history)
{
int i;
float *hist_ptr,*coef_ptr;
float output;
hist_ptr = history;
coef_ptr = coef + n - 1;
Music synthesis is a natural DSP application because no input signal or A/D converter is
required. Music synthesis typically requires that many different sounds be generated at
the sarne time and mixed together to form a chord or multiple instrument sounds (see
Moorer, 1977). Each different sound produced from a synthesis is referred to as a voice.
The duration and starting point for each voice must be independently controlled. Listing
5.15 shows the program MUSIC.C, which plays a sound with up to 6 voices at the sarne
time. It uses the function note (see Listing 5.16) to generate samples from a second
order IIR oscillator using the sarne method as discussed in section 4.5.1 in chapter 4. The
envelope of each note is specified using break points. The array trel gives the relative
times when the amplitude should change to the values specified in array maps. The envelope will grow and decay to reach the amplitude values at each time specified based on
the calculated first-order constants stored in the rates array. The frequency of the second order oscillator in note is specified in terms of the semitone note number key. A
key value of 69 will give 440 Hz, which is the musical note A above middle C.
/* input tap */
#include
#include
#include
#include
<stdlib.h>
<rnath.h>
rtdspc.h"
"song.h"
/* song[108][7] array */
return(output);
/* 6 Voice Music Generator */
/* update the fir_filter history array */
void fir_update_history(float input,int n,float *history)
{
int i;
float *hist_ptr,*histl_ptr;
hist_ptr = history;
histl_ptr = hist_ptr;
hist_ptr++;
typedef struct {
int key,t,cindex;
float cw,a,b;
float yl, yO ;
NOI'E_STATE;
#define MAX_VOICES 6
float note(NOI'E_STATE *,int *,float *);
and
fir_filter_
226
Chap.5
float trel[SJ = {
0.1
float amps[SJ = { 3000.0
float rates[lO];
int tbreaks[lO];
for(n = O
n < SONG_LENGTH
0.2,
0.7, 1.0,
5000.0, 4000.0, 10.0,
o.o };
o.o};
Sec. 5.4
Music Processing
227
#include <stdlib.h>
#include <math.h>
#include "rtdspc.h"
/* Function to generate samples from a second order oscillator */
/* key constant is 1/12 */
#define KEY_CONSTANT 0.083333333333333
n++) {
i++;
!* set the key numbers for all voices to be played (vnum is how many) */
vnmn = v;
flush();
USTING 5.15
/*
key:
ti= s->t;
/* t=O re-start case, set state variables */
i f ( !ti) {
wosc = TiKl_PI_DIV_FS_440 * pow(2.0, (s->key-69) * KEY_CONSTANT);
(Continuecf)
-,
228
Sec. 5.5
Chap.5
dk
z-ll
else {
ci = s->cindex;
/* rate change case*/
if(ti == tbreak_array[ci])
rate= rate_array[++ci];
s->a = s->cw * rate;
s->b =-rate* rate;
s->cindex = ci;
xk
Hk(z)
Yk
Enhanced
Output
AGURE 5.13
#include
#include
#include
#include
USTING 5.16
229
<stdlib.h>
<stdio.h>
<math.h>
rtdspc.h"
(Continued)
#define N 351
#define L 20
A signal can be effectively improved or enhanced using adaptive methods, if the signal
frequency content is narrow compared to the bandwidth and the frequency content
changes with time. If the frequency content does not change with time, a simple matched
filter will usually work better with less complexity. The basic LMS algorithm is illustrated in the next section. Section 5.5.2 illustrates a method that can be used to estimate
the changing frequency of a signal using an adaptive LMS algorithm.
5.5. 1 LMS Signal Enhancement
Figure 5.13 shows the block diagram ofan LMS adaptive signal enhancement that will be
used to illustrate the basic LMS algorithm. This algorithm was described in section 1.7.2
in chapter 1. The input signal is a sine wave with added white noise. The adaptive LMS
!* filter order,
(length L+l) */
void main()
{
230
Chap.5
Sec. 5.5
/* scale based on L */
mu = 2.0*mu/(L+l);
/* error signal */
e=d-y;
= O.O;
/* update sigma*/
sigma=alpha*(px[O)*px[OJ)+(l-alpha)*sigma;
mu_e=mu*e/sigma;
/* update coefficients */
for(ll = O ; 11 <= 1; 11++)
b[ll)=b[ll]+mu_e*px[ll];
/* update history */
for(ll = 1; 11 >= 1; 11-)
px[ll]=px[ll-1];
/*
return(y);
function lms(x,d,b,l,mu,alpha)
USTING 5.18
= input
= desired signal
b[O:l) =
=
mu
alpha
1
231
(ContinuedJ
data
algorithm (see Listings 5.17 and 5.18) is a 21 tap (20th order) FIR filter where the filter
coefficients are updated with each sample. Toe desired response in this case is the noisy
signal and the input to the filter is a delayed version of the input signal. Toe delay (~) is
selected so that the noise components of dk and xk are uncorrelated (a one-sample delay
works well for sine waves and white noise).
Toe convergence parameter mu is the only input to the program. Although many researchers have attempted to determine the best value for mu, no universal solution has
been found. If mu is too small, the system may not converge rapidly to a signal, as is illustrated in Figure 5.14. Toe adaptive system is moving from no signal (all coefficients
are zero) to an enhanced signal. This takes approximately 300 samples in Figure 5.14b
with mu =0.01 and approximately 30 samples in Figure 5.14c with mu =0.1.
1
Program LMS.C Output
Sec. 5.5
233
3 1
Program LMS.C Output
2.5
2
1.5
"
l
::,
>
.!l
(a)
Cll
!"
-1 ~
li
li
100
150
200
250
300
350
2~---r----r----r----r----r----,-----,
"
Cll
-o.s
-1
200
250
300
350
{\
V V V
-1.5
-2'----~---"----~---"----~---"----~
O
50
100
150
200
250
300
350
Sample Number
FIGURE 5.14 la) Original noisy signal used in program LMS.C. (b) Enhanced signal obtained from program LMS.C with- = 0.01.
232
150
V
1
100
FIGURE 5.14 (e) Enhanced signal obtained from program LMS.C with
= 0.1. (Continued)
.,H,
o
50
Sample Number
1.5
<b>
-2.5~--~--~---~--~---~--~---
-2
Sample Number
l l
..
-1.5
50
-0.5
-3
-40
{\
(e) ..
~
<ll
'h
-1
-2
Lsting 5.19 shows the INS1F.C program, which uses the lms function to determine instantaneous frequency estimates. Instead of using the output of the adaptive filter as illustrated in the last section, the INS1F program uses the filter coefficients to estimate the
frequency content of the signal. A 1024-point FFf is used to determine the frequency response of the adaptive filter every 100 input samples. Toe sarne peak location finding algorithm as used in section 5.1.2 is used to determine the interpolated peak frequency response of the adaptive filter. Note that because the filter coefficients are real, only the
first half of the FFf output is used for the peak search.
Figure 5.15 shows the output of the INS1F program when the 100,000 samples
from the OSC.C program (see section 4.5.1 of chapter 4) are provided as an input. Figure
5.15(a) shows the result without added noise, and Figure 5.15(b) shows the result when
white Gaussian noise (standard deviation = 100) is added to the signal from the OSC program. Listing 5.19 shows how the INS1F program was used to add the noise to the input
signal using the gaussian { ) function. Note the positive bias in both results due to the finite Iength (128 in this example) of the adaptive FIR filter. Also, in Figure 5.15(b) the first
few estimates are off scale because of the Iow signal levei in the beginning portion of the
waveform generated by the OSC program (the noise dominates in the first 10 estimates).
1
234
#include
#include
#include
#include
Chap.5
Sec. 5.5
<stdlib.h>
<stdio.h>
<math.h>
"rtdspc.h"
235
/* zero pad * /
for( ; i < FFT_~; i++) {
samp(i] .real= O.O;
sanp[i].imag = O.O;
fft(san'(),M);
for(j =O; j < FFT_LEN3TH/2; j++l {
tempflt = samp[j].real * samp[j].real;
tempflt += samp[j].imag * samp[j].imag;
mag[j] = tempflt;
= O.O;
for(;;)
else {
pl = mag[i] - mag[i-1);
p2 = mag[i] - mag[i+l];
}
sendout(((float)i + 0.5*((pl-p2)/(pl+p2+le-30)))/FFT_~);
USTING 5.19
_J
(Continued)
,
0.09 r - - - , - - - - - , - - - , - - - , - - - - r - - ~ - - - , - - . . , . . - - - - - - , . - ~
Sec. 5.6
References
237
5.6 REFERENCES
MOORER, J. (August 1977). Signal Processing Aspects ofComputer Music: A Survey. Proceedings
<:l
(a)
i:i' 0.05
[i
ICASSP.
CROCHIERE, R. and RABINER, L. (March 1981). Interpolation and Decimation of Digital SignalsA Tutorial Review. Proceedings ofthe !EEE, 69, 300--331.
SKOLNIK, M. ( 1980). lntroduction to Radar Systems, (2nd ed.). New York: McGraw-Hill.
General Aspects of Digital Transmission Systems (Nov. 1988). Terminal Equipments
Recommendations G.700--G.795. Intemational Telegraph and Telephone Consultative
Committee (CCITT) 9th Plenary Assembly, Melboume.
0.03
0.02
600
700
800
900
1000
600
700
800
900
1000
<!l
~
>,
"
(b) 8
e
0.05
0.04
0.03
0.02
O.OI
100
400
500
236
Appendix
210XO
filename
(*.e)
DSP32C
filename
(*.e)
320C30
filename
(*.e)
fftlk
ffin
fftlk
fftlk
NA
intout
NA
NA
filter
filter
filter
filter
ksrfir
NA
NA
NA
remez
NA
NA
NA
filter
filter
filter
filter
getsend
getinput
getinput
send_c30
getwav
NA
NA
NA
getsend
sendout
sendout
send_c30
PC
filename
(*.e)
APPENDIX
3.3.3 1024-Point FFl'.Test Function
sendwav
NA
NA
NA
filter
filter
filter
filter
mkgwn
mkgwn
mkgwn
mkgwn
interp3
NA
NA
NA
rfast
rfast21
rfast32
rfast30
intfft2
NA
NA
NA
osc
osc
osc
osc
wavetab
wavetab
wavetab
wavetab
rtpse
NA
NA
NA
radproc
NA
NA
NA
arma
NA
NA
NA
arfreq
NA
NA
NA
mulaw
mulaw
mulaw
mulaw
g722
g722_21k
NA
g722c3
NA
g722_21f
g722_32c
g722c3f
equaliz
equaliz
equaliz
equaliz
5.4.2 Pitch-Shifting
pshift
pshift
pshift
pshift
music
mu2lk
mu32c
muc3
lms
NA
NA
NA
instf
NA
NA
NA
Toe enclosed disk is an IBM-PC compatible high-density disk (1.44 MBytes capacity)
and contains four directories called PC, ADSP21K, DSP32C, and C30 for the specific
programs that have been compiled and tested for the four platforms discussed in this
book. Each directory contains a file called READ.ME, which provides additional information about the software. A short description of the platforms used in testing associated
with each directory is as follows:
Directory
Name
PC
ADSP21K
DSP32C
C30
Available
Sampling
MIPs
Rate(kHz)
Not
Real-time
Any
25
32
12.5
16
16.5
16
Note: "NA" refers to programs that are not applicable to a particular hardware platform.
Toe following table is a program list of the C programs and functions described in detail
in chapters 3, 4, and 5. Toe first column gives the section number in the text where the
program is described and then a short description of the program. Toe remaining columns
give the filenames of the four different versions of the source code for the four different
platforms. Note that the files from each platform are in different directories as shown in
the previous table.
238
239
Make files (with an extension .MAK) are also included on the disk for each platform. lf the user does not have a make utility availible, PC batch files (with an extension
.BAT) are also included with the sarne name as the make file. Toe following table is a
make file list for many of the C programs described in detail in Chapters 3, 4 and 5:
240
PC
filename
(*.mak)
filename
(*.mak)
Appendix
DSP32C
6lename
filename
(*.mak)
(*.mak)
320C30
NA
iout2lk
NA
NA
mkgwo
mkgwn
mkgwn
mkgwo
ioterp3
NA
NA
NA
rfast
rf2lk
rt'32
rfc30
intfft2
NA
NA
NA
osc
osc2lk
osc
osc
INDEX
wavetab
wavetab
wavetab
wavetab
rtpse
NA
NA
NA
radproc
NA
NA
NA
arma
NA
NA
NA
arfreq
mulaw
g722_2lk
g722_2lf
arfreq
mulaw
arfreq
mulaw
g722
NA
g722_32c
eqpc
eq
eq
arfreq
mulaw
g722c3
g722c3f
eq
ps
ps
music
mu2lk
ps
mu32c
muc3
lms
NA
NA
NA
instf
instf
instf
instf
Note: "NA"' refers to programs that are not applicable to a particular platform.
NA
ps
e
C preprocessor, 74, 87, 113
C Programming Pitfalls, 87
C++, 82,97
calloc, 78, 79, 80, 83, 84, 89, 150, 171, 173, 177,
222
case statement, 65
cast operator, 79
241
242
causality, 10
circular convolution, 170
clipping, 33, 34
coefficient quantization, 21, 145
combined operators, 61
commen~.54,92,93,94
complex conjugate, 19, 20, 91
complex conversion, 198
Complex Data, 90
complex numbers, 85, 87, 90, 91
complex signal, 190, 198, 200
compound statemen~. 64
compression, 33, 46, 200, 201, 202
conditional compilation, 74
conditional execution, 63, 95
constants, 20, 26, 42, 62, 71, 74, 90, 124, 225,
227
continue, 18, 66, 67, 68, 90, 96, 127
continuous time signals, 4
control structures, 63, 64, 66, 67, 95, 96
converter, 3, 41, 42, 54, 125, 132, 225
convolution, 9, 10, 18, 25, 134, 135, 165, 168, 170,
171, 172
cross correlation, 49
D
lndex
dynamic memory allocation, 77, 78
E
efficiency, 92, 93, 111, 113, 120, 121, 127, 128,
129, 135, 150
elliptic filter, 147, 149
enhancement, 160, 228, 229
EQUALIZ.C, 218, 219, 220
equiripple, 134
execution time, 65, 80, 89, 92, 93, 123, 124, 125
expected value, 37, 39, 42, 43
exponential, 32, 128, 178
expression, 8, 19, 37, 49, 59, 60, 61, 62, 63, 64, 65,
66, 67, 70, 86, 87, 91, 95, 128, 138, 165
extensibility, 92, 93
extem, 71, 72, 73, 155, 203
243
lndex
G
G.711, 201
G.722, 202, 215, 216
G722.C, 204, 208, 211, 214, 215, 216, 217
Gaussian, 37, 39, 45, 47, 158, 159, 160, 162, 193,
233
GETSEND.C, 152
GETWAV.C, 154
global variables, 93, 114, 118, 125
goto, 67, 68, 95, 96
H
hamming window, 188
Harris, 28, 52
higbpass filter, 191
Hilbert transform, 198, 199, 200
F
fast convolution, 134, 168, 170, 171, 172
fast filtering, 168
fast Fourier transform, 26, 28, 52, 160, 184
filter design, 18, 19, 22, 134, 136, 138, 140, 141,
145, 147, 184
filter functions, 221
filter order, 229, 234
filter specifications, 23, 24, 137
filter structures, 44, 45, 46
FILTER.C, 134, 158
FILIBR.H, 141, 150, 162, 171
finite impulse response (FIR), 17, 133, 140
F!Rfilter, 18,20,22,23,50, Ili, 113, 121, 128,
129, 134, 136, 138, 142, 144, 145, 147, 151,
160, 162, 165, 168, 171, 176, 198, 199, 221,
231, 233
fir_filter, 134, 135, 136, 151, 162, 167, 168, 199,
221,222,223,224
floating point, 203
flush, 154, 155, 156, 157, 180, 183, 226
fopen, 152, 153, 155
for loop, 54, 67, 72, 76, 87, 91, 94, 95, 96, 135
Fourier transform, !, 3, 4, 14, 15, 17, 18, 19, 25, 26,
28, 31, 44, 52, 160, 170, 184
free, 67, 78, 79, 80, 84, 93, 112, 125, 173
frequency domain, 7, 15, 16, 17, 18, 23, 24, 25, 30,
32, 44, 132, 168, 170, 176
frequency estimation, 198, 234
frequency response, 15, 17, 18, 19, 20, 21, 22, 23,
134, 138, 142, 149, 166, 176, 218, 220, 233
frequency tracking, 186, 233, 235
function cal!, 60, 86, 127, 128
function prototype, 73
M
macros, 74, 75, 76, 82, 85, 90, 120, 121, 128
magnitude, 23, 24, 48, 141, 149, 192, 235
rnaintainability, 92, 93
malloc, 78, 79, 80
matrices, 80, 81
rnatrix operations, 90, 111
mean, 37, 39, 40, 41, 42, 43, 44, 46, 48, 50, 51, 158,
159, 191, 193, 198
rnean squared error, 40, 41, 42, 48, 50
mean value, 40, 43, 193
memory allocation functions, 79
rnemory rnap, 113
memory mapped, 100, 124
MKGWN.C, 162
modeling, 21, 43, 46, 48, 186, 193, 194, 198
modulus operator, 60
moment, 39, 43
moving average (MA}, 44
MULAW.C, 201, 202
multiprocessor, 108, 130
music, 132, 178, 182, 186, 201, 202, 218, 225, 226,
228
N
K
Kaiser window, 18, 134, 137, 138, 141, 142, 143,
144, 165, 221, 222, 223
keyboard, 98, 138
keywords, 56, 66, 75, 76, 90, 91, 114
KSRFIR.C, 138
noise, 21, 28, 35, 42, 43, 44, 45, 46, 47, 50, 51, 98,
132, 145, 158, 160, 162, 163, 186, 187, 193,
198, 201, 218, 228, 229, 231, 233, 234, 236
nonlinear, !, 2, 32, 33, 164, 216
normal equation, 49
null pointer, 80
244
numerical C, 87, 90, 91, 113, 121, 124
Nyquist rate, 176, 187
o
operator precedence, 62
optimal filter. 46
OSC.C, 181, 233, 236
oscillators, 178
oversized function, 93
p
parameters, 1, 24, 43, 46, 51, 74, 75, 76, 121, 126,
138, 186. 188, 193
parametric, 186, 193, 198
pass-by-address, 87
periodic, 5, 8. 25, 28, 29, 32, 178, 183
periodogram. 186
phase response, 22, 23
physical input/output, 124
pitch-shifting, 220, 223
pointer operators, 77
pointers, 53, 56, 60, 69, 71, 72, 73, 77, 78, 80, 81,
82,84,86,88,90, 128, 135, 150
pole-zero plot, 149
poles, 51, 146, 147, 149, 150, 178, 193, 194, 195,
216
polled input/output, 124
polling, 125
polynomial interpolation, 163
post increment, 78
power spectral estimation, 186, 187, 188, 189, 191
power spectrum, 27, 28, 44, 125, 158, 163, 186, 189
precedence, 62
preprocessor directives, 74, 75
privacy, 71
probability, 2, 35, 36, 37, 39, 40, 41, 42, 43, 52, 185
program control, 53, 54, 63, 65, 68
programjumps, 67, 69
programming style, 92, 95, 97
promotion, 63
properties of the DFf, 26
PSHIFf.C, 220, 223, 224
Q
quantization, 3, 21, 32, 33, 40, 41, 42, 99, 145, 201,
207,218
R
radar, 46, 186, 190, 191, 192, 193, 198
lndex
RADPROC.C, 191
rand, 158, 159
random number generator, 158
random processes, 2, 35, 42, 43
random variables, 36, 37, 39, 42, 43, 52, 158, 159,
185
realloc, 78, 79, 80
rectangular window, 138
referencing Structures, 82
register, 71, 72, 107, 108, 111, 115, 118, 120, 121,
128, 129, 182
reliability, 92, 93
Remez exchange algorithm, 18, 134, 140
REMEZ.C, 19, 134, 138
RFAST.C, 171
RIFF. 151, 153, 155, 156
RTPSE.C, 188
245
lndex
srand, 194, 195
stack, 70, 71, 78, 80, 107, 118, 121
standard deviation, 55, 160, 162, 193, 233, 236
stationary, 42, 43, 190, 191
statistics, 43, 44, 49, 54
status, 68, 69, 107, 125
stopband, 22, 24, 134, 136, 137, 138, 141, 142, 147,
163, 165, 166
storage class, 71, 72, 82
stream, 100, 151
structured programming, 63, 95, 96
structures, 21, 44, 45, 46, 53, 54, 55, 63, 64, 66, 67,
68, 77, 82, 84, 85, 86, 90, 95, 96, 134
superposition, 4
switch, 64, 65, 67, 68, 95, 142, 143
synthesis, 98, 132, 178, 184, 225, 226
system design, 114, 124
u
unary minus, 60, 62
underscore, 56
unit citeie, 16, 17, 51, 178
unsigned, 57, 58, 59, 61, 63, 90, 153
upsampling, 160
userinterface, 116
V
variance, 37, 39, 40, 42, 44, 54, 55, 71, 73, 75,
158, 159
w
WAV file, 151, 153, 154, 155, 156, 157
waveform synthesis, 178
waveforms, 7, 178, 179, 186
WAVETAB.C, 179
white noise, 42, 45, 46, 160, 162, 193, 228, 231
Wiener filter, 46, 48, 49, 160
windowing, 27, 134, 164
windows, 28, 52
z
z-plane, 16, 149
z-transform, li, 12, 13, 14, 15, J6, 17, 21, 26
zero padding, 170
C ALGORITHMS
FOR REAL-TIME DSP
PAUL
M.
EMBREE
~"'~
http://www.prenhall.com
gopher to gopher.prenhall.com