Bio Signal Processing by Arnon Cohen
Bio Signal Processing by Arnon Cohen
Signal Processing
Volume I
Time and Frequency Domains
Analysis
Author
Bibliography: p.
includes index.
Contents: v. I, Time .and frequency domains analysis
- - v. 2. Compression and automatic-recognition.
1. Signal processing. 2. Biomedical engineering.
I. Title
R857.S47C64 1986 6i<)\28 85-9626
ISBN 0-8493-5933-3 (v. h
ISBN 0-8493-5934-! fv 2)
This book represents information obtained from authentic and highly regarded sources. Reprinted material is
quoted with permission, and sources are indicated. A wide variety of references are listed. Every reasonable effort
has been made to give reliable data and information, but the author and the publisher cannot assume responsibility
for the validity of all materials or for the consequences of their use.
Ail rights reserved. This book, or any parts thereof, may not be reproduced in an\ form without written consent
from the publisher.
Direct all inquiries to CRC Press, Inc.. 2000 Corporate Blvd.. N.W ., Boca Raton. Florida, 33431.
Biomedical signal processing is of prime importance not only to the physiological re
searcher but also to the clinician, the engineer, and the computer scientists who are required
to interpret the signal and to design systems and algorithms for its manipulations.
The biomedical signal is, first of all, a signal. As such, its processing and analysis are
covered by the numerous books and journals on general signal processing. Biomedical
signals, however, possess many special properties and unique problems that render the need
for special treatment.
Most of the material dealing with biomedical signal processing methods has been widely
scattered in various scientific, technological, and physiological journals and in conference
proceedings. Consequently, it is a rather difficult and time-consuming task, particularly to
a newcomer to this field, to extract the subject matter from the scattered information.
This book was not meant k, be or reference on general signal processing. It
is intended to provide material of interest to engineers and scientists who wish to apply
modern signal processing techniques to the analysis of biomedical signals. It is assumed the
reader is familiar with the fundamentals of signals and systems analysis as well as the
fundamentals of biological systems. Two chapters on basic digital and random signal proc
essing have been included. These serve only as a summary of the material required as
background for other material covered in the book.
The presentation of the material in the book follows the flow of events of the general
signal processing system. After the signal has been acquired, some manipulations are applied
in order to enhance the relevant information present in the signal. Simple, optimal, and
adaptive filtering are examples of such manipulations. The detection of wavelets is of
importance in biomedical signals; they can be detected from the enhanced signal by several
methods. The signal very often contains redundancies. When effective storing, transmission,
or automatic classification are required, these redundancies have to be extracted. The signal
is then subjected to data reduction algorithms that allow the effective representation in terms
of features. Methods for data reduction and features extraction are discussed. Finally, the
topic of automatic classification is dealt with, in both the decision theoretic and the syntactic
approaches.
The emphasis in this book has been placed on modern processing methods, some of which
have been only slightly applied to biomedical data. The material is organized such that a
method is presented and discussed, and examples of its application to biomedical signals
are given. Rapid developments in digital hardware and in signal processing algorithms open
new' possibilities for the applications of sophisticated signal processing methods to biome
dicine. Solutions that were cost prohibitive beforehand or impractical because of the lack
of appropriate algorithm become available. In such a dynamic environment, the biomedical
signal processing practitioner requires a book such as this one.
The author wishes to acknowledge the help received from many students and colleagues
during the preparation of this book.
A rnon Cohen
TH E A U T H O R
Volume I
Chapter 1
Introduction
I. General Measurement and Diagnostic System.............................................................. 1
II- Classification of Signals ........................ ......................................................................... 3
HI. Fundamentals of Signal P rocessing............... .................... ...........................................4
IV. Biomedical Signal Acquisition and Processing.............................................................5
V. The B o o k ............................................................................. ...................... ........................6
References.................................................................................. ............................. ............... . 7
Chapter 2
The Origin of the Bioelectric Signal
I. Introduction.........................................................................................................................9
n. The Nerve C e ll ............ ............................................ ........ ................................................ 9
A. Introduction............................................................. ............................................. 9
B. The Excitable M embrane....................................................... ........ ................. 10
C. Action Potential Initiation and Propagation....................... . ................. ........ 11
D. The Synapse............................................. ............. ..............................................11
III. The Muscle ............................................ ........................................ ................................12
% A. Muscle Structure............................... ................................................................. 12
B. Muscle Contraction.............. .............................................................................. 12
IV. Volume Conductors.......................................................................................................... 12
References.......................................................................................................................................13
Chapter 3
Random Processes
I. Introduction.......................................................................................................................15
II. Elements of Probability T heory............................................... ....................................15
A. Introduction.................................... ..................................................................... 15
B. Joint Probabilities................................................................................ ............... 16
C. Statistically Independent E v en ts..................................................................... 17
D. Random Variables................................................................................................17
E. Probability Distribution Functions....................................................................18
F. Probability Density F unctions........................................................................... 19
IH. Random Signals Characterization..................................................................................21
A. Random Processes...................................................... ........................................21
B. Statistical Averages (Expectations)............................ ....................................22
IV. Correlation A nalysis....................................................................................................... 23
A. The Correlation C oefficient.............................................................................. 23
B. The Correlation Function..... ..............................................................................25
C. E rgodicity...................... ......................................................................................26
V, The Gaussian Process.......................................................................................................26
A. The Central Limit Theorem ...............................................................................26
B. Multivariate Gaussian P rocess...........................................................................27
References............ ............................................................ .............................................................. 28
Chapter 4
Digital Signal Processing
I. Introduction....................................................................................................................... 29
II. Sam pling............................................................................................................................. 29
A. In troduction..........................................................................................................29
B. Uniform Sampling................................................................................................30
C. Nonuniform Sam pling.........................................................................................31
1. Zero, First, and Second Order Adaptive Sampling...........................32
2. Nonuniform Sampling with Run Length Encoding.......................... 34
III. Q u an tizatio n ...................................................................................................................... 36
A. In troduction..........................................................................................................36
B. Zero Memory Quantization................................................................................ 36
C. Analysis of Quantization Noise......................................................................... 39
D. Rough Q uantization.............................................................. ..............................40
IV. Discrete M e th o d s............................................................................................................. 42
A. The Z Transform ................................................................................................. 42
B. Difference E quations.......................................................................................... 43
References........................................................................................................................................ 44
Chapter 5
Finite Time Averaging
I. Introduction........................................................................................................................45
II. Finite Time Estimation of the Mean Value .................................................................45
A. The Continuous C ase.......................................................................................... 45
1. Short Observation Time............................................. .........................47
2. Long Observation Time......................................................................... 48
B. The Discrete C ase................................................................................................51
III. Estimation of the Variance and Correlation.................................................................53
A. Variance Estimation — The Continuous C ase............................................... 53
B. Variance Estimation — The Disci ete C a se .................................................... 54'
C. Correlation Estimation.........................................................................................56
IV. Synchronous Averaging (CAT-Computed Averaged Transients).............................. 56
A. Introduction.............................................................................. ......................... 56
B. Statistically Independent R esponses.................................................................58
C. Totally Dependent R esponses........................................................................... 59
D. The General C a s e ................................................................................................60
E. Records Alignment, Estimation of L atencies.................................................61
References........................................................................................................................................ 64
Chapter 6
Frequency Domain Analysis
I. Introduction........................................................................................................................ 65
A. Frequency Domain Representation .................................................................. 65
B. Some Properties of the Fourier Transform ...................................................... 65
1. The Convolution T heorem .................................................................... 66
2. Parseval's T heorem ..................................................... .......................... 66
3. Fourier Transform of Periodic S ig n als............................................... 67
C. Discrete and Fast Fourier Transforms (DFT. F F T ).......................................68
II. Spectral Analysis ............................................................................................................. 71
A. The Pow'er Spectral Density F unction............................................................. 71
B. Cross-Spectra! Density and Coherence Functions..........................................72
m. Linear F iltering.................................................................................................................73
A. Introduction.......................................................................................................... 73
B. Digital Filters........................................................................................................ 74
C. The Wiener F ilte r ................................................................................................74
IV. Cepstral Analysis and Homomorphic Filtering........................................................... 76
A. Introduction.......................................................................................................... 76
B. The C e p stra........................................................... ...............................................76
C. Homomorphic Filtering.......................................................................................77
References........................................................................................................................................80
Chapter 7
Time Series Analysis-Linear Prediction
I. Introduction........................................................................................................................ 81
II. Autoregressive (AR) M odels.................................................. ........................................85
A. Introduction.......................................................................................................... 85
■p . Estimation of AR Parameters — Least Squares M ethod..............................85
III. ^Moving Average (MA) M o d els............................... .....................................................89
A. Autocorrelation Function of MA Process........................................................89
B. Iterative Estimate of the MA Param eters........................................................89
IV. Mixed Autoregressive Moving Average (ARMA) M o d e ls...................................... 90
A. Introduction.......................................................................................................... 90
B. Parameter Estimation of ARMA Models — Direct M eth o d .......................90
C. Parameter Estimation of ARMA Models — Maximum Likelihood Method
93
V. Process Order E stim ation................................................................................................95
A. Introduction.......................................................................................................... 95
B. Residuals F latness...........: ................................................................................ 95
C. Final Prediction Error (FPE)......................................................... .................... 96
D. Akaike Information Theoretic Criterion (AIC)............................................... 97
E. Ill Conditioning of Correlation M atrix............................................................. 98
VI. Lattice R epresentation..................................................................................................... 98
VII. Nonstationary Processes ..................................................................................................99
A. Trend Nonstationarity — ARIM A.................................................................... 99
B. Seasonal Processes .............................................................................................101
VIII. Adaptive S egm entation..................................................................................................101
A. Introduction........................................................................................................101
B. The Autocorrelation Measure (ACM) M ethod..............................................102
C. Spectral Error Measure (SEM) Method..........................................................103
D. Other Segmentation M ethods........................................................................... 105
References....... ......................................... .................................................................................... 106
Chapter 8
Spectral Estimation
I. Introduction....... ................................................................. ............................................. 109
II. Methods Based on the Fourier Transform.................................................................. 110
A. Introduction........................................................................................................ 110
B. The Blackman-Tukey M ethod........................................................................ I l l
C. The Periodogram ...................................... ......................................................... 112
1. Introduction............................................................................................ 112
2. The Expected Value of the Periodogram.......................................... 114
3. Variance of the Periodogram............................................................... 116
4. Weighted Overlapped Segment Averaging (W O S A )..................... 117
5. Smoothing the Periodogram .................................................................119
III. M aximum Entropy Method (MEM) and the AR Method........................................ 122
IV. The Moving Average (MA) M ethod........................................................................... 125
V. Autoregressive Moving Average (ARMA) M e th o d s............................................... 126
A. The General C a s e ............................................................................................ .126
B. Pisarenko’s Harmonic Decomposition (PHD)............................................... 127
C. Prony’s M e th o d ................................................................................................. 130
VI. M aximum Likelihood Method (MLM) — Capon’s Spectral Estim ation.............. 133
VII. Discussion and Comparison of Several M ethods...................................................... 134
References...................................................................................................................................... 137
Chapter 9
r\uaptive Filtering
I. Introduction...................................................................................................................... 141
II. General Structure of Adaptive Filters......................................................................... 142
A. Introduction........................................................................................................ 142
B. Adaptive System Parameter Identification.................................................... 142
C. Adaptive Signal Estimation...............................................................................142
D. Adaptive Signal Correction...............................................................................143
III. Least Mean Squares (LMS) Adaptive Filter...............................................................143
A. Introduction........................................................................................................ 143
B. Adaptive Linear C om biner..........................................: ...................................144
C. The LMS Adaptive Algorithm......................................................................... 145
D. The LMS Adaptive Filter..................................................................................147
IV. Adaptive Noise Cancelling............................................................................................ 147
A. Introduction........................................................................................................ 147
B. Noise Canceller with Reference In p u t............................................................148
C. Noise Canceller without Reference Input...................................................... 153
D. Adaptive Line Enhancer (ALE) .............................. .......................................154
V. Improved Adaptive Filtering.........................................................................................154
A. Multichannel Adaptive Signal Enhancement................................................. 154
B. Time-Sequenced Adaptive F iltering...............................................................156
References...................................................................................................................................... 158
In d e x ............................................................................................................................................... 161
TA BLE O F C O N T E N T S
Volume II
Chapter 1
W avelet Detection
I. Introduction........................................................................................................................ 1
II. Detection by Structural Features.................................................................................... 2
A. Simple Structural Algorithms............................................................................ 2
B. C ontour'L im iting..................................................................................................5
III. Matched F iltering.............................................................................................................. 6
IV. Adaptive Wavelet D etection........................................................................................... 9
A. Introduction...........................................................................................................9
B. Template A daptation......................................................................................... 10
C. Tracking a Slowly Changing W avelet..... ....................................................... 12
D. Correction of Initial T em plate.......................................... .............................. 12
V. Detection of Overlapping Wavelets . , ...................................................... ....................14
A. Statement of the Problem................. ....................................... ........................14
B. Initial Detection and Composite Hypothesis Form ulation............................ 15
C. Error Criterion and Minimization............... .................................................... .1 6
References.......... ............................................................................................................................17
Chapter 2
Point Processes
I. Introduction............................................. .........................................................................19
II. Statistical Prelim inaries...................................................................., ............................ 20
III. Spectral A nalysis..............................................................................................................24
A. Introduction.................... ......................................................... ..,2 4
B. Interevent Intervals Spectral A nalysis............................................................ 24
C. Counts Spectral A nalysis...................... ......................................................... 25
IV. Some Commonly Used M odels..................................................................................... 26
A. Introduction.......................................................................................................... 26
B. Renewal Processes....... ............................................................... ...................... 26
1. Serial Correlogram.................................................................................27
2. Flatness of Spectrum........................ ............................................. ........ 27
3. A Nonparametric Trend Test.............................................. .................28
C. Poisson Processes................................................................................................28
D. Other Distributions..... ........................................................ .............................31
1. The Weibull Distribution ......................................................................31
2. The Erland (Gamma) D istribution.......................................... ............32
3. Exponential Autoregressive Moving Average (E A R M A ).............. 32
4. Semi-Markov Processes......................................................................... 32
V. Multivariate Point Processes............................................................................., ............33
A. Introduction.......................................................................................................... 33
B. Characterization of Multivariate Point P rocesses.......................................... 33
C. Marked Processes....................................................................... ........................ 35
References...................................................................................... .............................................. 35
Chapter 3
Signal Classification and Recognition
I. In tro d u ction......................................................................................................................37
II. Statistical Signal Classification.......................................................................................39
A. Introduction.......................... ................................................................................39
B. Bayes Decision Theory and Classification................................. . . . .............39
C. k-Nearest Neighbor (k-NN) Classification...................................................... 50
III. Linear Discriminant Functions.................................................................. .................... 53
A. Introduction......................................................................................................... 53
B. Generalized Linear Discriminant Functions............................ ........................ 55
C. Minimum Squared Error M ethod.................................... ................................ 56
D. Minimum Distance Classifiers..........................................................................58
E. Entropy Criteria M ethods....................................................................................60
1. Introduction.................................... : ........................................................60
2. Minimization of Entropy........................................................................60
3. Maximization of E n tro p y ...................................................................... 62
IV. Fisher’s Linear Discriminant............................................................ ............................. 63
V. Karhunen-Loeve Expansions (KLE).............................................................................. 66
A. In troduction.......................................................................................................... 66
b. Karhunen-Loeve Transformation (KLT) — Principal Components Analysis
(P C A ).....................................................................................................................67
C. Singular Value Decomposition (SV D )............................................................. 69
VI. Direct Feature Selection and O rdering......................................................................... 75
A. In tro d u ctio n .......................................................................................................... 75
B. The D ivergence................................................................................................... 76
C. Dynamic Programming M ethods......................................................................77
VII. Time W a rp in g ...................................................................................................................79
References........................................................................................................................................ 84
Chapter 4
Syntactic Methods
I. Introduction........................................................................................................................ 87
II. Basic Definitions of Formal Languages....................................................................... 89
III. Syntactic Recognizers.......................................................................................................92
A. Introduction.......................................................................................................... 92
B. Finite State A utom ata.........................................................................................92
C. Context-Free Push-Down Automata (PDA).................................................... 95
D. Simple Syntax-Directed T ranslation...............................................................100
E. P a rsin g ................................................................................................................. 100
IV. Stochastic Languages and Syntax A nalysis...............................................................101
A. In troduction........................................................................................................ 101
B. Stochastic R ecognizers..................................................................................... 102
V. Grammatical Inference...................................................................................................104
VI. E x am p le s..........................................................................................................................104
A. Syntactic Analysis of Carotid Blood P ressure............................................. 104
B. Syntactic Analysis of E C G .............................................................................. 106
C. Syntactic Analysis of E E G .............................................................................. 110
References.......................................................................................................................................I l l
Appendix A
Characteristics of Some Dynamic Biomedical Signals
I. Introduction...................................................................................................................... 113
U. Bioelectric S ig n als.......................................................................................................... 113
A. Action P otential................................................................................................. 113
B. Electroneurogram (EN G )..................................................................................113
C. Electroretinogram (ER G )..................................................................................113
D. Electro-Oculogram (EOG)................................................................................114
E. Electroencephalogram (EEG)...........................................................................114
F. Evoked Potentials (E P )...................................................................... ...........117
G. Electromyography (EM G )................................................................................ 119
H. Electrocardiography (ECG, E K G ).................................................................. 121
1. The S ig n al..............................................................................................121
2. High-Frequency Electrocardiography.................................................124
3. Fetal Electrocardiography (FE C G ).................................................... 124
4. His Bundle Electrocardiography (H B E )........................................... 124
5. Vector Electrocardiography (V C G ).................................................. 124
I. Electrogastrography (EG G ).............................................................................. 124
J. Galvanic Skin Reflex (GSR). Electrodermal Response (EDR)..................125
III. Impedance ...................................... .................................................................................125
A. Bioim pedance.....................................................................................................125
B. Impedance Plethysmography ........................................................................... 126
C. Rheoencephalography (REG )...........................................................................126
D. Impedance Pneumography................................................................................126
E. Impedance Oculography (Z O G )......................................................................126
F. Electroglottography............................................................................................ 126
TV. Acoustical S ig n als...................... ...................................................................................126
A. Phonocardiography..................................................................... ...................... 126
1. The First Heart S o u n d ......................................................... .............. 126
2. The Second Heart S o u n d ................................................................... 127
3. The Third Heart S o u n d ....................................................................... 127
4. The Fourth Heart S o u n d ......................................................................127
5. Abnormalities of the Heart S o u n d .............. ..................................... 127
B. A u s c u lta tio n .................................................................................................... 127
C. V oice................................................................................................................... 128
D. Korotkoff S o u n d s........................ ..................................................................... 129
V. Mechanical S ig n als.............. . ........................................................................................ 130
A. Pressure S ignals..... ...........................................................................................130
B. Apexcardiography (A C G )............................................................................... 130
C. Pneumotachography...................... . .................................................................130
D. Dye and Thermal Dilution................................................................................ 130
E. Fetal M ovem ents..................... ...... .......................... ....................................... 131
V I. Biomagnetic S ignals......................................................................................................131
A. Magnetoencephalography (M EG)........................ ...........................................131
B. Magnetocardiography (M C G )......................................................................... 131
C. Magnetopneumography (M PG )........................................................................131
VII. Biochemical S ig n a ls.......................................................................................................131
VIII. Two-Dimensional Signals.............................................................................................. 132
References...................................................................................................................................... 134
Appendix B
Data Lag Windows
I. Introduction.......................................................................................................................139
II. Some Classical W indow s.............................................................................................. 139
A. Introduction........................................................................... ............................. 139
B. Rectangular (Dirichlet) W indow ...................................................................... 140
C. Triangle (Bartlet) Window .................................... ......................................... 140
D. Cosinea W in d o w s.............................................................................................. 141
E. Hamming W in d o w .................................................... . . . . ................................. 143
F. Dolph-Chebyshev W indow ....... ...................................................................... 145
References.............. ........................................................................................................................ 151
Appendix C
Computer Programs
I. Introduction..................................... ........................................ ...................................... 153
II. Main P ro g ram s.............. .............................. ................................................................. 154
• NUSAMP (Nonuniform Sam pling).................................................................154
• SEGMNT (Adaptive Sesm entation)............................................................... 158
• PERSPT (Periodosram Power Spectral Density E stim ation)..................... 162
• WOSA (WOSA Power Spectral Density E stim ation)................................. 162
• MEMSPT (Maximum Entropy [MEMJ Power Spectral
Density E stim ation)............................................................................................165
• NOICAN (Adaptive Noise Cancelling).......................... ............................... 167
• CONLIM (Wavelet Detection by the Contour Limiting! M ethod)............ 169
• COMPRS (Reduction o f Signal Dimensionality by Thijee Methods:
Karhunen-Loeve |KL], Entropy [ENT], and Fisher Discriminant (F I]... 171
III. S u b ro u tin es.......................................................................................................................174
• LMS (Adaptive Linear Combiner, W idow’s Algorithm)............................ 174
• NACOR (Normalized Autocorrelation Sequence)........................................ 175
• DLPC (LPC, PARCOR, and Prediction Error of AR Model of Order
P) .......................................................................................................................... 176
• DLPC 20 (LPC, PARCOR, and Prediction Error of All AR Models of
Order 2 to 2 0 )..................................................................................................... 177
• FTOl A (Fast Fourier Transform IF F T j)........................................................178
• XTERM (Maximum and Minimum Values of a Vector) .......................... 180
• ADD (Addition and Subtraction of M atrices)............................................... 180
• MUL (Matrix Multiplication)........................................................................... 181
• MEAN (Mean of a Set of Vectors).................................................................181
• COVA (Covariance Matrix of a Cluster of Fectors)................................... 182
• INVER (Inversion of a Real Symetric M a trix )............................................i 83
• SYMINV (Inversion of a Rea! Symetric Matrix. [Original Matrix
D estrosedi).......................................................................................................... 183
• RFILE (Read Data Vector From Unformatted F ile )................................... 185
• W’FILE (Write Data Vector on Unformatted F ile)...................................... 186
• RFILHM (Read Data Matrix From Unformatted F ile)................................187
• WITLEM (Write Data Matrix on Unformatted File)...................................188
Index................................................................................................................................................189
Volume I: Time and Frequency Domains Analysis 1
Chapter 1
IN TR O D U C T IO N
I. G E N E R A L M E A S U R E M E N T A N D D IA G N O ST IC SY STE M
This book is concerned with the analysis and processing of biomedical signals. It is
pertinent to discuss first what, in general, is a signal, what is a biomedical signal, and why
process it. A discussion on these general topics is presented in this chapter.
A signal is a means to convey information. It is sometimes generated directly by the
original information source. We may then want to learn about the structure or functioning
of the source from the extracted information (the signal). The signal available may not yield
directly the required information. We then apply some opeiations on the signal in order to
enhance the information needed. This may be the case, for example, when the visual
processing mechanism of the brain is of interest. We may present the eye with a Hash and
monitor the activity of the brain by means of electrodes located on the scalp. We shall find
that the required information that is related to the visual activity of the brain is “ buried"
in the signal which is mainly due to other activities of the brain. Special processing procedures
must be applied to the signal, so as to enhance the relevant information. We may want to
transmit the signal from point of acquisition to a remote location for monitoring or processing.
This may be the case, for example, in the intensive care unit when information on patients
is required at the central monitoring station, or when the information concerning a patient
at home is required in the hospital or physicians office. In these cases, the processing of
the signal i> required in order to match it with the requirement of the transmission channel.
In other cases, the information is to be stored for later use. Effective storing is needed such
that it will require minimum amount of storing space (computer memory, magnetic tapes)
and can be later reconstructed at will.
The general measurement and diagnostic system is schematically shown in Figure 1.
Usually this system consists of a transducer that is coupled to the information source and
extracts the required information. The original information may be in a form that is not
suitable for processing, storing, or transmitting (pressure, temperature). The transducer
converts the information into (most often) an electrical signal. With current technology, this
type of signal is most convenient for the above tasks. The processing of the signal is often
required for diagnosis as well. In this case, the processing has to classify the signal into
one of many given classes which may be the normal and various abnormal classes. After
classification, corrective measures that change the source may be taken.
Several books are available on the topic of biomedical instrumentation and measure
m ents.1-5 They deal with the various transducers and the hardware associated with the
acquisition and basic preprocessing. In this book, the various topics regarding processing
are discussed. The basic preprocessing is sometimes called “ signal conditioning" while
later processing is sometimes known as “ signal manipulation and evaluation". Since no
distinct definitions exist we prefer the general term “ processing” .
The topics discussed in this book are depicted in broken lines in Figure I and in more
detail in Figure 2.
The first step in the processing is usually that of segmentation. The signal may drastically
change its properties during time. We then observe and process the signal only in a finite
time window. The length of the time window depends on the signal source and goal of
processing. We may use a single “ window" predetermined length as we do, for
example, when electrocardiographic monitoring is performed or we may require some scheme
for automatically dividing the signal into varying length segments as is often done in
electroencephalography.
2 Biomedical Signal Processing
[FEATURE
lEXCTRACT. f
RECONSTRUCTED
RECONST CT.
A variety o f methods are available for the enhancement of the relevant information in the
signal. The signal is either corrupted with additive and multiplicative noise or the information
required constitutes only a part o f the signal such that irrelevant portions are considered
noise. We then apply noise-attenuating-and-cancelling techniques or signal enhancement
methods in order to increase the signal-to-noise ratio. To do this, some a priori knowledge
on the signal and the noise is required. The more a priori knowledge that is available, the
better the processing. Enhancement methods that are optimal in some sense are discussed
as well as adaptive methods that automatically adjust themselves to varying conditions.
Very often the relevant information in the signal posseses some waveshape which is only
generally known. A good example is the electrocardiogram where the general shape of the
PQRST complex is known. It is required to extract the exact shape of the wavelet present
in the signal.
Not all the information conveyed by the signal is of interest. The signal may contain
redundancies. When effective storing and transmission are requiicu, ui when the signal is
to be automatically classified, these redundancies have to be eliminated. The signal can be
represented by a set of features that contain the required information. These features are
Volume I: Time and Frequency Domains Analysis 3
SIGNAL
[DETERMINISTIC] 1 RANDOM
then used for storage, transmission, and classification. Reconstruction of the signal from its
features is often needed. The types of features used and their number dictate, on one hand,
the data reduction rate for efficient storing and transmitting and, on the other hand, the error
of reconstruction.
The various functions depicted in the blocks of Figure 2 are discussed in the forthcoming
chapters. The processing system of Figure 2 is a general system for signal processing
independent of application. Geophysical signals,6 for example, as well as biomedical signals
face the same general steps of processing. In this book, however, the general topics are
discussed with emphasis on biomedical signal processing. Additional topics have been
introduced, which are specifically oriented to biomedical signals.
II. C L A S S IF IC A T IO N O F SIG N A LS
Signals, extracted from biological and physical systems, may possess various properties
and characteristics. It is important to identify the general characteristics of the signal, so
that the appropriate processing tools be applied. The various types of signals are introduced
in this section.
Signals are classified into two main groups: deterministic and random signals (see Figure
3). Deterministic signals are those that can be described by explicit mathematical relation
ships. Random signals cannot be exactly expressed. It can be described only in terms of
probabilities and statistical averages. A philosophical question may be raised as to whether
a random or deterministic signal exists. In reality, we are not able to find a signal that can
be accurately predicted by means of an exact mathematical formulation. Even a sine wave
from a signal generator is not deterministic in that sense, since no one can tell when a power
failure will cause the sine wave to completely disappear or some generator malfunction will
cause its shape to change. On the other hand, it can be argued that random signals, in reality,
do not exist. Any signal is the result of some physical or chemical phenomena and is governed
by some laws. If these laws were completely known to us, we could have exactly expressed
the signal and predict its value. Since our interest here is to apply processing methods for
the analysis o f the signals, we do not have to enter into these philosophical arguments. It
is the goals and constraints of the problem at hand that will dictate our decision to consider
a given signal as random or deterministic.
When analyzing the electrocardiographic (ECG) signal, for example, we may be interested
in the general characteristics of the QRS complex and thus consider the signal deterministic,
or we may be interested in the changes of the R-R interval, thereby considering it a random
signal.
Deterministic signals are divided into two subgroups: periodic and nonperiodic signals.
4 Biomedical Signal Processing
Periodic signals are signals for which x(t) = x(t + T), where T is the period. Periodic
signals are convenient since one period is sufficient for complete description. In the frequency
domain, the description is given by means:of the Fourier series, where only the fundamental
frequency and its harmonics take part. Nor periodic signals consist of two classes. “ Alm ost"
periodic signals are those that are not periodic in the mathematical sense but have discrete
description in the frequency domain. This frequency description differs from the periodic
one in that the various frequencies participating are not harmonics of some fundamental
frequency. A combination of several unrelated periodic signals creates an “ almost” periodic
signal.
A transient signal is a deterministic signal not having the properties discussed previously.
Random signals are much more difficult to deal with. A random signal is a sample function
of a random process. One sample function of a random process differs from another in their
time description. They possess, however, the same statistical properties. The complete
(infinite) set of sample functions produced by the random process is called the ensamble.
The description of the random signal is given by the joint probability density function.
A stationary process is a process the statistical properties of which are not a function of
time. For such a process we can calculate, for example, the expectation by averaging the
values, x(t), over all the ensamble at any time, t. An important class of random signals is
the class of ergodic signals. For these signals, statistical averaging over the ensamble equals
time averaging over the time axis of any one sample function.
We shall see that stationarity and ergodicity are properties which allow the use of practical
processing methods. A process which is nonstationary (and thus nonergotic) is very difficult
to process. Very often we are forced to assume the process is ergodic even though it is
known a priori that the assumption is false.
When processing the electroencephalographic (EEG) signal, for example, we do not have
at our disposal the complete ensamble. We have only one sample function. We are thus
forced to assume ergodicity and estimate the required statistical properties from time (rather
than ensamble) averages. Since the tools for processing nonstationary signals are not very
effective, we are often dividing a nonstationary signal into segments, each assumed to be
stationary. The length of the segments depends on the properties of the nonstationarities. In
speech signals, segments are chosen with durations of about 10 msec w hile in EEG analysis,
segments may be of the order of a few seconds.
Another basis for signal classification, w hich has great significance from processing point
o f view, is that of continuous vs. discrete signals. Continuous time signals are signals which,
in general, are defined at any point in time. The tools that are applied to the processing of
these signals are the Fourier and Laplace transforms and other “ analog*' methods. In terms
o f hardware, these signals are treated by analog systems (filters, amplifiers, com pilers).
Discrete signals are signals that are defined only at given points in time. Usually these
signals are also “ sampled” in amplitude. We usually think of discrete signals as the result
o f continuous signals that had been time sampled and amplitude quantized, though there
may be signals which are discrete by nature. These signals are processed by means of discrete
signal processing methods such as the Z transform and the Discrete Fourier Transform
(DFT). In terms of hardware, these signals are treated by means of digital systems including
digital computers. The advances in digital technology in recent years have created a situation
in which most of the signal processing activity is in discrete signals."!i
III. F U N D A M E N T A L S O F SIG N A L P R O C E S S IN G
A very basic tool for signal t . l.ig is that of Titering. Its use is best applied in the
frequency rather than in the time domain.
The Fourier transform12 is an operator that transfers a signal, x(t), in the time domain
into the frequency domain. In the frequency domain, the signal is represented in terms of
Volume I: Time and Frequency Domains Analysis 5
its amplitude and phase as a function of frequency. A more practical transformation is into
the complex frequency, S, plane. The Laplace transform is a transformation that can transfer
a signal x(t) into the complex frequency plane. The spectral properties of the signal can be
shaped by means o f filters which are designed to attenuate or completely cut off portions
of the signals’ frequencies.
The same tools exist for discrete signals in which the complex Z domain is defined.
Discrete filters can be designed to shape the spectrum of the discrete signal as required.
The fundamentals o f continuous and discrete signal processing as well as filter design
theory are not discussed in this book. It is assumed that the reader is at least familiar with
the basics of signal filtering.
The frequency filtering is a powerful tool for random as well as deterministic signals.
When processing random signals, we apply the Fourier transform to the autocorrelation
function rather then the sample function itself. We then deal with the power spectral density
functic::. T1.-.. . design techniques are applied here.
IV . B IO M E D IC A L S IG N A L A C Q U ISIT IO N A N D P R O C E S S IN G
When performing a measurement from a living biological substance, several unique prob
lems arise that deserve special discussion, in addition to the one presented in Section I of
this chapter.
The living biological system is a very complex system governed by biochemical, physical,
and chemical laws not well understood as of yet. In particular, many aspects of the complex
hierarchical control, the genetic control, the neural information transfer and processing, and
other systems are still under extensive investigation. Very often we use a priori information
concerning the system, generating the signal of interest, in order to help us in the analysis
and processing procedures. When the underlying mechanism is not well understood, the
processing may become less effective.
The complexity o f the biological system often introduces difficulties in the measurement
and processing procedures. Unlike physical systems, the biological system (most often)
cannot be uncoupled in such a way that subsystems can be monitored and investigated
individually. Because of the complex hierarchical control linkages among subsystems and
due to many feedback paths not always well understood, the biological system under in
vestigation must remain in its natural environment during observation. The signals produced
by the system are thus influenced directly by the activity of the surrounding systems. The
signal is also inherently contaminated by noise produced by the neighboring systems.
When m onitoring, for example, the activity of the visual processing mechanism of the
brain by means of the visual evoked potential, we cannot isolate the visual system and
perform the clinical test under controlled conditions. The eye, its position, and its sensitivity
are controlled by voluntary and nonvoluntary actions of the brain. The visual processing of
the brain itself is dependent on many other surrounding activities, most of them beyond our
control. The result is that very often we do not know exactly under what conditions the
signal was taken and how' to interpret the analysis of the signal — whether to attribute it to
some significant phenomenon in the underlying mechanism (abnormality in system function)
or to some change in measurement conditions.
The large variations that exist in biomedical signals force us to rely heavily on statistical
methods. These variations exist in signals acquired from the same individual, and of course
among populations. Thus, the accuracies and confidence limits that come out of our biomed
ical signal processing are usually not very high, at least in terms used in other engineering
disciplines.
Biomedical signals are usually extracted from living organisms and, in most applications,
from human beings. The measurement system must be designed so as not to damage the
6 Biomedical Signal Processing
system and not to cause pain whenever possible. Noninvasive techniques are always pref
erable. This means that very often we cannot get the information required directly and we
have to infer it from signals that are noninvasively available. Fetal heart monitoring may
serve as an example. Rather than apply electrodes directly to the fetal’s skin, a procedure
requiring invasive methods, we place the electrodes on the m other’s abdomen. The signal
thus acquired is heavily contaminated with the mother’s strong ECG and other muscle
activities. The inferring of the fetal ECG from the signal requires a lot of processing efforts.
In many important biomedical applications, the signals of interest are two dim ensional.13
Conventional X-ray analysis, computer tomography (CT), nuclear magnetic resonance (NMR)
imaging, and ultrasonic imaging are examples. In principle, the processing techniques of
one- and two-dimensional signals are similar. In practice, however, there are major differ
ences. Sophisticated algorithms for two dimensional signal processing have been developed.
These are not discussed here.
Biomedical signals are mechanical, chemical, or electromagnetic in nature. With current
technology, almost all transducers used provide electrical output so that the signal to be
processed is presented as an electrical signal. One very important group of biomedical signals
are signals which are electromagnetical in nature — the bioelectric signals. Chapter 2 is
dedicated to the origin and characteristics of these signals.
V. T H E BO O K
The material covered in this book follows approximately the main blocks depicted in
Figure 2. As mentioned previously, Chapter 2 discusses the origin and characteristics of the
bioelectric signal. It is not intended to replace or compete with the excellent books available
on the topic. It was felt, however, that because of the importance of these signals, one
chapter should be devoted and addressed especially to the engineer or computer scientist
who has not been extensively exposed to the biological basis of the signals. Chapters 3 and
4 discuss basic principles of random and digital signal processing. The experienced engineer,
and especially the one whose field is measurements and signal processing, will be familiar
with the material presented in these chapters. Others, whose expertise lies elsewhere, will
find Chapters 3 and 4 a basic reference for two very important topics. The problem of finite
time observation records presents itself in almost every signal processing problem. Either
the time available for measurement is finite or the signal is nonstationary and short segments
o f it are used in order to apply stationary processing methods. In both cases, we are faced
with the problem of having a finite time observation record (a windowed record) from which
we have to estimate the characteristics of the process. Chapter 5 presents and discusses the
problems of finite time estimations in continuous and digital signals.
Frequency domain analysis techniques are presented in Chapter 6. This chapter also
includes a discussion on cepstral methods and homomorphic filtering which are probably
less familiar even to the experienced engineer.
Time series analysis has been originated in the statistical literature and has become ar
important analysis approach in modem signal processing. The basic idea here is to consider
the signal an output of a linear system, whose (inaccessible) input is white in spectrum.
Rather than deal with the signal itself, we want to deal with the system that generates it. It
turns out that the representation of the signal by means of the coefficients of the system’s
differential (or difference) equation, or the system’s poles and zeroes is a very efficient
representation. The features achieved by time series analysis provide effective data reduction
for signal storage and transmission as well as signal recognition. Chapter 7 presents the
theory and practical algorithms for ' : Cementation of time series analysis.
A related topic, with great importance in random signal processing, is that of spectral
estimation. It is presented in Chapter 8, with algorithms that arc designed for various types
of signals.
Volume I: Time and Frequency Domains Analysis 7
One o f the problems associated with biomedical signal processing is the complexity of
the signal and the lack of a priori information that can be used to simplify the processing
methods. Adaptive filtering methods have thus found wide applications in biomedical signal
processing. Here, one requires a minimum a priori information on the signal and noise
involved. The adaptive filter “ learns” from the running data the required information and
automatically adjusts its parameters in some optimal fashion. Chapter 9 deals with adaptive
filtering. The discussion is primarily based on W idrow’s work, who has also suggested
many biomedical applications.
In many biomedical applications, the signal contains a wavelet which is of interest. This
may be the case in electrocardiographic analysis where the QRS complex is Nought, or in
some cases of REG analysis where K-complexes or spindles are of interest. Methods for
wavelet detection are discussed in Chapter 1, Volume II.
Chapter 2 (Volume II) deals with point processes analysis. Point processes are processes
which ......:K • ' uy »he time of appearance of events. The parameter of interest is
appearance time and not the shape of the event. The major application of thi> theory is in
the analysis of neurosignals. Other applications such as R-R interval analysis in ECG or
pitch analysis in speech arc also presented.
Chapters 3 and 4 (Volume 11) deal with the important problem of automatic signal rec
ognition and classification. The more known approach, the theoretic decision approach, is
presented in Chapter 3 (Volume II) while the more novel approach, that o f syntactic analysis,
is presented in Chapter 4 (Volume II).
A discussion on some of the many biomedical signals appears in the appendices. The aim
of the appendices is not to give an exhaustive survey of all signals used in biomedicine, but
rather to list a few representative ones with their main characteristics.
Throughout the book, examples and references to various biomedical applications have
been provided. The emphasis, however, is on the presentation of various methods tsome of
them novel w hich have yet to be massively applied to biomedicine). The framew ork of the
presentation is that depicted in ligure 2 for a general signal processing system.
REFERENCES
1. DeM arre, I). A. and M ichales, D., Bioelectronic Measurements, Prentice-Hall. Englewood Cliffs. N.J.,
19X3.
2. G eddes, L. A. and Baker, L. R., Principles o f Applied Biomedical Instrumentation, John Wiley & Sons.
New York. 1968.
3. Strong. P ., Biophysical Measurements, Tekironix. Beaverton, Or.. 1970.
4. W ebster, J. G ., Kd., Medical Instrumentation Application and Design, Houghton Mifflin. Boston, 1978.
5. C rom w ell, L ., W eibell, F. J ., Pfeiffer, E. A ., and Usselmann, L. B., Biomedical ln.\:rumentatum and
Measurements, Prentice-Hall, Englewood Cliffs. N.J., 1973.
6. Robinson, E. A. and Treitel, S.. Geophysical Signal Analysis, Prentice-Hall. Englewood Cliffs, N.J..
1980.
7. Beaucham p, K. G. and Yuen, C. K ., Digital Methods fo r Signal Analysis, George Alien and Unwin,
Ltd.. London, 1979.
8. G old, B. and Rader, C . M ., Digital Processing o f Signals, McGraw-Hill, New York. I°6^
9. O ppenheim , A. V ., Ed., Application o f Digital Signal Processing, Prentice-Hall. Englewood Cliffs. N.J..
1978.
10. Chen, C. T ., One-Dimensional Digital Signal Processing. Marcel Dekker, New- York. N “9.
11. T retter, S. A ,, Introduction to Discrete-Time Signal Processing, John Wiley & Sons, New York, 1976.
12. Bracewell, R. N ., The Fourier Transform and its Applications, McGraw-Hill. New York. 1978.
13. Reichenberger, H. and Pfeiler, M ., Objectives and approaches in biomedical signal processing, in Signal
Processing 11: Theories and Applications. Shussler. H. W .. Ed., Elsevier/North Holland, Amsterdam, 1983.
Volume J: Time and Frequency Domains Analysis 9
Chapter 2
T H E O R IG IN O F THE B IO E L E C T R IC S IG N A L
I. IN TR O D U C T IO N
The most important information processing mechanism in the living biological system is
the neural netw ork.' The biological s\stem has several means for information transfer.
Probably the most important is the neural information transfer. Neurophysiology, the study
of neural function, has been the key field for understanding internal communication and
control in the biological system. Basic and applied neurophysiological research heavily relies
on our ability to measure chemical and electrochemical activities taking place in the single
cell, or collectively by groups of cells.
Many functions of neural and muscular cells are chemical in nature. These functions,
however, produce changes in the electric field which can be monitored by electrodes. The
so-called bioelectric potentials help the neurophysiologist study cell function. Direct meas
urements of the chemical phenomena, e.g.. ion concentration changes,2 can be performed
by means of special transducers (ion selective electrodes, for example). However, these
measurements are more difficult to perform.
The source of the bioelectric signal is the single neural or muscular cell. These, however,
do not function alone but in large group>. The accumulated effects of all active cells in the
vicinity produce an electric field which propagates in the volume conductor3 consisting of
the various tissues of the body. The activity of a muscle, or some neural network, can thus
indirectly be measured by means of electrodes placed, say, on the skin. The acquisition uf
this type of information is easy; clectr^des can be conveniently placed on the skin. The
information, however, is difficult to analyze. It is the result of all neural and muscular
activity in unknown locations transmitted through an inhomogeneous medium. In spite of
these difficulties, electrical signals, monitored on the skin surface, are of enormous clinical
and physiological importance. Electroencephalographic (EEG). electrocardiographic (ECG),
electromyographic (EMG). and other Mich signals are rou':nely used for the diagnosis of
neural and m uscular systems in the clinic. The interpretation of the information is based
mainly on the large statistical experience collected throughout the years.
This chapter explains the basic bioelectric phenomena on the cell level both in neural
cells and in muscle cells. A brief discussion on the volume conductor problem is then
presented to provide a link to the gross surface electric signals.
A. Introduction
The basic processing unit in the neurophysiological system is the nerve cell — the neuron.
Its task is information processing, transfer, and acquisition. Neurons which are used for
information transfer are usually long, and serve to transmit information to and from the
central processing body. Special nerve cells have been evolved that serve as sensors.
A variety of mechanisms and sensor shapes exist to transduce many kinds of stimuli
(pressure, light, temperature, etc.) into electrical and chemical signals. The central nervous
system deals with the task of information processing and control. Though there are many
types of neurons, the basic structure of these cells can be generally discussed. Figure 1
schematically depicts the structure of a nerve cell.
The important parts of the neuron are the cell body (soma), the dendrites, and the axon.
The cell body consists of the intracellular fluid with the various bodies required for the
10 Biomedical Signal Processing
- • ° T ACTION
POTENTIAL
functioning o f a cell. There is a great variance in the size of nerve cells. The diameter can
be as small as a few microns or as large as a few tens of microns. It is surrounded by an
' excitable membrane, the thickness of which is in the range of 50 to 150 A. The cell membrane
is extended in various places to generate root-like structures called dendrides. These exten
sions are used for interconnections with other nerve cells.
The axon serves as the output of the nerve unit. It is an extension of the cell with a length
that can be about 50 fmm (in the cerebral cortex) or up to several meters (in peripheral nerves
o f large mammals). The diameter of the axon can range from less than 0.5 (xm to about 1
mm (in the squid giant nerve fibers). Some axons are covered with interrupted myelin sheath
which increases the velocity of information transfer.
Information into the neuron from other neurons is introduced through a junction called
synapse. Synapses are located on the dendrites or on the soma. The synapses can cause an
increase or decrease of the voltage across the membrane. The cell function is based on the
integrative (in time and space) effects of these potential changes.
The tips of the axon serve as inputs to other neurons through synapses, or to activate
m uscles through special synapses — the neuromuscular junctions. Peripheral nerves are
bounded together into a nerve trunk. The electrical activities of the single nerve cell will be
discussed in the following sections. The signals that are picked up from the nerve trunk are
the result of the electric field generated by the various nerves in the trunk.
B . T h e Excitable M em brane
T he cell membrane can be considered as a dividing medium between the extracellular
and intracellular fluids. These two fluids have different ionic concentrations. The membrane
has different permeability to the various ions in the solutions. As a result of ions transfer,
by means of diffusion and other mechanisms, a voltage is generated across the membrane.
If we consider the effects of only the three main ions, potassium [K + ], sodium (Na f 1.
and chloride [CI“ ], we get the membrane potential, E, from the Nernst Equation:
where R, T, and F are the universal gas constant, the absolute temperature, and the Faraday
Volume I: Time and Frequency Domains Analysis 11
constant, respectively. Px is the permeability of the resting membrane to the ion X, and
[X]Q and [Xj, are the concentrations of the ion X in the extracellular and intracellular fluids.
The calculated resting cross-membrane potential is approximately 80 mV (the inside of
the cell being negative with respect to its outside). This value agrees well with neurophy
siological measurements.
Some m embranes have excitability characteristics. When the membrane is excited by
means of electrical, mechanical, or chemical stimulus, the permeabilities of the membrane
to ionic transfer undergo some changes. These changes cause the resting potential of the
membrane to increase, become positive for a short period of time, and later, when the
membrane repolarizes. to return to its normal resting potential. The time coune of the
potential change, the action potential. is depicted in Figure 1.
Nerve and muscle cells have excitable membranes. The shape and time durations of the
action potentials differ in the various cells. Muscle action potentials are usually much larger
in duration.
The excitation of the membrane is caused only if the stimulus exceeds a threshold level
(about 20 m V i. Once the threshold has been crossed and action potential elicited, the
threshold changes. Following the initiation of the action potential is a certain period (of the
order of 1 to 2 msec) when the threshold level becomes infinite. This period is called the
total refractory period in which no new action potential can be initiated. The threshold then
returns to its resting value according to some decaying function. The period in which the
threshold decay s to its resting level is called relative refractory period. In this period, a
new action potential can be elicited provided the stimulus is strong enough to cross the
relatively high threshold.
D. The Synapse
The axon of a neuron terminates with junctions to other neurons or to muscles. One axon
can be connected by means of such junctions to many neurons or muscle fibers.
The s’ napse in the junction between one of the axon endings of one neuron and the
dendrite or soma of another. The presynaptic region is the axon's ending. It does not actually
touch the dendrite (or soma). A spacing of about 200 A, known as the synaptic cleft, exists.
12 Biomedical Signal Processing
T he region in the dendrite (or soma) on the other side of the cleft is called postsynaptic
region.
When an action potential arrives at the presynaptic region, it causes the membrane char
acteristics to change. This change increases the ability of certain chemical substances (trans
mitters) to diffuse from the presynaptic region into the cleft. The transmitters that cross the
cleft are captured by receptors in the postsynaptic region and cause membrane potential
change. The change may be excitatory (excitatory postsynaptic potential, EPSP) or inhibitory
(inhibitory postsynaptic potential, IPSP) depending on the type of transmitter released.
The complete process of transmitter release, cleft crossing, and postsynaptic receiving is
relatively slow and is of the order of 0.5 msec. The transmission of information through
the nervous system, though fast when compared with other biological mechanisms (hor
m ones), may be considered slow when compared with electronic or optical systems.
III. T H E M U SC LE
A . M uscle S tru c tu re
The skeletal muscle consists of cells with excitable membrane. The membrane is similar
in principle to the neuron’s membrane. Its function, though, is not to transfer or process
information but to generate tension. The muscle is constructed from many separate fibers.
The fibers contain two kinds of protein filaments, actin and myosin. These are arranged in
parallel interlacing layers which can slide one into the other causing shortening of the muscle
length. The sliding of the fibers is caused by chemical reactions that are not yet fully
understood.
The generation of motion or force by the muscle is activated when the fiber membrane
is excited. An action potential then propagates along the surface membrane of the fiber,
triggering chemical reactions that, in turn, cause fiber contraction.
When a muscle contracts, the action potentials generate an electric field that can be
monitored by means of surface (skin) electrodes. This field is a result of the contribution
o f many fibers at different times and with different rates. The signal (EMG) monitored this
way will thus be a random signal with statistical properties that depend on the muscle
function.
B . M uscle C ontraction
The neuron that activates the muscle is called motor nerve. The axon endings of the motor
nerve are similar to synapses but rather than activate another neuron, they are connected to
muscle fibers. The motor neuron-muscle connection is called neuromuscular junction or end
plate.
The chemical substance that serves as a transmitter in the end plate is acetylcholine (ACh).5
It is released from the axon endings when an action potential has arrived, diffuses toward
the muscle membrane and is absorbed there at the receptors sites, causing muscle membrane
potential change. When the change is sufficiently high and threshold level is crossed, an
action potential is generated and propagates along the muscle membrane.
The process of transmitter release, diffusion, and reception at the muscle lasts about 0.5
to 1.0 msec. Additional delay in contraction is due to the dynamic properties of the muscle
itself.
IV. V O L U M E C O N D U C T O R S
The source of the bioelectric signals are the action potentials generated by single neurons
and muscle fibers. The current densities generated by the membrane activity cause current
changes in the surrounding medium. The surrounding tissues, in which induced current
changes occur, are called the volume condu tor.
Volume I: Time and Frequency Domains Analysis 13
REFERENCES
Chapter 3
RA N D O M PR O C ESSES
I. IN TR O D U C T IO N
Randomness appears in biomedical signals in two major ways: the source itself maybe
stochastic (as are indeed all information conveying signals) or the measurement system
introduces external, additive or multiplicative, noise to the signal. Whether a signal is
considered stochastic or deterministic is a matter of definition. An ECG signal can be
considered deterministic, and even “ almost" periodic, when some characteristic of the QRS
are of interest, or it can be considered stochastic, where R-R interval variations are of
interest.
Probability theory plays an important underlying role in the analysis of random signals.
Therefore, we provide a brief review of probability theory in the opening of this chapter.
The concepts o f probability theory are then extended to the characterization and analysis of
random signals. The emphasis in this chapter is on definitions and basic presentation of
material directly required for the understanding of the topics discussed in later chapters. For
a more detailed and rigorous presentation of the material, the reader is referred to the many
textbooks av ailable.1 '
Special attention is given, in this chapter, to the topic of correlation analysis, since it has
importance as a detection method often used in biomedical signal processing The multi
dimensional gaussian process is introduced at the end of the chapter. In several analysis
methods discussed in the course of this book, the assumption is made that me signal is
iiaussian. and reference is made to its distribution and other characteristics.
II. E LE M E N TS O F P R O B A B IL IT Y T H EO R Y
A. Introduction
Consider an experiment, the outcome of which can be one of several events. The outcome
of the experiment depends upon the combination of many factors which are unpredictable.
The events are called discrete random events. We can not predict the exact result of such
an experiment: we can, however, comment about the average outcome of a large number
of experiments. A throw of a die serves as a popular example for such “ experim ents" where
the events are the numbers on the face of the thrown die.
Assume we have performed the experiment N times. Out of the N resulted events, the
event A( has oceured n, times. We define the relative frequency, f>, as:
The probability of event A,, P(A,), is then given as the limit of the relative frequency:
(3.2A)
with
(3.2B)
Note that we have assumed that the limit in Equation 3.2A does exist.
16 Biomedical Signal Processing
Two events are called mutually exclusive events if the occurrence of one makes the
appearance of the second impossible. If A; and Aj are mutally exclusive, then the probability
that Aj or Aj will occur is P(A; or Aj), with
and more generally, if the random variables A,, i = are mutally exclusive, then
M
B. Joint Probabilities
When an experiment has many (rather than single) outcomes, we speak about joint prob
abilities. Consider, for example, the result of a blood test. The test outcome consists of
several parameters. We can talk about the probability that the outcome of the blood test will
be some given values for all the parameters; the probability of this happening is the joint
probability. We denote the joint probability of the random variables A ,B ,C ,...,J by
P (A B C D ,...,J) with the meaning: the probability that A and B and C a n d ,..., and J will
occur.
Often the probability of one event is influenced by another event. We may want to consider
the probability of one event occurring, given that the other one has already occurred. This
is known as conditional probability. The probability of event A occurring, given event B
has occurred, is written as:
As an example, consider the following experiment: two cards are successively drawn from
a deck (without returning the drawn card to the deck) and the probability of the first being
an ace and the second a king is sought. The problem can be posed as follows: what is the
probability of drawing a king given an ace was previously drawn?
Consider now the relationship between the joint and conditional probabilities. Assume an
experiment the result of which is given by two simultaneous events performed N times. Let
nA denote the number of times event A appeared in the outcome and nAB the number of
tim es the event A and B appeared. The probability of the joint event AB is
Assuming that the number of experiments is sufficiently large such that nA is also very large,
then we can rewrite Equation 3.6A as:
(3.6C)
P(B|A> = P(A) (f°r P(A) * 0)
Volume I: Time and Frequency Domains Analysis 17
and
P(AB)
P<A 1B) = - ^ 7 (for P(B) * 0) (3.6D)
also:
M B . - (3.6E)
and
P ( B |A ) = M 1M B )
(3.6F)
then the information about the occurrence of event A has added nothing to the knowledge
about event B. The two events are said to be .statistically independent. Introducing Equation
3.7 into Equation 5.6C . we get:
namely. the joint probability of statistically independent events equals the multiplication of
the individual probabilities. In general, for the case with n statistically independent events
A,, i = 1.2...... n. ve have:
D. R an d o m V ariab les
The outcome of an experiment can be a number or some other description of the event.
Let us assign a real number, or a set of numbers to each possible outcome of the experiment.
We may have some n discrete values for the description of the outcomes, in which case the
experiment will be described by a discrete set of numbers (vectors); we denote the set as a
discrete random variable. In other cases, continuous values are required to describe the
outcomes; this set is termed continuous random variable.
Consider the discrete random variable x. The values x,, x2,..,x n are n discrete values
constituting the random variable x. The probability of the event to which the number Xj was
18 Biomedical Signal Processing
assigned is Px(x = x^), which denotes the probability that the random variable x will have
the value x;. If, for example, the n outcomes of the experiments are mutually exclusive,
then:
2 P „ (x = x,) = 1 (3.9)
i= I
since Equation 3.9 describes the probability of the certain event. When the outcome of the
experiment consists of two events, x and y, we define as before the joint probability Pvv(x
= Xj, y = yj), the probability that the random variable x will get the value \ t, and the
random variable y get the value y^ The certain event is the summation over all i and j,
hence:
The conditional probabilities and the Bayes’ rule are similar to Equation 3.6:
(3. IOC)
(3.10D)
(3.10E)
(3.11)
P(X x) = 1 (3.12)
Also, the probability that the random variable x will get a value in the range X2 < x ^
X, is nonnegative and is given by:
We conclude that for every X, > X2 we get P(x ^ X,) 2= P(x X2), hence the probability
distribution function is a nonnegative, nondecreasing function bounded by zero and 1 (see
Figure 1).
Consider the case where two random variables, x and y, are participating in the range
( —00 ^ x s? x s —x y oc). Define the joint probability distribution function, P(x ^
X,y =£ Y). The joint probability distribution function of X and Y is the probability that the
random variable x will get a value x ^ X and the variable y will get a value y ^ Y. The
following relations are obvious:
Volume I: Time and Frequency Domains Analysis 19
p (X )
1 1
1 1
! \
Jp ( x ) <j x
--------------------- § m ^ —
X2 X, X
AX)
p(X) = lim (3.15)
AX—0 AX
(3.16A)
p(X) = dx P(x 85 X)
In Equation 3.16A , we have assumed that the derivative exists, or can be expressed in
terms o f delta functions.- The following relations are obvious:
20 Biomedical Signal Processing
Note also that for continuous random variable, the probability of getting a certain value,
say Xo, is zero: i
When the experiment consists of more than one random variable, we shall use the joint
probability density function defined on the joint distribution function in a similar manner to
Equation 3.16:
P(X,Y) = s id Y P (x 55 x , y 15 Y ) (3 ,1 8 A )
In Equation 3.18A, we have assumed that the partial derivatives exist, or can be expressed
in terms of delta functions.2
We shall be interested in situations where a random variable is to be investigated given
some conditions on the other random variable. Consider the probability of the random variable
y being less or equal to some Y, given that x is in the range X - AX < x X:
We define Equation 3.19 as the conditional probability distribution function, if this function
has derivatives, we define the conditional probability density function, p(YjX), by:
p(Y|x) = ^ _ ^ X ) (32QA)
The following relationships are easily shown from the last definitions and previous relations:
p(Y|X) 3= 0 (3.21 A)
Jj> (y |X )d y = I (3.21C)
p(Y|X) = (3.21 D)
Volume 1: Time and Frequency Domains Analysis 21
III. R A N D O M SIG N A LS C H A R A C T E R IZ A T IO N
A. R an d o m Processes
Up until now we have considered random variables that were “ the outcome of an ex
perim ent” . We now consider a time function x(t). The value of the function at any time t,,
x(tj). is a random variable. The variable t is chosen since most of the signals that will be
considered here are. time-dependent signals. In general, of course, x may be a function of
distance or any other variable.
We assume that we have a source generating the random function x(t), which is denoted
as a sample function. The source generates many sample functions which together are known
as the ensam ble. At any time;, t , , we can observe the values of all sample functions, to get
many “ outcomes o f the experiment” . Figure 2 depicts n sample functions out of the ensamble
of random process x.
For exam ple, consider the EEG signal (see Appendix A) taken by means of surface
electrodes located at a certain location on the scalp. We want to investigate the properties
of the EEG. recorded at the particular location, of a given population segment, say. healthy
22 Biomedical Signal Processing
children in a certain age group. We hypothesize that the EEG recorded is a sample function
o f a common random process. After recording n sample functions, we have for each time
t; n values of the random variable x(tj).)We can use these values to estimate the probability
distribution function of x(t4). j
Assume we have an ensamble of n sample functions of x(t). For large n we can use the
ensamble for estimating the probability, P(x(t) ^ X). We can also estimate the joint prob
ability, P(x(t,) ^ X,, x(t2) X2,...,x (tN) XN); we shall denote this as P (x,,x2,...x N) and
the joint probability density function of the random process x(t) by p(x,,x2,...,x N). It is easy
to show that the probability density function has the following properties:
p(x,,x2,...,x N) ^ 0 (3.22A)
Suppose we have two random processes x(t) and y(t). These are said to be statistically
independent random processes if:
p(x,,x2,...x N,y,,y2,...,y M) =
The joint probability density function of x(t) was estim atedat the time t = t, (refer to Figure
2). Let us also estimate the joint probability density function at another time, t = t, + t .
A process in which the joint probability density functions are identical at all times, and
for all N is called a stationary process.
The expectation is also known as the statistical average or mean. If x and y are discrete
such that x gets the values x^ i = 1,2 ,...,I and y gets yj9 j = 1 .2 ....,J, than the expectation
is
i j
E{z} = E{f(x,y)} = ^ X f(x..yJ)PCx1.y1> (3.25)
i=lj=l
The first moment o f x, m = E{x}, is called the mean. In a stationary random process it is
the t4d c” com ponent. The nth central moment, |xn, is defined b y
The second central moment has a special importance; it is called the variance and is denoted
by crj:
The square root o f the variance is called the standard deviation. The (n + k i;;? order joint
momt’t't F fv ' ' r - m r i j0iM central moments, |xnm, are similarly defined:
IV. C O R R E L A T IO N A N A L Y SIS
yp = ax + b (3.32)
The random points (x,y) will not fall exactly on the line (points (x,yp)) but w ill be scattered
in its vicinity. The mean square error, e, between the random points and the line is given
by:
The best line that is drawn through the scattered points in such a way that e is minimized
is known as the regression line. The parameters a and b of the regression line are given by
minimizing Equation 3.33:
A gaussian random process is a random process in which for every time instant, tj, the
random variables x^tj), i = are jointly gaussian distributed.
Since the gaussian process is completely described by the first and second moments, it
follows that a wide sense stationary gaussian process is also stationary in the strict sense.
It can also be shown that a linear transformation of a random gaussian process yields
gaussian random process. This property of the gaussian process is important in signal
processing since any linear amplification and filtering will not alter the gaussian nature of
the process.
R EFER EN C ES
1. Papoulis, A ., Probability, Random Variables and Stochastic Processes, McGraw-Hill, New York, 1965.
2. D avenport, \V. B. and Root, W. L ., An Introduction to the Theory o f Random Signals and Noise,
McGraw-Hill. New York, 1958.
3. Bendat, J. S. and Piersol, A. G ., Random Data: Analysis and Measurement Procedures, Wiley-Inter-
science. New York, 1971.
Volume I: Time and Frequency Domains Analysis 29
Chapter 4
I. INTRODUCTION
Most of the signals of interest in biomedicine are continuous (analog); namely, they are
defined over a continuous range of the variable (usually time). It is important, however, to
analyze discrete signals, namely, signals that are defined only at discrete instants. Modern
digital technology, both in terms of hardware and software, make discrete time processing
advantageous over analog processing. The advantages are such that usually it is worthwhile
to convert the analog signal into a discrete one so that discrete processing can be applied.
The conversion is done by analog to digital (A/D) conversion systems that sample and
quantize the signal at discrete times. Usually the sampling is performed uniformly; however,
sometimes nonuniform sampling is used.
Some of the main tasks o f digital signal processing are to apply filtering, to estimate
various signal parameters, and to perform transformations of the signal (i.e., Fourier trans
form). When the results of the processing are not required immediately following the signal,
off-line processing methods are used. When results are required during and immediately
after the signal sample has been acquired, real-time or on-line processing methods are used.
Depending on the application, the processing time and the size of memory required are
of importance in digital signai processing. Off-line processing can be performed on general
purpose computers. Real-time processing usually requires special dedicated machines or
systems such as fast correlators, dedicated Fourier transform machines, array processors, or
hardware multiply-accumulate circuits.
This chapter briefly discusses the problem of sampling and quantization. The z transform
is introduced and applied to digital signal processing.
The material presented in this chapter is used as a background for later chapters. For a
comprehensive discussion of this material and other important topics such as the fast Fourier
transform (FFT) and digital filtering, the reader is referred to the literature on digital igna!
processing.1
II. SAMPLING
A. In tro d u ctio n
Consider a band-limited continuous signal, x(t). which is bounded in amplitude by |x(t)|
A. A band-limited signal has a.Fourier transform, X(w). with X(w) = 0 for all jwj 2*
wnm. It is desired to process the signai by digital processing means, in general, by a digital
computer. The digital machine requires the conversion of the continuous signal, x(t), into
a series o f discrete numbers {x(k)} (refer to Figure 1). The first stage of the conversion is
sampling or time discretization. Assume that we apply uniform samplirv^ with interval T0
namely, with sampling frequency f, = 1/TS. We then generate out of the signal a sequence
of sampled data {x(nTJ}, n = 0 ,1 ,... Note that each sample. x(iTJ, is continuous in
amplitude in the range —A ^ x(iT„) A. A discretization of amplitude is.now- required to
get the necessary sequence of discrete numbers. A quantizer device is used for this method.
The combined process of sampling and quantization is known as analog-to-digital conversion
(A/D). We consider the quantization as a transformation of the amplitude continuous quantity,
\(iT J, into discrete number. x,,(iTj. A detailed discussion on the quantization process
is given in the next section.
In the general processing system, the signal is required in an analog continuous form after
30 Biomedical Signal Processing
processing. Consider (see Figure 1) the processed sequence of numbers, yM(n). To get an
analog signal, we perform the digital-to-analog conversion (D/A). The D/A most often
consists o i zero order hold (ZOH) circuit, followed by a low pass filter (LPF).
B. U niform Sampling
The ideal sampler, described in Figure 1, consists of a switch that is closed for an infinitely
short duration every Ts seconds. It can be considered as an ideal switch driven by a train
of impulse function, 8 ,(t), where:
and where 8(t) is the delta (or dirac) function. The sampled signal can be considered as the
multiplication of the continuous signal, x(t), with the train, 8,(t). The time discrete sampled
signal is written, thus, as a time function, denoted by x*(t):
It can be shown that if x(t), the band-limited signal, has a Fourier transform given by X(\v).
then the Fourier transform of the sampled signal, X*(w), is given by:
where wN = 2 llfs = 2FI/TS. Figure 2 shows an example of the transform for two cases. In
Figure 2b, the sampling frequency obeys ws > 2wmax where wmax is the largest frequency
of the signal x(t). We note that the sampled signal in the frequency domain consists of
nonoverlapping functions. Consider the effect of a low pass'filter that will pass all frequencies
in the range —\vmax =5 w wmaK undistorted, w hilezeroing all frequencies outside this
range. The Fourier transform of the signal at the output of the filter equals that of x(t). Since
the Fourier transform is unique, we can restore the original signal from its samples by such
low pass filtering operation, provided the sampling frequency obeys:
ws 5= 2wmax (4.4)
This is known as the sampling theorem. Condition 4.4 is known as the Nyquist rate. Figure
2c shows the Fourier transform of the sampled signal when the sampling frequency is less
than the Nyquist rate. In this case, the functions in the frequency domain overlap and low-
pass filtering cannot restore the signal without distortions. The phenomenon of overlapping
is called aliasing infrequency. Note that when sampling a continuous signal that is not band
limited, aliasing always occurs no matter how large ws is.
In practical cases, when a signal has a large wm.lx, it is often preprocessed by analog low
Volume I: Time and Frequency Domains Analysis 31
(b )
(c)
FIGURE 2. Sampled band-limned signal in ihe l‘requenc\ domain, (a) Spectrum >'i ihe
band-limited signal: (b) spectrum oi’ the sampled signal. -w,,,.,,: ic) spectruir. o: the
sampled signal. wv ■' 2\v,r-i>. Nine aliasing.
pass filter in such a wav that the high frequencies are eliminated so as not to cause aliasing
problems. In theory, the signal can be sampled at the lowest Nyquist rate, w, = 2wm.1N.
The reconstruction of a signal such sampled requires an ideal rectangular low pass filter,
which is impossible to implement. The need to use realizable filters for the reconstruction
of the signal makes it necessary to sample at frequencies higher than the Nyquist rate.
Sampling at frequencies of 2.5 to 10 times w.. v are often used.
C. Nonuniform Sampling
Uniform sampling rate is convenient since the information is contained in the value of
the sample only. No time information is required since it is known a priori that samples
are equally spaced by T„ seconds. Sometimes, however, the signal consists of some inter
mittent occurrences of fast changing and relatively quiescent intervals. One would then tend
to sample at a high rate during fast changing periods, while reducing sampling rate during
the quiescent intervals. This Calls for an adaptive, nonuniform sampling. The ECG signal
is an exam ple where such sampling scheme m ay be effective.
Two main reasons exist for using nonuniform, adaptive sampling. The first is when
effective storage is required. The problem is to store the signal using minimum storage size,
retain in g /11' **Mity to recons*mct the signal within a given error. The second is when an
effective transmission is required. The problem here is to reduce the transmission rate (bit
per second), retaining the ability to reconstruct the signal at the receiver side within a given
error.
32 Biomedical Signal Processing
Several data compression techniques to reduce transmission rate and storage requirements
have been developed for communications application. The differential pulse code modulation
(DPCM) is one of the most popular schemes. An error signal is generated and nonuniformly
quantized. The error is the difference between the original signal and a signal estimated
from the output of the quantizer. Thus, only the error is quantized and transmitted reducing
the amount o f information. The output of the quantizer is uniformly sampled. An improve
ment to the accuracy of the above scheme is the introduction of an adaptive quantizer that
automatically adapts the step size of the quantizer, q, according to the signal. Adaptive delta
modulation (ADM) is such a scheme often used in synchronous communication systems.
A significant reduction in data compression can be achieved by using nonuniform sam
pling.7'8 Consider a scheme in which information is sent only when the source signal crosses
a threshold level. This will cause periods in the signal, where fast changes exist, to be
sampled at a higher rate than periods with slow variations. Note, ‘however, that the trans
mission is now asynchronous, since the receiver does not know a priori the exact location
o f the sample on the time axis. In storage application, information must be added to indicate
the time of the sample.
Since the signal is assumed to be band limited by wmax, there is no use to sample it at a
rate higher than ws = kwmax (where k is a constant in the (empirical) range 2 ^ k ^ 10).
When Equation 4.5 yields sampling interval Tj < 2n/kw max, we replace it by ts = 211/
kwmax. Thus the maximum instantaneous sampling frequency of the adaptive scheme is
bounded by kwma^. An example of the voltage-triggered nonuniform sampling of the ECG
is given in Figure 3.
The first order method is known also as the two points projection method. Here, the first
samples are used to estimate the slope of the signal. As long as subsequent samples fall
within some specified error of this slope, they are ignored. The first sample that falls outside
the error tolerance is stored (or transmitted) and used to estimate the next slope. Denote the
derivative of the signal, at time ts, by x (t;). Assume that at t;, the sample x(ts) has been
stored. The next sample to be stored is the sample at time t* 4- t {, x(t4 + t ;), for which the
absolute value of the slope’s difference first crosses the threshold, R,
Note that here w'e compare the siope of points at time (t; + t ) with the slope at the last
point to be stored. When R, is crossed, we store the sample x(t; -f t ,) and use the new
slope x (t, -f Tj) as a new reference.
The slope of the signal has to be estimated. Consider the uniform sampling of the signal
at a maximum rate of = kwmax, yielding the samples {x(nTs)}, n = 0 .1 ... The slope can
be estimated by:
Volume I: 'l ime and Frequency Domains Analysis
ZERO
■hi in in in i
FIRST
I hi hi i «
SECOND
FIGURE 3. Nonuni form sampling of FCG. Synthesized FCG and sampling instances
and reconstructed signal for zero, first. and second order adaptive sampling methods.
34 Biomedical Signal Processing
If the signal contains additive noise, the estimation 4.7A can be modified by:
where (2M — 1) is the number of samples used to smooth the data. The slope is then
estimated every (2M - 1)TS seconds.
The application of the two points projection method to the ECG is demonstrated in Figure
3.
The second order nonuniform sampling method is known as the second differences method. 10
It examines the slope just before the current sample and just after it. If the absolute value
o f these two adjacent slopes is larger than a given threshold, R2, the sample is stored. Hence,
here we are considering the local change oi slopes. The method is formulated as follows:
The sample x(t,) is stored if:
In practice, we have to estimate the time derivatives. This can be done, again, by uniformly
sampling the signal at a maximum rate. The examination of the sample X(nTs) at time tn
= nTs by Equation 4.8 can be done by using the slope estimation of Equation 4.7B. If we
choose again a window of (2M — 1) samples for smoothing the data, we get (assuming M
is odd):
M -1 \ \ 2 <M
« ,W / / M -1
: f1. M
x n -------------------------------------------Ai i ------------ :---------------------------- <4 9 A >
( 2 ) Ts
2 // M -1 \ \ 2 // M -1 \ \
^ ,?o X( ( " * — + M - M T1 I * ((" - — - J) T- ) _
x(t, ) = ----------------------------------------- 7 ir = ir --------------------------------: -------- (4-9B)
( 2 ) s
The application of the second difference method to ECG is shown in Figure 3. Blanchard
and Bar9 showed that the voltage triggered and second differences method had serious defects
in sampling their synthetic ECG. They concluded that the two points projection method best
suited the signal yielding average compression reduction of about 1:14.
if Tj tq
(4.10)
if t < Tj
= tT/tq + 0.5], with [•] being the integer smaller than or equal to the argument. We
shall encode each sample as follows: Denote each time quantum, tq, in t? by a pair of zeroes
(00), an upward crossing by (01), and a downward crossing by (10). A digital word is now
generated by placing the pairs of zeroes (minus one pair) followed by the crossing code.
For exam ple, a downward crossing that occurred 3tq sec after the previous crossing will be
denoted by 00 00 10. We now code this word by letting the first digits be the number of
pairs o f zeroes followed by a single bit — one for downward and zero for upward crossings.
For the example above, we shall get the coded number 101. This coding is the run length
of the initial w ord.7 The length of the encoded word in the example is n = 3 bits. This
allows for two digits for the registration of the interval. If no crossing occurs in the period
of n*tq, a word o f n times “ 00” is generated (for n = 3 this word is coded into 110 denoting
3 groups o f “ 0 0 ” ). Long quiescent periods with no level crossings will be coded into a
succession o f the words 110.
Consider the compression ratio provided by the method. Assume that the probability of
downward crossing equals the probability of upward crossing. We then get:
Ml21
where m = 2" ~ 1 - 1 is the number of time quanta that can be coded with block length of
n bits. The rate of information is given by the required bits per seconds. If the rate of the
coded information is denoted by rc, then:
rc = ----- - (4.13)
m • ta
Consider now the case where the signal is sampled uniformly with sampling frequency,
fs = l/T s, and where each sample is described by a word length of N bits. The rate of
information, r, is
(4.14)
40 Biomedical Signal Processing
The last two terms on the right side of Equation 4.20A are due to the saturation parts of
the quantizers. In Equation 4.20A it was assumed that x() = and xN = In general,
we shall adjust the input signal such that p(x) = O for x > xN _ , and x < x,; hence, the
last two terms o f Equation 4.20A are zero and:
2p<V q? (4 .20B)
The most common quantizer is the uniform quantizer for which q, = q, i = 2 ......N -
.1. For these quantizers we can further simplify the expression for the noise variance:
q2 q2
<= p 2 P(xM,)q = — (4.21)
since:
S p ( x qi)q - Jp(x)dx' = 1
The approximate value for the quantization noise variance given by Equation 4.21 is the
one most often used. Uniform quantizers, with small quantization step q, and with saturation
levels that can be ignored, have uniformly distributed noise,12 in the range —q/2 ^ nq ^
4 2 with variance cr2 - q2/12.
To justify the assumption that saturation effects can be neglected, we make sure the
extreme quantization levels x, and xN _ , are some multiple of the input signal variance.
Define the loading factor, Lq (for symetric uniform quantizers):
where o\. is the standard deviation (rms) of the input signal. A common choice for the
loading factor is Lq = 4 (four sigma loading). For such a quantizer, we have N — 2 levels
for - 4 x ^ 4o\, hence:
q = 8or/(N - 2) (4.23)
The SNR of the symetric uniform quantizer is given by introducing Equations 4.21 and 4.23
into Equation 4.24:
In Equation 4.25, we have used the relation N - 2n and the assumption N > 2. In practical
cases, we may use a 10-bit word, which results in a high SNR of more than 50 db.
D. Rough Quantization
In the previous section, the quantization noise was analyzed under the assumption that
the number of quantization levels is high and quantization step is small. In many applications,
these assumptions are not valid; we then talk about rough quantization. A more detailed
Volume I: Time and Frequency Domains Analysis 41
xq
(a)
, (b)
analysis is required for the noise geneiated by rough quantization. Figure 7 shows two rough
quantizers. T h e first is a one bit quantizer (N 2) in which the quantized sample is the
sign of the input, x - Sgn(x). The second rough quantizer has N = 3, where the description
of the quantized output requires 2 bits. The importance of these two rough quantizers (also
known as clippers) is due to the fact that processing of the quantized data is extremely
simple. Digital correlation, for example, of signals quantized by these quantizers requires
no multiplication. Such correlators have been suggested1' and applied14 to biomedical signal
processing.
The statistical analysis of rough uniform quantization noise has been given by Wid-
row .,5Kl Widrow has proven the quantization theorem which is, in some sense, analog to
the Nyquist sampling theorem. The time samples of x(t) have continuous amplitude prob
ability density function p(x). The quantized output. xq, assumes only discrete amplitudes
and thus has discrete probability density function pq(xq). This function consists of a series
of uniformly distributed impulses, each one centered in a quantization region. Figure 8
dcpcits the two density functions.
Widrow has considered the output density function as the sampled form of the input
density function. If the input density, p(x), is bounded by frequency (namely, its Fourier
transform P(u) has the property P(u) = 0 for all juj umax), then there exists a quantization
level, qs, such that quantized signals, with quantization levels q qs, contain all the
information on the original‘distribution, p(x). In other words, we can generate the original
probability density function of the input signal from the quantized one, provided the quantizer
obey the quantization law, q qs. The quantization law states that in order to have all
informatk i on the probability density function, the quantization step must obey:
q qs = IT/umax (4.26)
42 Biomedical Signal Processing
Having developed the probability density function of the quantized signal, the noise
statistics (such as the variance and the correlation) can be calculated in general. The variance
of the quantized noise of Equation 4.21 is a special case of the general result given by
W idrow.
IV . D IS C R E T E M E T H O D S
A. T he Z T ransform
Consider the sampled signal, x*(t), given by Equation 4.2. If its (one-sided) Laplace
transform is denoted by X*(S), then:17
Z = exp(ST) (4.28)
Equation 4.29 is known as the one-sided Z transform in which we assume that the signal
x(t) = 0 for t < 0. It is easily shown that the Z transform is a linear operator. Several
important properties of the transform make this operator an important tool for the solution
o f difference equation and the analysis of sampled data system.
One o f the important properties is the shift property. It can easily be shown that:
For example, for m = - 1, Equation 4.30 yields the Z transform of the sampled sigi.al,
x(nT), delayed by one interval, in terms of the Z transform of the original signal:
The inverse transform, x(nT), can be determined by inspection. The inverse transform can
be determined also analytically by the residue theorem, through an integration in the complex
plane.17
B. Difference Equations
A time invariant linear system, with input u(t) and output y(t) which are J;*fined only at
discrete instances t = kT, can be described by a difference equation:
The difference equation 4.33 can be solved by means of the Z transfonr; in a similar
manner in which differential equations arc solved via the Laplace transform. Denote Z{y(t)}
= Y(Z), Z[u(t)} = U(Z) and transfer both sides of Equation 4.33 into the Z domain using
the shift properties. Assuming all intial conditions to be zero, we get:
or:
The output signal is given in the Z domain by the ratio of the two polynomials. H(Z) (the
Z domain transfer function) describing the system, and the input, U(Z). Apply ing the inverse
transform operation on Equation 4.34B will give the required output signal in the time
domain. ;
The transfer function, H(Z), can represent a digital filter operating on the signal, u(t),to
improve its quality in some sense. We shall also see in later chapters that v\e sometimes
use H(Z) as a means for effective description of the signal, y(t). In these cases, we assume
that y(t) is the output of a linear system driven by u(t), a white noise source We identify
H(Z) and use the parameters a, and b, to represent the signal y(t).
44 Biomedical Signal Processing
REFERENCES
1. G old, B. and Rader, C. M ., Digital Processing o f Signals, McGraw Hill, New York, 1969.
2. Beaucham p, K. G. and Yuen, C. K ., Digital Methods fo r Signal Analysis. George Allen and Unwin,
Ltd., London, 1979.
3. Tretter, S . A ., Introduction to Discrete Time Signal Processing, John Wiley & Sons, New York, 1976.
4. Chen, C .-T ., One Dimensional Digital Signal Processing, Marcel Dekker, New York, 1979.
5. Oppenheim , A. V., E d., Application o f Digital Signal Processing, Prentice-Hall. Englewood Cliffs, N.
J., 1978.
6. Ahm ed, N . and Rao, K. R ., Orthogonal Transforms fo r Digital Signal Processing, Springer-Verlag,
Berlin, 1975.
7. M ark, J. W . and Todd, T. D ., A nonuniform sampling approach to data compression, IEEE Trans.
Commun., 29, 24. 1981.
8. Plotkin, E ., Roytman, L ., and Swamy, M. N. S., Nonuniform sampling of band limited modulated
signals. Signal Process., 4, 295, 1982.
9. Blanchard, S. M . and Barr, R. C ., Zero, first and second order adaptive sampling from ECG’s, in Proc.
ofth e35th A C E M B . Philadelphia, 1982, 209.
10. Pahim , O ., Borjesson, P. D ., and W erner, O ., Compact digital storage o f ECG's. Comput. Programs
Biomed., 9, 293, 1979.
11. G ersho, A ., Principles o f quantization, IEEE Trans. Circuits. Syst., 25, 427. 1978.
12. Sripad, A. B. and Synder, D. L ., A necessary and sufficient condition for quantization errors to be
uniform and white. IEEE Trans. Acoust. Speech Signal Process., 25, 442, 1977.
13. Landsberg, D. and Cohen, A ., Fast correlation estimation by a random reference correlator, IEEE Trans.
Instrum. M eas., 32. 438, 1983.
14. Cohen, A. and Landsberg, D ., Adaptive real-time wavelet detection. IEEE Trans. Biomed. Eng., 30,
332, 1983.
^ 15. W idrow, B ., A study o f rough amplitude quantization by means of Nyquist sampling theory, IRE Trans.
Circuit Theory, 3. 266, 1956.
16. W idrow, B ., Statistical analysis of amplitude quantized sampied-dala systems. A IE E Trans., (Applications
and Industry), II. 555, 1961.
17. Derusso, P. M ., Roy, R. J ., and Close, C. M ., State Variables fo r Engine ering. John Wiley & Sons,
New York, 1965.
Volume I: Time and Frequency Domains Analysis 45
Chapter 5
I. INTRODUCTION
Availahility of long records — Often only short time records are available for processing.
This may be due to the fact that the phenomenon monitored existed only for a >hort time
or due to the fact that the acquisition system has allocated only a given time slot to the
signal at hand.
Stationarity — Most often the signal to be processed is nonstationary. It is convenient,
however, to assume stationarity so that powerful (stationary) signal processing ’techniques
can be employed. The signal, therefore, is divided into segments, such that each can be
considered stationary. Rather than estimating the statistics of a nonstationary signal, the
problem now is to estimate the statistics of several “ stationary” signals represented by finite
time segments.
This chapter deals with the problems associated with finite time estimation. The errors
involved with these types of estimators are discussed, as well as the im provem ent in signal-
to-noise ratio achieved by the estimation.14
An important case, when the signal to be processed is a repetitive one, is analyzed. In
this case, synchronous averaging5'9 (known also as coherent averaging) techniques are em
ployed in order to estimate the averaged waveshape of the repetitive signal.
EEG evoked potentials (EP) are classical examples of a signal treated b> means of
synchronous averaging.
Finite time averaging techniques are implemented by software on general purpose com
puters, on dedicated computers, and on special digital circuits.1012 In practice, all signal
processing is time bounded, hence the importance of the knowledge of the estimation errors
involved.
II. F IN IT E T IM E E ST IM A T IO N O F T H E M EA N VALUE"
M* = E{x(t)} (5.2)
Clearly,
The first term of the right side of Equation 5.5 can be rewritten, using Equation 5.3:
where t = r\ - £.
Since stationarity was assumed rx (t ) is independent of r| and <:. is an even function of
t , and has a maximum at t = 0.
Equation 5.6 in terms of the new variable t can be rewritten as:
Note that the integration of Equation 5.8 is carried out on the variables t and £ over the
region shown in Figure 1.
Changing the orders of integration leads to:
T, , ____ - (5.10)
T~ J - t
In order to have the mean square error of the estimated quantity. |i N, one has to have the
autocorrelation function t x( t ) . This function is, however, usually unavailable. Note that when
Equation 5.1 is used, one has
For gaussian processes and most physical random processes, Equation 5.? is a consistent
estimator. We shall deal First with two general cases.
which states the trivial conclusion that for very short observation tim es, the variance of the
estim ated quantity equals that of the record itself, hence no improvement in signal-to-noise
ratio is achieved.
lim ~ f
r - T J r
| rn(T) | dT = 0 (5.16)
The signal-to-noise ratio may be defined as the ratio between the expectation E{x(t)} =
}jlx (the signal) and the variance, Var [x(t)]. The signal-to-noise ratio o f the signal is thus
The ratio between the SNRs can be considered a figure of merit for the estimator.
Thus, for very large observation tinvs. the expected improvement achieved by the esti
m ator 5.3 is
SNR,, _ Varfn(t)] _
SNR, 1 [' ( |t |\
tL(* -V)r“(T
)dT
For large observation times, the improvement in SNR approaches infinity.
Example 5.1
The resting potential of a cell membrane is to be estimated by means o f the system depicted
in Figure 2. The cell membrane potential is measured by the two high impedance glass
electrodes and the amplifier. The signal s(t) is the amplified membrane potential and the
noise n(t) is the noise (referred to the output) of the electrodes, the amplifier, and the noise
picked up by the electrodes.
We assume that the additive noise is zero mean with autocorrelation function
SNR() «T
------ - = — --------- ;--------------------------- - (5.23)
SNR.
2^1 - “ j; (1 - Exp( - a T ) ) J
SNR(I uT
ltm ------- = — (5.24)
SNR, 2
Define a new filter impulse response, h(t,T), which is equal to the low-pass filter's impulse
response throughout the observation interval and is zero elsewhere; hence,
fh(t): 0 t T
h(t.T) = j (5.26)
0: elsewhere
We would like to investigate the mean square error of the estimator. The expectation of the
filter’s output is
(5.28)
(5.29)
(5.31)
(5.32)
Comparing Equations 5.32 and 5.13 we see that when an ideal integrator is used, the variance
w as calculated by integrating rn(T) with a weighting function ~
while here, with a low-pass filter, the weighting function becomes rh (t ,T).
Example 5.2
Repeat the problem discussed in Example 5.1 with a simple RC low-pass filter replacing
the integrator.
Denoting ( R C ) '1 = p , the impulse response for the filter is given by
0; t > T
(5.34)
0; t < -T
0 7 > T
P (exp | ~ ( i 7 l - expl - (J(2T - 7 )} 0 ^ 7 ^ T
rh(7,T) = - < (5.35)
3 (exp ( p T| - exp[ - (3(2T + 7 )] - T ss 7 0
0 7 < -T
Assume, again, that the noise is exponentially correlated (Equation 5.21); then, from Equa
tion 5.32:
u!> - I)
( il/ I )( I - e x p l -«!>*»!» -f l)j - (*]/ - 1)exp( - 26)( 1 ~ exp(-(b(v!/ -
SNR,
1I1 * I
SNR,
<i/ - I (5.37)
0.5 — expl - 2<i>|/0.5 + <}>)
S N R (, a
l i m ------- - = ij, + 1 = - + (5.38)
SNR, M p
Comparing Equations 5.24 and 5.38, we note that while the signal-to-noise ratio can be
infinitely improved by the integrator, it is bounded when using the low-pass filter estimator.
Note also that in both cases the improvement is linearly proportional to the noise correlation
exponential factor ex.
where { x j and {nk} are the signal and noise sample sequences.
The estimate of the mean }ix at time kAt is defined as the average over the current and
M - 1 previous samples:
(5.40)
The estimator is clearly unbiased. The variance of the estimator (Equation 5.5) is given by
M- I M- I
E{xk_, (5.41)
Assuming {xk} is a stationary process, the last equation can be expressed in terms of the
correlation function r- = E{xk xk +T}
rn Tm -1
Rx = . rL_,
(5.43)
the elements of which are the correlation coefficients used in the first term in the right side
o f Equation 5.42. This term, therefore, is the sum of the matrix elements. Due to the
symmetry of the matrix, the variance becomes
1 I 2 M 1 "1
V ariK (M )j = ^ k + <M - t) rl* (5.45)
The last equation is the equivalent of Equation 5.13 for the discrete case.
The improvement in SNR for the discrete case is thus given by
Volume I: Time am i Frequency Domains Analysis S3
S N R ,,_________ Var | i \
(5-46)
SNR,
5 [‘ *5 ? > - « ]
Example 5.3
Consider again the problem given in Example 5 .1 . Assume now that the noisy signal x(t)
is given in terms o f its finite samples sequence {x,}: t = k - M + l , k - M + 2, k -
1, k.
Assume also the noise process has an exponential correlation function as in Example 5.1
(Equation 5.21 ). From Equations 5 .1 8 . 5.19. and 5 .4 6 , we get the improvement in signal-
to-noise ratio due to estimator 5.40:
^ -------------- (5.47)
1 + 2 ^ (1 - t/M ) exp ( -o rr)
! fT
SX(T) = - J (x(t) - fix)J dt (5.50)
where fix is the estim ator for the expectation of {X(t)} given by Equation 5.3. It can be
shown, using Equations 5.8 and 5.12 that the expectation of Estimator 5.50 can be written
as
The variance estim ator (Equation 5.50) i\ a biased estimator. In most practical cases, how
ever, the noise process is such that
For these cases, the bias term approaches zero for large observation times.
54 Biomedical Signal Processing
XMAX» 0.250E 03
SNR IN =2.0
XHIN* 0.100E 01
YMAX- 0.873E 00
N=2 SNR OUT *3.579 YNIN— 0.994F-01
FIGURE 3. Finite time averaging of membrane voltage. (A) M = 2. SNR,, = 3.58; (B) M =
16. SNR,, = 28.15: (C) M - 64, SNR,, = 52.72.
The variance of the estimator (Equation 5.50) is a function of the fourth moment of the
process {x(t)} and thus its calculation is of a limited practical value.
{xj; t = k - M + 1, k - M + 2, ..., k - 1, k
j M -1
Sx - (5.54)
M(M - 1 ) ." ,
Equation 5.54 is the discrete equivalent of Equation 5.51. The estimator is a biased one.
Volume I: Time and Frequency Domains Analysis
XMAX* 0.250E 03
SNR IN *2.0 XMIN* 0.100E 01
YMAX- 0.873E 00
N=16 ' SNR OUT -28.151 YMIN— 0.994E-01
FIGURE 3B.
XMAX= 0.250E 83
SNR IN =2.0 XNIN® 0.I00E 01
YMAX- 0.873E 80
YMIN=-0.994E-01
K=64 SNR OUT =52.716
FIGURE 3C.
56 Biomedical Signal Processing
For most practical noise processes, however, the bias term in Equation 5.54 approaches
zero for long observation times. As in the continuous case, the variance o f the estimator
(Equation 5.53) depends on the fourth moment of the process. Its calculation is thus of very
limited practical value.
Note that for uncorrelated noise process, both the continuous estimator (Equation 5.50)
and the discrete one (Equation 5.53) are unbiased.
C . Correlation Estimation1
The autocorrelation function rx (t) of the ergodic process {x(t)} and the function rxy(T)
(which is the cross-correlation function between the two ergodic processes {x(t)} and {y(t)})
are of great importance in signal processing. It is often required to estimate the correlation
functions given only finite sample functions of the processes.
When the sample functions x(t), y(t) are given in the finite time te(0 ,T + t ) the estimator
for the cross correlation can be defined by
Hence, the estimator (Equation 5.55) is an unbiased estimator of the correlation function
r x (7). The mean square error is given by the variance of the estimator. The variance depends
on the fourth moment of the processes.
Ii can be show'n that for jointly gaussian processes with zero means the variance becomes
M ? + t) r„ (£ - t) j d£ (5.57)
If rv(£), rv(£), and rxx(£) are absolutely integrable over ( ^-ccfx ) t the variance of the estimator
approaches zero as the observation time increases. Thus, for these cases, the estimator is a
consistent one.
The estimation of the autocorrelation function is performed in a similar manner with results
given by Equations 5.56 and 5.57 with {y} replaced by {x}. The correlation function can be
efficiently estimated indirectly. The power spectrum density function is first estimated
(Chapter 8) and then subjected to an inverse FFT to yield the correlation estimation*
IV . S Y N C H R O N O U S A V E R A G IN G (C A T -C O M P U T E D A V E R A G E D
TR A N S IE N T S )
A. Introduction
In many biomedical applications, the problem of detecting a repetitive random signal
heavily corrupted with noise arises. It is often possible to trigger the signal by controlling
the cause. This may be the case when analyzing evoked potentials (EP) in the EEG. The
stimulus can be repeated at known times, and the response can be analyzed by methods
discussed in this chapter.
Volume I: Time and Frequency Domains Analysis 57
[Si(T); O ^T^T i
s,(t “ 0 =|
0; otherwise
where t; is the :ime of initiation of the \th stimulus and Tj is the length of the \th response.
The observed signal to be analyzed. z(t), can be expressed as a series of the responses
Sj(t) corrupted with additive noise
where n(t) is a zero mean process, statistically independent of Sj(t). Let the observation z 4(t)
be defined as
where n,<t ) = niti: tel tj, i,- + T] and T is larger or equal the length of the longest response
1
S (N .i) = - ^ B j S , (t - u} (5.62)
j N 1 N
Assuming stationary zero mean noise process, with variance ex2. the expectation of the
estimator is
1 I I \ \
~ li\ ^ z;’(t - t.) 4- V / (t - t,) z,(t - t,)} - (E{S(N,t}: (5.65)
N- I I . ‘ i , i
58 Biomedical Signal Processing
In general, the responses S;(t) are dependent on one another. This may be (in the case of
EP analysis) due to phenomena like learning or fatigue.
Define the nonstationary spatial cross correlation function rs(T,t) by
t = j —i (5.66)
Noting that
and
and
N N- I
Equation 5.72 gives the variance of the estimator in the general case, with the assumption
of statistical independence between noise and response.
Consider now two important cases.
E{s2(t - t^}; k - 0
rs(k ,T ) (5.73)
m2(t); otherwise
which states that the variance of the estimator can be made as small as required by increasing
the number o f responses participating in the averaging window. In most practical cases, the
variance cannot be made zero due to the limit on N.
The signal-to-noise ratio of the observation z 4(t) is defined by the ratio of the mean response
(the “ signal” ) to the variance of the observation:
SNR, = (5.75)
SNR; = (5.76)
Using Equations 5.74, 5.75. and 5.76. the improvement in signal-to-noi>e ratios can be
written as
Note that the improvement is independent of the signal to noise ratio of the observation z(t).
I ,
07 = v ; + — a- (5.79)
N
Note that in this case the estimator is not a consistent estimator. Its variance can only
approach the variance of the signal: as N approaches infinity, it will not approach zero.
The improvement in signal-to-noise ratios in this case can be written as
SNR. p + 1
----- : = N JL------- (5.80)
SNR, Np + 1 1 ;
where p = —; is the ratio of variances of signal and noise in Zj(t). Note that in this case
(j-
the improvement does depend on the signal-to-noise ratio of the observation. The improve
ment is always greater than I. For very noisy observations (p approaches zero), the im
provement approaches N.
60 Biomedical Signal Processing
(5.81)
where is the latency of the i/7? response. Synchronizing the averaging process in Equation
5.63 to the known stimuli times tj means that the averaged responses will not be properly
aligned with one another. The estimator will thus yield a “ smeared” template o f the response.
To overcome this problem, one must estimate the latency, and synchronize the observations
zs to times (t; + t.) rather than tj. One way to overcome this problem is discussed in the
following section.
In some applications, the knowledge of the average response in Equation 5.62 is not
sufficient. In these cases, one is required to analyze the single EP. Techniques such as
sophisticated adaptive filtering (Chapter 9) and waveform detection (Chapter 1, Volume II)
have to be applied.
Example 5.4
The estimator 5.63 is a random variable with mean that equals the desired quantity S(n.t).
The probability distribution of the estimator is unknown: however, with the use of very
well-known bounds, confidence limits can be set for the design of the synchronized averaging.
Consider the Chebychev inequality:
which states that the probability of the estimate to be outside the range of ± k a s from the
mean m(t) is less than or equal to l k-. Hence, the probability to have an estimate outside
the range ± 3cr; is less than or equal to 0.11. With a probability (confidence) of 0.889 (—0.9
or 90%), the error in the estimate will be in the range of —3ov The experimental requirement
can be phrased as follows: determine the number of trials (N) required such that with certainty
of 90%, the error in the estimate will be less than or equal to — For statistically independent
responses, we have (Equation 5.74)
SNR, xt
------ - = N 2* (5.84)
SNR,
For the second case, where the responses are statistically dependent. Equation 5.79 is used
to get
Volume I: Time and Frequency Domains Analysis 61
FIGURE 4. Synchronous averaging. (A* The signal (SNR = * ): (B) raw data, signal with
additive noise. M = 1; (C) averaging with M = 200: *D) averaging with M = 800; (E) signal-
to-noise ratio vs. M. (See pages 62 and 63. >
SNR;
(5.85)
SNR,
0 (t - t;) *£ T (5.86)
The observation signal z(t) was generated by adding uniformly distributed Pseudo Random
noise. Figure 4 shows the signal s(i), (SNR = =c ) and the estimator output for various N.
The signals were sampled and the estimator (Equation 5.63) was implemented on the computer.
We shall use s„(N.t) as the first estimate of the average response. To improve on this
62 Biomedical Signal Processing
estim ate, each of the N responses will be cross correlated with the estimate. Consider the
correlation of the ith response:
1 (T+\
,(Nt) • Zj(t - X)dt (5.88)
= T 1. s“'
We shall look for the time Xj for which the cross-correlation is maximum:
The time Xj is the time shift required to best align the observation Zj with the estimate of
Volume I: Time and Frequency Domains Analysis 63
FIGURE 4D.
FIGURE 4E.
the average response. Therefore, we shall shift the observation Zj(t) by that amount. The
next estimation for the average is given by:
1N
s.(N,t) = - £<s,(t - t, - K) + n(t - t, - X.) (5.90)
We shall now repeat the correlation process of Equation 5.88 replacing s,.,(N,t) by the
new estimate s,(N,t). The correlation of all N responses with the new estimate. R'JXX), will
provide new estimates, for realignment. This procedure is r e p e a t some halting
criterion is met. One such criterion can be the difference in the areas under two successive
correlations, namely:
64 Biomedical Signal Processing
It has been shown that for a correlation function and noise which are well behaved, the
above procedure results in an asymptotically stable estimate.14 In other cases, the procedure
must be stopped after some arbitrary number of iterations; visual examination of the data
may be recommended.
REFERENCES
I H e n 'l l V s -nr! Pierson, A. G ., Random Data Analysis and Measurement's Procedures, Wiley-Inter-
science, New York. 1971, chap. 6.
2. Ernst, R. R ., Sensitivity enhancement in magnetic resonance. I. Analysis of the method of time averaging.
Rev. Sci. Instrum .. 32 (12), 1689, 1965.
3. Bendat, J. S ., Interpretation and application of statistical analysis for random physical phenomena, IRE
Trans. Bio — Med. Electron., 9. 31, 1962.
4. Davenport, W. B .. Johnson, R, A ., and M iddleton, D., Statistical errors in measurements on random
time functions, J. Appl. Phys., 23 (4), 377, 1952.
5. Ruchkin, D. S ., An analysis of average response computations based upon a periodic stimuli, IEEE Trans.
Eng ., 12, 87, 1965.
6. Bendat, J. S., Mathematical analysis of average response values for nonstationary data, IEEE Trans.
Biomed. Eng., i i . 72. 1964.
,7 Kovacs, Z. L., On the enhancement of the SNR of repetitive signals by digital, averaging, IEEE Trans.
Insirum. Metis., 28 (2). 152. 1979.
8. Shiavi, R. and Green, N ., Ensemble averaging o f locomotor EMG patterns using interpolations. Med.
Biol. Eng. Com put. . 21, 573. 1983.
9. Volkers, A. C. \V ., Van de Schee, E. J ., and G rashuis, J. L .,-Electrogastrography in the dog: waveform
analysis by a coherent averaging technique, Med. Biol. Eng. Comput.. 21. 56. 1983.
10. Shvartsman, V ., Barnes, G. R ., Shvartsman, L ., and Flowers, N. G ., M ultichannel signal processing
based on logic averaging. IEEE Trans. Biomed. Eng., 29, 531. 1982.
i i . Gethner, J. S., Woodin, R. L ., Rabinowitz, P., and Kaldor, A ., Multiparameter matrix signal averaging.
Rev. Sci. Instrum.. 53 (9), 1398, 1982.
12. Thomas, C. W .. Rzeszotarski, M . S ., and Isenstein, B. S., Signal averaging by parallel digital filters.
IEEE Trans. Acoust. Speech Signal Process., 30, 338, 1982.
13. W oody, € . D ., Characterization of an adaptive filter tor the analysis of variable ’latency neuroelectric
signals, Med. Biol. Eng. Comput., 5, 539. 1967.
14 Senm oto, S. and Childers, D. G ., Adaptive decomposition of a composite signal o f identical unknown
wavelets in noise. IEEE Trans. Syst. Man Cybern., 2. 59. 1972.
Volume I: Time and Frequency Domains Analysis 65
Chapter 6
I. INTRODUCTION
where iX(w)j is the amplitude and 0(w ) is the phase of the frequency domain representation.
Not all real functions can be transformed into the f r e q u e n c y domain by Equation 6.1.
Sufficient conditions for the existence of X(w) are given by the Dirichlet conditions:
2. x(t) has a finite number of discontinuities and a finite number of extrema points in
any finite interval.
There are some very usciui iunctions such as the impulse (deita) function, step function,
or even sine and cosine functions which do not obey the Dirichlet conditions. These functions
do not have a Fourier transform; they do. however, have the transform in the limit. Wre
shall talk about the FT of these functions where, in fact, we shall mean the limi: that does
exist
The FT possesses many interesting properties that help in the calculations of direct and
inverse transformations. The reader is referred to references and textbooks dealing in details
with the subject.12 We shall deal here only with some important properties that bear direct
relation to the material presented in later sections J
(6.5A)
The importance of Equation 6.5 in systems analysis stems from the fact that the output of
a linear system is given by the convolution of the input with the impulse response of the
system. It can be easily shown that the FT of x(t) is
where Xj(w) = F{x;(t)}. Hence the convolution operation in the time domain, which is a
relatively complex operation, becomes a simple multiplication operation in the frequency
domain.
Consider now the convolution of two functions. X,(w) and X2(w•i. in the frequency domain:
Hence, convolution of two functions in the frequency domain is given by 211 times the
multiplication of the two functions in the time domain.
2. Parseval's Theorem
Consider the energy, E, of the time function x(t):
(6.9)
(6.9A)
( 6 . 10)
Equation 6.10 is known as Parseval's theorem which states that the total energy of the signal
can be calculated by the integral in the time domain and in the frequency domain. Note that
if the energy in a band of frequency w, w ^ w , is required, we have to integrate over
this band and the band of negative frequencies - w, ^ w ^ - w , . Since |X(w)|2 is an even
function, in w, we gel:
(6.11)
(6.13)
(6.14)
(6.15)
where T = 2Il/w„ is the period of the signal, and a„ are the coefficients of the expansion.
Applying the FT to Equation 6.15 yields:
Hence, the FT of a periodic signal is a train of impulse functions located at the harmonics
of the fundamental frequency, w„, each multiplied by 211 times the corresponding coefficient
of the Fourier series.
68 Biomedical Signai Processing
The sequence is, in general, a complex sequence. The transformation that maps
f / v x^
the complex sequence <}X ^k ) j back into the sequence {x(nTs)} is called the inverse
Note that T s and w„/N are constants, hence w^e denote xn = x(nTs) and Xk •= X
Let us now consider the sampled sequence {x(nTj} as a time function generated by
multiplying the signal x(t) with the ideal sampler operator, 5x(t) (see Equation 4.2). We
denote the sampled signal by x*(t) and get:
N - 1
X *(t) = ^ x ( t) 8 ( t - nTs) (6 . 20 )
n= 0
The FT of the signal x*(t), namely X*(w), is easily calculated, noting that:
N - 1
Equation 6.22 describes the FT of the sampled signal. This FT, X*(w), is a continuous
function of w. Let us sample, in the frequency domain, this continuous function at frequency
- 1,0,1...
= 2 x (n T J e x p (^ - j2 1 1 k k = ...,-2,-1.0.1,2, (6.23)
Volume I: Time and Frequency Domains Analysis 69
t
i
!T.
11 >
I
I 4
f i t
N/2
it iu>s
- i r —1
A a) = c u c / n
FICiURE- !. The relations between tiic Fourier transform (FT) of \(t). the FT
of jp.d the DF-T. (A) The FT of x(t): (B) the FT of x*(t); (C) the DFT.
Note that ihe set of N members k - 0 .1 ......N — 1 of the infinite sequence (Equation 6.23)
equals the DFT of Equation 6.17.
We recall also from the discussion in Chapter 4 (Equation 4.4 and Figure 4.2 i that the
FT, X*(w). of the sampled signal is the repetition of the FT of the continuous sig:nal X(w)
centered at (w v. When we sample the FT, the samples —N /2 ,..., —l,0 ,l,...,( N 2 - 1) are
samples of the FT centered at w = 0. The rest of the samples of the sequent onvey no
new information since they represent the same samples shifted to (w + €ws), t = . - L L ...
This can also easily be seen from Equation 6.23. The functions exp(—j2IIk —) arc? periodic
N
functions with period N. Hence X *^k —J = X*( (k + £N) for any integer (. Since
the FT of real signals is symetric, we can represent the samples of the FT by the sequence
X *^k k = 0 .1 ...... N - 1.
samples of the negative frequencies of the FT centered at ws. Since, in our cases, the FT
is symetric, these samples contribute no additional information. From the sequence of N
samples of the DFT, we require only the N/2 + 1 first ones (or last ones); the rest are
redundant. In the example given in Figure 1, wx = 2wmax, and hence for k = N/2 we get
The convolution of two signals was defined in Equation 6.5. A similar operation can be
defined for two sequences. Let us define the cyclic convolution of the two sequences {x,(n)}
and {x2(n)} by:
N- 1
N- i
E = 2 x(n2) (6.27)
n=0
The DFT is an important tool for discrete signal processing for the same reasons the FT
was important for continuous signal processing. The direct computation of the DFT requires
approximately N2 complex multiplication and addition operations. In 1965, Cooley and
Tukey, in their famous paper, presented an efficient method for calculating the DFT. Their
method, known as the fa st Fourier transform (FFT), requires only N log2 N operations
(where N is a power of 2). For N = 1024, the number of operations required by the FFT
is ten times less than the number required for direct computation.
Many different FFT algorithms have been derived for software and hardware implemen
tations. Two commonly used algorithms are known as the decimation in time and decimation
in frequency algorithms. The interested reader is referred to the vast literature on this sub
je c t.3"5
Volume I: Time and Frequency Domains Analysis 71
SJ„,. Iims&s®
T— X T
(62))
In Equation 6.29, we use the expected value of the random variable |XT(w)p which is the
energy. By introducing the FT we get:
and
Equations 6.3 1 A and B state that the PSD and the autocorrelation are a Fourier pair. Equations
6.31 A and B are known as the Wiener-Khinchin relations. Note that since the autocorrelation
function is an even function, the PSD is real.
Consider a stationary random signal x(t) which has an autocorrelation function:
Namely, the values of the signal x(t) at the time t and at the time t + t (for all t not equal
zero) are uncorrelated. The PSD function of such a process is
S(w) = a (6.33B)
72 Biomedical Signal Processing
The power is equally distributed along the frequency axis, hence the process is called white
noise. A random process with power unequally distributed is called colored noise.
One principal application of the PSD function is related to the analysis of linear systems.
Consider a linear system with an| impulse response, h(t), driven at its input by a random
sam ple function, u(t). The output! of the system, x(t) is given by the convolution:
Consider an input signal that is stationary in the wide sense. We then calculate the
autocorrelation of x(t) and take its FT; the result is:7
Hence, when the input, u(t), is white, Su(w) is constant. The output signal x(t) is a nonwhite
noise, colored by the frequency response of the system.
The cross-correlation function is not necessarily even, hence the cross-spectral density is,
in general, a complex function:
Hence, the frequency response of the system can be calculated from the cross spectrum and
input spectrum.
A convenient real value bounded quantity is defined, named the coherence function:
|Sxv(w)|2
7xv(w ) = Sx(w)Sv(w)
o W , \ 55 1 (6 -40>
W hen Y^y(w) = 1 for all frequencies, x(t) and y(t) are said to be fully coherent: when for
some w = w0, YJy(w0>- ° :;(t) and y(t) said to be incoherent at w(). When x(t) and
y(t) are statistically independent, then YJv(w) = 0 for all w.
Coherence function is useful in investigation of signals which are only slinhtlv correlated.
Volume I: Time and Frequency Domains Analysis 73
Hence, it is the low coherence values that are of interest. In practice, the exact values of
the various spectra are not known and must be estimated (cee Chapter 8); hence, the coherence
function is always given in terms of its estimates. Common and novel1011 estimation methods
exist for the coherence function. Estimation may cause large inaccuracies in the coherence
function, and its application must be carefully considered.9
The coherence function has been applied to EEG analysis9,12,13 for the investigation of
brain asymmetry, localizing epileptic focus, the study of relations between cortical and
thalamic activity, and more.
A. In tro d u ctio n
In the design of signal acquisition and processing systems we must often alter a given
...... /..at some parts of it are enhanced, or attenuated, its phase is changed, parts of
it are delayed, smoothed, or “ predicted". The signal may be deterministic, random, con
tinuous, or discrete. Many of the desired alterations can be achieved by linear transformation.
We then design a linear system, or filter, that operates on the signal with the required
transformation.
The basic filter is the time invariant filter, or fixed parameters filter. It is usually designed
to meet the required specifications, given some a priori information concerning the signals
and noise involved. Filters can be designed to meet the required specifications while optim
izing some performance criterion; these are called optimal filters. One example, the Wiener
filter, is discussed in this chapter. Filters in which the values of the pjrameters are functions
of time are called time varying filters. An important class of time varying filters is the class
of adaptive niters which is discusscd in Chapter 9.
Consider the signal u(t) which is to be processed. It is desirable to apply a linear trans
formation such that its outcome will be x(t). We can consider the linear system depicted in
Figure 2 to be the filter driven by the input signal u(t) with the output signal xm .
The relations between the two signals are generally given in terms of the differential
equation:
dmu(t) , du(t)
bm — — + ...... + b, —— + b0u(t) (6.41)
dtm dt
A general solution to Equation 6.41 is given in terms of the impulse response, hit). Since
the system is linear, its output is composed of the linear combination of the response to the
impulse function.14 It can be shown that the output. x(t). is given by the convolution of the
impulse response with the input:
We see that U(w) can be shaped into a desired X(w) by designing the right filter H(w). The
advantages of the design in the frequency domain become obvious from Equation 6.42B.
Only the basics of digital and optimal filters design will be discussed here. For detailed
discussion of the material, the reader is referred to the literature of these topics.1s The topic
74 Biomedical Signal Processing
3 ud) h it) x (0
INPUTo- -o OUTPUT
U(w) H(w) X(w)
o f cepstral analysis and homomorphic filtering with applications to biomedical signal proc
essing are discussed in detail at the end of this section.
B. D igital Filters
The availability of low cost and efficient digital computers and dedicated processing
circuits have made the implementation of filtering, by digital means, very attractive. Even
when ? ^ | og environments, where both input and output signals are continuous,
it is very often worthwhile to apply analog-to-digital conversion, perform the required
filtering digitally, and convert the discrete filtered output back into a continuous signal.
Digital filters are linear discrete systems governed by difference equations (see Chapter
4). Two classes of digital filters are used— finite impulse response (FIR) and infinite impulse
response (HR).
FIR filters are characterized by finite duration impulse response which, in the Z domain
means:
X(Z)
H(Z) = 77777 = b° + b 'z ' ' + ...... + b - z_m (6-43)
where X(Z) and U(Z) are the Z transforms of the input and output sequences. Equation 6.43
states that the FIR filter is a moving average (MA) filter (see Chapter 7), or an all zero
filter. FIR filters are always stable.
IIR filters have, in general, infinite duration impulse response, they possess zeroes and
poles (ARMA filters — see Chapter 7), and their transfer function in the Z domain is
b0 + b ,Z _1 4- ......• bmZ n
H(z ) = "• — 7 - , ^ ---------— ^ 7 1 (6.44)
1 + a,Z + ...... + anZ
IIR filters are stable if all the poles of H(Z) are within the unit circle in the Z domain.
IIR and FIR filters can be synthesized recursively via the difference equations, or by
means of the FFT. Since continuous filter design is well established, one of the approaches
for designing digital filters is to find a difference equation, with the associated H(Z), that
yields an output sequence close to the samples of the analog output signal. This approach
is termed the impulse invariant method. Another approach is to transform the analog filter,
by means of the bilinear transformation, into the Z domain yielding a digital filter, H(Z).
The resultant filter will not possess the same impulse response since the transformation
introduces frequency scale distortions. This method is known as the bilinear transformation
method. A third approach for digital filter design is the frequency sampling method. This
method is based on the approximation of a function by a sum of sine functions. Detailed
discussion of the design steps can be found for example in Gold and Rader 4 and Chen.K
C . T he W iener Filter
Consider now the problem of optimal filter design. Assume, for example, that a signal
s(t) is corrupted with additive noise n(t); it is required to estimate, by linear operations, the
value of the signal s(t + t]), nq ^ 0 from the observation signal x(t):
Volume I: Time and Frequency Domains Analysis 75
We assume that both s(t) and n(t) are stationary in the wide sense. Note that for tj = 0 the
problem is that o f smoothing, namely, extracting the current value of s(t) from current and
past values o f the observation signal. For t) > 0 the problem is that of prediction, namely,
extracting the future value of s(t + t^) from current and past values of the observation signal.
Assume we have a linear Filter, h(t). We apply the signal x(t) to its input. Let us denote
the output of the Filter by §(t + iq). We shall look for the optimal filter in the sense of
minimization o f the mean square error between the output of the Filter and the actual desired
quantity:
It is required to m inim ize^2 overall possible, realizable h(t). Performing the minimization4
yields the condition:
where rsx and rv are the cross correlation of the observation signal with the desired signal
s(t) and the autocorrelation of x(t). respectively. Condition 6.47 is known as the Wiener
H opf condition i?>ee also Equation 9.10).
When the optimal filter, given by Condition 6.47, is used the minimum squared error is6
and
The last result states that under optimal conditions, the error and observations are uncor
related; since E{e(t)} = 0, the two are also orthogonal. If we remove the realizability
constraint, the solution of Equation 6.46 will be similar to Equation 6.47 but with a lower
integration boundary including all negative values; namely, integration boundaries will be
minus to plus infinity. Note that the right-hand side of Equation 6.47 is the convolution of
tx(t ) with h (r). Taking the FT of the equation yields:
The exponent in the left side of the last equation is due to the time shift present in rsx of
Equation 6.47. The required optimal filter is thus given in the frequency domain by:
system ^£di|p61es at,^M R H P must have nonzero impulse response, h(t), at t < 0 which
o p tim ^ li^ iz a b l# 3 ^ ie n e r filter can be calculated6 from Equation 6.47. Its error will
o r equafeto^that o f the optimal filter. Similar arguments can be applied to
sdigitaT filters. Optima£Wiener FIR and IIR filters can be designed.®
#VC-'n ’ - . V- :
W .C E P S T R A L ANALYSIS AND HOMOMORPHIC FILTERING
A. Introduction
The concept of cepstrum was first introduced in the early sixties in an attempt to analyze
a signal containing echoes. The power cepstrum was defined as “ the power spectrum of
the logarithm of the power spectrum” . Later the definition was changed to make its con
nection with the correlation function clearer and to provide it with units of time. The new
definition became “ the inverse transform of the log of the power spectrum ” . The term
“ cepstrum ” was derived from the “ spectrum” by reversing the order of the first four letters.
T he domain of the cepstrum was termed quefrency, a term derived from frequency. Additional
term s have been defined, such as “ lifter” (derived from “ filter” ), but these were not accepted
well in the literature.
Cepstral analysis is applied mainly in cases where the signal contains echoes of some
fundamental wavelet. By means of the power cepstrum, the times of the wavelet and the
echoes can be determined. The complex cepstrum is used to determine the shape of the
wavelet. These techniques15'21 have been discussed in the literature with various applications.
Itihas been applied to the analysis of EEG signals,17 21 to ECG signals,20 and to the speech
sig n al.19
B. T he C ep stra
The complex cepstrum, x(t ), of the real signal x(t) is given by:
Since the argument of the logarithm in Equation 6.52 is complex and may be negative, we
shall introduce the complex logarithm of a complex function V:
W e shall also need to perform the inverse operation, n a m e ly exponentiation; therefore, let
us define complex exponentiation of V:
In the discrete case, when the data is presented in terms of the sequence {x(nT)}, the
cepstra are defined16 by means of the Z transform.
The power cepstrum of the sequence {x(nT)} is the square of the inverse Z transform of
the logarithm of the magnitude squared o f X(Z). Thus, we write the power cepstrum
xp(nT):
Thefinal squaringjnI
does not contain lnfo4nati^|
complex cepstrum.of the^set
logarithm o f X(Z): ~ : ^
If the sequence x(nT) is the convolution of two sequences u(nT) and h(nT), namely,
x(nT) = u(nT)*h(nt), then:
and since the inverse transfonp is a linear operation, the complex cepstrum is
Hence, the com plex cepstrum of the convolution o f two sequences equals the sum of their
cepstra. The com plex cepstrum is thus an operator converting convolution into summation.
Its application to deconvolution problems becomes apparent. Assume that x(nT), u(nT), and
h(nT) are the output, input, and impulse response sequences of a descrete linear system,
respectively. If u(nT) and fi(nT) occupy different quefrency ranges, then the complex ccp-
strum can be liftered (filtered) to remove one. In the complex cepstrum, phase information
is retained therefore it can be inverted, to yield the deconvolved h(nT) or u(nT).
The com putation o f the complex cepstrum in Equation 6.57 has to be carefully considered
since the com plex logarithm is not singled valued. The imaginary part of the complex
logarithm (Equation 6.53) is the phase. If it is presented in module 211 form (principal
value), then discontinuities will appear in the phase term. This will occur due to the jump
from 2 n to zero, when the phase is being increased over 211. Phase unwarping algorithms
must be em ployed to overcome this problem. A simple solution is to compute the relative
phase between adjacent samples, add them together in order to get a cumulative, unwarped
phase.
The complex cepstrum can be implemented22 by means of the DFT replacing the Z
transform. This is true since the sequences are of finite length. The region of convergence
for the Z transform includes the unit circle allowing the Z transform and its inverse to be
evaluated for Z = exp(jw); therefore:
Equation 6.59 is o f great computational importance since the DFT and IDFT can be very
effectively calculated by the FFT algorithm.
The upper part o f Figure 3 depicts schematically the operations involved in the complex
cepstrum com putations.
C . H o m o m o r p h i c Filtering
Let us consider again the example given by Equations 6.58A — C. Here the sequence
{x(nT)} can be the samples of a speech signal, the sequence {h(nT)} the weighting sequence
of the vocal tract, and {u(nT)} the samples of the pressure wave exciting the vocal tract
during voiced utterance, when the vocal cords are vibrating. The pressure {u(nT)} can be
78 Biomedical Signal Processing
m o d ele d as a train o f very narrow p ulses appearing at a frequency k n o w n as the fundam ental
freq u en cy or the pitch. W e are interested both in the seq uence { h (n T )} in order to learn about
the v ocal tract characteristics, and in the seq uence {u(nT)} in order to estim ate the pitch.
E quation (6 .5 8 C ) g iv e s the co m p le x cepstrum as the sum o f the cepstra o f the input and
the v o ca l tract resp on ses. A ssu m e that in the quefrency range w e have:
h(nT) = 0 for n ; (6 .6 0 A )
and
T h erefore, th ese are separable in the q uefrency dom ain. C onsid er tw o lifters, a short pass
lifter, Y ,(n T ), given by:
1; n < n0
Y ,(n T ) (6.61 A )
Y2(nT) = (6.61 B)
Volume I: Time and Frequency Domains Analysis 79
When x(nT) is fed into the input of these two filters, the output of Y, will be fi(nT) and
that of Y : will be u(nT). We now want to transfer u(nT) and fi(nT) from the quefrency back
into the time domain. We have to subject the sequences to the inverse operation. This
involves first the DFT followed by complex exponentiation (Equation 6.54) and IDFT. The
complete operation o f the homomorphic frftering is depicted in Figure 3.
Homomorphic filtering has been applied20 to the automatic classification of ECG. Normal
inverted T-wave and two types o f premature ventricular contractions (PVC) have been
considered. It has been found that feature selection for diagnostic purposes could be more
efficient using homomorphic filtering than by conventional methods. It has been also dem
onstrated that the basic wavelet of normal ECG signal evaluated by the homomorphic filtering
closely approximates the action potential spike in the cardiac muscle fibers.
Scnmoto and Childers21 have used homomorphic filtering to decompose visual evoked
response (VER) potentials. It has been suggested that the recorded VER signals can be
expressed as an aggregate of overlapping signals generated by luuuipL disparate
sources whose basic signal waveforms are unknown and have to be estimated. The as
sumption. therefore, is that the wavelets are identical in waveshape. We shall consider here
the decomposition of tw'o wavelets. The extension to the multiple case can be easily done.
Let x(t) be the composite signal and s(t) the wavelet; then:
where the shape o f s(t), the delay n(), and the echo amplitude a < 1 are unknown. x(nT)
can be written in terms of the convolution:
where
T ak in g the Z transform :
with
The second term in the right side of Equation 6.65 can be expanded in a power series
yielding
a2 a3
log(X(Z)) = log(S(Z)) + a Z - “ - - Z ~ 2™ + - Z - ,ro...... (6.66)
a2
x(n T ) = Z - ‘{log(X(Z)} = s(nT ) + a8(n T + n0T) - - 5 (n T - 2noT) +
Thus, the complex cepstrum of the composite signal consists of the complex cepstrum of
the wavelet plus a train of 8 functions located at positive quefrencies at the echo delay and
its multiples. A comb notch lifter can be used to remove the train of delta function. After
smoothing, the wavelet is reconstructed by inverting the operations used for the computation
o f the complex cepstrum, as shown in Figure 3.
A similar procedure can be used for the processing of dye dilution curves (see Appendix
A, Volume II).
R E FE R E N C E S
1. Bracewell, R. N ., The Fourier Transform and Its Applications, M cGraw-Hill. Kogakusha, Tokyo. 1978.
2. Papouiis, A ., Signal Analysis. McGraw-Hill int., Auckland. 1977.
3. Tretter, S. A ., Introduction to Discrete Time Signal Processing, John Wiley & Sons, New York, 1976.
4. Gold, B. and Rader, C . M ., Digital Processing o f Signals, McGraw-Hill, New' York. 1969.
5. Brigham, E. O ., The Fast Fourier Transform, Prentice-Hail. Englewood Cliffs, N .J., 1974.
" 6. Laihi, B. P., An Introduction to Random Signals and Communication Theory, Ini. Textbook C o., Scrantcn.
Pa., 1968.
7. Davenport, W . B. and Root, W. L., An Introduction to the Theory o f Random Signals and Noise, McGraw-
Hill. New York, 1958.
8. Chen, C. T ., One Dimensional Digital Signal Processing. Marcel Dekker, New York. 1979.
9. Glasser, E. M. and Ruchkin, D. S ., principles o f Neuruhiological Signal Analysis, Academic Press. New
York, 1976.
10. Nuttal, A. H ., Direct coherence estimation via a constrained ieast-squares linear predictive fast algorithm,
Proc. of ICASSP. IEEE, Paris. 1982, 1104.
11. Yotm, D. H ., Ahmed, N ., and Carter, G. C-. Magnitude squared coherence function estimation: an
adaptive approach, IEEE Trans. Acoust. Speech Signal Process., 31, 137, 1983.
12. Shaw, J. C ., Brooks, S., Colter, N ., and O ’Connor. K. P ., A comparison of schizophrenic and neurotic
patients using EEG power and coherence spectra, in Hemisphere Asymmetries o f Function in Psychopath
ology, Gruzelier, J. and Flor-Henry, P., Eds , Elsevier-North Holland, Amsterdam, 1979.
13. Beaumont, J . G -, Mayes, A. R ., and R ugg, M . D ., Asymmetry in EEG alpha coherence and power:
effect of task and sex, Electroencephalogr. Clin. Neurophysio!.. 45. 393. 1978.
14. Derusso, P. M ., Roy, R. Y., and Close, C . M ., State Variables fo r Engineers. John Wiley & Sons, New
York, 1967.
15. Oppenheim, A. V,, Generalized linear filtering in. Digital Processing o f Signals. Gold. B. and Rader,
C. M ., Eds.. McGraw-Hill, New- York, 1969.
16. Childers, D. G ., Skinner, D, P., and Kenerait, R. C ., The cepstrum: a guide to processing. Proc. IEEE,
65, 1428, 1977.
17. K emerait, R. C. and Childers, D. G ., Signal detection and extraction by cepstrum techniques, IEEE
Trans. Inf. Theory, i8, 745, 1972.
18. Oppenheim, A. V ,, Kopec, G. E ., and Tribolet, J. M ., Signal analysis by homomorphic prediction,
IEEE Trans. Acoust. Speech Signal Process., 24, 327, 1976.
19. K opec, G. E ., Oppenheim, A* V ., an d Tribolet, J. M ., Speech analysis by homomorphic prediction,
IEEE Trans. Acoust. Speech Signal Process.. 25. 40. 1977.
20. M urthy, I. S. N ., Rangaraj, M. R .. U d upa, K. J ., and Goyal, A. K „ Homomorphic analysis and
modeling of ECG signals, IEEE Trans. Biomcd. Eng., 26. 330, 1979.
21. Senm oto, S. and Childers, D. G ., Adaptive decomposition of a composite signal o f identical unknown
wavelets in noise. IEEE Trans. Syst. Man Cybern., 2. 59. 1972.
22. Oppenhein, A. V. and Schafer, R. W ., Digital Signal Processing, Prentice-Hall, Englewood Cliffs, N .J.,
1975.
Volume I: Time and Frequency Domains Analysis 81
Chapter 7
I. INTRODUCTION
Modern signal processing techniques are applied to a variety of fields such as econometrics,
speech, seism ology, communications, and biomedicine. A major problem in these appli
cations is the need to analyze and process finite time samples of random processes. In
general, the processes are nonstationary2 and nonlinear. The theoretical basis for modern
time series analysis has been developed by mathematicians and statisticians such as Mann
and W ald.'
Recent developm ents in both theory and com; u t : m . ; l : h m s of linear stationary
signal analysis ' provide powerful tools for signal processing. Some of the techniques are
well established (with available computer program packages (Reference 11, for example)).
When a nonstationary signal is to be processed, it is usually regarded in such a way that
each segment can be considered stationary. Stationary signal processing methods then*can
be applied.
A favorable approach for stationary signal processing is the parametric modeling proce
dure. The process is modeled by some causal rational parametric model. The signal is then
represented by means of the model parameters. Such a procedure is attractive from the point
of view of data compression. Rather than handling (for processing, storing, or transmitting)
the complete time --ample, or sequence, only a reduced number of parameters are used.
Consider, for exam ple, the problem of analyzing and storing EEG data1- in neurological
clinics. It would be of great help if these data could be reduced and compressed for storage
purposes in such a w ay that the signal can be regenerated at will. Another example may be
the storing of compressed ECG data (or a complete medical file) on a personal credit card
carried by the patient in such a way that it can be reproduced at will anywhere.
Signal compression is also attractive from the point of view of classification (diagnosis).
Effective algorithm.'- f o r the automatic classification of signals typically representing various
pathological states arc available.
Since most m odem signal processing is implemented by digital computers, we consider
the sampled signal S*(t) sampled at the frequency of fs = 1/T
The finite time windowed sampled signal is given by the sequence ]S(kT)} k - 0. 1, ...,
N - I
For the sake o f brevity, w'e shall denote the sequence by {SJ without the loss of generality.
The sequence in Equation 7 2 is to be modeled by a parametric model.
A very effective parametric model is that of the transfer function (TF). The sampled signal
i the sequence), { S ^ i s assumed to be the output of a linear system driven by an (inaccessible)
input sequence {L J and corrupted by an additive noise. The sequence { S j is thus given as
the solution to the difference equation
s k -- - Y
i i ■-(1 i r.
(7.3)
82 Biomedical Signal Processing
in Equation 7.1; { £ j is the additive noise sequence. It is usually convenient to work with
noise sequences which are white, with zero mean. Consider, therefore, the sequence {£J to
be; the output of a noise filter driven by white noise nk such that
I
q
B(z ') ■= E bjZ-1
and transfering Equations 7.3 and 7.4 into the z domain, we get
8 (z ~ ') D( z ^ )
S(z) - - U(z) - N(z); (TF) (7.6)
A(z 1) C(z ')
In Equation 7.6 the sequence {S(k)} is modeled by means of the system parameter vector,
Psr and the noise parameter vector, g n, where
The problem of identifying the above parameters when the input is available is well covered
in the literature on system identification.1’ In signal processing modeling the input sequence
{Uk} is assumed to be a white unaccessible sequence. The parameter vector |3S is thus
describing a linear transformation transfering the white sequence into the (colored) signal
sequence. The transfer function model can be decoupled into the deterministic and noise
models
F (z‘)
((k)
B (z-') Y(k ) J
A (r') ~ ~ Vc
(see, for exam ple, Reference !!). The ARM AX model can thus be written a^
The autoregressive moving average (ARMA) model is derived from Equ tion 7.6 by
assuming there is no external noise, hence
B(z ')
S(z) ~ U(z): (ARMA) (7.11)
A(z "')
Si = - 2 a ,S k i + S b i u k-i
(7.12)
i I ! -- 0
G
S(z) U(z); (AR) (7.13)
A(z
Sl = 2 a, S ( k - i ) + GU(k) (7.14)
while the MA, all zero model, is derived from the ARMA model by assumi; ^ A(z ') =
I
sk = i b,Uk _; (7..16X
84 Biomedical Signal Processing
0 U(h) BU'*)
A (i-')
S(k)
AR MODEL
nan
XbjUU-iV
MA Model
q
s k = 1X=0 bA - , (7-16)
Figure 3 shows, schematically, the AR and MA models in the frequency and time domains.
Other models such as the IMA or ARIMA3 (autoregressive integrated moving average)
are used when homogeneous nonstationary signals are to be modeled (see Section VIII).
The basic idea behind linear predictive modeling is that by assuming the sequence {S j
to be the output of a linear system, one can express the sequence (in a reduced parametric
manner) by means of the system parameters. Given a sequence {SJ, one has to identify the
parameters of the system. Since the input is inaccessible, well-known algorithms for system
Volume I: Time and Frequency Domains Analysis 85
identification 13 cannot be directly applied. The common approach has been to model the
sequence by means o f a system driven by an input with white spectrum (i.e., white noise,
impulse). Algorithms for the estimation of system parameters (without the need to have an
access to the input) are available. These will be discussed in the following sections.
The AR and ARMA models are the most commonly used. These will be discussed in
more detail in following sections.
A. Introduction
The AR m odel17 is very often used because of its simplicity and because of the fact that
effective algorithms for the estimation of the AR parameters are available. Note that an
ARMA model can be approximated by means of an AR model. Assume an ARMA model
given by the polynomials B(z ') and A u :), of order q and p. By long v.^ wu.,
get
B(z~') 1
(7.17)
A(z ''j A ( z '!)
hence, the ARMA model can be expressed by means of an AR mode!. In general, however,
the polynomial A (z“ ') will be of infinite dimension. It is possible to approximate the ARMA
model by an AR model having the first p coefficients of the A(z"'') polynomial.
The choice of the order p of the AR model depends on the accuracy required.
When modeling a stationary process, one must make sure that the model is stationary. In
order to ensure weak stationarity, the autocovariance and autocorrelations of the modeled
sequence must satisfy a set of conditions.' For a linear process, these are ensured if the
complex roots of the characteristic equation
are all inside the unit circle in Hie / p la n e or outside the unit circle in the z '" 1 plane.
The inverse filter H _l(z _ ')
is known as the .\R whitening filler. When the sequence {SJ serves as the input to the AR
whitening filter, the resultant output will have white spectrum. The simplest AR process is
that o f the first order. It is known as the Markov process, given by the difference equation
Sk = - 2 a , S k_i (7.21)
i= 1
where a circumflex (•) denotes estimated value. For the time being, we shall assume that p
is given (for example, by guessing). At time t = kT we can calculate the error e(k) (known
as the “ residual” ) between the actual sequence sample and the predicted one:
ek = Sk - Sk = Sk + 2 (7.22)
i= I
Note that the residuals { e j are the estimates of the inaccessible input {GUJ. The least squares
method determines the estimated parameters by minimizing the expectation of the squared
error.
5 E{e;;
; i = 1 ,2 , p (7.24)
d a;
p
2 «»r:-i = “ r.
i = 1 .2 ....... p (7.25)
Ef, = r„ + i a; rj (7.26)
1- 1
The correlation coefficients are not given; hence, they have to be estimated from the given
finite seqnence {S(k)}. Assume the sequence [ S j is given. For k = 0, 1, 2, ..., (N - 1)
we can estimate the correlation coefficients by
In Equation 7.27, we have assumed all samples of {SJ to be zero outside the given range.
These estimations (Equation 7.27), known as the autocorrelation method, will be used instead
o f the correlation coefficients of Equation 7.25. For sake of convenience, we shall continue
to use the symbol rs where indeed f; must be used. Equation 7.23 can be written in a matrix
form
Volume /: Time and Frequency Domains Analysis 87
r„ r, r, . • rn n a, r,
r. r„ r, . ■ .fp-2 a, r.
rP-2 r„ r.
rr •
r, r«, a,-,
Ra = r (7.28)
where the correlation matrix R, vector r, and the AR coefficients vector a are defined in
Equation 7.28. It can he shown17 that, for the deterministic case, a similar equation exists
for the estimation of the AR parameters vector.
The direct solution of Equation 7.28 is given by inversion of the correlation matrix
R - 'r (7.29)
The correlation matrix is symmetric and, in general, positive semidefinite. Efficient al
gorithms for the solution of Equation 7.28 exist. Note that the correlation matrix is a Toeplitz
matrix (the elements along any diagonal are identical). Durbin11' has develop;-i an efficient
recursive procedure
E, - r„ (7.30A)
(7.30C)
Hence, the Durbin procedure for model of order p also yields all models of order less than
p. A flow chan for the calculation of Equations 7.30A through E is given in Figure 4.
An additional byproduct of the Durbin’s algorithm is the minimal average error of the ith
order model E,. It can easily be shown that
0 E; (7.32)
E0 - r(0) (7.33)
One way for determining the model’s order is to evaluate Equation 7.30 for some large
88 Biomedical Signal Processing
order n and then choose the model with order p < p for which tne minimal average error
is small enough.
The coefficients K-,; j = 1, 2, ..., p calculated by Equation 7.30B are known as the
reflection coefficients or the partial correlation coefficients (PARCOR).3’8 Sufficient con
ditions for the stability of the model are
Since the Durbin procedure yields the PARCOR at no extra calculational cost, stability is
easily determined without the need to solve the pth order Equation 7.18.
It can be show n17 that the estimated gain G is related to the correlation coefficients by
p
G2 = E„ = f„ + 2 a; ?* (7.35)
Several methods have been suggested for the estimation of the m odel's order, p. One of
the well-known methods is the one suggested by Akaike19 22 which will be discussed later
in this chapter. An important application of AR analysis is that of spectral estimation; this
will be discussed in detail in Chapter 8.
Volume I: Time and Frequency Domains Analysis 89
it = E{sk st+ J =
i (7.36)
j- 0 Ii ~0
where r" is the autocorrelation of Ihe white input where
= H\) (7.37)
i f f - j = 0 ' (7.38A)
2 > ,b . ; 0 « i «s q (7.38B)
i Nv
f'; = z , — I 2 V i . , (7.39)
:vi 1 i~0
it = £ b; bj. ; ; 0 as i q (7.40)
i-'O
fe = f; - V
j• !
v - -
b. - b0 1 (fr - Z b. bJ+ ,)’ i = 1 ,2 , ..., q (7.41)
j i
t> r - (ft, - E ir
t i
m = 0, 1, 2
90 Biomedical Signal Processing
where (*)(m) is the value of the mth iteration of (•) and (*)j<m) = 0 for m < 0.
Equations 7.42 are iteratively solved until some convergence criterion is satisfied. Such
a Criterion may be
2 (b«"’> - « e (7.43)
1= 0
IV . M IX E D A U T O R E G R E S S IV E M O V IN G A V E R A G E (A R M A ) M O DELS
A. Introduction
In cases where the process does have a number of influential zeroes, the approximation
7.17 will require very large dimension of the AR model. This may become undesirable for
signal compression applications. An ARMA model can then be used.
Assume an ARMA (p,q) process given by Equation 7.12. The process is stationary if the
roots of the characteristic Equation 7.18 are outside the unit circle in the z 1 plane.
The inverse filter
is called the ARMA whitening filter since when driven with the given sequence {Sk} will
yield white noise as its output.23
sk = - iI I a , s k. , + GUt (7.46)
In Equation 7.46, r is the order of the A(z ') polynomial. It is chosen to be sufficiently
hl^h (see Section II). The AR parameters a,, i = 1, 2, ..., r are identified using the Durbin’s
algorithm (Equation 7.30). By cross multiplication, we get from Equation 7.45
a, = a, + b,
= a, + a,b, 4- b :
= a, 4- a,b, 4- ajb : 4- b ;
“ ^ ~ ••••
0 = a. apb ,
In Equations 7.48 and 7.49 ah i = 1. 2, ..., r are known and the ARMA parameters a,, j
= 1 ,2 ........p and bk, k = 1, 2, .... q are to be determined. The order (p,q) of the ARMA
model is estim ated by (p,q). Equation 7.49 provides a set of q equations with q MA
parameters, b . as unknowns
92 Biomedical Signal Processing
- - - — — —
a*. a^-i •••• a^+i-q b, ^p+1
ap-j+q — 3p +q
K pA = ~ !<p-» (7-51)
where the matrix A^,, b, and a ^ +1) are defined in Equation 7.50. The solution to the MA
parameters b is thus given by
b X = [ b „ b 2, . . . , b „ 0 , ...,0 ] =
1 0 0
a, 1 0
a, a, 1
Ad =
ap_, a^_ a, 1
(7.55)
a = a(1) + Ad bA (7.56)
Equations 7.52 and 7.56 constitute the ARMA estimation. The estimation, however, is very
sensitive to noise and ill conditioning of
r, + Z airj i =
0; j > q (7.57)
where the sequence h(k) is the weighting sequence (the response to Kronecker delta) of the
ARMA filter. Equations 7.57 show that the AR parameters, a, appear in a linear fashion
for all j > q. Consider the set of t equations; evaluated over the range q + 1 j ^ q +
t, these yield
V . "l" ‘o '
a, 0
1q + t 1q + t - I v«- aP 0 (7.58)
_„
or in matrix notation
R.a = 0 (7.59)
a T = [ 1• aT] (7.60)
If the order (p.q) is known and the correct (and not estimated) correlations are used to form
R„ Equation 7.59 will have a unique solution for all values of t 3= p. This is true since the
rank of Rt is equal to min (p,t). However, both (p,q) and the exact correlation coefficients
are not given; thus, estimation methods must be applied.
The unbiased estimate f ’..j autocorrelation coefficients are adopted here
M -j- I
1
V (7.61)
M - j
94 Biomedical Signal Processing
where M is the number of samples in the finite sequence over which the estimation is
performed. Estimating the ARMA order (p,q) by (p,q), the elements of the estimated cor
relation matrix Rt are given by
‘ r ■■■ ■
i. r(i,j) = f4+I+i_j
(7.62)
The left side o f Equation 7.59 with the estimate of the correlation matrix will no longer
equal 0. We can define an error vector e such that
R, a = e (7.63)
Since the error is the result o f the sum of many random variable products, its joint density
function can be assumed gaussian with zero mean and covariance matrix W.
The estimate of the AR parameter vector a can be performed by maximizing the joint
density function. The maximum likelihood3 estimate of a is given by the equation
R^ W _ 1 R ta = p (7.66)
p is a constant selected such that the first component of a is one. Cadzow16 has suggested
taking the (p x p) covariance matrix as the identity matrix W = I, hence the estimate of
the AR parameters becomes
Assuming A (z-1) =* A(z"‘)» then the MA model of Equation 7.69 is close to the MA part
of the ARMA model to b** identified. The sequence {§(k)} is therefore assumed to be a
MA(4) process. Its parameters 6 are identified using techniques discussed in Section III.
A. Introduction
One o f the important decisions that has to be made when modeling data with parametric
models is that o f estimating the order. In ARMA modeis, this means the estimation of (p,q)
the AR and MA orders.
Order estimation is usually done by means of some optimization technique. It is desirable
to have the minimum values for p and q to reduce the amount of calculations, storage, and
ill conditioning problems. It is required, however, to have large enough values of p and q
to adequately represent the process.
Currently, many methods are available for choosing the order of AR or ARMA models
(see, for exam ple, References 3, 17, 19 through 22, and 28 through 37). Several basic ideas
are used for order determination. Some check the correlation or spectral flatness of the
residuals; others use decision rules based on Bayesian approach, maximum likelihood ap
proach, and amount of information measures.
B. Residuals Flatness
For the estimation of the order of an AR process, a measure of residuals correlation has
been generally u sed.17 As noted by Equation 7.22, the residuals are the estimates of the
inaccessible input sequence {G U j which is an uncorrelated (white) sequence. Hence, for
an adequate model, the residuals must be uncorrelated or, in other words, possess flat power
spectrum.
Consider an AR model with order p, and define the normalized error p
Vp = f (7.70)
where Ep is the minimum average error (Equation 7.34) and r0 is the total energy in the
sequence (Equation 7.27). From Equations 7.32 and 7.70, it is clear that is a monotonically
decreasing function of p. It has been shown17 that is bounded by
V0 = 1 ^ V , ^ v min (7.71)
where
(7.73)
96 Biomedical Signal Processing
FIGURE 5. Estimation of AR model’s order. Data was synthesized from an AR difference equation of order ten.
The test is based on the hypothesis testing methods, rather than on some procedure where
multiple decisions are used. This has been a major reason for criticizing the method20 and
developing new methods for model order estimation.
In Figure 5, the method for order estimation is demonstrated by means of synthesized
data. A sequence { S j was synthesized from an AR Model of order 10. The curve of vs.
p is plotted.
E, - 1 1 ; ( , , i
E{EP} = ( l - £ - ± - 1 ) G 2 (7.75)
Consider now another sequence { y j infinitely long with the same statistical properties as
{S(k)}. The predictor for this sequence will be
y* = - J U n - i (7.76)
where a; j — 1, 2, ..., p are functions of {S j. The variance of the resiauais tends asymp
totically to (1 + (p + 1) N ~ ') G2 as N approaches infinity. Hence, it is logical to estimate
the final prediction error (FPE) of the predictor (Equation 7.76) by
Volume I: Time and Frequency Domains Analysis 97
FPE(p) = ( l + Q (7.77)
(7.78)
FPE(p) = Mm (7.79)
P
An alternate method which very often gives the same results has been suggested by Parzan.6
The method is known as the criterion autogressive transfer function (CAT). It is based on
minimizing the average normalized spectrum error between model of infinite order and of
order p.
where k is the number of independently adjusted parameters within the model. The AIC is
defined as an estimate of twice the negentrophy (known also as the Kullback information)
of the true structure with respect to the fitted model. The estimates of a and E by the Yule-
Walker Equation (7.40) are approximately maximum likelihood estim ates.1 Thus the AIC
for an AR model is given by
In Equation 7.81, p(Uis a factor that compensates for windowing effects. Its value is taken
as the ratio of the energy under the window function used in the data handling to that of
rectangular window. For a Hamming window, pto = 0.4.
It has been suggested7 that the search for the absolute minimum of Equation 7.81 will be
carried out over the range 1 ^ p =? 3 N 1'2.
The AIC is also successfully used" in determining the order of ARMA models. Here the
AIC is defined by <
where
The matrix A becomes ill conditioned when its normalized determinant becomes much
smaller than one. The optimal order, p, can be defined as the model order for which |A|n(p.
+ 1) exhibits a significant falloff in magnitude.
In the case where the correlations are calculated with no error, the rank of Rt equals the
true AR order, p (for p ^ t, p p). In the practical case, the matrices will have full rank
for all estimated orders, p, larger than the correct one. However, it is found that (p - p)
eigenvalues have values “ close” to zero. A procedure for order estimation can thus be
formulated as: the optimal order, p, is the value for which the estimated correlation matrix
Rt has (p — p) of its eigenvalues sufficiently close to zero (for all p > p). A method for
implementing the above procedure, based on singular values decomposition (SVD), has
been suggested by Cadzow.27 Efficient algorithms for calculating the SVD are available.38
k4 = af}
aji-u = (a(i> _ a<o _ k2}- i ; ! ^ j ^ (i _ (7.87)
Volume I: Time and Frequency Domains Analysis 99
Equation 7.87 is calculated recursively for i = p, (p - 1), 1 (in that order) where
initially a]^' = a,, 1 ^ j ^ p.
The LPC coefficients can be calculated from the PARCORs by the recursive equation
a!"11 = km
where x(t) is zero mean stationary time series and m(t) is continuous trend represented by
a polynomial of degree (d - 1).
Signals like the one described in Equation 7.89 frequently occur in biomedical applications.
The trend, m(t). in these cases may be, for example, a baseline shift due to undesirable
electrode movement in bioelectric signals.
The trend can be removed from the signal y(t) by the repeated difference filter V d. To
illustrate this, consider for example a trend polynomial of degree 2:
- m,(t - 1) - mQ + Vx(t)
= m2 • 2t - m2 + m, + Vx(t)
applyingthedifferenceoperator again,
100 Biomedical Signal Processing
V 3y(t) = V 3x(t)
S(t), therefore, is stationary and stationary modeling techniques (such as ARMA modeling)
can be applied to its processing. Consider now the discrete sequence {Sk} consisting of the
samples of S(t), modeled by an ARMA model
Note that the original process can be written from Equation 7.91 as
where
Yk = 2 ...... 2
jd=- oc i2=- -X 2 - TCSfi (7.97)
The nonstationary sequence {Yk} can be retrieved from the stationary sequence {Sk} by d
summations (or “ integrations” ); the process in Equation 7.94 is therefore called an auto
regressive integrated moving average or ARIMA (p,d,q) process. The indexes p,d,q denote
the order of the AR process, the order of the difference operator, and the order of the MA
process, respectively.
The ARIMA (p,d,q) model of Equation 7.94 has q (stable) poles outside the unit circle
due to the polynomial A(z~ ’), and d poles on the unit circle at z = 1. When solving Equation
7.94 for { Y j by the inverse transformation, the dth order pole at z = 1 is responsible for
the trend form present in { Y j while the stable poles are responsible for the stationary part
o f{ Y j.
Volume I: Time and Frequency Domains Analysis 101
B. Seasonal Processes 3
Many biological signals exhibit periodic behavior. Some show a period of 12 months,
some (like menstrual secretions) have a period of about a month, others have a period of
about a day (Circadian rhythms), and others exhibit shorter periods of 30 min, 7 min, and
shorter.
Assume a nonstationary time series {Y j with seasonal behavior of period X. The analysis
of the sequence Yk, Yk_x, Yk_2x, ... is of interest. This sequence may possess a trend of
order (d - 1). The stationary sequence {SJ is thus given by
with
S(z) = VJ Y(z)
where
V JX = (1 - z - K)d (7.100)
V III. A D A P T IV E S E G M E N T A T IO N
A. In tro d u ctio n
The assumption usually made that the signal under test is stationary does not generally
hold for real signals . A more practical assumption is that of taking the signal to be stationary
within a certain time window. The length of the window is determined taking into account
the nonstationary dynamics. For the speech signal, stationarity can be assumed for windows
of about 10 to 20 msec. The short time segments are due to the relatively fast changes in
the spectral characteristics of the speech signal. Other, slower changing signals, like the
EEG, can be considered stationary for much longer durations.
Thus, it is often convenient to treat the nonstationary signal as piecewise-stationary. That
is to say, a signal which is stationary within every given segment. The nonstationarity
manifests itself by the changes in spectral characteristics between the various segments.
Thus, the continuous changes in the statistical characteristics of the process are modeled by
a series of jum p changes.
A simple method of segmentation is to divide the signal into a sequence of constant length
segments. The length of the segments is determined a priori such that it will be short enough
to be considered stationary, yet long enough for lowest frequencies of its spectrum to be
estimated. Such segmentation is usually used in speech processing.
A better segmentation procedure is the one in which the segments are determined adap
tively. Here, the segment length depends upon the dynamics of nonstationarity of the process.
Adaptive segmentation procedures are especially important when classification and effective
storing and transmission are required.
102 Biomedical Signal Processing
Adaptive segmentation procedures have been applied to the detection of failures in linear
system s.41'43 It has been applied mainly to the detection of abrupt, step-wise changes in the
signal statistics due to failure in some systems components. In signal analysis, however,
the usual case is the presence of “ trend-like” nonstationarity where the Statistical charac
teristics of the process are slowly changing. '
Adaptive segmentation procedures have been applied to biomedical signal processing44
and, in particular, to the analysis of EEG signals.45 52 In these applications, it is important
that the procedure can be implemented on line and will be immune to isolated short-time
nonstationarities caused by noise.
where r,(o) and r0(o) are the zero lag correlation coefficient of the sliding and reference
windows, respectively. Since the zero lag correlation is the energy of the signal, DA is a
normalized distance measure of amplitudes. For the spectral distance measure, we shall
consider only the correlation coefficients for lags zero until the lag for which the correlation
first becomes negative. Define the modified normalized correlation, r°(m)
f r(m)
m = 0 ,1 ,..., m° — 1
no
r(o )’
i*(m) - (7.103)
0; m°,m° + 1,___
where m° is the lag for which the correlation first becomes negative. The spectral distance
X |rf(m) - r^(m)|
Ds = ------~ ' q ------------- — (7.104)
0.5 + Min{r^(m),i?,(m)}
m= I
I ..........
where q is chosen such that the correlations for lag larger than q can be neglected.
The rationale behind Equation 7.104 is that for pure sine waves of different frequencies
in the reference and sliding window, the measure yields the difference between the cosine
of the frequencies divided by their minimum.
The autocorrelation measure is now defined by a linear combination of the two distances
7 / . / — — ~
sliding window
D
2
TIM E
FIGURE 7. Adaptive segmentation. Upper trace: piecewise stationary simulated EEG. First 2.5 sec were
simulated from an AR process and last 2.5 sec from a different AR process. Fixed and growing reference
windows are demonstrated. Lower trace: the SEM. A new segment has been detected at t = 2.5 sec.
where RA and Rs are constants. To determine a new segment, ACM is compared with a
threshold. In Reference 48, the constants were determined such that the threshold was unity.
The boundary of the new segment is located within the sliding window (for which ACM
is greater than the threshold) proportionally to the steepness of the autocorrelation measure.
The ACM method is based on somewhat heuristic measures. Its calculations load is,
however, relatively low and it is general in the sense that it does not assume a model for
the signal under test.
windows (Figure 7). The basic idea is to estimate an AR model for the signal in the reference
window and to observe the variations of the spectrum in the sliding window. Variations in
spectrum are measured by means of the SEM criterion.
Consider the spectrum of the signal estimated through the reference window by means of
AR spectral estimation (see Chapter 8). An AR model is fitted to the signal using the
reference window data. Denote this by A~'(z). The estimated signal samples are given by:
o GU(z)
s-<a ■ J m 1,1061
The power spectrum of the signal at the reference, SG(w), is given by:
= (7 1 0 7 »
since the input signal {U J is assumed to be white noise with unit variance.
Consider now the inverse filter, A 0(z), applied to the samples, {st(k)j, o f the sliding
window. The output of the filter, the residuals {et(k)} transformed into the z domain are
where Et(z) and St(z) are the Z transform of the residuals and sliding window samples,
respectively. The power spectral density function of the residuals can be expressed in terms
of the spectrum o f the reference §0(a)) by
Note that here all functions of to denote power spectral density functions. The power spectral
density of the residuals is the Fourier transform of residuals autocorrelation |r t(k)}
Equations 7 .1 0 9 and 7 .1 1 0 give the relation between the residuals autocorrelations and the
ratio of spectra.
Since we are interested in the relative spectrum error, it is logical to define an average
error criterion:
Introducing Equations 7 .1 0 9 and 7 .1 1 0 into Equation 7.1 1 1 and noting the orth ogon ality of
the cosine functions, we get
W hen comparing the spectra in the reference and the sliding windows, G is constant.
Therefore, Equation 7.112 can be multiplied by G4 without loss o f meaning. Bodenstein
Volume I: Time and Frequency Domains Analysis 105
and Praetorius45 also claim that for the EEG application it was shown that normalization to
rf(o) was appropriate. Therefore, define the spectral error measure (SEM) as
G4
rf(o )' • (,u31
Since G is not given and only M autocorrelation coefficients are estimated, a more practical
measure is
SEM
(Ir O’- 1 O ’
where G is the estimation of the reference AR filter gain given by Equation 7.35. If the
signal contains short-term, isolated nonstationarities caused by disturbances, the system may
introduce short, meaningless segments. It has been shown45 that clipping the residual signal
to a predetermined level before calculating the correlation, rt(k), removes most of the effect
of these nonstationarities.
Segmentation is determined by placing a threshold, TSEM, on the SEM. As long as SEM
^ T sem, we continue to slide the window and include the data in the current segment. For
the first window for which SEM > TSLm<we declare a new segment. The boundary between
segments is placed at the middle of the window. The complete process is then repeated with
a reference window initiated at the boundary, serving for the new AR coefficients estimation
The complete segmentation is given by me following steps:
an advantage when abrupt, step-wise changes are expected since the data up to the change
does indeed belong to a stationary process. When the nonstationarity is “ trend” wise,
namely, gradual changes occur, the advantage of this arrangement over the constant length
reference is doubtful. The GLR method puts emphasis on the detection of the boundary
within the window in which a change has been detected. This is again important in the case
of abrupt changes. Its significance is lost when the change is gradual and boundaries of
segments are arbitrarily defined by means of some average error allowed.
Basseville and Benveniste55 have suggested a procedure based on two AR models with
distance measure between them such as the log likelihood ratio and Kullback’s divergence
between conditional probability laws. They have suggested a continuously growing reference
window but with “ forgetting” weights that will decrease influence of past samples. This
may cause even more severe problems with trend-like nonstationaries than the previous
growing reference window.
A method based on parametric families of distributions was suggested by Sclove.*3 The
assumption here is that each segment is the sample function of a random process with
different probability distribution.
A comparison between three segmentation methods — the ACM, SEM, and GLR — has
been conducted56 on simulated data with parameter jumps and real EEG data. The results
show that for the simulated signal (with “ jum p” nonstationaries), the GLR method is
superior. For the EEG data, however, the SEM method may be recommended due to its
lower calculations load and satisfactory segmentation properties.
Segmentation approaches based on syntactic methods (see Chapter 13) also have been
suggested and applied, for example, to the analysis of speech signals.57
R E FE R E N C E S
1. M ann, H. B. and Wald, A ., On the statistical treatment of linear stochastic difference equations. Econ-
ometrica. 11. 173. 1943.
2. Kitagawa. G ., A nonstationary time series model and its fitting b\ a recursive filter. J. Time Series A n a l.,
2, 103, 1981.
3. Box, G. E. P. and Jenkins, G. M ., Time Series Analysis Forcasting and Control. Holden-Day. San
Francisco. 1970.
4. Hannan. E. J ., Time series analysis, IEEE Trans. Autom. Control. 19. 706. 1974.
5. Robinson, E., Physical Application o f Stationary Time Series, Macmillan, New York. 1980.
6. Parzan, E., Some recent advances in time series analysis, IEEE Trans. Autom. Control. 19. 723. 1974.
7. Sawaragi, V., Soeda, T ., and Nakamizo, T ., “ Classical" methods and time series analysis, in Trends
and Progress in System Identification. Eykhoff, P., Ed., Pergamon Press, Oxford. 1981. chap. 3.
8. Anderson. T. YV., The Statistical Analysis o f Time Series, John Wiley & Sons. New York. 1971.
9. Parzen, E., Time series model identification and prediction variance horizon, in Applied Time Series
Analysis. Vol. 2. Findley, D ., Ed., Academic Press, New York. 1981, 415.
10. Granger, C. YV. J ., Acronyms in time series analysis, J. Time Series Anal., 3. 103. 1982.
11. YYang, D. C. C. and Vagnucci, A. H ., TSAN' A package for time series analysis, Comput. Programs
Biomed., 11. 132, 1980.
12. Isaksson, A ., YVennberg, A ., and Zetterberg, L. H ., Computer analysis of EEG signals with parametric
models, Proc. IEEE, 69(4). 451, 1981.
13. Eykhoff, P., System Identification: Parameter and State Estimation. John Wiley & Sons. London, 1974.
14. Jakem an, A. J. and Young, P. C ., Advanced methods of recursive time series analysis. Int. J. Control,
37, 1291. 1983.
15. Cadzow, J. A ., ARMA time series modeling: an effective method. IEEE Trans. Aerosp. Electron. Syst.,
19. 49, 1983.
16. Cadzow, J. A ., ARMA modeling o f time series, IEEE Trans. Pattern Anal. Mi ch. Intelligence, 4, 124.
1982.
17. M akhoul, J ., Linear prediction: a tutorial review, Proc. IEEE, 63. 561. 1975.
Volume I: Time and Frequency Domains Analysis 107
18. D urbin, J ., The fitting of time series models. R e v . Inst. Int. Suit., 28(3), 233, 1961.
19. A kaike, H ., Statistical prediction identification, Ann. Inst. Stat. Math., 22, 203, 1974.
20. A kaike, H ., A new look at the statistical model identification, IEEE Trans. Autom. Control, A C -19, 716.
1974. * j
21. A kaike, H ., Fitting autoregressive models for prediction, Am. Inst. Stat. Math., 21, 243, 1969.
22. A kaike, H ., Likelihood o f a model and information criteria, J. Eton., 16, 3, 1981.
23. B ittani, S ., Is the prediction o f regression model white?, J. Franklin Inst., 315, 239. 1983.
24. T relter, S . A . and Steiglitz, K ., Power spectrum identification in terms of rational models, IEEE Trans.
Autom. Control, A C -12, 185, 1967.
25. M ayne, D . Q . and Firoozan, F., Linear identification of ARMA processes, Automatica, 18, 461, 1982.
26. G raupe, D ., K rause, D. J ., and Moore, J. B ., Identification o f autoregressive moving average parameters
of time series, IEEE Trans. Autom. Control, AC-20, 104, 1975.
27. C adzow , J. A ., Spectral estimation: an overdetermined rational model equation approach. Proc. IEF.E.
70(9), 907. 1982.
28. Pao, Y. and Lee, I). T ., Performance characteristics of the Cad/.ow modified direct ARMA method lor
spectrum estimation. Proc. 1st Acoust. Speech and Signal Process. Workshop on Spectral Estimation.
McMaster University. Hamilton. Ontario. Canada. Aug. 1981, 2.51-2.5.10.
29. Akaike, H ., Modern development of statistical methods, in Trends and Progress in System Identification.
Eykhoff, P.. Ed., Pergamon Press. Oxford. 1981. chap. 3.
30. Suzum ura, N. and lsh ii,‘N ., Estimation of the order of autoregressive process, Int. J. Syst. Sci.. 8. 905.
1977.
31. K ashyap, R. L ., Optimal choice of AR and MA parts in ARMA models, IEEE Trans. Pattern Ana!.
Mach. Intelligence. 4. 99. 1982.
32. Chaure, C . and Benveniste, A ., AR and ARMA identification algorithms of Levinson type: an innovation
approach. IE E E Trans. Autom. Control. 26. 1243. 1981.
33. H annan, E. J ., The estimation of the order of an ARMA process, Ann. Stat., 8, 1071. 1980.
34. Schw arz, G ., Estimating the dimension of a model. Ann. Stat., 6, 461, 1978.
35. Rissanen, J ., Modelling by shortest data description. Automatica, 14. 465. 1978.
36. Ishii, N ., Iwata, A ., and Suzumura. N., Evaluation of an autoregressive process by information measure.
Int. J. Syst. Sci., 9. 743, 1978.
37. Nloden, Y ., Yam ada, M ., and Arimoto, S., Fast algorithm for identification of an ARX mode! and its
order determination. IEEE Trans. Acoust. Speech Signal Process., 30, 390. 1982.
38. Haimi C ohen, R. and Cohen, A., A rapid gradient search algorithm for computing partial SVD and
principal component decomposition'', to be published in IEEE Trans. Pattern Anal. Mach. Intelligence, in
press.
39. M akhoul, J ., Stable and efficient lattice methods for linear prediction IEEE Trans. Acoust. Speed: Signal
Process.. 25. 423, 1977.
40. Friedlander, B ., Instrumental variable methods for ARMA spectral estimation, IEEE Trans. Acoust. Speech
Signal Process. 31, 404. 1983.
41. WHIsky, A. S ., A survey o f design methods for failure detection in dynamic systems. Automatica. 12,
601, 1976.
42. W illsky, A. S. and Jones, H. L., A generalized likelihood ratio approach to the detection and estimation
of jumps in linear systems, IEEE Trans. Autom. Control, 21, 108, 1976.
43. Rao, T. S ., A cumulative sum test for detecting change in time series. Int. J. Control, 34. 285. 1981.
44. V asseur, C. P. A ., Rajagopaian, C. V., Couvreur, M ., Toulottes, J. M ., and Dubois, O .. A micro
processor oriented segmentation technique: an efficient tool for electrophysiological signal analysis. IEEE
Trans. Instrum. M eas., 28, 259, 1979.
45. B odenstein, G . and Praetorius, H. M ., Feature extraction from electroencephalogram by adaptive seg
mentation. Proc. IEEE, 65, 642, 1977.
46. Sanderson, A. C ., Segen, J ., and Richey, E ., Hierarchical modeling of EEG signals, IEEE Trans. Pattern
Anal. M ach. Intelligence, 2, 405, 1980.
47. Jansen, B. H ., Hasman, A ., and Lenten, R., Piecewise analysis of EEG using AR modeling and clustering.
Comput. Biomed. Res., 14, 168, 198],
48. M ichael, D . and Houchin, J ., Automatic EEG analysis: a segmentation procedure based on autocorrelation
function, Eleciroencephalogr. Clin. Xeurophysiol.. 46, 232, 1979.
49. Barlow, J. S ., Creutzfeldt, O. D .. Michael, D ., Honchin, J ., and Epellbaum, H ., Automatic adaptive
segmentation o f clinical EEGs, Electroeneephalogr. Clin. Neurophysiol., 51, 512, 1981.
50. Bodenstein, G . and Schneider, W ., P-*"-m r^-i^nition of clinical, electroencephalograms. Proc. Int. Conf.
Dig. Signal Process., Florence Italy. 19X1, 206.
51. Praetorius, H. M ., Bodenstein, (I., and Creutzfeldt, O. D., Adaptive segmentation of EEG records: a
new approach to automatic EEG anaUsis, Eleciroencephalogr. Clin. Neurophysiol.. 42, 84, 197".
108 Biomedical Signal Processing
52. Appel, U. and Brandt, A. V ., Adaptive sequential segmentation of piecewise stationary series, Inf. Sci.
N.Y., 29, 27, 1983.
53. Sclove, S. L ., Time series segmentation: a model and a method. Inf Sci. N .Y ., 29, 7, 1983-
54. Segen, J. and Sanderson, A. C ., Detecting changes in time-series, IEEE Trans. Inf. Theory, 26, 249,
1980. |
55. Basseville, M . and Benveniste, A ., Sequential segmentation of nonstationary digital signals using spectral
analysis, Inf. Sci. N.Y., 29, 57, 1983.
56. Appel, U. and Brandt, A. V ., A comparative study of three sequential time series segmentation algorithms.
Signal Process., 6, 45, 1984.
57. De M ori, R ., Computer Models o f Speech Using Fuzzy Algorithms, Plenum Press, New York, 1983.
Volume I: Time and Frequency Domains Analysis 109
Chapter 8
I. INTRODUCTION
Biomedical signals most often are the result of processes that take place in the time
domain. The analysis of such signals, however, may be more convenient and effective in
the frequency domain. This may be the case both in the deterministic and in the stochastic
cases. In most cases, power spectral density function (PSD), known also as “ the spectrum ” ,
is of interest (see Chapter 6).
Spectral analysis of the EEG1 has been used both for clinical and research purposes.
Detailed spectral analysis can be applied to the auLuiuaiic classification of sleep states- and
of the depth o f anaesthesia3 as well as to the classification of a variety of neurological
disorders. Spectral analysis of EJvIG signals serves also in the clinic and in research work.
It has been shown, for example, that muscle fatigue4 can be characterized and. to some
extent, predicted by processing the EMG spectrum. Power spectral analysis of speech signals
has been used to assist in the diagnosis of laryngical disorders.5 The analysis of hand tremors,
pressure and flow waveforms are but a few examples of the application of PSD processing
in biomedical signals.
The exact PSD function cannot, in general, be calculated. The given signal is time limited,
nonstationary. and corrupted by noise. It is necessary, therefore, to estimate the PSD from
the given, short data record. Earlier methods of PSD estimation were based on the estimation
of die Fourier transform. An important step in modem spectral analysis was Wiener's work'1
which established the theoretical framework for the treatment of stochastic processes. Wiener
and, independently. Khinchin7 have shown the Fourier transform relationship between the
autocorrelation function (of a stationary process) and its PSD. This is often referred to as
the Wiener-Khinchin relationship.
Prior to the introduction of the FFT algorithm in 1965. the accepted method for otim ating
the PSD was based on the implementation of the Wiener-Khinchin relationship, suggested
by Tukey and Blackman." According to this method, the discrete autocorrelation coefficients
are first estimated using the sequence of windowed data. The windowed correlation is then
Fourier transformed to provide the estimated PSD. The Blackman-Tukey estimation pro
cedure suffers from poor resolution, high cost of computation, and poor accuracy mainly
due to sidelobes of the window which may even produce negative estimates for the positive
PSD.
Since the introduction of the FFT algorithm, a lot of progress has been made in spectral
estimation. Methods based on the FFT and on time series analysis have been extensively
researched and reported in books914 and articles.15'25 Attempts to provide a uniform approach
to the various estimation methods or to categorize them have been reported.22 23 31
This chapter describes several methods of PSD estimation. The Blackman-Tukey procedure
is briefly discussed followed by the more modern approach of the periodograms. Time series
analysis techniques based on AR (and the maximum entropy method), MA, and ARMA
models are presented in more details. More specialized cases of the Pisarenko harmonic
decomposition (PHD) and Prony’s methods are discussed as well as the maximum likelihood
method (MLM). 1
Finally, a comparison of the various techniques is made with the goal of providing some
help in choosing the right method for a given application.
110 Biomedical Signal Processing
A. Introduction
We shall consider two approaches for PSD estimation based on the Fourier transform.
The first is the direct method suggested by Tukey and Biackman8 which uses the discrete
form o f the Wiener-Khinchin relation. This method requires first the estimation of correlation
coefficients and second the application of DFT (to the correlation sequence) to get the desired
PSD estimation. The second approach is the indirect approach known as the periodogram.
Here the estimation is achieved by applying the DFT operator directly to the (windowed)
data and then smoothing or averaging the absolute values of the DFT.
In general, these two methods do not yield identical results. However, if a certain biased
estimator is used for the correlation estimation, and as many correlation coefficients as data
samples are used, then the two methods do yield identical results.26
One major problem 7. transform PSD estimation methods is due to finite time
data sequence used. The PSD estimated is not that of the process, but that of a sample
function multiplied by a window. In the frequency domain, this yields the PSD of the process
(the desired function) convoluted with Fourier transform of the window. When the power
of the signal is concentrated in a narrow bandwidth, the convolution operation will “ spread”
the power into adjacent frequency regions, a phenomenon known as leakage. The leakage
causes both lack of resolution and inaccuracies. The power due to weak sinusoidal com
ponents in the signal can be completely masked by the sidelobes of adjacent stronger
sinusoides. For a rectangular window, for example, the resolution (the 3 db of main lobe
width) is approximately15 the inverse of the observation time (NAT). Better weighting
windows are available-7-2* (see also Appendix B, Volume II).
One way to reduce the finite observation time problem is to use extrapolation techniques
to estimate the data outside the observation window.85 Another problem arises when the
number of data samples is small. A common procedure, known as zero padding. is often
used. Zeroes are added to the given data sequence (autocorrelation in the case of Blackman-
Tukey algorithm or actual signal in the case of the periodogram) before applying the DFT.
The result is the estimation of the PSD with additional interpolated values within the given
frequency r^.nge. The basic resolution of the estimation is not improved by the zero-padding
technique; the results are, however, much smoother and in some cases with less ambiguities
in peak spectral determination.
Consider a data sequence { x j. k = 0, 1, ..., P - 1. Its DFT is given by
p i ••
Xni = At xk exp(—j27T m k/P)
k 0
m = 0 ,1 ,...P - 1 (8.1)
where At is the sampling interval. The estimation range in the frequency domain is given
by:
If w;e now pad the data with L zeroes, we get a “ new” data sequence {xk}, k = 0, 1. ....
P - 1 + L with
xk — 0: k = P, P + 1,..., P - 1 + L (8.3)
p - i
m = 0,1,....P - 1 + L (8.4)
The DFT (Equation 8.4) of the padded sequence has the same frequency region (Equation
8.2) as that of the nonpadded signal (Equation 8.1), but with P + L spectral points rather
than P spectral points. If we choose L = qP where q is an integer, then Equations 8.1 and
8.4 are equal for all frequency samples m = (1 + q)m. An efficient FFT algorithm for
zero-padded sequences is available. ^
M
S(w) = At ^ rx(m) exp( - jwmAt) (8.6)
m= -M
where At is the sampling inierval and ?x(m) m = - M , ....... M are the discrete estimates
of the correlation function.
The correlation coefficients are estimated from the sampled data sequence { x j, k = 0,
1, ... N - 1. Rather than using the unbiased estimator
. N —m - 1
J N -m -l
f,(m) = - 2 xr.-mx„
n= 0
which is the true autocorrelation function weighted by a triangular weighting window (Bartlett
window). Estimator 8.8 tends to have lower mean square error than the unbiased one for
many finite data sequences. Note that this type of estimator has been used for the estimation
of the AR coefficients (see Equation 7.27).
The Blackman-Tukey8 PSD estimation is given by using the estimates of Equation 8.8 in
Equation 8.6. Equation 8.6 can be solved by means of the FFT. The sequence rx(m) used
in Equa on 8.6 can be zero-padded as discussed in the previous section.
Figure 1 shows the Blackman-Tukey estimated PSD of synthesized sine waves corrupted
with PRBN. Three sine waves were used and the estimation was performed with various
112 Biomedical Signal Processing
FIGURE I. Power spectral density function estimation by the Blackman-Tukey method. Synthe
sized signal consisting of tVee sinusoidals with additive white noise. Upper trace: 3?. correlation
coefficients and 480 padding zeroes. Middle trace: 256 correlation coefficients and 416 padding
z.eroe . Lower trace: 256 correlation coefficients and 256 padding zeroes.
numbers of data points, observation times, and zero padding. Figure 2 shows the estimation
of an EMG signal by the Blackman-Tukey method.
C . The P eriodogram
/ . Introduction
The periodogram is a method for PSD estimation using the data sequence without the
need to first estimate the correlation coefficient.
It can be shown9 that the Wiener-Khinchin relation (Equation 8.5) can be rewritten as:
Implementation of Equation 8.10 is impractical since it requires both infinite time integration
and statistical expectation. The periodogram is thus an estimator of Equation 8.10, which
can be practically implemented.
Consider the wide sense stationary process x(t) with the windowed data sequence {xk}
such that
Volume I: Time and Frequency Domains Analysis 113
50 100 Ka
FREQ.
FIGURE 2 Power spcctr;;' density function estimation by the Blackman-Tukey method. SiuLice
EMG recoidcd o \c r the respiratory diaphragmatic muscle, sampled at 400 Hz. Traces are same as
in Figure i .
0; otherwise
( 8 . 1 !)
where w(k) is the window function and N is the number of samples in the sequence, use
the following estim ator for the autocorrelation:
Note that since we have defined an infinite sequence in Equation 8.11 we can talk about
infinite dimensions for the correlation function estimates. Introducing the estimates o1 Equa
tion 8.12 into the PSD estimation Equation 8.6 yields
(8.15)
where X(w) is the DFT of xk given by Equation 8.1 and an asterisk (*) denotes the conjugate.
Note that |X(w)|2 is the energy distribution function. The division by At was required in
order to get the PSD.
The major advantage of the periodogram method is the fact that one can use the efficient
FFT algorithm to compute the DFT. When calculating the square absolute value of the DFT
of the sequence { x j by means of the FFT we get:
n- 1
where XF is the FFT result which is scaled differently than the quantity calculated from
Equation 8.1. When using the FFT, a scaling factor must be used such that
The periodogram (Equation 8.17), though an efficient estimator from calculations point
of view, has been shown to have large variance. Example of periodograms of white noise
process in which the variance does not decrease even when longer sequences are used have
been demonstrated:26 The large variance comes as no surprise since in the estimation 8.15
the expectation operator present in Equation 8.10 has been ignored.
The expectation and variance of the periodogram will be discussed followed by several
methods of smoothing and averaging.
The expected value of the correlation depends on the correlation estimation, namely, the
window used in Equation 8.11. Hence the expectation of the correlation
where rx(m) are the true process correlation coefficients and rvv(m) are the correlation coef
ficients of the window. Introducing Equation 8.19 into Equation 8.18 we get
Volume I: Time and Frequency Domains Analysis 115
The periodogram is thus a biased estimator, with bias resulting from the window rw(m).
In the windows used, the bias vanishes as N approaches infinity.
When a rectangular window is used,
I
I; 0 ss k *= N - 1
0; otherwise
( 8 . 21)
!m . .
|m| ^ N
r„(m) (8 .22 )
0; otherwise
Note that Equation 8.20 gives the expected value of the PSD estimator in terms of the
DFT of the multiplication of the two sequences rx(m) and r jm ) . Applying the complex
convolution theorem we get
At f vSl
E{S(w)} = — S( t])R w( w - iq)dTi (8.24)
277 J- - ±-
where S(w) is the true PSD and Ru(w) is the DFT of rw(m). Equation 8.24 states that the
expected value of the periodogram is the true PSD “ viewed” through the filter R,v(w).
Consider, for example, the use of a rectangular data window (Equation S.21). The cor
relation window is then given by Equation 8.22. and Rw(w) is
1 / Sin W N/2\ -
R*(w, = N ( l i 7 w i r ) (8'25)
figure 3 shows the data window, the correlation window, lag window in the time and in
the frequency domain, and a schematic description of the frequency convolution (Equation
8.24). Here the process under investigation, whose true PSD is S(w), was chosen as a
random process with strong spectral peak at w = w ,. Figure 3D depicts the layout for the
estimation of S(wu) by Equation 8.24. The expected value of the estimation at this frequency
is given by the area under the multiplication of the two curves. In this particular example,
the estimation of the PSD at w = w() may be largely due to the effect of the first sidelobe
of R„ which coincides with the peak of S(w). Hence, the leakage due to the peak and first
sidelobe will cause S(w(); to be large even though S(wu) is very small.
116 Biomedical Signal Processing
<i>(k)
N k
FIGURE 3. Power spectra! density estimation with rectangular data window. (A) Uata window; (B) correlation
(lag) window in time domain; (C) correlation window in frequency domain; (D) frequency convolution.
The main sidelobe for this window has the width of 4n/NAt. As N increases, the window
becomes narrower reducing the leakage. At the limit, as N approaches infinity, the window
becomes a delta function having no leakage at all. A detailed discussion on the various
windows used for signal analysis is presented in the appendix.
in L W w ¥ .
FIGURE - Power spectral density function estimation by means of the peiiodoiirum. Synthe
sized nois\ '.r.-so id als as in Figure I. Upper trace: 512 samples and 512 padding zeroes. Lower
trace: 102- ^ " ip le s . no padding zeroes.
If we define the standard deviation of the periodogram as the measure for noise, then the
signal-to-noise ratio for the periodogram of a gaussian process with an infinitely large data
window is only 1. Nongaussian processes yield approximately the same results.
The main reason for the large variance of the periodogram is the fact that the estimator
(Equation 8.15) does not take into account averaging as indicated by the expectation operation
in Equation 8.10.
A detailed discussion on the variance of the periodogram and on confidence limits of the
PSD estimation is given by Koopmans.9
An estimation o f the PSD with periodograms is demonstrated in Figures 4 and 5. The
same signals used for the Blackman-Tukey PSD estimation (Figures 1 and 2) are used here.
i = 1.2......1 (8.27)
118 Biomedical Signal Processing
FIGURE 5. Power spectral density function estimation by means of the periodogram. Surface
EMG as in Figure 2. Traces as in Figure 4.
1.5 SIC.
V A /1
FIGURE 6. EMG signal segmented with 25% overlapping for WOSA spectral estimation. Upper trace: EMG.
Lower trace: hamming windowed overlapping segments.
Each two adjacent segments overlap with D samples. The I segments covcr the given data
sequence { x j such that the last sample of the I’s segment obeys L + (I - 1)D = N. An
EMG signal, such segmented, is described in Figure 6.
W e shall now calculate the periodograms of the I overlapped segments, each multiplied
by a data window w(k). Denote the normalized periodogram of the ith segment by S'(w),
hence:
i = 1 ,2 ,...,I ( 8 .2 8 )
Volume I: Time and Frequency Domains Analysis 119
(8.29)
The expectation of the estimator is similar in nature to the one calculated for the periodogram
(Equation 8.24). The variance, however, is improved.
Define the normalized covariance, p(j), between two normalized periodograms such that
(8.32)
Note that for D > L such that the covariance (Equation 8.31) approaches zero we get p(i)
= 0 for all i > 0 . Hence the improvement in the variance (over a periodogram with L
samples) is I times. Nonoverlapping segmentation, therefore, should be employed if N is
large enough. If the total number of samples, N, is not large, it is recommended30 to overlap
the segments by one half of their length (D = L/2).
Several attempts have been made recently 22-23.31 to provide a general framework for PSD
estimation. It has been argued that both the Blackman-Tukey and the WOSA estimators are
special cases o f a general estimator. Figures 7 and 8 show an example of the PSD estimation
by means o f WOSA. The reader is referred to Figures 1 ,2 ,4 , and 5 for comparison.
(8.33)
|w — tj| < B
0; otherwise
V
FR EQ .
FIGURE 7A. Power spectral density function estimation by means of WOSA with 25% overlapping.
Synthesized sinusoidals signal as in Figure 1. Averaging over one segment o f 1024 samples.
FREQ.
1 f w' + B A
<s(Wj) > = — scn)dn (8.
2B Jnv. - b
FIGURE SA. Power spectral density function estimation by means of WOSA with 25% overlapping
EMG signal as in Figure 2. Averaging over one segment of 1024 samples.
FREQ .
(8.
S' " ' • -2 1 A ' , -
f tr/At
E{<S(w) > } = E{S(t))} H(w - -n)dti (8.37)
• J —n/At
where the expectation term in the integrant is given by Equation 8.20. For very large N we
get
f rr/A
- t
Hence, the smoothing operation has introduced a bias, even for large N.
Triangular smoothing window has often been used. This window results in less bias than
the rectangular one since it gives less weight to remote periodogram samples. However, the
variance due to the triangular is larger than achieved by the rectangular one.
III. M A X IM U M E N T R O P Y M E T H O D (M EM ) A N D T H E A R M E T H O D
The MEM is also known as the Maximum Entropy Spectral Estimation (MESE)15
and the Maximum Entropy Spectrum Analysis, MESA.33 The method was originally pro
posed by Burg34 for geophysical applications and was since dealt with by many research
ers.*"''3237 Several special problems have been investigated.38'40 More recently, attention has
been placed on the application of the method to multidimensional PSD estimation.41 43
Modifications and extensions to the original method have also been suggested.44'45
In the one-dimensional case, w'hen consecutive correlation values are available, the MEM
method is identical to the AR PSD estimation method.46 These two methods will be discussed
jointly in this section.
The MEM PSD estimation can be posed as follows: Given p + 1 consecutive estimates
of the correlation coefficients o f the process {x(t)}. rx(i), i = 0, 1, ..., p, estimate the PSD
of the process. Clearly what is needed for the estimation are the unknown correlation
coefficients rx(i); i > p. Burg34 has suggested these autocorrelation coefficients be extrap
olated so that the time series characterized by the correlation has maximum entropy. Out of
all time series having the p + 1 given first autocorrelation coefficient, the time series that
yields maximum entiopy will be the most random one or, in other words, the estimated
PSD will be the flattest among all PSDs having the given p +• 1 correlation coefficients.
The entropy is a measure of the amount of information (or “ ignorance” ) we have on a
process. Consider a discrete process with J states each having the probability of occurrence
Pi j = 1, 2, J. Assume that initially no knowledge is available on the probabilities pj.
The information we posses, on the system, is limited. If now we are given the value of a
certain pi9 we have gained a certain amount of information on the system or our state of
ignorance has been reduced. The amount of information added, AI, is defined (in “ bits” )
by
AI = log, - (8.39)
'P i
The average information per time interval, H, is known as the entropy, and is given from
Equation 8.39 by
In the deterministic case, where we know that event i has happened with probability one,
Volume I: Time and Frequency Domains Analysis 123
all other p}’s arc zero and the entropy is zero. In the random case, the entropy is always
positive. The entropy is thus a measure for the randomness of the system.
In the continuous case the relative entropy is given by
where p(s) is the amplitude distribution of the signal s(t). We would like to write down a
relation between the estimated spectrum of the signal S(w) and the entropy. Assume the
signal s(t) was generated by passing a white signal through a linear filter having a transfer
function S(w). It can be shown47 that the difference in entropies, AH, between that of the
signal s(t) and the input is
Since there are an infinite number of signals with white spectrum, the exact input is
unknown. However, we know that we want to maximize the entropy (Equation 8.42) subject
to the constraints that
r(n) = | S(w)exp(j2IlwAtn)dw
n = 0 .1 ......P (8.43)
This constraint maximization will ensure that the estimated spectrum is the spectrum of a
process having the flatness spectrum of all processes with the given P + 1 correlation
coefficients.
The maximization of Equation 8.42 with the constraints (Equation 8.43) can be solved
using the Lagrange multiplier technique. The result is found to be
criAt
S(w) = ---------------- e------------------ (8.44)
jl + 2 ai e x p (-j2 iriw )p
i= I
where a p and a* i = 1, ..., P are determined from the data. It has been sho\vn4S'49 that the
MEM PSD estimation (Equation 8.44) is identical with the estimation of PSD of an AR
m odel.40
Recall that the AR model (Equation 7.13) presents a random sampled signal xk as the
output of a filter H(z) = A ~'(z) with white sequence as an input. The estimated PSD of
xk is thus
where G 2 equals the mean square error, Ep, and is given in terms of the correlation coefficients
by Equation 7.35. The estimated AR coefficients a* are given by Equation 7.30 and the
order, p, by one of the methods discussed in Chapter 7.
Equation 8.45 can be rewritten" as
124 Biomedical Signal Processing
p
r(0) + 2 a,r(i)
____________ i = I
§(w) = (8.46)
p(0) + 2 p(i)cos(iw) |
with
p -i
p(i) = 2 . M t +i. a» = 1
k=0
i ■= 0 ,1 ,...,p (8.47)
Note that in the Blackman-Tukey PSD estimator (Equation 8.6) the power spectrum is
estimated by means of the p -I- 1 first correlation coefficients, calculated from the data
while the correlation coefficients r(i), i > p are assumed to be zero. In the MEM and AR
PSD estimation, the first p + 1 correlation coefficients are calculated from the data and are
identical to the ones used in the Blackman-Tukey algorithm. The coefficients r(i), i > p are
not assumed zero but are calculated by the maximization of Equation 8.42 with the constraint
(Equation 8.43). This yields the same result as the AR model where
p
r(i) = - J a k r ( i - k), i> p (8.48)
k= 1
The statistical properties of the MEM and AR spectral estimators have been investigated
by several authors.50'33 It has been demonstrated that the variance of the estimators is inversely
proportional to the data length.
The MEM estimator has been experimentally compared54-55 with the WOSA estimator
(Equation 8.30). The MEM estimator has been found to have superior frequency resolution
capability, especially for short records and to have larger dynamic range. Hung and Herring,55
however, report that when considering the detection of sinusoid in additive white noise, the
DFT based detector consistently provided higher signal detection probability and a more
accurate estimate of signal frequency than the MEM. Dyson and Rao33 have concluded that
the MEM methods show promise of achieving the detection performance of long observation
interval DFT analysis, at a reduced observation time, for useful SNR range.
The MEM PSD estimator is known to yield incorrect results for sinusoidal signals in
additive white noise, a phenonmenon sometimes called line splitting?9 54 56 Line splitting is
the occurrence of two or more closely spaced peaks in the estimated PSD, where only tme
should be present. Line splitting is most likely to occur when the SNR is high, the initial
phase is some odd multiple of 45°, the time duration is such that sine components have an
odd number of quarter cycles, and the number of AR coefficients (the order of the model)
is a large percentage of the number of data samples.
The correct model (with white noise as an input) for describing an N-pole complex
sinusoidal signal in additive white noise is an N-pole and N-zero system with equal gain
weights for its pole and zero parts. When an AR model is forced to describe such a signal,
infinite number of poles are required. The use of finite numbers of poles is the source for
the line splitting and line shifting inaccuracies. Several methods have been suggested to
overcome the line splitting problem.56'58
A second problem associated with the MEM method u*ui of the bias m the positioning
o f the spectral peaks with respect to the true frequency of the peaks. This shift is sometimes
known as the frequency estimation bias. Swingler60 has shown that this bias can be of the
Volume I: Time and Frequency Domains Analysis 125
FIGURE 9. Power spectral density function estimation by the MEM method. Synthesized sinu-
soidals in Figure I. Upper trace: AR model order is 10. Lower trace: AR model order is -*0.
Yet another problem exists when estimating the PSD of sinusoids in noise with MEM.
it has been shown59 that the peak amplitudes in the MEM are not linearly proportional to
the power. In high SNR the peak is proportional to the square of the power.
Recently, many modifications and improvements for the MEM have been suggested for
AR and multivariate AR spectral estimation (see, for example, References 61 through 63).
Experimental results with MEM PSD estimation are shown in Figures 9 and 10.
The process, its PSD is to be estimated, can be modeled by an MA(q) model (Equation
7.15). Its estim ated PSD is given by
where Sn(w) is the PSD of the input white noise. The coefficients b(, j = 1, 2, .... q of the
MA model can be estimated as discussed in Chapter 7, Section III. In terms of these
coefficients, the estimator becomes:
q q q
S(w) = | 2 ) bj e x p (-jw iA t)|2 = ^ ^ bmbn e x p (-j(rn - n)wAt) (8.50)
i= 0 n= 0 m= 0
order of 16% o f the frequency resolution (l/NAt). Methods for overcoming this problem*
were also suggested.58
126 Biomedical Signal Processing
FIGURE 10. Power spectral density function estimation by the MEM method. EMG signal as in
Figure 2. Traces are as in Figure 9.
with
q
r(n) = X 6kbk_n; -q n ^ q
k= 0
and
It can be easily shown that the quantities r(n) in Equation 8.51 are the autocorrelation
coefficients of the MA model.
Hence, the MA spectral estimator (Equation 8.51) is the same as the Blackman-Tukey
estimator (Equation 8.6). The Blackman-Tukey spectral estimator can be thus considered a
special case o f the MA spectral estimator. The MA estimator will be effective when the
spectra to be estimated contain sharply defined notches and do not contain sharply defined
peaks.
had been suggested for ARMA (p,q) spectral estimation (e.g., References 64 to 70). The
statistical properties of the ARMA (p,q) estimator have been investigated7173 and some
preprocessing techniques for the improvement of estimation have been suggested.74 Cadzow,
in a comprehensive paper,75 has presented various methods for AR, MA, and ARMA power
spectral estimation.
Once the order p,q and the coefficients of the ARMA model had been estimated, the PSD
estimate is obtained by
q
crjAt| 1 + 21 f>jexp(-jwiAt)p
S(w)' = |H (ex p (-jw A t)|2 Sn(w) = ------------ ^ ----------------------- (8.52)
|l + 21 asexp( —jwiAt)|-
i= I
where H(jw) is given in Equation 7.44 and i>n(w) is the PSD function of the input white
noise having a variance a 2. The order of the ARMA model p,q and its estimation are
discussed in Chapter 7, Section IV.
Cadzow75 has shown that when the ARMA coefficients are evaluated by the overdeter
mined rational model equation approach, the resultant PSD estimate is less sensitive to the
coefficients estimates. Cadzow’s approach calls for the determination of the coefficients
from an overdetermined set of the Yule-Walker equation (see Chapter 7, Section II), for
example by means of the singular valued decomposition (SVD) technique (for SVD analysis,
see Chapter 3, Volume II).
p-
Xn == 21 Aj sin(WjnAt + 4\) -f nn (8.53)
i= i
where A ; and (I), i = 1, ..., p/2 are the amplitudes and phases of the sinusoids and {nj is
a sequence from the white noise process having zero mean and a variance of a 2. The noise
is uncorrelated with the sinusoids.
Using the trigonometric identity:
Hence, the samples of a determinstic sinusoid can be described by means of a second order
AR(2) equation.
In general, for the deterministic summation of p/2 sinusoids,15 the resultant difference
equation is an AR(p) equation
128 Biomedical Signal Processing
xn = - X amxn_m (8.56)
m= I
Transferring Equation 8.56 into the Z domain yields the characteristic equation
There are p roots to the characteristic Equation 8.57 arranged in conjugate pairs
k = 1 , 2 , . . . , p/2. The roots are all located on the unit circle where the frequencies wk are
the frequencies of the sinusoids -present m me m - uui.
Returning now to the noisy case (Equation 8.53)., we get
Xn — > Xn + °n = ~ 2 a mX n - m + (8 .5 8 )
ni —I
p p
V Xn = — 2 a mX n - m + 2 a mn n - m (8 .5 9 )
m= I m. = 0
Equation 8.59 states that the signal represented by Equation 8.53 is a special ARMA(p,p)
process. In this process, the AR(p) and MA(p) coefficients are identical. Due to this property,
the identification of this special ARMA(p,p) process is less complicated than the general
case. Techniques, simpler than the ones discussed in Chapter 7, Section IV can be applied;
one method is presented here. *
The ARMA (p,p) Equation 8.59 can be written in a matrix form. Define
xT a = nT a (8.61)
Premultiplying both sides of Equation 8.61 by x and taking the expectation we get
noting that (because of the assumptions made on the noise) the cross correlation between
the noisy observation x and the noise n is
r*(p), ,rx(0)
Hence, we get
Equation 8.65 states that the coefficient vector a is an eigenvector of the correlation matrix
Rx and the noise variance is the corresponding eigenvalue. The eigenvector must be scaled
such that its first component equals one.
It can be show n15 that the eigenvalue a ; is the minimum eigenvalue of the correlation
matrix with the correct dimension (p + 1) (p + 1). In the overdetermined case, where the
correlation matrix is generated by more than (p + 1) lags, the minimum eigenvalue is
repeated.
The autocorrelation of the signal xn given by Equation 8.53 is
J p/2
Assuming the frequencies w> i = 1 ,2 ....... p/2 and the correlation coefficients rv(k) k =
0. I, ... are known, the sinusoids power Af/2 can be determined. Define the power vector
The power coefficients are given by introducing Equations 8.67, 8.68, and 8.69 into Equation
8.66B
£ = c 'rx (8.70)
1, Estimate the order, p, of the model, which is twice the number of sinusoids present
in the si anal;
130 Biomedical Signal Processing
2. Estimate, for the data, p + 1 terms of the autocorrelation function using the biased
estimator (Equation 7.27);
3. Solve the eigenvector equation (Equation 8.65);
4. Repeat steps 2 and 3 with increasing order until the minimal eigenvalue remains
unchanged;
5. The order, p, the noise variance (minimal eigenvalue) are determined. The vector a
is taken to be the eigenvector corresponding to the minimal eigenvalue;
6. Solve Equation 8.57 to get the roots and the frequencies, wk;
7. Solve Equation 8.70 to get the power of the various sinusoids;
8. Solve Equation 8.66A to get the noise power.
An efficient method for solving the eigenvector equation is discussed in Chapter 3, Volume
II. When a priori knowledge about the signal exists, stating it consists of sinusoids in
additive noise, Pisarenko’s method has the advantage o f pro a estimate with 8
functions. Other methods such as the AR spectral estimator will “ sm ear” the spectrum.
However the order is not known exactly. It may be estimated too high: then spurious
components may be introduced to the PSD estimation; or too low: then the spectral com
ponents will usually appear at incorrect frequencies. Another inaccuracy source of the method
is the fact that the autocorrelation coefficients are estimated by means of the biased estimator.
This is done in order to ensure that the autocorrelation matrix is positive definite. The bias
estimation, however, causes inacurracies in both frequency and power estimation.
The technique had been extended78 to include the case of colored additive noise.
C. P ro n y ’s M ethod
Prony’s method15 79-82 is mainly applied for transients analysis. It has been extended,
however, to provide PSD function estimation.
Assume the sequence xn is the samples of a signal composed of damped (complex) sinusoids
xn = £ Am exp(amnAt)exp(j(wmnAt + 4>m))
n = 0 ,1 ,...,N - 1 (8.71)
Equation 8.71 describes the signal xn as a sum of p sinusoids with frequencies wm, phase
<t>m, and with amplitudes Am, m =• 1 ,2 , ..., p, exponentially decaying, with rate a m ( a m
< 0). For xn to be real, it is required that the roots of the characteristic equation be complex
conjugate pairs of the type exp(j(wmnAt + d>m)) and exp(-j(wmnAt 4- cj>m)). The energy
spectral distribution function of Equation 8.71 is given by
To use Equation 8.72 as “ spectral” estimator, the parameters p, Am, a in, c})m, and wm m
= 1 ,2 , ..., p must be identified. In order to do that, rewrite Equation 8.71 as
*„ = S bmz;„ n = 0,1 - 1
m=1
bm — Am exp(j<j>m)
Zm = exp((am + jw JA t) (8.73)
Volume I: Time and Frequency Domains Analysis 131
The last equation is the homogeneous solution to a constant coefficient linear difference
equation'5
p
*n = - X ainxn„ m
in = 1
X ,„ ( l + 2 = 0 (8.75)
The roots Zk given by the solution of Equation 8.76 are the exponents of Equation 8.71.
In the more practical case, the signal is noisy, hence the observation xn can be described
by the Prony’s model as
xn = xn + nn
n = 0 .1 ...... N - 1 (877)
where n„ is a sequence of white noise with zero mean and variance ctJ. Introducing Equation
8.77 into Equation 8.74 yields
^ p
Xn = - 2 a mXn-m + 2 V n -m (8.78)
The signal is thus modeled as a ARMA (P,P) model with equal coefficients for the AR(p)
and MA(p) parts. Unlike the Pisarenko model, the roots of the characteristic Equation 8.78
are not restricted to the unit circle. The model describes, in general, decaying sinunosoids
rather than pure sinusoids.
The ARMA coefficients of (Equation 8.78) can be solved by the methods discussed in
Chapter 7, Section IV. Note that only the AR (or MA) part has to be identified.
Once the a* have been identified, Equation 8.76 can be solved to provide the roots Zk.
The coefficients required for the estimator Equation 8.71 are computed from the solution
of the set of linear equations (Equation 8.73). Define the model vector
with the observation vector x defined similarly. Define the complex coefficient vector
bT = [b,,b;......,bp] (8.80)
1
Zn
Z =
Z?-
(8.81)
The ‘ et of N equations (8.73) with the unknown vector b can be written in a matrix form
x = Zb (8.82)
Recall that x is the model of the observations x. Our aim is to choose the model parameters
such that the model will best fit (in some sense) the observations. Assume we want a least
squares minimization
Introducing Equation 8.82 into Equation 8.83 and performing the minimization yields the
well-known least squares estimate of b
Am = |bm|
<!>m = tg _I(In.(bm)/Re(bm))
a m = (l/A t)€ n |Z j
In summary, the Prony’s method for energy spectral density estimation is given by the
following steps:
where {nk} is the sequence of “ noise” appearing due to the leakage. It is our desire to design
the MA(N) filter in such a way that the signal Aexp(jwkAt) will pass with no distortion,
while the second term in Equation 8.86 be minimized.
Following the arguments presented in Chapter 7. we “ predict" the kth value of the output
of the MA filter such that
= 2 b„ (8.87)
where Rn is the noise autocorrelation matrix and b r = [b„, b,, ..., bN_,].
Note that we want the sinusoidal component to pass the filter undistorted; hence, for the
noise-free case we want
134 Biomedical Signal Processing
N- 1
Dividing both sides of Equation 8.90 by the term on the left, we get the constraint
1 = (e*)Tb (8.91A)
It can be shown that the minimization of Equation 8.89 subject to the constraint (Equation
8.91) yields the optimal filter
(8.93)
. Note that Rx has to be inverted only once. The evaluation of the quadratic form (the
denominator of Equation 8.93) for each frequency can be done by means of the FFT
algorithm. It is easily shown that the quadratic form can be written as a weighted sum of
exponentials such that
N- 1
(8.95)
S mLmIW) p m=. ^ ar(w )
In Equation 8.95 both estimators were assumed to have the correct autocorrelation matrix
of order p.
The MLM has lower resolution as compared to the AR estimator. This can be explained
intuitively by Equation 8.95 where it is seen that the high resolution of the pth order AR
estimator is reduced by the “ inverse” averaging with lower order estimators. The MLM,
however, exhibits less variance than the AR estimator.50 This can also intuitively be explained
by Equation 8.95.
V II. D IS C U S S IO N A N D C O M P A R IS O N O F S E V E R A L M E T H O D S
Only the most commonly used techniques have been presented in this chapter. Some
methods, such as the Walsh spectral estimator89 <X) and other specific methods, hr^e not been
Volume 1: Time and Frequency Domains Analysis 135
presented either because of lack of space or due to the fact that they have not been used in
biomedical applications. For the same reasons, many interesting approaches85 86 for PSD
estimation have not been discussed.
An important topic not discussed in this chapter, but that must be taken into account when
estimating the PSD, is that of preprocessing the signal. Care must be taken to low-pass filter
the data before sampling so that (atmost) band-limited spectrum is to be processed and the
sampling rate should be chosen such that no aliasing problems arise. Prewhitening filters
are often used (especially when employing an A R estimator). Prewhitening filters have been
reported to improve AR estimator results and reduce the window bias in the Blackman-
Tukey method. A more general preprocessing filter was suggested by Lagunas-Hemandez
et al.74 T hc\ have reported that the use of their preprocessing method caused the reduction
of estimator order and computation complexity. Preprocessing filters, however, requires
some a priori knowledge about the signal.
Several adaptive and recursive methods for PSD estirnat;,' n suggested; most
adapt the coefficients of the appropriate filter (e.g., References 75, 87— 88). Adaptive
estimation is important when the signal under test is nonstationary as is most often the case
with biomedical signals. The most commonly used method to overcome the nonstationarity
problem is that of segmenting the signal. In this approach, the signal is segmented into
“ alm ost" stationary segments. The PSD function of each segment is then estimated by
means of nonadaptive methods. Segmentation may be done a priori into predetermined
length segments, as is commonly done in speech processing, or by an adaptive algorithm
as is done, for example, in BEG processing (see Chapter 7).
It is sometimes required to estimate the PSD function with unequal resolution, when
certain regions in the frequency are of specific interest. Methods based both on the FFT92
and on AR m odels15 have been developed.
The various methods for PSD function estimation discussed here, and others, differ from
one another in the assumptions made on the process under test, in their statistical charac
teristics, and computational complexity. It is therefore not expected that one can define a
“ universal" criterion by means of which the various methods can be graded. The best
method to use heavily depends on the application, the type of signal, the computation facility
or accuracy, and time constraints. It is obvious fhat the estimation method to choose when
the computation facility consists of a mini- or microcomputer may be different than the one
chosen if a “ number crunching" machine (e.g., array processor) is available.
Table 1 presents, in a concise manner, some of the main characteristics of the various
methods discussed in the chapter. This table may thus serve as a guideline for choosing an
appropriate estimation method for a given problem (see Kay and Marple15 for more details).
136 Biomedical Signal Processing
Volume I: Time and Frequency Domains Analysis 137
REFERENCES
1. Boston, J . R ., Spectra of auditory brainstem responses and spontaneous EEG, IEEE Trans. Biomed. Eng.,
28, 344, 1981. •
2. W illiam s, R . L ., Karacan, I., and Hursch, C. Y ., EEG o f Human Sleep: Clinical Applications, John
Wiley & Sons, New York, 1974.
3. Butter, L. A ., A real time software system on the PDP-11 for two channel EEG spectral analysis during
surgery', Comput. Programs Biomed., 6, I. 1976.
4. G ross, D ., G rassino, A ., Ross, W. R. D., and Macklem, P. T ., Electromyogram pattern o f diaphragmatic
fatigue, J. Appl. Physiol.. 46, 1, 1979.
5. M ezzalam a, M ., Prinetto, P., and Morra, B., Experiments in automatic classification o f laryngeal
pathology. M ed. Biol. Eng Comput., 21. 603. 1983.
6. W iener, N ., Generalized harmonic analysis. Acta M ath.. 55. 117, 1930.
7. K hinchin, A ., Y a., Korrelationstheone der Stationaren Stochastischen Prozesse, Math. Annalen. 109, 604,
1934.
8. Tukey. J. W. and Blackman, R. B., The Measurement o f Power Spectra From the Point o f View o f
Communications Engineering, Dover, New York. 1959.
9. K oopm ans, L. H ., The Spectral Analysis o f Time Series, Academic Press, New York, 1974.
10. Schw artz, M. and Shaw, L ., Signal Processing Discrete Spectral Analysis Detection an d Estimation,
McGraw-Hill, New York. 1975.
11. C hilders, D. G ., Ed., Modern Spectrum Analysis, IEEE Press, New York, 1978.
12. Box, G . E. P. and Jenkins, G. M ., Time Series Analysis: Forecasting and Control, Holden-Day, San
Francisco. 1970.
13. H aykin. S. S ., E d ., Nonlinear Methods o f Spectral Analysis, Springer-Verlag, New York. 1979.
14. Jenking, G. M, and Watts, D. G ., Spectral Analysis and its Applications, Holden-Day. San Francisco.
1968.
15. Kay, S. M. and M arple, S. I,., Spectrum anahsis — a modern perspective. Proc. IEEE, 69. 1380. 1981.
16. Spectra: estimation, special issue. Proc. IhEE. “0(9». i 982.
17. Robinson, E. A ., A historic perspective of spectrum estimation. Proc. IEEE. 70(9). 885. I9S2.
18. Friedlander, B .. Lattice methods for spectra! e'lhnation. Proc. IEEE, 70(9.). 990, 1982.
19. M cClellan, J. H ., Multidimensional spectral ^timation. Proc. IEEE, 70(9), 1029, 1982.
20. Papoulis,' A ., Maximum entropy and spectral estimation: a review, IEEE Trans. Acoust. Speech Signal
Process.. 29, 1176. 1981.
21. Roberts. J. B. G .. Moult*. G. L.. and Parry. G., Design and application of real time spectrum analyser
system. I EE Proc.. 127(2). 70. 1980.
22. C arter. G. C. and MuttaS, A. H ., Analyst .i; 4 generalized framework for spectral estimation. 1EE Proc..
130(3). 239. 1983.
23. Allen. J. B. and Rabiner, L. R., A unified approach to short time Fourier analysis and synthesis. Proc.
IEEE. 65(11). 1558. 1977.
24. Linkens. D. A ., Short time series spectral anahsis of biomedical data, I EE Proc., 129(9). 663. 1982.
25. M akhoul, J ., Spectral linear prediction: properties and applications. IEEE Trans. Acoust. Speech Signal
Process.. 23, 283. 1975.
26. O tens, R. K. and Enochson, L., Digital Time Series Analysis. John Wiley & Sons. New York. 1972.
27. H arris. F. J ., On the use of windows for harmonic analysis with DFT. Proc. IEEE, 66. 51. 1978.
28. W ebster, R. JL, Leakage regulation in the DFT spectrum. Proc. IEEE. 68. 1339, 1980.
29. M arkel. J. D ., FFT pruning. IEF.E Trans. Audio Eiectroacoust., 19, 305, 1971.
30. W elch. P. D ., The use of fast Fourier transform for the estimation of power spectra: a method based on
time averaging over short modified periodograms. IEEE Trans. Audio Eiectroacoust., 15. 70. 1967.
31. Carter. G. C. and Nuttall, A. H ., A brief summary of generalized framework for power spectral estimation.
Signal Process., 2. 387, 1980.
32. M orf, M .. Vleria, A ., Lee, LT. L., and Kailath, T ., Multichannel maximum entropy spectral estimation.
IEEE Trans. Geosci. Electron.. 16, 85. 1978.
33. Dyson, T . and Rao, S. S., Equal observation interval comparison of maximum entropy and weighted
overlapped segment averaging spectrum estimation techniques. (EEE Trans. Acoust. Speech Signal P rocess.,
2 9 ,9 1 9 .1 9 8 1 .
34. Burg, J. P ., Maximum Entropy Spectral Anahsis. Proc. 37th Annu. Inst. Meeting Soe. Explor. Geophys.,
Oklahoma City, 1967.
35. Ulrych. T . Y. and Bishop, T. N ., Maximum entropy spectral analysis and autoregressive decomposition.
Rev. G eophys. Space Phys.. 13, 183, 1975.
36. Jaynes. E. T ., On the rationale of maximum entropy method . Proc. IEEE, 70, 939, 1982.
37. Lang, S. W. and McClellan, J. H ., Frequency estimation with maximum entropy spectra! estimators,
IEEE Trans. Acoust: Speech Signal Process.. 28. 716, 1980.
138 Biomedical Signal Processing
38. Theodoridis, S. and Cooper, D. C ., Application of the maximum entropy spectrum analysis technique
to signals with spectral peaks o f finite width, Signal Process., 3, 109, 1981.
39. Herring, R. W ., The cause o f line splitting in Burg maximum entropy spectral analysis, IEEE Trans.
Acoust. Speech Signal Process., 28, 692, 1980. j
40. W u, N .-L ., An explicit solution and data extension in the maximum entropy method, IEEE Trans. Acoust.
Speech Process., 31, 486, 1983. ;
41. Lang, S. W .’and M cClellan, J . H ., Multidimensional MEM spectral estimation, IEEE Trans. Acoust.
Speech Signal Process., 30, 280, 1982.
42. M alik, N. A. and Lim , J. S ., Properties of two dimensional maximum entropy power spectrum estimates,
IEEE Trans. Acoust. Speech Signal Process., 30, 788, 1982.
43. M clellan, J. H. and Lang, S. W ., Duality for multidimensional MEM spectral analysis, IEE P roc.. 130(F),
230, 1983.
44. Newman, W. I., Extension to the maximum entropy method, IEEE Trans. Inf. Theory, 23. 89. 1977.
45. Johnson, R. W. and Shore, J. E ., Minimum cross entropy spectral analysis of multiple signals. IEEE
Trans. Acoust. Speech Signal Process., 31, 574, 1983.
46. M akhoul, J ., Linear prediction: a tutorial review, Proc. IEEE, 63. 561, 1975.
47. Bartlett, M. S ., An Introduction to Stochastic Processes, 2nd ed., Cambridge University Press. New York,
1966.
48. Van den Bos, Alternative interpretation of maximum entropy spectral analysis, IEEE Trans. Inf. Theory,
1 7 ,4 9 3 ,1 9 7 1 .
49. Grande, J ., II, Hamrud, M ., and Toll, P., A remark on the correspondence between the maximum
entropy method and the AR model, IEEE Trans. Inf. Theory, 26, 750, 1980.
50. Baggeroer, A. B ., Confidence intervals for regression (MEM) spectral estimates. IEEE Trans. Inf. Theory,
22, 534. 1976.
51. Huzii, M ., On spectral estimate obtained by an AR Model fitting , Am. Inst. Statist. M ath., 29. 415. 1977.
52. Sakai, H ., Statistical properties o f AR spectral analysis, IEEE Trans. Acoust. Speech Signal Process., 21,
402, 1979.
53. Toomey, J. P., High resolution frequency measurement by linear prediction. IEEE Trans. Aeorsp. Electron.
Syst., 16. 517. 1980.
54. Fougere, P, F., Zawalick. E. J ., and Radoski, H. R ., Spontaneous line splitting in maximum entropy
power spectrum analysis, Phys. Earth Planet. Inter., 12, 201, 1976.
55. Hung, E. K. L. and H erring, R. W ., Simulation experiments to compare the signal detection properties
of DFT and MEM spectra, IEEE Trans. Acoust. Speech Signal Process., 29. 1084, 1981.
56. Kay, S. M. and M arple, S. L ., J r., Sources of and remedies for spectral line splitting in AR spectrum
analysis. Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 151. 1979.
57. Fougere, P. F., A solution to the problem of spontaneous line splitting in maximum entropy power spectrum
analysis. J. Geophys. Res.. 82. 1051, 1977.
58. M arple, L., A new AR spectrum analysis algorithm, IEEE Trans. Acoust. Speech Signal Process., 28.
441, 1980.
59. Lacoss, R. T ., Data adaptive spectral analysis method., Geophysics, 36, 661. 1971.
60. Swingler, D. N ., A comparison between Burg’s maximum entropy method and a nonrecursive technique
for the spectral analysis o f deterministic signals, J. Geophys. Res.. 84, 679, 1979.
61. Scott, P. D. and Nikias, C. L ., Energy-weighted linear predictive spectral estimation: a new method
combining robustness and high resolution, IEEE Trans. Acoust. Speech Signal Process., 30, 287, 1983.
62. Quirk, M. P. and Liu, B ., Improving resolution for AR spectral estimation by decimation, IEEE Trans.
Acoust. Speech Signal P ro cess., 31, 630, 1983.
63. Lee, T. S., Large sample identification and spectral estimation of noisy multivariate AR processes. IEEE
Trans. Acoust. Speech Signal Process., 31, 76, 1983.
64. Cadzow, J. A ., High performance spectral estimation — a new ARMA method, IEEE Trans. Acoust.
Speech Signal Process., 28, 524, 1980.
65. Cadzow, J. A ., ARMA spectral estimation: a model equation error procedure. IEEE Trans. Geosci. Rem.
Sens., 19, 24, 1981.
66. K ay, S. M ., A new ARMA spectral estimator, IEEE Trans. Acoust. Speech Signal Process., 28, 585,
1980.
67. Friedlander, B ., Efficient algorithm for ARMA spectral estimation. IEE Proc., 130 (F.3), 195, 1983.
68. Friedlander, B ., Instrumental variable methods for ARMA spectral estimation, IEEE Trans. Acoust. Speech
Signal Process., 31, 404, 1983.
69. Bruzzone, S. and Kaveh, M ., On some suboptimal ARMA spectral estimators, IEEE Trans. Acoust.
Speech Signal Process., 28, 753, 1980.
70. Porat, B ., ARMA spectral estimation based on partial autocorrelations, Circuits, Syst. Signal Process.,
2(3), 341, 1983.
Volume I: Time and Frequency Domains Analysis 139
71. Kaveh, M . and Bruzzone, S. P., Statistical efficiency of correlation based methods for ARMA spectral
estimation. IEE Proc.. 130 (F-3), 211, 1983.
72. Fischer, J. and Wilfert, H. H ., Determination o f the statistical errors in the estimation o f the pov.jr
spectrum by means of univariate time series analysis, Proc. 5th IFAC Symp. Ident. and Syst. Param.
Hstimation, 1235, 1979.
73. Takeuchi, M ., Decision of order in ARMA model on power spectrum estimation and accuracy of estimated
power spectrum — in fitting ARMA model to time series, Syst. Comput. Controls, i 2< I >. 18, 1981.
74. Lagunas-H ernandez, M . A ., Flgueiras-Vldal, A. R ., Marino-Acebal, J. B ., and Vilanova, A. C ., A
linear transform for spectral estimation. IEEE Trans. Acoust. Speech Signal Process., 29. 989, 1981.
75. C adzow , J . A ., Spectral estimation: an overdetermined rational model equation approach. Proc. IEEE,
70. 907. 1982.
76. Pisarenko, V. F ., The retrieval of harmonics from a covariance function, Geophys. J. R. Astron. Soc.,
33.' 247. 1973.
77. Pisarenko, V. F., On the estimation of spectra by means of non linear functions of the covariance matrix,
Geophys. J. R. Astron. Soc., 28. 511. 1972.
78. Satorius. E. H. and Alexander, J. T., High Resolution Spectral Analysis of Sinusoids in Correlated
Noise. Ree. 1978 ICASS1V luisa.
79. Bucker, H. P ., Comparison of FFT and Prony algorithms for bearing estimation of narrow band signals
in a realistic ocean environment. J. Acoust. Soc. Am ., 61, 756. 1977.
80. Scahubert, D. H ., Application of Pron\ s method to time domain reflectometer data and equivalent circuits
synthesis. IEEE Trans. Antenas Prop.. 27. 180. 1979.
81. W eiss, L. and M cDonough, R. N ., Prony1s method. Z. transforms and pade approximation. SIAM Rev.,
5. 145, 1963.
52. K um aresan. R. and Tufts, D. W ., Improved spectra! resolution III: efficient realization. Proc. IEEE. 68.
1354. 1980.
53. Capon. J .. High resolution frequency wavenumber spectrum analysis, Proc. IEEE, 57. 1408. 1969.
84. Burg, J. P .. The relationship between maximum entropy spectra and maximum likelihood spectra. Geo
physics. 3". 375. 1972.
85. Jain. A. K. and Ranganath, S .. l:\tranolation algorithms for discrete signals with applications in spectral
estimation. IEEE Trans. Acoust. Spetch Signal Process., 29. 830. 1981.
86. Beev, A. A. and Scharf, L. L ., Gnariance sequence approximation for parametric spectrum modeling,
IEEE Trans. Acoust. Speech Signal Process.. 29. 1042, 1981.
87. Andrew s. M .. An adaptive NR filter for spectrum analysi.. Comp. Electron. Eng.. 6. 9y. 1979.
88. Friedlander. B ., Recursive lattice forms for spectral estimation, IEEE Trans. Acoust. Speech'Signal
P rocess.. 30. 920, 1982.
89. Larsen, R. D ., Crawford, E. F., and Howard. G. K ., Walsh analysis of signals. Math. Biosci.. 31,
237. 197(v
lM). Sm ith, W. I)., Walsh versus Fourier estimators of the FEG power spectrum, IEEE Train. Biomed. Ei m;.,
28. 790. h > $l.
91. Lai, I). C. and Larsen, H ., Walsh spectral estimates with applications to the classification of FF.G signals.
IEEE Trans, Biomed. Eng., 28, 790, 1981.
92. O ppenhein. A ., Johnson, D ., and Steiglitz, K ., Computation of spectra with unequal resolution using
FFT. Proc. IEEE, 64. 299. 1971.
Volume I: Time and f requency Domains Analysis 141
Chapter 9
ADAPTIVE FILTERING
I. INTRODUCTION
Filtering is used to process a signal in such a way that the signal-to-noise ratio is enhanced,
noise of a certain type is eliminated, the signal is smoothed, or “ predicted” , >r classification
of the signal is achieved. When the signal and noise are stationary and thcii characteristics
are approximately known or can be assumed, an optimal filter can be designed a priori.
Such is the W iener filter discussed in Chapter 6 and the matched filter that is presented in
Chapter I, Volume II.
w hen no a priori information on the signal or noise is available, or when the signal or
noise is nonstationary, a priori optimal filter design is not possible. Adaptive optimal filters
are filters that can automatically adjust their own parameters, based on the incoming signal.
The adaptation process is conducted such that the filter uses incoming signal information in
order to adapt its own parameters so that a given performance index is optimized. Adaptive
filters thus require little or no a priori knowledge of the signal and noise characteristics.
The applications of adaptive filtering to signal processing in general, and to biomedical
signal processing in specific, has been preceded by the development and use of adaptive
algorithms1'3 in control theory. Although adaptive filters and algorithms, used in signal
processing, are basically similar to those used in control systems, some differences do exist,
which demand new design approaches.4
Since no (or almost no) a priori information is available, the adaptive filter requires an
initial period for learning or adaptation. During this period, its performance is unsatisfactory.
The time of adaptation is clearly an important characteristic of the filter. Signals, where fast
changes are expected, require filters which adapt rapidly. Care should be placed when
designing such filters since the filter may track rapid artifacts. After initial adaptation, the
filter is supposed to act optimally, while tracking the nonstationary changes in signal and
noise. The nonperfect ability of the filter to estimate signal and noise statistics p r e v e n ts it
from being truly optimal. In practical design, however, this loss of performance can be
made quite sm all.5
The adaptive filter is required to perform calculations to satisfy the performance index
and must have provision for changing its own parameters. Digital techniques, w'ith or without
computing device, have clear advantages here over analog techniques. It is mainly because
of this reason that most adaptive filter implementations are performed by discrete systems.
We shall only consider here discrete adaptive filters operating on sampled signals.
The next section in this chapter will present the general structure of an adaptive filter.
This will follow by a detailed discussion o f least mean square (LMS) adaptive filter.5 s The
use of the LM S adaptive filter for line enhancement914 and noise cancellation7 15 " will be
discussed with some biomedical applications.71517 Finally, the multichannel20 and the time-
sequenced adaptive filter2122 will be introduced. The discussions are based mainly on Wid-
row’s papers.
The LMS filter discussed here is by no means the only type of adaptive filter available.
Other types are discussed in the literature,23'28 where various performance criteria or structures
are used (e.g .. the lattice structure35 27), or in which the structure as well as the weight is
adaptable.28
142 Biomedical Signal Processing
A. Introduction
The adaptive filter consists o f three main parts: the performance index which is to be
optimized, the algorithm that recomputes the parameters of the filter, and the structure of
the filter which actually performs the required operations on the signal. Claasen4 has sug
gested the classification of adaptive filters according to their major parts and to their goals.
W e follow this approach here.
The performance index is best determined by the application. When using adaptive filtering
for elimination of maternal ECG in automatic fetal ECG monitoring, the performance index
may bef the minimization of false detection. This, however, may be a difficult criterion to
implement since we do not know when a false detection has happened. Therefore, we look
for performance criteria that will be easily implemented. In most applications, the minim
ization o f the square of an output »‘c fonrid to be a satisfactory criterion.
The algorithm is the mechanism by means of which the parameters, optimizing the
criterion, are calculated. Two*basic types of algorithms are to be considered. The first is a
nonrecursive algorithm. It requires the collection of all data in a given time window and
solving the necessary equations. The exact least square method is such an algorithm. The
algorithm usually requires the solution of a set of linear equations by the inversion of a
matrix, and the results are not available in real time. The second type of algorithm is the
recursive algorithm which updates itself with every incoming signal sample, or a small group
of samples. This algorithm usually requires gradient methods and convergence must be
checked. Results are available immediately and tracking of signal nonstationarities is possible.
The structure of the filter depends to some extent on the algorithm and the application.
Most often a transversal filter is used because of its straightforward hardware structure and
its robustness in combination with iterative algorithms. The lattice structure, though some
what more complex, has been found to possess better convergence and sensitivity.26
We shall proceed by considering various goals of adaptive filtering.
A. Introduction
We shall discuss here adaptive filters using the least mean square (LMS) algorithm
developed5 by Widrow and Hoff in 1960. The filter consists of reference inputs, variable
gains multipliers (weights), an adaptation algorithm, and an additional input signal denoted
144 Biomedical Signal Processing
SIGNAL TO BE
CORRECTED
the “ primary input” . We shall present the principal component of the adaptive filter, namely,
the adaptive linear combiner, and then discuss various structures of the adaptive filter for
adaptive noise cancelling and line enhancement.
X] = [xoj,xy, . . . , x J (9.1)
where xuj is usually a constant set to the value 1. Its role is to take care of biases in the
inputs. Wc also define a vector of the variable gains (weights):
Sj = t,
i 0
= W'x, = x J W (9.3)
We consider s, to be the estimation of a signal. This signal will depend on the problem to
which the com biner is applied. We shall define the error signal Cj by:
C. T he LM S A daptive A lgorithm
The performance index for our algorithm is the mean square error. The LMS adaptive
algorithm ’s task is to adjust the weights. W. in such a way as to minimize the mean square
error. The mean square error is calculated by squaring Equation 9.4 and taking the expec
tation. Assuming the reference and primary inputs to be stationary, and the weights fixed,
we get:
R = E{x,x.’} (9.7)
so that the mean square error can be expressed as the quadratic function of the weights:
In the stationary' case, the minimization of Equation 9.8 means the adjustment of the
•weights, descending along the surface (Equation 9.8) until the minimum is reached. In the
nonstationary case, the minimum is drifting and the algorithm has to adapt the weights sue!
that they track the minimum.
To find the minimum of Equation 9.8. we have to calculate the gradient of the squared
error. The weighting vector, Wopt, is the vector that zeroes the gradient.
146 Biomedical Signal Processing
hence:
which is the matrix form of the Wiener-Hopt equation (see Chapter 6).
The LMS algorithm does not use directly Equation 9.10 for the optimal solution. Rather,
it uses the method of steepest descent. We calculate the optimal vector iteratively, where
in each step we change the vector proportionally to the negative of the gradient vector.
Hence:
Wj+ , (9.11)
where jjl is a scalar that controls the stability and rate of convergence of the algorithm. We
have added a subscript to the weighting vector to denote the number of iteration. Note that
using Equation 9.11 does not require the calculations of the correlations and not the inversion
o f the correlation matrix. The gradient with subscript j in Equation 9.11 is given by Equation
9.9 where the derivatives are taken at W = Wj.
In practice, it is impossible to implement Equation 9.11 since the gradient involves
expectations. For practical implementation we have to replace the gradient, Vj, with some
kind of estimation, Vj. Widrow has suggested the crude estimate:
de2 jte 2
(9.12)
dw0’ d w ,\ *’ dwn
namely, to estimate the expectation of e? by the value of ef itself. This means that we
estimate the mean by a very short finte time. The derivatives of Equation 9.12 become:
i,b '
The right side of Equation 9.13 is calculated by taking the derivative of Equation 9.4 with
respect to W. Introducing the estimate of the gradient of Equation 9.13 into Equation 9.11
yields:
The last equation is known as the W idrow-Hoff LMS algorithm. It has been shown that the
expected value of the weight vector (Equation 9.14) converges to the Wiener weight vector
(Equation 9.10) if the reference inputs are uncorrelated over time.
A necessary and sufficient condition for convergence8 is
where Xmax is the largest eigenvalue of the correlation matrix R. The eigenvalues, however,
are usually not known. It has been suggested,8 therefore, to use a sufficient condition for
convergence:
Volume I: Time and Frequency Domains Analysis 147
Since R is positive definite, tr(R) > \ nm. The trace is easy to estimate since it is the total
power in the reference signals. Widrow has shown that the learning curve (the curve de
scribing the convergence of the weights W, to the Wiener weight) can be approximated by
a decaying single exponential curve with time constant, t
’ - T
4|xTtr TR » ■ '»
The LMS adaptive algorithm (Equation 9.14) is easy to implement, and does not require
differentiations or matrix inversion. For each iteration, it requires n + 2 multiplications and
n additions.
and the output of the summer, the estimated signal s, (Equation 9.3):
Note that §j is the autoregressive (AR) estimation (see Chapter 7). The LMS filter with
reference Equation 9.17 is an adaptive AR filter. The AR coefficients (LPC) are optimally
adapted in such a way that the output of the filter and the desired input w-iH have minimum
mean square error.
If we set d j.= Xj, w0 = w, = 0, wt = - w ^ , , i = 1 ,2 ....... n, denote §j = xj5 we get
from Equation 9.18:
n - I
Xj = - 2 WiXj-i (9-19)
i= I
which is the AR equation. The filter, under these conditions, can be used to estimate and
track the LPC (AR) of a nonstationary signal.
Adaptive LMS filters have been successfully implemented on small machines. The errors
due to finite word length have been analyzed.40 4' Adaptive filtering can be implemented
also in the frequency domain1'02 ” with some advantages over the time dom ain.16
A. Introduction
Consider the following problem (depicted in Figure 6). A signal s(t) is contaminated with
an additive noise n^(t), and with another noise. i^(t); we assume that s and n(, and -r] are
uncorrelated. The noise, nu, is generated by a white noise process n(t) that has passed an
unknown linear filter, H,. The additive noise n„(t) is therefore a colored noise. Assume also
that we have a reference signal x(t) consisting of a white noise £(t) and another noise nr(t).
The second noise is the result of the noise process n(t), contributing to the primary noise,
but after another unknown linear filter. HL. Note that here we have:
148 Biomedical Signal Processing
where N0, Nr, H ^ z - '), and H2(z ~ ‘) are the z transforms of n„(t), nr(t). h,(t), and h2(t).
respectively. We assume that the auxiliary noises T ](t) and £ ( t ) are white and uncorrelated
with one another, with n(t), and with the signal s(t). The concept of the adaptive noise
canceller is as follows. An adaptive estimate of n0(j), denoted by nQ(j). is calculated by the
adaptive LMS filter. As shown before this filter is an adaptive AR filter estimating the
unknown filter H2' ,(z _1)H,(z_1) by means of the reference input nr(j) and the error. Note
also that the adaptive filter does not operate as the Durbin’s algorithm described in Chapter
5. There, the AR coefficients estimated were optimized in such a way as to whiten the
output of the filter (the residuals). The estimated AR coefficients were the coefficients of
the filter, H2_i(z _ I). Here the criterion is the minimization of E{ef} so that the estimated
AR coefficients are the coefficients of H2- !(z~ *)H ,(z- ').
Adaptive noise cancelling filters7 have been extensively used in biomedical signal pro
cessing and many other applications.
The adaptive algorithm will change the weights so that Equation 9.22 is minimized.
However, changing the filter weights effects only nu and does not effect the term E{s2}.
Therefore, minimization of Equation 9.22 is equivalent to the minimization of E{«n . •+- t])
— (n„ - From Equation 9.21 we get:
hence the m inimization of E{(rno 4 T|) - (n0 -f f|))2} also minimizes E{(s - s): }. The
adaptive noise canceller provides the estimate s, which is the best least squares estimate
of s.
The reference input must be correlated with the primary noise, n0. It is this correlation
that allows the LMS noise canceller to function effectively. To demonstrate this, assume
that nr is not correlated with n4, (Figure 6 does not hold for this example). The minimization
of Equation 9.22 yields:
The algorithm will minimize Equation 9.23 by adjusting all weights W to zero thus bringing
the last term to its minimum, namely zero. The adaption noise canceller has been applied
to a variety of biomedical715 and many other applications, such as echo cancellation in
communication network.34'35
150 Biomedical Signal Processing
where the amplitude, A, and the phase, <j>, are unknown. The frequency of the line voltage,
wOJ can vary around its nominal value. Its exact value, at any moment, is unknown a priori.
This is a common problem in biomedical signal processing. A fixed band stop filter can be
designed with a notch at the nominal value of w0, and with sufficient width to cover the
expected variations in the frequency . In cases where meaningful portions of the PSD function
of the signal are in the vicinity of wu, this will cause distortions to the processed signal.
Typical examples are ECG, EMG, and EEG signals, all having meaningful information in
the region of 50 to 60 Hz. — ......
We shall see now that the adaptive LMS noise canceller can operate as an adaptive narrow
notch filter with its central frequency automatically tracking the variations of w„.
For the reference signal, we take a signal which is directly proportional to the power line
voltage; we choose:
Here B, wt„ and i{j are known. This can simply be the voltage taken directly from the wall
outlet. The second reference is derived from x,(t) by shifting it 90°. Hence:
where AT is the sampling interval. Note that the output of the LMS combiner is a linear
combination of the two normal phasors (Equation 9.25C). It is clear that we can represent
the cosine primary noise as a linear combination of the phasors, given the right weights.
Any change in w„ will appear both in the primary and reference signals. The LMS will thus
track the variations in w0. Widrow7 has shown that the adaptive noise canceller described
in Figure 7 is equivalent to a notch filter, with notch frequency always at w0, and Q (ratio
of center frequency to bandwidth) given by:
(9.26)
2 jjlB 2
This configuration was used7 to cancel 60-Hz interferences in ECG. Figure 8 shows the
cancellation effects. Note the adaptation of the filter. Other types of learning filters for
power line interferences removal are available.17
monitoring the ECG of a patient after such heart transplantation, both “ o ld " and “ new”
QRS complexes are present. It is of interest to the physician to be able to separate between
the two and to be ,.ble to analyze the “ old” ECG without the interferences of the “ new”
ECG. Adaptive noise cancelling has been applied to this problem by W idrow' and his co
workers. The primary input was supplied by a catheter, inserted through the left brachial
vein and the vena cava to the atrium of the “ old” heart. The reference input was supplied
by ordinary chest electrodes, which carried mainly the “ new” heart’s signals. Figure 8B
shows the improvement in the signal after the application of adaptive noise cancelling.
A D A PT A T IO N |j A DA PT AT IO N
R Fm N S 11 P.OMPi FTF
FIGURE 8. Noise cancellation in ECG. (A) Adaptive cancellation tff power line
interferences; {B) removal of ‘'new-' ECG in heart transplant patient; (C) cancellation
of electrosurgical noise. (From Widrow, B.. Glover. J. R., Jr., McCool. J. M..
Kaunitz. J.. Williams, C. S., Hearn, R. H., Zeidler, J. R.. Dong, E., Jr., and Goodlin,
R. C.. Proc. IEEE, 63, 1962, 1965. With permission.)
cancelling filter to the problem of noise reduction in pilot radio communications. The noise
in the cockpit is highly nonstationary due to variations in engine speed and load. A second
microphone was placed in a suitable location in the cockpit to produce the reference signal.
In W idrow’s experiments, simulated cockpit noise made the unprocessed speech unintelli
gible. After LMS processing, the output power of the interferences was reduced by 20 to
25 dB, rendering the interferences barely perceptible to the listener. No noticeable distortion
w as introduced to the speech signal itself. Other algorithms, which use only one microphone,
have been suggested.?s These, however, were ineffective for low signal-to-noise ratios.
The LMS noise canceller was applied to the problem of improving speech communication
for the hearing impaired.37 Noise cancellation is important, for example, in cases wherein
a hearing impaired child must function in an educational setting. A special amplification
system can be used to amplify the teacher’s voice for the child. The teacher’s microphone,
however, picks not only the teacher’s voice but also the environmental noise o f the classroom.
A reference microphone can be placed apart from the teacher to pick the reference noise
(which will also, in this case, contain some of the primary signal).
Applying the LMS noise canceller in a controlled environment using very noisy speech
improved intelligibility of speech from near zero to about 30 to 50%.
r NEW I r-NEW
H EA RT H EA R T
A
J s
^OLD
^NEW AND OLD
NEW / H EA RT i
H EA RTS
JH E A R T ^ J (
M I
i i i
, OLD |
Y OLD
HEART
NEW
H EA RT
,v^
H EA RT
/ w — "V 'vw v
\
dj = Sj + n; (9.27)
v
Let the autocorrelation of the signal be R s( t ) . We choose t r such that R s( t r ) < e , where €
is some small number. The delayed signal s(j - t r ) in the reference will (almost) be
uncorrelated with the primary signal. The reference n(j - t r ) is correlated with the primary
noise since it is periodic.
Adaptive noise cancellers can be implemented also in the frequency dom ain.16-33
A multichannel recording of evoked potentials, where each channel represents the voltage
monitored at a different location on the scalp, may serve as an example for such a problem.
Consider the ith channel's output \ "
Here we describe these relationships in the z domain by means of the unknown linear
filters, H,(z ■'). Refer to Figure 11 for the signals model and the adaptive filter. Each signal
x(i) is used as the desired input for an LMS adaptive filter (see Figure 5). The left side of
Figure 11 is just an imaginary model. In practice, the M given channels are used directly
as primary and reference inputs to the multichannel adaptive filter. In the process of ad
aptation, the weights of each LMS filter are simultaneously adjusted to minimize the power
of €j. After convergence of all LMS filters, the output of the combiner is the best least
squares estimate of the delayed primary signal s]l,).
The delay in the primary channel is required to account for possible delays in the filters
H|(z) and in the LMS filters. The distortions in the estimated signal as well as the noise
power spectra o f the output for the multichannel adaptive filter have been calculated.20 The
multichannel adaptive signal enhancer yields a substantial reduction in background noise
but often at the expense of considerable signal distortion21 and computation load.
156 Biomedical Signal Processing
S IG N A L A N D N O IS E M U L T IC H A N N E L A D A P T IV E
M OOEL F IL T E R
The signal d(k) consists of M noisy processes s;(k), i = 1, 2, ..., M. each one appearing
in the signal at times kj to kj + !. The assumption is made that at each time only one process
is present in the signal. As in the conventional LMS filter, a reference signal, x(k), is given.
Additional input inquired here, called the sequence number, a k. This signal provides
information concerning the type of signal, s, currently present in the signal, d. Namely,
when <jk — i, we assume that x(k) — S;(k) + n(k). Figure 12 shows the time-sequenced
Volume I: Time and Frequency Domains Analysis 157
adaptive filter. For each one of the M processes appearing in the signal, we sc: adaptive
filter. Each filter will adapt and find the minimum point appropriate for its ov-n process.
The filters are switched by two synchronized switches controlled by the sequence number.
crk. Consider the case where crk = i, depicted in Figure 12. The output of the L M S algorithm
is connected to the ith LMS filter, adjusting its weight vector W‘. This is dor.r using the
constant j j l ' . the ith element of the vector ji.
The output of the ith LMS filter is connected to the summer to provide the error. ek. Note
that when the sequence number changes, another filter will be selected. The same LMS
algorithm will now adapt the weightings of the new filter using the appropria:e constant,
fx. Hence the adaptation process can be written as:
WL + <xk - i
The computational load of the time-sequenced adaptive filter is almost the same as that
required from the conventional LMS filter, since in both cases only one vector is being
adjusted at a given time. The time-sequenced filter requires some more operations for the
switching and selection of proper constants and weights. The convergence time o f the filter
is longer than that of the conventional LMS filter, since here each one of the filters is
158 Biomedical Signal Processing
1 ,,''/ v V /V /A a$\
(b) ABDOMINAL LEAD & 2 WITH MATERNAL ECG CANCELLED
FIGURE 13. Enhancement of fetal ECG. (From Ferrara, E. R. and Widrow. B .. IEEE Trans. Biomeil. Eng.,
29. 458. 1982. © 1982 IEEE. With permission.)
adapted only in the periods when the appropriate signal is present. Also, the memory required
for the time-sequenced filter is larger than that required by the conventional one.
When ork is given with no error, the time-sequenced filter converges to the correct optimal
time varying filter. In practical applications, however, the sequence number, a , is not perfect
but subject to “ jittering” which causes the filter to be less than optimal.
REFERENCES
1. Davisson, L. D ., A theory of adaptive filtering, IEEE Trans. Inf. Theory. 12, 97, 1966.
2. M ehra, R. K ., Approaches to adaptive filtering, IEEE Trans. Autom. Control, 17. 693, 1972.
3. Johnson, C. R ., The common parameter estimation basis of adaptive filtering, identification and control,
IEEE Trans. Acoust. Speech Signal Process.. 30, 587, 1982.
Volume I: Time and Frequency Domains Analysis 159
35. Verhoeckx, N . A. M ., van den Elzen, H. C ., Snijders, F. A. M ., and van Gerwen, P. J ., Digital echo
cancellation for baseband data transmission, IEEE Trans. Acoust. Speech Signal Process., 27, 768, 1979.
36. Low er, R. R ., Stofer, R. C ., and Shumway, N. E ., Homovital transplantation o f the heart, J. Thoracic
Cardiovas. Surg., 41, 196, 1961. j
37. Chabries, D. ML, Christiansen, R. W ., Brey, R. H ., and Robinette, M . S ., Application of [he LMS
adaptive filter to improve speech communication in the presence o f noise, Proc. ICASSP, IEEE, New York,
1982, 148.
38. L im , J. S. and Oppenheim, A. V ., Enhancement and bandwidth compression o f noisy speech, Proc.
IEEE, 67, 1586, 1979.
39. W idrow, B ., M antey, P. E ., Griffiths, L. J ., and Goode, B. B ., Adaptive antenna systems, Proc. IEEE,
5 5 ,2 1 4 3 ,1 9 6 7 .
40. Caraiscos, C. and Lin, B ., A roundoff error analysis o f the LMS adaptive algorithm, IEEE Trans. Acoust.
Speech Signal Process., 32, 34, 1984.
41. Norm ile, J. O. and Boland, F. M ., Adaptive filtering with finite wordlength constraints, IEE Proc.,
130(E), 42, 1983.
Volume I: Time and Frequency Domains Analysis 161
INDEX
A Autocorrelation function, 5, 25
Autocorrelation matrix, 93
Autocorrelation measure (ACM) method. 102— 103
Acetylcholine (ACh), 12
Autocorrelation method, 86, 106
ACM, see Autocorrelation method
Autocovariance, 85
Acquisition, 5— 6
Autoregressive integrated moving average
Actin, 12
(ARIMA), 84, 99— 100
Action potential, 11— 12
Autoregressive (AR) model, 83— 88
Adaptation, 141
comparison with other methods, 136
Adaptive algorithm, 135. 142
least squares model, 85— 88
Adaptive delta modulation (ADM), 32
order estimation, 95— 96
Adaptive estimation. 135
spectral estimation, 109, 122— 125
Adaptive filtering. 7. 141 — 160
Autoregressive moving average (ARM). 83, 85, 100
adaptation time, 141
Akaike information theoretic criterion. 97
adaptive signal correction, 143— 144
comparison with other methods, 136
adaptive signal estimation, 142— 143
filters, 74
adaptive system parameter identification, 142—
lattice filters, 99
143
order estimation, 95
algorithm, 142
spectral estimation, 109, 126— 132
frequency domain. 147
Autoregressive moving average exogenous variables
general structure o f filters, 142— 143
(ARMAX), 82— 83
improved, 154— 158
Availability o f long records, 45
lattice structure, 141— 142
AV node, 150
least mean square, 141, 143— 147
Axon, 9— 11
line enhancement. 141
multichannel. 141
multichannel adaptive signal enhancement, 154— B
156
noise cancellation. 141, 147— 154
Background broad band noise, 154
performance index. 142
time-sequenced. 141. 156— 158 Band-limited continuous signal, 29, 31
Adaptive filters. 73 Band-limited spectrum, 135
Adaptive linear combiner. 144— 145 Bartlett window, 111
Adaptive line enhancer (ALE). 143 Bayes’ rule, 17— 18
noise cancellation. 154— 155 Bias, 124
Adaptive segmentation. 101— 106 Bilinear transformation method, 74
ADM. see Adaptive delta modulation Bioelectric potentials, 9
Akaike information theoretic criterion (AIC), 97 Bioelectric signals, 6
ALE, see Adaptive line enhancer muscle, 12
Algorithms, 81 nerve cell, see also Nerve cell, 9— 12
Aliasing in frequency. 30 neuron, 9
All zero model, 83 origin of, 9— 13
“ Almost” periodic signals, 3— 4, 15 volume conductors. 12— 13
Almost stationary segments, 135 Biomedical signals, see also specific topics. 1
Amount of information. 122 Blackman-Tukey algorithm, 110, 124
Anaesthesia, 109 Blackman-Tukey procedure, 109, 111— 113, 117,
Analog processing. 29 124, 126, 135
Analog systems, 4 comparison with other methods, 136
Analog to digital (A /D ) conversion systems, 29 Block quantizer, 36
A priori information. 5, 7, 73, 130, 135, 141 Brain activities, 13
A priori optimal filter design, 141
AR, see Autoregressive mode!
ARIMA, see Autoregressive integrated moving
c
average i
ARMA, see Autoregressive moving average Capon’s spectral estimation, 133— 134, 136
ARMAX, see Autoregressive moving average exog CAT-computed averaged transients, see Synchron
enous variables ous averaging
Autocorrelation, 26, 85, 89 Cell body, 9— 11
Autocorrelation coefficients, 93 Central limit theorem, 26— 27
162 Biomedical Signal Processing
w
X-ray analysis. 6
Volume II
Compression and Automatic
Recognition
Author
C R C P ress, Inc.
Boca R ato n, Florida
Library o f Congress Cataloging in Publication Data
Cohen, A m o n , 1938-
B ioinedical signal processing/
Bibliography:*?.
Includes index.
Contents: v. 1. time and frequency domains analysis
— v. 2. Compression and automatic recognition.
1. Signal processing. 2. Biomedical engineering.
I. Title.
R 857.S47C 64 1986 610'. 28 85-9626
ISBN 0-8493-5933-3 (v. 1)
ISBN 0-8493-5934-1 (v. 2)
This book represents information obtained from authentic and highly regarded sources. Reprinted material is
quoted w ith permission, and sources are indicated. A wide variety o f references are listed. Every reasonable effort
has been m ade to give reliable data and information, but the author and the publisher cannot assume responsibility
for the validity of all materials or for the consequences o f their use.
A ll rights reserved. This book, or any parts thereof, may not be reproduced in any form without written consent
from the publisher.
Direct all inquiries to CRC Press, Inc., 2000 Corporate Blvd.. N .W ., Boca Raton. Florida, 33431.
B iom edical signal processing is o f prime importance not only to the physiological re
searcher but also to the clinician, the engineer, and the computer scientists who are required
to interpret the signal and to design systems and algorithms for its m anipulations.
T h e biom edical signal is, first o f all, a signal. As such, its processing and analysis are
covered by the numerous books and journals on general signal processing. Biomedical
sig n als, however, possess m any special properties and unique problem s that render the need
for special treatment.
M o st o f the material dealing with biom edical signal processing methods has been w iddy
scattered in various scientific, technological, and physiological journals and in conference
proceedings. Consequently, it is a rather difficult and time-consuming task, particularly to
a new com er to this field, to extract the subject matter from the scattered information.
T h is book was not m eant to be another text or reference on general signal processing. It
is intended to provide m aterial o f interest to engineers and scientists who wish to apply
m odern signal processing techniques to the analysis o f biomedical signals. It is assumed the
read er is fam iliar with the fundam entals o f signals and system s analysis as well as the
fundam entals o f biological system s. T w o chapters on basic digital and random signal proc
essin g have been included. These serve only as a summary of the material required as
background for other material covered in the book.
T he presentation of the material in the book follows the flow of events of the general
signal processing system. After the signal has been acquired, some manipulations are applied
in order to enhance the relevant information present in the signal. Simple, optim al, and
adaptive filtering are exam ples o f such manipulations. The detection o f wavelets is of
im portance in biomedical signals; they can be detected from the enhanced signal by several
m ethods. The signal very often contains redundancies. When effective storing, transmission,
o r autom atic classification are required, theso redundancies have to be extracted. The signal
is then subjected to data reduction algorithms that allow the effective representation in terms
o f features. Methods for data reduction and features extraction are discussed. Finally, the
topic of automatic classification is dealt with, in both the decision theoretic and the syntactic
approaches.
T he em phasis in this book has been placed on modem processing m ethods, some of which
have been only slightly applied to biom edical data. The material is organized such that a
m ethod is presented and discussed, and examples of its application to biomedical signals
are given. Rapid developm ents in digital hardware and in signal processing algorithms open
new possibilities for the applications o f sophisticated signal processing methods to biome
dicine. Solutions that were cost prohibitive beforehand or im practical because of the lack
o f appropriate algorithm becom e available. In such a dynamic environm ent, the biomedical
signal processing practitioner requires a book such as this one.
T h e author wishes to acknowledge the help received from many students and colleagues
d uring the preparation of this book.
A rnon C ohen
T H E AUTHOR
V o lu m e I
C h a p ter 1
In tro d u ctio n
I. G en eral M easu rem en t and D ia g n o stic S y ste m ........ .1
II. C la s sific a tio n o f S ig n a ls ............. .............................. .. .3
HI. F u n d am en tals o f S ig n a l P r o c e s s i n g .............................
IV . B io m e d ic a l S ig n a l A c q u isitio n and P r o c e s s in g ___ . 5
V. T h e B o o k ...... ~ .......................... ..................................... .. .6
R e f e r e n c e s . . . . . . . .......................................... ............ ............................ . 7
C h a p ter 2
T h e O r ig in o f the B io e le c tr ic S ig n a l
I. in tr o d u c tio n ..................................................... .......................... . 9
II. T h e N erv e C ell ................................................. ...................... . 9
A, Introdu ction ............................................ .................. .y
8. T h e E x c ita b le M e m b r a n e . . ................... 10
C. A ctio n P oten tial Initiation and Propagation n
D. T h e S y n a p se ............................................................. ii
III. T h e M u sc le ............... ............................ ................................... 12
A. M u scle S t r u c tu r e .................................................... i
B. M u scle C o n tr a c tio n ........................... . .................. 12
IV . V o lu m e C o n d u c to r s ............................................................... 12
R e f e r e n c e s ...... .................... ............................... ....................................... 13
C h a p ter 3
R a n d o m P r o cesse s
I. I n tr o d u c tio n .................................... ......................................... 15
II. E le m en ts o f P robab ility T h e o r y ......................... ............ is
A. I n tr o d u c tio n ......................... ..................................... 15
B. Joint P r o b a b ilit ie s .................................................. 16
C. S ta tistica lly Ind ep en d en t E v e n t s ............ 17
D. R and om V a r ia b le s .................................................. 17
E. Probab ility D istrib u tion F u n c tio n s ................ 18
F. P robab ility D en sity F u n c t io n s ......................... 19
III. R an d om S ig n a ls C h a r a c te r iz a tio n ................................. 21
A. R and om P r o c e s s e s ................................................... 21
B. S tatistical A v e r a g e s (E x p e c t a t io n s ) .............. TI
IV . C orrelation A n a l y s i s ............................................................. 1
A. T h e C orrelation C o e ffic ie n t ............................. 23
B. T h e C orrelation F u n c tio n ................................... 25
C. E r g o d ic it y ...... ................ ........................................... 26
V. T h e G au ssian P r o c e s s ............................................ .............. 26
A. T h e C entral L im it T h e o r e m ............................. 26
B. M u ltivariate G a u ssia n P r o c e s s ......................... 21
R e fe r e n c e s ................................................................................................... 28
C hapter 4
D ig ita l S ig n a l P r o c e ssin g
I. I n tr o d u c tio n ..................................................................................; ............................................................. 29
II. J
S a m p lin g ........................................................................................ ............................................................. 29
A. I n tr o d u c tio n ................................................................. i ............................................................. 29
B. U n ifo r m S a m p lin g ....................................................................................................................30
C. N o n u n ifo rm S a m p lin g ............................................................................................................31
1. Z er o , First, and S eco n d O rder A d a p tiv e S a m p lin g ................................ 32
2. N o n u n ifo rm S am p lin g w ith Run Length E n c o d in g ................................ 34
III. Q u a n t iz a t i o n ............................................................................................................................................... 36
A. I n tr o d u c tio n .................................................................................................................................36
B. Z er o M em ory Q u a n tiz a tio n ....................................................................................................
C. A n a ly s is o f Q u an tization N o is e ......................................................................................... 39
D. R o u g h Q u a n t iz a tio n ................................................................................................................40
IV . D is c r e te M e t h o d s ............................................................................................................. - ...............42
A. T h e Z T r a n s fo r m ............................................................................................ ,\v : ,.................42
B. D iffe r e n c e E q u a tio n s ...................................................................................... ................. 43
R e f e r e n c e s ............................................................................................................................................. ................... 44
C hap ter 5
F in ite T im e A v e r a g in g
I. I n tr o d u c tio n ..................................................................................................................................................4}
U. F in ite T im e E stim ation o f the M ean V a l u e .............................................................................. 45
A. T h e C o n tin u o u s C a s e ..............................................................................................................4r>
1. Short O b servation T im e ......................................................................................... 47
2. L ong O b servation T im e ................. ....................................................................... 48
B. T h e D iscrete C a s e .................................................................................................................... 51
III. E stim a tio n o f the V arian ce and C o r r e la tio n ...............................................................................53
A. V a iia n c e E stim ation — T he C o n tin u o u s C a s e .........................................................
‘ B. V a ria n ce E stim ation — The D iscrete C a s e ................................................................54
C. C orrelation E stim a tio n ............................................................................................................36
IV . S y n c h r o n o u s A v era g in g (C A T -C om p u ted A v era g ed Transients') .................................. 56
A. I n tr o d u c tio n ......................................................................................................... ....................... 56
B. S ta tistic a lly Ind ep en d en t R e s p o n s e s .............................................................................. 58
C T o ta lly D ep en d en t R esp o n ses ................................................................. . . . . . . . 59
D. T h e G eneral C a s e ........................................................................................ .r * . ; .................60
E. R eco rd s A lig n m e n t, E stim ation o f L a t e n c i e s ........................................................... 61
R e f e r e n c e s ........................................................................................................................................... - - e ................. 64
C h ap ter 6
F req u en cy D o m a in A n a ly sis
I. in t r o d u c tio n ..................................................................................................................................................65
A. F req u en cy D o m a in R ep resen tation .................................................................................65
B. S o m e P roperties o f the Fourier T r a n s fo r m ................................................................. 65
1. T h e C o n v o lu tio n T h e o r e m ...................................................................................66
2. P ar se v a l's T h e o r e m ................................................................................................. 6(r
3. F ourier T ransform o f Peril ’ . a l s ......................................................... 67
C. D isc r e te and Fast Fourier T ran sform s (D F T . F F T ) ...............................................68
II. S p e ctra l A n a ly sis .....................................................................................................................................7i
A. T h e P ow er Spectral D en sity F u n c t io n .......................................................................... /i
B. C ro ss-S p ectr a l D en sity and C oh e ren ce F u n c tio n s................................................... 11
HI. Linear F ilterin g ................................................................................................................... 73
A. Introduction .................................................................. ......................................... 73
B. Digital F ilters.. ....................................................................................................... 74
C. The Wiener F ilte r .................. ...............................................................................74
IV. Cepstral Analysis and Homomorphic Filtering..................................... ...................... 76
A. In tro Ju ctio n ................ .................................................................................... 76
B. The C e p stra .............. ......................................... . — ......................................... 76
C. Homomorphic Filtering........................................................................................ 77
R eferences.. . . . . . . ......................... .................................................................................................... 80
Chapter 7
Tim e Series Analysis-Linear Prediction
I. Introduction.................................. ....................................................................................... 81
II. Autoregressive (AR) M odels........... ............................ ......................... 85
A. In troduction............................................................................................................ 85
B. Estimation of AR Parameters — Least Squares M ethod.............................. 85
III. Moving Average (MA) Models ................................ ...................................................... 89
A. Autocorrelation Function o f MA Process.........................................................89
B. Iterative Estimate o f the MA Parameters ......................................................... 89
IV. Mixed Autoregressive Moving Average (ARMA) M o d e ls.......................................90
A. Introduction............................................................................................. ..............90
B. Parameter Estimation of ARM A Models — Direct M e th o d ....................... 90
C. Parameter Estimation of ARMA Models — Maximum Likelihood Method
93
V. Process Order E stim ation..................................................................................................95
A. introduction .............................................................................................................95
B. Residuals F latness..................................................................................................95
C. Final Prediction Error (FPE).............................. ................................................ 96
D. Akaike Information Theoretic Criterion (A IC )................................................97
E. Ill Conditioning o f Correlation M atrix.............................................................. 98
VI. Lattice Representation ....................................................................................................... 98
VII. Nonstationary P ro c e sse s................................................................................................... 99
A. Trend Nonstationarity — A R IM A ..................................................................... 99
B. Seasonal P ro c esses__ 1................................................................... ...................101
VIII. Adaptive S egm entation .................... .............................................................................. 101
A. Introduction .................................................... ........................... .......................... 101
B. The Autocorrelation Measure (ACM) M ethod.............................................. 102
C. Spectral Error M easure (SEM) M ethod...........................................................103
D. Other Segmentation M ethods............................................................................ 105
References......................................................... ............................................................................... 106
Chapter 8
Spectral Estimation
I. Introduction......................................................................................................................... 109
II. Methods Based on the Fourier Transform ....................................................................110
A. Introduction....... .................................................................................................. 110
B. The Blackman-Tukey M eth o d ...........................................................................I l l
C. The Periodogram .................................................................................................. 112
1. Introduction.............................................................................................. 112
2. The Expected Value of the Periodogram ...........................................114
3. Variance of the Periodogram................................................................ 116
4. Weighted Overlapped Segment Averaging (W O S A )......................117
5. Smoothing the Periodogram..................................................................119
III. M axim um Entropy Method (MEM) and the AR Method......................................... 122
IV. T he M oving Average (MA) M ethod............................................................................ 125
V. Autoregressive Moving Average (ARMA) M e th o d s................................................ 126
A. The General C a s e ............................................................................................... 126
B. Pisarenko’s Harmonic Decomposition (PHD)................................................ 127
C. Prony’s M e th o d ...................................................................................................130
VI. M aximum Likelihood Method (MLM) — Capon’s Spectral Estim ation...............133
VII. Discussion and Comparison o f Several M ethods.......................................................134
References........................................................................................................................................ 137
Chapter 9
Adaptive Filtering
I. Introduction........................................................................................................................141
II. General Structure of Adaptive F ilters.......................................................................... 142
A. Introduction..........................................................................................................142
B. Adaptive System Parameter Identification..................................................... 142
C. Adaptive Signal Estimation............................................................................... 142
D. Adaptive Signal Correction............................................................................... 143
III. Len^t Mean Squares (LMS) Adaptive Filter................................................. ...........143
A. In tro duction..........................................................................................................143
B. Adaptive Linear C om biner................................................................................144
C. The LMS Adaptive Algorithm.......................................................................... 145
D. The LMS Adaptive Filter...................................................................................147
IV. Adaptive Noise Cancelling..............................................................................................147
A. Introduction..........................................................................................................147
B. Noise Canceller with Reference In p u t............................................................ 148
C. Noise Canceller without Reference Input.......................................................153
• D. Adaptive Line Enhancer (ALE) .......................................................................154
V. Im proved Adaptive Filtering........................................ ................................................. 154
A. Multichannel Adaptive Signal Enhancement..................................................154
B. Time-Sequenced Adaptive F ilterin g ................................................................156
References........................................................................................................................................ 158
In d e x ..................................................................................................................................................161
Volume II
Chapter 1
Wavelet Detection
I. Introduction........................................................................................................................... 1
II. Detection by Structural F eatures......................................................................................2
A. Simple Structural Algorithms............................................................................... 2
B. Contour Lim iting....................................................................................................5
III. M atched Filtering................................................................................................................ 6
IV. Adaptive Wavelet D etectio n ............................................... .,........................................... 9
A. Introduction ............................................................................................................ 9
B. Template A daptation........................................................................................... 10
C. Tracking a Slowly Changing W avelet. . . . ....... .............................. 12
D. Correction of Initial T em plate. . . . . . . . . . . . . . . . ..... ......................... 12
V. Detection o f Overlapping Wavelets .............................................................. 14
A. Statement ot the P roblem .. . ....... ............... ................. .................... 14
B. initial Detection and Composite Hypothesis Form ulation. . ......... 15
C. Error Criterion and M inim ization........................................ . .......... . 16
R eferences.............. ................................................................................. ....................... 17
C h a p te r2
Point Processes
I. Introduction...................... — ....................... ....................... ......... ............... 19
II. Statistical P relim inaries..... ............................. ................................ .......... 20
HI. Spectral A n a ly sis.................... ....................................................................... 24
A. Introduction .. .................................................................................. 24
B. Interevent intervals Spectral A n aly sis..... ........ ............................... 24
C. Counts Spectral A nalysis............ ....................................................... 25
IV. Some Commonly Used M o d e ls........................................ ............................. 26
A. Introduction.............. .......................................... ................................ 26
B. Renewal Processes . ...................................................................... 26
1. Serial Correlogram ........................................ .................. . 27
2. Flatness o f S pectrum ................................... — ............. 27
3. A Nonparametric Trend T est........................................ .— 28
C. Poisson Processes . . .............. ................................. ............................ 28
D. Other D istributions.................................................... . . . . .............. .31
1. The Weibull D istribution..................................................... .31
2. The Erland (Gamma) Distribution ....................................... .32
3. Exponential Autoregressive Moving Average (EARMA), .32
4. Semi-Markov Processes....................................................... .32
V. Multivariate Point P ro cesses............ .......................... ............................. .33
A. In troduction ..................... ............................................ ...................... .33
B. Characterization of Multivariate Point P ro cesses.............. ......... .33
C. Marked P ro c esses............................................................................... .35
R eferences..................................................... ................................................................. . 35
Chapter 3
Signal Classification and Recognition
I. Introduction..................................... ................................................. ............... .3 7
II. Statistical Signal C lassification......................................... ........................... .3 9
A. In troduction.......................................................................................... .3 9
B. Bayes Decision Theory and Classification.............................. — .3 9
C. k-Nearest Neighbor (k-NN) Classification..................................... .5 0
III. Linear Discriminant F unctions............................... ...................... ............... .53
A. In troduction.......................................................................................... .5 3
B. Generalized Linear Discriminant Functions.............. ..................... .55
C. Minimum Squared Error M ethod..................................................... .5 6
D. Minimum Distance Classifiers.......................................................... .5 8
E. Entropy Criteria M ethods................................................................... .6 0
1. Introduction.............................................................................. .6 0
2. Minimization of Entropy....................................................... .60
3. Maximization of E n tro p y ..................................................... .6 2
IV. Fisher’s Linear D iscrim inant................ ......................................................... .6 3
V. Karhunen-Loeve Expansions (KLE)................................................................................66
A. In troduction............................................................................................................66
B. Karhunen-Loeve Transformation (KLT) — Principal Components Analysis
(P C A )...................................................................................................................... 67
C. Singular Value Decomposition (SV D ).............................................................. 69
VI. Direct Feature Selection and O rdering...........................................................................75
A. In troduction............................................................................................................75
B. The Divergence ....................................................................................................76
C. Dynamic Programming M eth o d s.......................................................................77
VIL Tim e W a rp in g .................................................................................................................... 79
R eferences.......................................................................................................................................... 84
Chapter 4
Syntactic Methods
I. Introduction.................................................................................................................. ....8 7
II. Basic Definitions of Formal Languages.......................................................... .............89
III. Syntactic Recognizers........................................................................................................92
A. In troduction.................................................................................. ...... .................. 92
B. Finite State A utom ata..........................................................................................92
C. Context-Free Push-Down Automata (PD A )..... ............................................... 95
D. Simple Syntax-Directed T ranslation...................................... ........................100
E. P a rsin g ...................................................................................................................100
IV. Stochastic Languages and Syntax A n aly sis................................................................101
A. In troduction ........................................ ...................... ......................................... 101
B. Stochastic R ecognizers...................................................................................... 102
V. Grammatical Inference.................................................................................................... 104
VI. E x a m p le s.......................................... ........................................................... ..................104
A. Syntactic Analysis of Carotid Blood P re ssu re.............................................. 104
B. Syntactic Analysis of E C G ................................................................................106
* C. Syntactic Analysis of E E G ................................................................................110
References........................................................................................................................................ i ii
Appendix A
Characteristics of Some Dynamic Biomedical Signals
I. Introduction........................................................................................................................113
II. Bioelectric S ig n als............................................. ..............................................................113
A. Action P otential...................................................................................................113
B. Electroneurogram (E N G )...................................................................................113
C. Electroretinogram (E R G ).......................................................................... ........113
D. Electro-Oculogram (EO G )................................................................................. 114
E. Electroencephalogram (EE G )............................................................................ 114
F. Evoked Potentials (E P )................................................................................ . 117
G. Electromyography (E M G )............ ....................................................................119
H. Electrocardiography (ECG, E K G )................................................................... 121
1. The S ig n al............................................................................................... 12i
2. High-Frequency Electrocardiography..................................................124
3. Fetal Electrocardiography (FE C G )..................................................... 124
4. His Bundle Electrocardiography (H B E )............................................ 124
5. Vector Electrocardiography (V C G ).................................................. Iz4
I. Electrogastrography (EGG).......................................................................... •1 2 4
J. Galvanic Skin Reflex (GSR), Electrodermal Response (EDR).................. 125
III. Im p ed ance. .................................................................................................................... 125
A. Bioimpedance ..................................................................................................... 125
B. Impedance Plethysm ography.................................................... ...................... 126
C. Rheoencephalography (R E G )........................................ ............j......................126
D. Impedance Pneumography............... . ........................................j.....................126
E. Impedance Oculography (Z O G )............................................... 1......... . 126
F. Electroglottography.............................................................................................126
IV. Acoustical Signals .............. . . . . . ............... ..................................................... ............126
A. Phonocardiography . . . . . . ........ ......................................................................... 126
1. The First Heart Sound . ....................... ............................... ............... 126
2. The Second Heart Sound .................................................................... 127
3. The Third Heart Sound .......................... ............................................. 127
4. The Fourth Heart S o u n d ......................................................................127
5. Abnormalities of the Heart S o u n d .............. ...................................... 127
B. Auscultation.........................................................................................................127
C. V o ic e ...................................................................................................................128
D. Korotkoff S o u n d s....... ........................................... .........................................129
V. Mechanical S ig n als........................................................................................................ 130
A. Pressure Signals ................................. ................................................................ 130
B. Apexcardiography (ACG) .................................................................................130
C. Pneumotachography.............................................................................. ............130
D. Dye and Thermal Dilution............................................................................... 130
E. Fetal M ovem ents................... ............................................................................ 131
VI. Biomagnetic Signals .......................................................................................................131
A. Magnetoencephalography (M EG).................................................................... 131
B. Magnetocardiography (M C G )......................................................................... 131
C. Magnetopneumography (M PG )........................................................................131
VII. Biochemical S ig n als......................................................... .......................................... 131
V lli. Two-Dimensional Signals............................................................................................. 132
References...................................................................................................... ............................ .134
Appendix B
Data Lag Windows
I. Introduction.............. ........................................................................................................ 139
II. Some Classical W indow s.............................................................................................. 139
A. Introduction.................................................................... ................................. 139
B. Rectangular (Dirichlet) W indow..... ..................................................... .......... 140
C. Triangle (Bartlet) Window ...................................' .......................................... 140
D. Cosinea Windows .............................................................................................141
E. Hamming W in d o w ............................................................................................143
F. Dolph-Chebyshev W indow ...................................... ....... ................................ 145
References......................................................................................................................................151
Appendix C
Computer Programs
I. Introduction........................................ .............................................................................. 153..
II. Main P ro g ram s....... ........................................................................................................ 154
• NUSAMP (Nonuniform Sam pling).................................................................154
• SEGMNT (Adaptive Sesm entation)............................................................,.1 5 8
• PERSPT (Periodosram Power Spectral Density E stim ation)..................... i62
• WOSA (WOSA Power Spectral Density Estim ation)................................. 162
• MEMSPT (Maximum Entropy [MEM] Power Spectral
Density E stim ation)........................................................ ...................................165
• NOICAN (Adaptive Noise Cancelling)......................................................... 167
• CONLIM (Wavelet Detection by the Contour Limiting M ethod)............ 169
• COMPRS (Reduction of Signal Dimensionality by Three Methods:
K:T+iunen-Loeve (KL], Entropy [ENT], and Fisher Discriminant [F I]... 171
III. S u b ro u tin es.......................................................................................................................174
• LMS (Adaptive Linear Combiner, W idow’s Algorithm)................ ............174
• NACOR (Normalized Autocorrelation Sequence)........................................ 175
• DLPC (LPC, PARCOR, and Prediction Error of AR Model of Order
P) ...........................................................................................................................176
• DLPC 20 (LPC, PARCOR, and Prediction Error of All AR Models of
Order 2 to 2 0 )............................................................................................ .....1 7 7
• FTOIA (Fast Fourier Transform {FFT])........................................................178
• XTERM (Maximum and Minimum Values of a V ecto r).......................... 180
• ADD (Addition and Subtraction of Matrices)............................................... 180
• MUL (Matrix Multiplication)........................................................................... 181
• MEAN (Mean of a Set of Vectors).................................................................181
• COVA (Covariance Matrix of a Cluster o f Fectors)................................... 182
• INVER (Inversion of a Real Symetric M a trix )............................................183
• SYMINV (Inversion of a Real Symetric Matrix, [Original Matrix
Destroyed]).......................................................................................................... 183
• RFILE (Read Data Vector From Unformatted F ile )................................... 185
• WFILE (Write Data Vector on Unformatted F ile)...................................... 186
• RFILEM (Read Data Matrix From Unformatted F ile)................................187
• WFILEM (Write Data Matrix on Unformatted File)................................... 188
Index 189
Volume //; Compression and Automatic Recognition 1
Chapter 1
W AVELET DETECTION
i
— i
I. INTRODUCTION
where
S(t) = 2 G iS iU - t,) (1 -1 B )
i= 0
with
t ^ T ^ T (1.1C)
and
i +
- J S?(t - t;)dt - 1 for all i (L ID )
In general, the wavelets Sj(t) are stochastic and our goal is to determine the expectation
E{Si(t)}. In some cases the wavelets are deterministic and we are interested in the exact
shape of each one (for example, consider the problem of single evoked potential estimation).
Sometimes it will be sufficient to get the mean of several wavelets, say
_ t 1
S(T ) = 7 £ S,(T)
I i= l
where
and
where the gains, Gj’s, are now funcjions of time and the noise process, n(t), is nonstationary.
The signal, S(t), consists o f a series of wavelets Sis each with gain G*, appearing at times
tj. Each wavelet is finite in duration and its energy is normalized to 1. We shall also assume,
for the time being, that the wavelets do not overlap, namely,
An a priori knowledge on the wavelets, Sj(t), is given in terms of the estimate S(t), which
is termed “ the template” . The template S(t) is zero outside the range 0 ^ t ^ Ts < T.
The problem of wavelet detection can now be formulated as follows. Given the initial
template, S(t), estimate the occurrence times, tj, the exact shape, S;(t), and the gains, G;.
This is, of course, the general problem. For some applications it may be sufficient justtc
determine whether a wavelet was present in a given time window.
The sampled version of Equation 1.2 is given by:
with
and
j k, +M- I
- 2 ) S?(k - k,) = i (1.4C)
M k=
where the sampling interval was assumed to be one without the loss of generality. (If sampling
interval is important just replace k and M by At*k and AtM, etc.)
Sophisticated algorithms are available to detect the presence of a wavelet by analyzing
its structure. These syntactic methods are discussed in Chapter 4. Several methods exist for
shape analysis of waveforms. General discriptors like the Fourier discriptors,20 polygonal
approximations,21 and others are used mainly for the analysis of two-dimensional pictures,
but also for one-dimensional signals. Most often, however, simpler methods are used,
specifically designed to the application at hand. The advantage of these methods is their
simplicity and the ability to implement it on relatively inexpensive, dedicated hardware. The
main disadvantage, however, is the fact that each method is specific to a given wavelet and
cannot be generally applied. These schemes are usually rigid and do not lend themselves to
adaptation. It is mainly applied to QRS detection22 and to the detection of wavelets in the
EEG.23 Since methods of the type discussed here depend on the wavelet, the best way to
describe it is by an example.
A
V
large positive or negative bias in the signal. Second, sometimes line frequency noise may
interfere. To overcome these problems, we shall use the sequence of first difference of the
observed ECG signal (Equation 1.4). Assume the line frequency is 60 Hz, and we sample
the ECG at the rate of 1800 samples per second.
Consider the first difference:
Note that the difference is 30/1800 = 1/60 sec, namely, one period of the line frequency.
The first difference is thus synchronized to the line interferences in such a way that these
interferences do not appear in x(k) (see discussion on seasonal time series in Chapter 7.
Volume I). Note also that base line shifts, which are usually much slower than the power
line interferences, will also be eliminated by Equation 1.5.
The wavelet present in the signal, x(k), will now be transformed to
Equation 1.6 may increase the sensitivity of the algorithm to noise, since it is analogous to
differentiation.24 Since the difference operator has removed most of the base line shift, we
can now apply threshold techniques. Consider the following threshold25 procedure. Let
XM , = Min x(k)
k
The presence o f ith QRS wavelet is determined at k,, the ith time that the threshold is being
crossed:
j x(k)S *T H R (1.9)
This algorithm has not used all the structural characteristics of the QRS. The R wave consists
of an upslope o f typical slope and duration26 followed immediately by a downslope with
characteristic slope and duration. Condition 1.9 can be considered as an hypothesis for QRS
complex. T he hypothesis can be accepted if the upslope and downslope in the neighborhood
of kj meet the R wave specifications.
The accuracy of the QRS detector is important, especially when high frequency ECG is
of interest (see Appendix A). The inaccuracies in the QRS time deiecliop, known as jitters,
in simple threshold detection systems were discussed by Uijen and co-w orkers.<r'
Many other algorithms for QRS detection have been suggested.37 24 A weil-known al
gorithm is the amplitude zone time epoch coding (AZTEC).,K This algorithm has been
developed for real time ECG analysis and compression. AZTEC analyzes the ECG waveform
and extracts two basic structural features: plateaus and slopes. A plateau is given in terms
of its amplitude and duration and a slope by its final elevation and duration. Another
algorithm, the coordinate reduction time encoding system (CORTES). has been suggested
which is an improvement to the AZTEC.
B. C o n to u r L im iting
The methods discussed in the previous section suffer from the fact that they are “tailored”
to a specific wavelet. The method of contour limiting*5-’2 is more flexible in the sense that
it can easily be adapted to various wavelets. The contour limiting uses a template, which
is some typical wavelet. The template can be constructed from some a priori knowledge
about the wavelet or by averaging given wavelets that were detected.and aligned manually.
Knowledge about the expected variations of each sample of wavelet is also required. From
this know iedge upper and lower limits are constructed. Consider rhe observation vector x(k)
of Equation 1.4. The upper limit S *(k) and lower limit S ~(k) are given by
k - 1 ,2 ,...,M (1.10B)
where GS(k) is the template and L~(k) and L _(k) are functions derived from the variations
of the template. These can be taken, for example, to be the estimated variance at each point.
Detection is performed as follows: At the time k, an observation vector x(k) is formed from
the observation signal x(k), such that
In the same manner, the upper and lower limit vectors S + and S" are defined:
FIGURE 2. Detection o f QRS complex by the contour limiting method: upper trace, noisy ECG signal;
lower trace, PQRST complex detection.
At each time k, the observation vector is compared with the limits. A wavelet is assumed
to be present in the observation window if
S ~ ^ x (k )^ S + (1.14)
If Equation 1.14 is not true, the observation window' is assumed not to contain a wavelet.
The data in the observation vector is shifted by one sample and the new vector x(k -f 1)
is checked by the Equation 1.14.
No general rules can be given concerning the exact construction of the limit functions
L +(k) and L~(k). Making these limits large will sometimes allow noise to be recognized
as a wavelet (false positive). Making the limits small will cause some wavelets to be rejected
(false negative). The decision as to the required safety margin depends on the relative
importance of the above two errors to the particular application. Equation 1.14 can be relaxed
by requiring that only a certain fraction, say 90%, of the M elements of the observation
vector x(k) obey Equation 1.14. This will reduce the sensitivity to noise.
and detect the QRS complex o f the signal of Example 1.1, by means of Equation 1.14.
Figure 2 shows the detection results.
III. M A T C H E D F IL T E R IN G
require that the filter will cause the wavelet (when present) to be amplified while relatively
attenuating the noise, thus increasing the signal-to-noise ratio. The output o f this filter will
be subjected to a threshold to detect the presence of a wavelet. Assume that we want to
consider an MA filter (Chapter 7, Volume I). The output of the filter is given by the
observation vector yT(k) = [y(k - 1), . . . ,y(k - M + 1)], where
M - I
y(m) = 2 “ j) (115)
j-0
We have chosen a filter of order M so that if a wavelet is present, me output of the filter
will contain information about the complete wavelet. Consider now the following signal to
noise ratio, SNR0, at the output of the filter.
The numerator of Equation 1.16 is the power of the difference between the filter’s output
with and without a wavelet at the input. The variance of output’s noise serves as a nor
malization factor. Let us arbitrarily choose the \th wavelet to represent the time frame
including a wavelet, then m = k, + M - 1 and
, M -1 >>
M I
= 2 b^G^M - 1- j)} (1.17)
j-0
Where we have used the assumption, the noise has a zero mean:
E{y(m)|x(m) = n} = 0 (1.18)
M- I
Var{y(m)|x(m) = n} = a ; ^ b,2 (1.19)
j=o
(1.20)
The right term in Equation 1.20 is due to the Schwarz inequality. Maximization of the
signal-to-noise ratio of Equation 1.20 is given when equality occurs. This takes place for
6S (M - 1 - j) = E{G|Si(M - 1 - j)}
j = 0 ,1 ,...,M ~ 1 (1.22)
b3 =".K6S(M W :! - j)
j = - 1 (i.23)
Where K is an arbitrary constant, we shall thus choose K = 1. This optimal filter is known
as the matched filter.
We can rewrite the filter’s coefficients m ^vtui lurm:
b = GS (1.26)
where x(m) is defined in Equation 1.11. The last equation states that the matched filter is
equivalent to a cross correlator, cross correlating the observation window x(m) with the
template J . The maximum signal-to-noise ratie for the matched filter is achieved by intro
ducing Equation L26 into 1.20.
M - 1
The matched filter procedure can be summarized as follows. We estimate the template
GS and store it in memory. For each new sample of the incoming signal* x(k), we form the
observation vector x(k) (by opening a window of M samples). We cross correlate the template
and observation window to get the kth sample of the output. This we compare with the
threshold. The observation window for which y(k) has crossed the threshold is considered
to contain a wavelet. Correlation-based detection procedures have been applied to biomedical
signals.34'37 Note that here as in the previous discussion, we only determine the presence or
absence of a wavelet in the observation window, but not its exact shape.
Volume U: Compression and Automatic Recognition 9
A. Introduction
We shall consider now the problem of wavelet detection while estimating and adapting
the tem plate.34 T his is required when the a priori information is insufficient or when the
wavelets are nonstationary and the template has to track the slow variations in wavelets.
We consider here a modification of the filter discussed in the previous section,
Consider the average squared error e2(k,M) between the signal x(k) and the estimated
wavelet:
The best least squares estimate of the gain, G0(k), is the one that minimizes the error of
Equation 1.29. This optimal estimated gain is given by
i (£T(k& R
G0(k> = — ----------- = (1.30)
at time k^ when the observation vector contains the j th wavelet. Sj4 Equation 1.30 becomes
For a stationary noise process and large M, the last term of Equation 1 .3IB is almost
constant. Since the noise and wavelets are independent, the third term becomes small. The
first tw>o terms of Equation 1.3IB denote the error between the energy of the wavelet GjSj
and its estimate G(kj)S. This term will yield a minimum at times kj. Detection of time of
occurrence kj is thus achieved by finding the local minima of the error e-.
The minima at times kj are local minima. For example, in the noise-free case, the estimated
gain and the error will be zero in segments with no wavelet, while the minimum error at
times kj will be some local minimum, probably above zero. To produce an improved error
function we shall introduce a weighting function. This function will ensure higher error
values for data associated with low probable gains. Suppose the gains G3are random variables
w^ith known probability distribution P(G). Define a weighting function W(G):
P(E{G})
W(G) = f ( G ) P(G) (L32)
«
The weighted error e2(k,M) is
and is inversely proportional to the gain probability distribution. The function f(G) in Equation
1.32 is chosen such that
The weighting function assures high errors for very low gains and reduces the error for the
more probable ones. Since for low gains the error approaches zero as G 2, the function f(G)
must obey:
For the examples described in this section, the gains were assumed to be gaussian distributed
and the function f(G) was chosen as
( exp(G/E{G} - l ) \y
t(G)={ - - - B E IG ) j
7 > 2 (1.36)
The parameter y is heuristically determined; other parameters of Equation 1.36 are determined
from a priori knowledge of wavelets statistics or by sample estimation during training state.
Assuming the correlation window is large enough that Rs„(k) — 0, it can be shown that the
expectation of the weighted error can be approximated by
E{e;.(k,M)} = ( G k | ) ( G k( R s - ^ ) )
@ k = kj (1.37)
The detection of the presence of a wavelet in the signal is performed by placing a threshold
on the weighted error function. The value of the threshold level LIM is experimentally
determined using a training signal. Assume an initial sample o f the analyzed signal as a
training signal. This record is analyzed, for example, visually, by a trained person, with L
wavelets at times k*, i = 1 , 2 , . . . ,L, detected. The unweighted error is calculated and its
mean is estimated by .
1 L
fi{e2(k,M)}„,,„ = - X e2(k„M) (1.38)
L t=i
B. Template Adaptation
The need to adapt the template arises in two cases; when the initial information concerning
the wavelet is insufficient, and thus S0 has to be improved, and when the wavelets are slowly
changing in time so that tracking action is required. Template adaption is performed each
time a wavelet is detected. Assume that at time m k the kth wavelet was detected; then
adaptation is achieved according to
Volume II: Compression and Automatic Recognition 11
where JS* is the adapted template, Su is the previous template, and p(k) and vji(k) are weights.
The current template is thus a linear combination of the last template and durrent observation
signal. The kth template can be expressed in terms of the (k - N)th tenjiplate:
§k ~ P n - i(k)S^ _ n + Sk + nk
k 5* N (1.4IA)
where
i
ll p(k - j) ; x 2* 0
(1.41B)
0 ; x < 0
. N - I
(1.41C)
§" = 2
j-n
Pj-,<kW>(k - j)G k-i§k-j
N- I
j N - I
The signal-to-noise ratio of the observation vector at time mk is similarly given by:
E{G3S[Sk Rs(mk)E{G2} „ _
i E{nTn} a2
The ratio SNR§/SNRXgiven by Equations 1.44 and 1.45 is a m ^ su re of the relative noisiness
o f the adapted template. We shall now deal separately with the two cases o f adaptation.
12 Biomedical Signal Processing
Since only slow changes in wavelets are allowed, one can assume “ almost” stationary
conditions with
Taking the expectation of Equation ! .40, with £{n} = 0 and the assumption of Equation
1.47 yields
The kth estimate of the template is given from Equations 1.41 and 1.46:
The relative noisiness of the estimate is given from Equations 1.44 to 1.46 by
Q ^ p C l (1.50)
In selecting the adaptation coefficients p and *1/, both the tracking rate and noisiness of
the template have to be considered. Define the tracking coefficient Tc to be the ratio between
the template (GS) part and the initial estimate (S* _ N) part of Equation 10.49, hence
1 - pN+1
Tc = 4/G — — - — p~ N
1 - p
0^ p < 1 (1.51)
In order to optimally select the adaptation coefficients, we shall define a cost function, Ic:
_ SNRs - SNRX
Ic = n(Te) - ^ (1-52)
estimate of the template. Here, the adaptation coefficients are time dependent. Exponential
decay has been chosen such that
p(k) = I ~ (j - p0) e - k ; 0 « p0 *s 1
The adapted template is thus a weighted average of about K = 5/a initial templates. After
this period, 4»(k) — 0 and the adaptation process is terminated. A priori knowledge and
estimation o f the goodness of the initial template S0 allows the determination of a .
In this case, the convergence rate coefficient C\ will be defined instead o f the tracking
coefficient (Equation 1.51).
AE(0) - AE(k)
Q.(k) = -----------r ------------ (i . 5 4 )
where
lE (k ) is thus the mean square error between the current template and the wavelet. The
relation between the adaptation coefficients a ,p 0,ili„ is given by the maximization of the cost
function
^ SNR,(k) - SNRX
I - c ------- -------------------- 11.56)
SNRX
Note that here, the assumption in Equation 1.47 can be used only with great care since
templates may be highly nonstationary.
FIGURE 3. Detection error for real ECG signal. (A) Template; (B), (C)
signal and weighted error, SNR = infinity; (D), (E) signal and weighted error,
SNR = 2.48; (F), (G) signal and weighted error, SNR = 1.24. (From Cohen,
A. and Landsberg, D ., IEEE Trans. Biomed. Eng., BME-30. 332, 1983 (©
1983. IEEE). With permission. )
t€(t,t - T) (1.57)
Volume II: Compression and Automatic Recognition 15
\ — | — If— If Ir
is before, the Gj’s and tj’s are unknown and n(t) is white noise. De Figueiredo’s algorithm
s implemented in two steps.
£ j(T ) = jjfijS /t - 7) ~ X ( t ) |d t
j = 1 ,2 ,..: ,u (1.58)
T D
f F
The output of the filter is achieved by integrating (over the window) the absolute value
of the difference between the observation and the template. This procedure is repeated while
shifting the template with various delays, t . Note that these filters require no multiplications.
The point where Zj( t ) gets its minimum is the most likely location ( t ) of Sj in the composite
wavelet and serves as the first estimate for Tj, and the actual participation of Sj. The minima
points of all Zj( t ) , j = 1,2, . . . ,n, are compared with a threshold. All templates whose
corresponding filters provide minimum below the threshold are hypothesized to be present
in the composite wavelet.
J(G,t ) is differentiated with respect to G, and t, and the result is set equal to zero. This
leads to the set of 2n normal equations
E - t.) • G, = Rs.s(Tj)
i= I
j = l ,2 ,...,n (1.60A)
Volume II: Compression and Automatic Recognition 17
i
i-I
R*<tj - T<> • o, = R,„(Ti)
j = 1 ,2,...,n j (1.60B)
where
REFERENCES
21. Lew is, J . W. and G raham , A. H ., High speed algorithms and damped spline regression and electrocar
diogram feature extraction, paper presented at the IEEE Workshop on Pattern Recognition and Artificial ^
Intelligence, Princeton, N .J., 1978
22. Holsinger, W . P ., K em pner, K . M ., and M iller, M. H ., A QRS processor based on digital differentiation,
I E E E T ra n s . B io m e d . E n g ., 18, 212, 1971.
23. D e Vries, J ., W ism an, T ., and Binnie, C . D ., Evaluation o f a simple spike wave recognition system,
E le c tro e n c e p h a lo g r. C lin . N e u r o p h y s io l., 51, 328, 1981.
24. Goldberger, A. L . and Bhargava, V ., Computerized measurement o f the first derivative o f the QRS
complex: theoretical and practical considerations, Comput. Biomed. Res., 14, 464, 1981.
25. H aywood, L. J ., M ur thy, V . K ., H arvey, G ., and Sattzberg, S ., On line real time computer algorithm
for monitoring ECG waveforms, Comput. Biomed. R es.. 3, 15, 197C
26. Fischhof, T . J ., Electrocardiographic diagnosis using digital differentiation. Int. J. Bio-Med. Comput., 13,
441, 1982.
27. Colm an, j . D. and Bolton, M . P ., Microprocessor detection o f electrocardiogram R-waves, J. Med. Eng.
Techno!.. 3. 235, 1979.
28. Talmon, J, L. and K asm an, A ., A new approach to QRS detection and typification, IEEE Comput.
Cardiol., 479. 1981.
29. Nygards, M. E. and Sorntno, L ., Delineation o f the QRS complex using the envelope of the ECG, Med.
B io l Eng. Comput., 21, 538, 1983,
30. Uijen, G. J . H ., De W eerd, J . P. C ., and Vendrik, A. J . H ., Accuracy o f QRS detection in relation to
the analysis o f high frequency components in the ECG, Med. Biol. Eng. Comput., 17, 492, 1979.
31. Van den Akker, T . i M R os, H . H ., Koelm an, A. S. M ., and Dekker, C ., An on-line method for reliable
detection o f waveforms and subsequent estimation o f events in physiological signals, Comput. Biomed.
Res., 1 5 ,4 0 5 , 1982.
32. Goovaerts, H. G ., Ros. H . H ., Van den Akker, T . J ., and Schneider, H. A ., A digital QRS detector
based on the principle o f contour limiting. IEEE Trans. Biomed. Eng., 23. 154, 1976.
33. Papoulis, A ., Signal Analysis, McGraw-Hill, Kogakusha, Auckland, 1981.
34. Cohen, A. and Landsberg, D ., Adaptive real t'.me wavelet detection, IEEE Trans. Biomed. Eng., 30,
332, 1983.
35. Collins, S. M. and Arzbaecher, R. C ., An efficient algorithm for w aveform analysis using the correlation
coefficient, Comput. Biomed. R es., 14, 381, 1981.
36 Fraden, J . and Neum an, M . R ., QRS wave detection, Med. Biol. Eng. Comput.. 18, 125. 1980.
37. De Figueiredo, R. J . P. and G erber, A ., Separation o f superimposed signals by a cross-correlation
method, IEEE Trans. Acoust. Speech Signal Process., 31, 1084. 1983.
38. C ox, J . R ., Nolle, F. M ., Fozzard, H . A ., and Oliver, G . C ., AZTEC: a preprocessing program for
real time ECG rhythm analysis, IEEE Trans. Biomed. Eng., 15, 128. 1968.
39. Abenstein, J. P. and Tom pkins, W . J ., A new data reduction algorithm for real-time ECG analysis, IEEE
Trans. Biomed. Eng., 29, 43, 1982.
Volume II: Compression and Automatic Recognition 19
Chapter 2
POINT PROCESSES
I. INTRODUCTION
Point p ro cesses'1 are random processes which produce random collections of point oc
currences, or series o f events, usually (but not necessarily) along the time axis. In univariate
point process analysis, the exact shape of the event is of no interest. The “ tim e" of occurrence
(or the intervals between occurrences) is the only information required. A more general case
is the multivariate point process in which several classes of points are distinguished. In the
multivariate case, the shape of the event serves only to classify it. The statistics of the
process, however, are given in terms of the intervals only. Point processes can be viewed
as a special case o f general random processes4 and can be dealt with as a type of time series.
Point processes analysis has been applied to a variety of applications ranging from the
analysis o f radioactive emission to road traffic studies and to queuing and inventory control
problems. Point processes theory has been applied to the analysis of various biomedical
signals.4 * The main application, however, has been in the field of neurophysiology.7 16
A neural spike train is the sequence of action potentials picked up by an electrode from
several neighboring neurons. The neurophysiologist is interested in the underlying cellular
mechanisms producing the spikes. He may investigate, for example, the effects of environ
mental conditions such as temperature, pressure, or various ion concentrations or the effects
of pharmacological agents. The analysis of the spike train may be used for the description,
comparison, and classification of neural cells. Different interval patterns mas result from
the same cell under different conditions. Interneural connections may be investigated by
analyzing the corresponding spike trains. Multivariate point processes analysis is sometimes
required1" when the spike train contains action potentials from more than one neuron.
Classification of each spike into one of the classes to be considered in the analysis is needed.
Classification o f spikes is done by means of the methods discussed in Chapter 1, Volume
II. Figure 1 shows a record of neural spike train.
Analysis of myoelectric activities has also been performed by point processes meth
o d s.17,4 Here the motor unit action potential train has been modeled as a point process.
Characteristic deviations from normal motor unit firing patterns were suggested to serve as
a diagnostic tool in neuromuscular diseases.20 It was found, for example, that both firing
rate and SD of the interpotential intervals increase in patients with myopathy.
The ECG signal can be considered a point process21-22 when only the rhythm of the
heartbeat is of interest and not the detailed time course of polarization and depolarization
of the heart muscle. The occurrence of the R wave is defined as an event and the R-R
interval statistics is of interest. Figure 2 shows a record of ECG signal. The high signal-to-
noise ratio allows the detection of R waves with a simple threshold device thus generating
a point process record.
l he occurrence of glottal pulses during voiced segments of speech23 can be analyzed as
a point process. The time interval between consecutive glottal pulses, known as the pitch
period, is a function o f the vocal cord’s anatomy. Laryngial disorders can be diagnosed24
by means of speech signal analysis. Here the detection of the event is not an easy task.
Several algorithms have been suggested23 for pitch extraction. Figure 3 shows a sample of
voiced speech where the pitch events are clearly seen.
Once the events of the process have been defined, dntn fitted25 into a noint process
model. The most often used models are the renewal process, Poisson distribution, Erlang
(Gamma) distribution, Weiball distribution, and AR and MA processes. Analysis of the
20 Biomedical Signal Processing
M— u — JL-J—
liiiliiiiiiiijiiiiiiiiiij!! J p
0.5 nA
~~u
j 20 mV
^ U 4 - 4 t l U i U 0 4 l ~ J ___ U -— __
FIGURE 1. Neural spike train. Spikes recorded from a photoreceptor stimulated by a light step.
(From Alkon, P. and Grossman, Y ., J. Neurol., 41, 1978. With permission.)
process includes statistical tests for stationarity, trends and periodicities, and correlation and
spectral analysis. These will be discussed in the following sections.
The point process is completely characterized by one or two of the canonical form s:1 the
interval process and the counting process. These are schematically described in Figure 4.
The interval process describes the time behavior of the events. The random times, tj, i
= 1,2, . . . ,M, at which the \th event occurs is a way to describe the process. Here an
arbitrary point in time is chosen as a reference. At this origin point, an event may or may
not have occurred. The time intervals between two adjacent events. T s. i = 1,2, . . . ,M
- 1, can also descnoe me process.
Of interest also are the higher-order intervals. The nth order interval is defined as the
Volume II: Compression and Automatic Recognition 21
I ' I I I I II I I I I I I I I I I I I I I I I I I
TIME
EVENTS
COUNTS
Ord«r In te rv al
elapsed time between an event and the nth following event. Denote the nth order interval
by Tjn\ then:
T T = 2 Ti+
i=0
i = i ,2 ,... (2 . 1)
We shall define the quantity N (t„t2) to be the random number of events in the range
(t,,t2). We shall require that there are essentially no multiple simultaneous occurrences,
namely, the following condition exists:
A process for which Equation 2.2 holds is called an “ orderly process” . The random
quantity N(o,t) yields the counting canonical form of the process. The various random
variables defined above are drawn from the unucilying probability distribution of the point
process under test. Their statistics, usually first- and second-order statistics, are used to
characterize the investigated process.
Note that for the case of higher-order intervals, as the order n becomes larger there is_
substantial overlapping of the original intervals. The central limit theorem suggests that for
most distributions of the original interval, the nth order interval distribution will tend toward
gaussian.
The random variable tt (or T s) is described by one of several equivalent functions. The
probability dersity function, p(t), describing the random variable ti? is defined such that
p(t)*At is the probability that an event occurs between t and t + At. The probability density
function (PDF) may be expressed as:
Volume 11: Compression and Automatic Recognition 23
with:
r- p,(T)dT = 1 (2.4)
The interval histogram is often used as an estimator for the interval PDF. The cumulative
distribution function, P, (T), is the probability that the random variable 7 t is not greater than
T , hence,
The probability that the random variable is indeed greater than T is termed the survivor
function, R ;(T):
t < t (2.7)
The hazard function is also known as the “ postevent probability” , “ age specific failure
rate” , “ conditional probability” , or "conditional density function” . The hazard function
may be constant (as in the Poisson process) or may vary with t . Pacemaker neurons, for
example, exhibit interspike interval distributions with positive hazard function. Some neurons
in the auditory system, for example, exhibit interval distributions with negative hazard
functions. A similar function is the “ intensity function” . The complete intensity function,3
h0(t), is defined as:
The conditional intensity function. h(x), is defined such that h(T)At is the probability that
an event has occurred at time (t + t ) given that an event has also occurred at time t, hence,
conditioned upon having the previous event at t , namely, no event has occurred in the
interval while the intensity function is conditioned only to the occurrence of an event
at 7 .
The point process can also be described by means of the counting process (Figure 4).
The counting process, N(t), represents the cumulative number o f events in the time interval
(0,1). Hence,
The relationship between the two forms, the counting and interval form, is as follows:''4
Equation 2.11 states that at all times smaller than ts, the cumulative event counts must be
smaller than i. This is true since no simultaneous events are allowed (Equation 2.2).
Equation 2.11 yields (using Equation 2.6):
hence,
and also,
The last equations show that a direct relationship between the counting and interval forms
exists. The two processes are equivalent only by way of their complete probability distri
butions.1 In usual practice the analysis is based only on the first- and second-order properties
of the process. Such an analysis, based on the first and second order of a counting process,
is not equivalent to the analysis based on the interval process and information is gained by
considering both forms.
III. S P E C T R A L A N A LY SIS
A. Introduction
In general, the intervals (counts or event times) are statistically dependent. Hence the
joint PDF, p(TlfT2, . . . ,Tn), rather than Equation 2.3 has to be considered. The dependency
is usually experimentally analyzed by mean., of joint interval histograms (or scattering
diagrams) where two-dimensional plots describing the relations between p(T,) and p(T, + j)
are given.
The second-order statistics are very often analyzed by means of the correlation and power •
spectral density functions. In the analysis of point processes, two different types of frequency
domains have been introduced, that of the intervals and that of the event counts.
where jxr = E{T} is the expectation of the stationary interval process. The expectation
operator in Equation 2.15 means integration over the joint PDE. Let the variance of the
interval process be
Pk CTf (2.17)
k = . . . . - 1 ,0 ,1 ,...
The sequence {pk} is known as ihe serial correlogram. It is easily shown that - 1 pk
1. The serial correlation coefficients have been used extensively to describe statistical
properties of neural spike intervals. In practice the serial correlation coefficients have to be
estimated from a finite sample with N intervals. A commonly used estimate15 for pk is
with:
M--r(k) — S Tj +k (2.18B)
The interval power spectral density (PSD), S,(w), is given by the Fourier transform of the
serial correlation; hence:
E{N(t,t + At)}
X(t) = Lim (2.20)
At
X(t) is thus the local number o f events per unit time. In genera] for a nonstationary process,
the local rate is a function o f time. The counts PSD function, Sc(w), of a stationary process
(X(t) = X) is given by:3
( 2 . 21 )
where h (t) is the conditional intensity function given in Equation 2.9 Sc(w) is the Fourier
transform of the counts autoco variance. Methods for estimating the PSD function have been
reported in the literature .9rI
A. Introduction
The event generating process is usually to be estimated, or modeled, with the aid of the
finite time observed data. The various models are given in terms of the probability distribution
functions. The motivation for modeling the point processes mainly to represent the event
generating process in a concise parametric form. This allows the detection of changes in
the process (due to pathology, for example) and comparison of samples from various processes.
In a stationary point process, the underlying probability distributions do not vary with
time. Hence phenomena, common in biological signals, such as fatigue and adaptation,
produce nonstationarities. Testing stationarity and detecting trends are important steps in the
investigation of the point process; in fact, the initial step of analysis must be the testing of
the validity of the stationarity hypothesis. In the remainder o f this section, various distribution
models will be discussed. These models have been used extensively for modeling neural
spike trains, EMG, R-R intervals, and other biological signals.
B. Renewal Processes
An important class of point processes often used in modeling biological signals is the
class of renewal processes. Renewal processes are processes in which the intervals between
events are independently distributed with identical probability distribution function, say g(t).
In neural modeling it is commonly assumed8 that the spike occurrences are of a regenerative
type which means that the spike train is assumed to be a renewal process. This is used,
however, only in cases o f spontaneous activity. In the stimulated spike train, the neuron
reacts and adapts to the stimuli so that the interval independency is violated.
Consider the intensity function, h(t) (Equation 2.9), of the renewal process. Recall that
h(t)At is the probability of an event occurring in the interval (t + At) given that an event
has occurred at t = 0. The event can be the first, second, third, etc. occurrence during the
time interval (0,t).
It can be shown14 that when k events have occurred during the interval (0,t), the intensity
function of the renewal process becomes:
where (*) denotes convolution and the last term contains (k — 1) convolutions. Equation
2.22 is better represented via the Laplace transformation. Define
V o l u m e 11: C o m p re ss io n a n d Automatic Recognition 27
H(s) = L[h(t))
k G(s)(l - G“(s))
H(s) = 2 (G(s)V = (2.24)
i- I 1 - G(s)
/. Serial Correlogram
The assumption o f interval independency (in the sense of weak stationarity) can be tested
using the estimation of the serial correlation coefficients defined in Equation 2.17 and
estimated by Equation 2.18. The exact distribution of pk is, of course, unknown. However,
under the assumption that the process is a renewal process and for sufficiently large N, the
random variable pk/(n - 1 ) '2 (k > 0) has approximately normal distribution,'5 with zero
mean and unit variance. The null hypothesis H0 is that the interval sequence {T,.T2, . . . ,TN}
is drawn from a renewal process. The alternative, H,. hypothesis is that the intervals are
identically distributed, but are not independent.
A test based on pk will be to reject the renewal hypothesis H„. if:
where a is a predetermined significance level and za/2 is given by the integral equation over
the normalized (0.1) gaussian distribution:
(2.26)
(e.g., see Bendat and Piersol,27 Chapter 4). It has been argued that measurement errors (in
the case o f neural spike trains)'4-28 may introduce trends and dependencies between intervals,
thus rendering the serial correlogram test unreliable.
Perkel et a l.13 have suggested subjecting the sequence of intervals to random shuffling
and recomputing the correlation coefficients. Serial correlation due to the process (if it exists)
will be destroyed by the random shuffling. Computational errors, however, exist in the
estimation o f both original and shuffled correlations. A test for independence can then be
constructed from the comparison of the two correlograms (e.g., by means of the sum of
squares o f the difference between corresponding correlation coefficients). Other tests have
been suggested.35
2. Flatness o f Spectrumi
A renewal process has a flat intervals PSD function. Deviations from a flat spectrum can
be used as a test for interval independence.' When the spectrum is estimated by the per
iodogram (Chapter 8, Volume I), the flatness can be tested by the quantities C f
28 Biomedical Signal Processing
(2,21)
where c,, i ~ 1,2, . . . . N/2 - 1, are the elements of the periodogram. The quantities C
of Equation 2.27 represent the order sta tistic sfro m a uniform distribution. The Kolmogorov -
Smimov statistics79 iiiay be used to test the C ’s.
where
(2.28B)
If the intervals between events tend to increase with time, the T ‘s will increase with their
subscripts, causing the statistic D to be large. It can be shown that for sufficiently large n,
given H0 is true, D is approximately normally distributed with:
n2(n + l)2(n - 1)
Var{D|Ho} = (2.29)
36
The test, therefore, calls for rejecting the H0 assumption of no trend (hence the necessary
requirement for renewal process) fo r larger values of D.
C. Poisson Processes
Poisson processes are a special case o f renewal processes, in which the identical interval
distribution is a Poisson distribution. In the theory of point processes, the Poisson process,
due to its simplicity, pla^ > a ^umewhat analogous role to that of normal distribution in the
study o f random variables.
V o l u m e II: C o m p r e s s i o n a n d Auto ma ti c Recognition 29
: . The Poisson process, with rate X, is defined by the requirement that for all t, the following
exists as At —*• 0:
The constant rate, X, denotes the average events per unit time. An important aspect of the
definition (Equation 2.30) is that the probability does not depend on time. The probability
of having an event in (t,t + At) does not depend on the past at ali.
It is well known that for a random variable for which Equation 2.30 holds, the probab:,ity.
of r events occurring in t (starting from some arbitrary time origin) is
(Xt)r
Prob(N(t) = r) = — - e x p ( - X t) (2.31)
which is the Poisson probability. The probability of having zero events in time T. followed
by one event in the interval T tt* dt, is given by the joint probability of the two. However,
the two probabilities are independent, due to the nature of the Poisson. process Aiso the
probability o f having one event in the interval T 4- dt is by Equation 2.30 Xdt. hence,
or
The probability that in the following time interval of t -f dt one and only pne event will
occur is Xdt. Since the two are independent, the joint probability of their occtp en ce is given
by: -
or
\ n /" T ( n ) \n - 1
The PDF o f the nth order interval given by Equation 2.34B is known as the Gamma
distribution.
30 Biomedical Signal Processing
The survivor function (Equation 2.6) for the Poisson process is given by integrating
Equation 2.32B:
Consider now the autocovariance and the spectrum of the Poisson process. Since the interval,
T,, is independent o f Tj for all i # j , the autocovariance o f the process (Equation 2.15)
becomes a delta function. Its Fourier transform, the interval power spectral density function
(Equation 2.18), is thus constant (flat):
' W f r - a b (2 36)
It can also be shown that for the Poisson process the relative intensity function h (t ) =
X. Hence the counts’ power spectral density function (Equation 2.20) is also flat, with:
S,(w) - ~ (2.37)
Several statistics to test the hypothesis that a given sequence o f intervals was drawn from
a Poisson process have been suggested. For the Poisson process, the quantities
P = t/tN
(Figure 4) represent the order statistics from a random sample size N, which is uniformly
distributed with zero mean and unit variance.14 A modification to Equation 2.38 shows1
that when rearranging the intervals sequence to generate a new sequence {Tf} in which
T ?+1 ^ T? the quantities
, N -t-2 —i
P. = r 2 t?
IN
n ji =
= ll
also represent a similar order statistics. The Kolmogrov-Smirnov1’29 statistics can then be
used to test the Poisson hypothesis.
Other tests based, for example, on the coefficient of variations15 have been suggested. It
is sometimes of interest to test whether the Poisson process under investigation is a ho
mogeneous or nonhomogeneous Poisson process. A nonhomogeneous Poisson process is
one in which the rate o f occurrence, X, is not constant but time dependent — in other words,
a Poisson process with trend in the rate of occurrence. The Wald-Wolfowitz ran test27 may
be used for this task. For this test we define a set of equal arbitrary time interval lengths
(TIL). If the number o f events in the TIL exceeds the expected number for this interval, a
( + ) sign is attached to the TIL. If the number o f events is below the expected number, a
( - ) sign is attached. When the number of events equals the expected number, the TIL is
discarded. A sequence of ( + ) and ( - ) signs is thus generated. The number of runs, r, is
determined by summing up each uninterrupted sequence of ( -I-) or ( —). The sequence ( + +
------------+ - + + ) yields r = 5.
V o l u m e II: C o m p r e s s i o n a n d Automatic Recognition 31
and
(2.40B)
D. O th e r D istributions
In some cases the process under investigation does not fit the simple Poisson distribution.
Other distributions have been found useful in describing biological point processes. The
more commonly used ones are discussed here.
T > €
0 ; T < e
k > 0 ; v > e (2.42)
T > e
P(T,v,e,k) = <
0 T < €
k > 1 ; v 3= e (2.43)
A random variable, T, with a Weibull distribution has the expectation and variance given
by:29
e {t } = (v - € )r a + k - ’) (2.44A)
where eamma function. Note that for k = 1 the Weibull density reduces to the
exponential density.
32 Biomedical Signal Processing
0 ; T < 0
k > 0 (2.45)
where F(-} is gamma function. A random variable. T, with Erlang distribution has the
expectation and variance:
E{T} = ^
VarfTj = (2.46)
A~
4. Semi-Markov Processes
A sequence of random variables, xn, is called Markov if for any n we have the conditional
probability:
namely, the probability of the current event depends only on the event preceding it. Assume
now that the random variable, xn, is a discrete ran<jom variable taking the values a,,a2, . . . ,a„.
The sequence {xn} is then called a Markov chain. A semi-Markov process is a process in
which the intervals are random-variably drawn from any one of a given set o f distribution
functions.3 The switching between one probability function to another is controlled by a
Markov chain.
Consider the case with k “ classes” or “ types” and a set of k2 distribution functions Fi j5
i,j = 1,2, . . . ,k. Assume that each interval of the point process is assigned a “ class”
type, 1,2, . . . ,k. The assignment is determined by a Markov chain with transition matrix
P = (Pj j). A sequence of intervals beginning with type i and ending with type j is drawn
from the distribution Fy. The transition matrix P is such that when an interval had been
assigned a class i, the probability of the next interval to get the class j is Pio.
A special case of the Semi-Markov process is the two-State Semi-Markov model (TSSM).
Here the transition matrix P is
V o l u m e II: Comp re ss io n a n d Automatic Recognition 33
P. 1 - P,
P = (2.48)
I - P'S P2
F; , » F? , = F2 (2.49)
Equation 2.49 states that the interval probability distribution depends only on the type of
the interval and not on adjacent types. In a Semi-Markov process for which Equation 2.49
holds, the num ber of consecutive intervals which have the same distribution is geometrically
distributed.'
The TSSM model has been applied to spike train activity analysis. It was found, how ever,11
that it can be used only for a limited part of all experimental stationary data. The more
general nongeometric S em i-M a rk o v model has been applied with more success.3- The
nongeometric two-state S em i-M ark ov process has also been applied to neural analysis by
De K w aadsteniet.’1 The term semialtemating renewal (SAR) is used there.
A. Introduction
In multivariate point processes. two or more types of points are observed. This may be
the case, for exam ple, when two or more univariate point processes are investigated and
the relationship or dependence between them is sought. Another example may be where
several d iffe re n t processes are recorded together. M ultispike trains are com m on in
neurophysiology10 ?3 when an electrode picks the spikes of the neuron under investigation
together with spikes from neighboring neurons. It is often possible to distinguish between
the various neuron spikes based on the difference in pulse shapes.1,1 The record thus can be
considered a multivariate point process. A similar process occurs when recording muscle
activity. Several motor units form a multivariate point process.
In general, the various types of the multivariate process are dependent on one another; it
is therefore necessary to have the conditional probability functions in order to characterize
the process.
’NO.) < n,
X 'T(i) > t,
•—I
n2
S ’ T ( i) > t 2 v2.50B )
i= I
Similar to the univariate process we shall define the cross intensity function 2h(7) as a
generalization of Equation 2.9.
The cross intensity function, |h(7), is similarly defined. The cross intensity, 2h(7)At, yields
the probability of having an event o f type 1 at time 7 , given an event of type 2 has occurred
at the origin. Note that jh(7) is the univariate conditional intensity. The complete intensity
function of the multivariate process is defined (see Equation 2.8):
and the complete intensity function of simultaneous occurrence o f the two types at time 7
is
It is sometimes required that one ignore the different types of the multivariate process
and consider it a univariate process. The conditional intensity function of such a process is
given, in terms of the intensities of the multivariate process:3
h(T) = — 7
A.j 1—A.2 (|h (T ) + ?h(T)) + 7 -^ —
A.| T A.->
U h {T ) + 5h(T)) (2 .5 4)
with the cross covariance density, }C(t), defined similarly. The cross spectral density
function, iS(w ), is the Fourier transform of Equation 2.55:
JS(W) = j ^ 5C(7)exp(-jwT)dT
(2.56)
with the cross spectral density, {S(w) defined similarly. A discussion concerning the ap
plication o f the cross spectral density function to neural spikes processing has been given
by Glaser and Ruchkin."
C. Marked Processes
A multivariate point process can be expressed as a univariate process with a marker
attached to each point marking its type. Such is often the case in neuroohv'cinfncY^i re
cordings o f action potentials. A microelectrode may record the action potentials generated
by several neurons in its vicinity. Action potentials (spikes) of the various neurons, as
recorded by the microelectrode, differ from one another in shape and can be classified10 by
means of w avelet detection methods (see also Chapter 1, Volume II). The recording of the
microelectrode, known as multispike train,10can thus be analyzed as a marked point process.
We shall denote the mark of the \th point by M, and the accumulated mark at time t by
M(t), hence.
M(t) = IM , (2.57)
where the summation takes place over all points in the interval (0,t). M(t) is a random
variable with statistics to be estimated from the given data.
In the general case we shall be interested in the joint probability distribution of (M(t),N(t)).
From it we can derive the dependence (if any) of the different types on one another. Consider
the more simple case where the point process has a rate, and the marks are independent
(of one another and of the point process) and are uniformally distributed. Assume a record
of length NT with N(A) = n and M(A), the sum of the n independent marks. It can be
shown' that:
E{M(A)} = \|A]E{M}
Var{M(A» = \|A|Var{M} + (E{M} + (E{M})2Var{N(A)} (2.58)
Cov{M(A),N(A)} = E{M}Var{N(A)}
Methods for the analysis of the marked point process with dependencies between marks and
process have been reported in the literature; these are, however, outside the scope of this
book.
REFERENCES
1. Cox. D. R. and Lewis, P. A. W ., The Statistical Analysis o f Series o f Events, Methuen, London, 1966.
2. Lew is, P. A. YV., E d., Stochastic Point Processes: Statistical Theory and Applications, Wiley-Interscience,
New York. 1972.
3. Cox. D. R. and Isham, V ., Point Processes, Chapman and Hall, London, 1980.
4. Brillinger. D. R ., Comparative aspects of the study otdinary time series and of point processes, inDe-
vel<>pmcnts in Statistics, Krishnaiah. P. R., Ed.. Academic Press, New York, 1978, 33.
36 Biomedical Signal Processing
5. Sayers, B . Me A ., Inferring significance from biological signals, in Biomedical Engineering Systems, Clynes,
M. and Miisuin, J. H-, Eds., McGraw-Hill, New York, 1970, chap. 4.
6. Anderson, D. J. and Correia, M . j . , The detection and analysis o f point processes in biological signals,
Proc. IEEE, 65(5). 773, i977.
7. Ten Hoopen, M. and Penver, H. A ., Analysis o f sequences o f events with random displacements applied
to biological systems, Math. Biosci. , i , 599, 1967.
8. Fienberg, S. E ., Stochastic models for single neuron firing trains: a survey, Biometrics, 30, 399, 1974.
9. Lago, P, J, and Jones, N . B ., A note on the spectral analysis o f neural spike train, Med. Biol. Eng.
Com put,, 20, 44, 1982.
IQ. Aheies, M , and G oldstein, M . H „ Multispike train analysis, Proc. IEEE, 65(5), 762, 1977,
11. De Kvvaadsleniet, J. W.« Statistical analysis and stochastic modeling o f neural spike train activity, Math.
Biosci., 60, 17, 1982. ’
12. Ten Hoopen, M ., The correlation operator and pulse trains, Med. Bio!. Eng., 8, 187, 1970.
13 Perkel, D. K ., Gerstetn, G . L ., and M oore, G. P ., Neuronal spike trains and stochastic point processes.
L The single spike Main, II. Simultaneous spike m u m , Biophys. J ., 7. 39 L 419, 1967.
14. Landoftv j . P. and Correia, M. J ., Neuromathematical.concepts o f point processes theory, IEEE Trans.
Biol. M ed.-Eng.. 25{1). i, 1978.
15. Yang, G. L. and Chen, T. On statistical methods in neuronal spike train analysis. M ath. Biosci., 38,
1, 1978.
vfo. Sam path, G. and Srinivasan, S. K ., Stochastic M-odeis fo r Spike Trains o f Single Neurons, Lecture Notes
in Biometries, Vol. 16, Springer-v'erlag, Berlin, 1977.
17 Clam ann, H. P., Statistical analysis of motor unit firing patterns in human skeletal muscle, Biophys. J .,
9. 1233. 1969.
18. Parker, P. A. and Scott, R. N.,-Statistics of the myoeletric signal from monopolar and bipolar electrodes,
M ed. Biol. Eng., ! i, 591, 1973.
19. Lago, P. J. A, and Jones, N. B ., Turning points spectral analysis o f the interference myolectric activity,
Med. Biol. Eng. Comput.. 21. 333, 1983.
20. Andreassen, S ., Computerized analysis o f motor unit firing, in Progress in Clinical Neurophysiology, Vol.
10. Computer Aided Electromyography, Desmcdt, J. !:.. Ed., S. Karger. Basel, 1983, 150.
21. Ten Hoopen, M ., R-wave.sequences- treated as a point process,.progress report 3, Inst. Med. P hys., TNG,
Utrecht, Netherlands, 1972. 124.
22. G oldstein, R, E. and Barnett, G. O ., A statistical study o f the ventricular irregularity of atrial fibrillation.
Comput. Biomed. Res., 1, 146. 1967.
23. Schafer, R. W . and M arkel, J. D.« E ds., Speech Analysis, IEEE Press. New York, 1979.
24. Kasuya, H ., Kobayashi, Y ., Kobayashi, T ., and Ebihara, S ., Characterization o f pitch period and
amplitude perturbation in pathological voice, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process.,
IEEE, New York, 1983, 1372.
25. Brassard, J. R ., Correia, M. J ., and Landoit, J. P.< A computer program ior graphical and iterative
fitting of probability density functions to biological data, Comput. Prog. Biom ed., 5, 11, 1975.
26. Lehmann, E ., Non Parameirtcs: Statistical Methods Based on Ranks, Hoiden-Day, San Francisco, 1975.
27. Bendat, J. S. and Pier sol, A. G ., Random Data: Analysis ana M easurement Procedures, Wiley-Inter
science, New York, 1971.
28. Shiavi, R. and Negin, M ., The effect of measurement errors on correlation estimates in spike interval
sequences, IEEE Trans. Biomed. Eng., 20, 374, 1973.
29. M ood, A, M ., Graybill, F. A ., and Boes, D. C ., Introduction to the Theory o f Statistics, 3rd ed ., McGraw-
Hill, Kogakusha. Tokyo, 1974.
30. Parzan, E ., Stochastic Processes, Holden-Day, San Francisco, 1962.
31. M ann, N. R ., Schafer, R. E ., and Singpurwalla, N. S ., Methods fo r Statistical Analysis o f Reliability
and Life Data, Wiley-Interscience, New York, 1974.
32. Ekholm , A ., A generalization of the two state two interval semi-Markov model, in Stochastic Point
Processes, Lewis, P. A .. Ed., Wiley-Interscience, New York, 1972.
33. G laser, E. M . and Ruchkin, O. S ., Principles o f Neurobiological Signal Analysis, Academic Press, New
York, 1976.
34. Bartlett, M. S ., The spectral analysis o f point processes, J. R. Stat. Soc. Ser. B, 25, 264, 1963.
35. L i, H . F. and Chan, F. H. Y ., Microprocessor based spike train analysis. Comput. Prog. Biomed., 13,
61, 1981.
V o l u m e II: C o m p r e s s i o n a n d Automatic Recognition 37
Chapter 3
I. INTRODUCTION
Modem biomedical signal processing requires the handling of large quantities of data. In
ihe neurological clinic, for example, routine examinations of electroencephalograms are
usually performed with eight or mo-e channels, each lasting several tens o f seconds. In
more elaborate examinations for sleep disorders analysis, hours-long records may be taken.
Several hours of electrocardiographic recordings are sometimes required from patients re
covering from heart surgery. Various screening programs are faced with the pioblem of
handling a large number of short-term ECG and other signals.
Storing and analyzing such large quantities of information have become a severe problem.
In some cases manual analysis is cost prohibitive; in others, it is completely impossible.
The problem has therefore been rccogni/,cd rs an important part of any modern signal and
pattern analysis system. Signals are in essence one-dimensional patterns. The methods and
algorithms developed for pattern recognition are in general applicable to signal analysis.
The topics discussed in this chapter are based on the decision — theoretic approach to
pattern recognition. A different approach — the syntactic method — is discussed in Chapter
4,
The signal to be analyzed, stored, or transmitted contains quite often some redundancies.
This may be due to some built-in redundancies, added noise, or the fact that the application
at hand does not require all the information carried by the signal. The first step for sophis
ticated processing will be that of data compression. Irrelevant information is taken out such
that the signal can be represented more effectively. One accepted method for data compression
is by features extraction (refer to Figure 2. Chapter 1, Volume I). Based on some a priori
knowledge about the signal, features are defined and extracted. The signal is then discarded,
with the features being its representation. Features must thus be carefully selected. They
must contain most of the relevant information while most of the redundancies are discarded.
Optimal feature selection routines are available. For some applications compression is re
quired for storage or transmition purposes, so that the signal must, at a later stage, be
reconstructed. Features based on time series analysis (ARMA, AR — see Chapter 7, Volume
I) can be used for such applications where the reconstructed signal has the same spectral
characteristics as the original one. In other applications, automatic classification is required.
Compression is then performed with features that not necessarily allow reconstruction, but
provide distinction between classes.
Any linear or nonlinear transformation of the original measurement can be considered as
features, provided they allow reconstruction or give discriminant power. Various transfor
mations, optimal in some sense, have been used to compress signal data. These transfor
mations can be used without the need for a priori knowledge of the signal. Other features
require some assumptions on signal properties — these may be, for example, the order of
the ARM A model or the range of allowable peak of a waveform.
In many cases, the features extracted are statistically dependent on one another; some
methods, however, provide independent features. The computational cost and time required
for the feature extraction process usually dictate the need to reduce the number of features
as much as possible. A compromise has to be taken between that demand and the accuracy
(in reconstruction or classification) requirement. Methods for (sub) optimal determination
of the number of features are available; some are discussed in this chapter.
The material covered in this chapter is based on the vast Iitei iture on pattern and signal
recognition textbooks.15 reference books,0-7 and papers.8 Signal classification methods have
38 Biomedical Signal Processing
i - j
Consider the simple example depicted in Figure 2. Here the features vector is of dimension
three g T = anc* two classes, w ,, w2. are given. The two clusters of features of
w, and w2 belong to signals known in advance to be in either w, or w 2. These are known
as the training set. The clusters of features may be considered as an estimate for the probability
distribution of the features. The projections of the clusters in the three-dimensional feature
space are shown in Figure 2. it is clearly seen that classification of the two classes can be
made with feature {32 alone, since the projections of w2 and w, (w'-r’ and w',2)) do not overlap.
The projections on other feature-axes show that overlapping exists between the two classes,
A linear decision function can be drawn in t h e ( p ,,p 2) or 0 2’3.>) planes to discriminate
between the two classes.
An example for the procedure described above can be that o f automatic classification of
ECG signals. Here samples of records of ECGs of normal and pathological states are given.
These have been diagnosed manually by the physician and constitute the training set. From
this given set, templates (for the normal state and each one of the pathological states) are
generated and the statistics of each class are estimated. It is clear that the more information
there is in the training set, the better is the training process and the probability of correct
classification.
In some cases training sets are not a priori classified. The system must then “ train” itself
by means of unsupervised learning. Cluster-seeking algorithms have to be used in order to
automatically Mentify groups that can be considered classes. Unsupervised recognition sys-
Volume II: Compression and Automatic Recognition 39
terns require a great deal of intuition and experimentation. The interested reader is referred
to the pattern recognition literature.
Two important topics are discussed in this chapter: features selection and signal classi
fication. Both were included in one chapter since in many cases there are similarities in the
discussions of the two problems. It was probably logical to open with the discussion on
features selection and signal compression since in most cases these are done prior to clas
sification (Figure 2). ft was found, however, that from the point of view of material pres
entation it is more convenient to discuss the topic of classification first.
A. In tro d u ctio n
We may look at the signal classification problem in probabilistic terms. Assume that we
have M classes and an unknown signal to be classified. We define the hypothesis, Hk, that
the signal belongs to the wk class. The problem then becomes a problem of hypothesis
testing. In the case of two classes we have the null hypothesis (usually denoted by H0, but
for convenience denoted here by H,) that the signal belongs to w, and the alternative
hypothesis. H : . that the signal belongs to w2. It is the task of the classifier to acceptor
reject the null hypothesis. For this we need the probability density functions o f the various
classes, which are usually a priori unknown. The methods for statistical decision learning
and classification are discussed in this section.
the featu res, g . The con d ition al probability that this signal b elo n g s to the jf/r c ia ss, P ( w JB l
is g iven by B a y es rule:
(3.1*
w here p^gjw,) & the con d ition al probability d en sity o f getting p g iv e n the cla ss is W,, p (g i
i> the probabim v d en sity function o f p . and P (w ;) is the probability o f the j//? class,
N ote a lso .that *!or the tw o c la sse s case):
Equation 3.1 (lie a pmierlori probability P(vv,jB) in terms of ihe a priori prohabill:>
F(ws). h i> logical (0 c la ssify the signal p as follows: if P ( w ,j g f > P (w2jg) u e decide 6 €
w,, and if P (w ; .g ) > P(w?jg) we decide g e w ,. i f PCwJg) --- Piw .;p; wc remain un
decided.'.
A n alyzin g ail p o ssib ilities w e see that a correct ’c lassifica tio n occurs when:
In hypoth esis testing lan gu age the errors are called the error o f ih e ‘'first k in d r’ and the
’ ‘secon d k in d ” or " fa lse p o sitiv e" and " fa lse n eg a tiv e''. T h e probability o f an error is
thus:
It can e a sily be show n that the in tu itive d ecisio n rule w e have ch o sen m in im izes the average
error p robability. T he B a y es d ecisio n rule can b e written b y m ea n s o f the con d ition al
probabilities:
Volume II: Compression and Automatic Recognition 41
which means that when the left side of the inequality (Equation 3.5) is larger than the right
side, we classify £ into wy. when it is smaller, g is classified into w: .
We want now to generalize the decision rule of Equation 3.5. Assume we have M classes.
For this case we shall have the probability oi g , p(g), given by Equation 3.2, but with
summation index running j = L . . . ,M. We also want to introduce a weight on the various
errors. Suppose that when making a classification g t w5, we take a certain action, a,. This
may be, for exam ple, the administration of certain medication alter classifying the signal
as belonging to some illness w,. We want to attach a certain loss, or punishment, when we
take an action a, when indeed g e w,. Denote this loss by Akxjw.) = A...
Suppose that we observe a signal with features vector g and consider taking the action
a,. If indeed 8 € w . we will incur the loss \ (a,jw;). The e\pec:ed loss associated with
taking the action a , (also known as the conditional risk) is
V. V
R (ajg ) - X X<u.jW,)P{wjg) = 2 0 .6 )
i■* ‘ i• ! ' .
The classification can be formulated as follows. Given a feature vector g , compute ail
conditional risks R (a jg ), i - 1,2, . . . ,M. and choose the action a, (classification into \v;)
that minimizes the conditional risk (Equation 3.6).
Consider, for example, the two classes case. Equation 3.6 for this case is
Note that \ M and A;: are the ioss for correct classification; these are less than the loss ibr
making an error *X;;.X2,). We classify 8 into w, if R (a.!g) < Rks jg). Prom Equations 3.7
and 3.1 we get classification into w, when:
or
w,
p(g|w,) > ( \ l2 - X22)P(w?)
(3.9)
p(gjw,> < - Xn )P(w,)
w\
The left side o f the inequality (Equation 3.9) is called the likelihood ratio. The right side
can be considered a decision threshold.
In general, classification is performed with discriminant functions. A classification ma
chine computes M discriminant functions, one for each class, and chooses the class yielding
the largest discrim inant function. The Bayes rule calls for the minimization of Equation 3.6;
we can then define the discriminant function of the ith class d;(g) by:
d;(g) = —R (a jg ) (3.10)
and the classification rule becomes: assign the signal with feature vector g to class w; if:
DISCRIMINANT
Since the logarithm function is a monotonicaliy increasing function, we can take also the
logarithm Equation 3.11 without changing the rule. Figure 3 shows the general classifier
scheme.
Consider now a simple loss function:
XCaJwj) = | f° i = j
i * j'
i,j = ' 1 ,2,...,M (3.12)
To minimize the risk, we want to choose that w, for which PCwjjS) is maximum. Hence,
for this case, known as the “ minimum error rate", the classification rule becomes: classify
p into w; if:
Note that the last term of the discriminant o f Equation 3.I5A depends only on g and not
on Wj. This term will be present in all discriminants d;(P), i = 1,2, . . . ,M. Since we are
looking for the largest d;(g), any common term can be ignored. We shall therefore define
the discriminant without the last term:
Consider the case where the features are normally distributed. The probability distribution
of a signal belonging to the \th class, w4, represented by, g , is
where = E{g} is the expectation of the ith class and X, its n • n convariance matrix:
The term - n 2 in (211) was dropped for the same reasons discussed above. Equation 3.18A
can he rewritten in the form of a quadratic equation:
where
A. V - J S ."
and
The soV tion of d.ip) 0 yields the decision surfaces in the features space. In the general
case, the^e are hyperquadratic surfaces. In the special case where
we are d.v.jr.g with M classes equally distributed, but with different expectations y,,. In this
case the f:r>t term of Equation 3. IS \ ^an also be ignored as well as the first term of Equation
3.18B. Tr.e discriminant function becomes linear, with
where
b, “ -
The first term of the right side Equation 3.19B is the square of the Mahalanobis distance.
If, in addition, the a priori probabilities P( w j are equal they can be ignored, and classification
is performed by choosing the class with the minimum Mahalanobis distance (between the
44 Biomedical Signal Processing
signal £ and the cla ss m ean j^ ). I f P fw ,) are not equal the d istance is b ia sed in favor o f the
in ore probable cla ss.
T h e .xiinpie^ casp is the e ise w h ere not on ly all cla sses have the sam e covarian ee m atrices,
but a .s c (he feature ^ art statistically independent. For this c a se.
1 ; - % - ( i2l for a l l i (3 .2 0 )
d jg ) = - - Eif (§ - B ) + imr-t%v,)t
Here w e h ave the discrim inant g iv e n in terms o f the E uclidean d ista n ce. The discrim inant
can a lso be written as
where in Equation 5 .2 1 8 term s w h ich are com m on to ail c la sses w ere ignored. N ote that
She o n h calculation?, required in Equation 3 .2 * B are the vector m u ltip lication s in the First
term T h e rest are precalculated and stored as constant in the cla ssifier.
PL = [ - 0 .5 ,1 ! §L - [ -2 ,0 1 PL = l- i.o ]
PL = [0,2] P L = [ - 0 .5 ,- 0 .5 ] E » = [0,0]
P L = M .n g L = [ - 1 ,- 1 .5 ] g L = [ 2 .- 2 ]
§ L = [0.5,0.5] P L = [ 0 .5 ,- 1 ] g L = [2.0]
§ 1 . = [ 1 .- 0 .5 ] f g , = [ 0 ,- 2 ]
& = i = 1.2
j =*
~ = T~;IX i
>0 j.j i O i ! " M - . - ’- S . i
— — - fi:)r
— + 6,.>
- - jU ( § j.2
- - M M
- jI
The discriminant functions d.(g) and d:(g) are calculated from Equation 3.19A 10
The discriminant functions for the data set are given below:
d ,( g M) - -0 .8 6 9 ; d2( g u ) - -4 .6 7 1
46 Biomedical Signal Processing
0 .2 1 8 ; ^ 2( ^ 4. i) ~ - 6 .2 3 6
< W g 5;,) = 0 ,9 9 2 ;
<M g w > “
- 4 .7 2 0
-6 ,5 9 0 ; 3 .0 4 0
<W g. j =
= -7 ,8 4 1 ; - 4 .5 4 0
- -3 .2 1 9 ; - - 1.63V
d ,( g ,o ) = -6 .7 5 4 ; d2(g 5 2) = 2 .9 7 5
The discriminant functions have correctly classified all training data, since d ,( ^ ,) > d:({3,,)
and d,(gj 2) < d2(g j2) for all j ’s. Consider now the four unknown signals g , K:
m y j = - 4 .3 3 5 <Mgi.*) - -0 .0 5 7 fi.- , e W2
d r(g 2>x) - - 2 .0 7 9 d2C82.;> = 3.155 —* fc. x 6 w2
J == - 2 .2 4 3 -3 .2 2 0 —> g ,. v € W,
Hunger and pain cry records from five infants were used as a training set. The mean feature
vectors for the hunger cry j l H, and for the pain cry, j i (>, were estimated by:
1 Ni
i = H ,P (3.23)
where gj is the j th training vector of the i € (H,P) class. The covariance matrices and
were estimated by:
V o l u m e 11: C o m p r e s s i o n a n d Automatic Recognition 47
i. = ^ 2 « - - H,)r
i - H,P (3.24)
The Bayes rule (Equation (3.5)) in its log form with the probability of Equation 3.16 for
this case becomes:
25)
where \v„ and wp denote the hunger and pain classes, respectively. Define the quadratic
distance. D ,^ (Mahalanobis distance):
DMi = (g - £ / - jy
j = H,P (3.261
and
and
THR = 2 In _ |n / f e h (3 28)
Vp(Wp) / V liJ /
I
wH
(3.29)
w„
Equation 3.29 is the quadratic Bayes test for minimum error. This classifier does not allow
rejects. Each data record is forced to be classified even if it is a “ bad” record in the sense
that it includes artifacts or does not belong to each one of the two classes. Consider now
the case where we introduce two threshold levels, R, and R2, such that:
f= £ R , ^ g e w H
D = D„ tt - Dpjjj 3= R, - » J3 e wp (3.30)
I otherwise —> |3 e Reject
48 Biomedical Signal Processing
TIME- (S E C .)
TIME { S E C .}
A
FIGURE 5. Typical hunger and pain cry (A) Time Records; (B) power spectral density
estimated by*FFT
With no rejections, two types of classification errors were present: (I) errors due to the
classification of pain cry as hunger cry (denote the probability of this error by eH!p):
/t h r
and (2) errors due to the classification of hunger cry as pain cry (with the probability.
Sin)*-
h u n g e r
S»S»0
i:
!U :
!i !
I
F.'GURi: SB.
With rejection, usini* decision rule (Equation 3.30), the errors are
f R,
= J P(D|g e wp)dD =s elljP (3.31C)
and
However, rejections will be present. Some of the rejections were correctly classified by
Equation 3.29. The probability of a hunger cry record, correctly classified by Equation 3.29
and rejected by Equation 3.30. is
50 Biomedical Signal Processing
f THR
The probability densities, p(D |0), are estimated from the training data. We shall choose the
rejection threshold Rj such as to minimize a linear combination of the error probabilities
efliMand €HP. Similar considerations will dictate the decision for R2. Hence,
where <t>(, i = 1,2,3,4, are weights, determined by the relative importance of each one of
the errors. Figure 7 depicts the training and classification system. Automatic classification
of infant's cry was performed by the classifier (Equation 3.30), with error rate of less than
5% .3S The quadratic classifier (Equation 3.29), with no rejection, was also applied to the
classification o f single evoked potentials39 with similar results.
p(g) = — ■ (3.34)
In Equation 3.34 one must decide the size of V to be used. Clearly V cannot be allowed to
grow since this will cause the estimation to be "sm oothed". On the other hand, V cannot
be too small since then the variance of the estimate will increase. One method for choosing
the volume V is to determine k as some function of m such that, for example, k ~ km =
k-VreTThe .ume chosen, denoted by Vin. is determined by increasing V until it contains
km neighbors o f g . This is known as the km-nearest neighbor estimation. Note that the volume
chosen for the estimation of the probability functions becomes a function of the data. If the
features are densely distributed around £ ,th e volume containing km neighbors will be small.
M
Assume that the training set consists of total N = ^ N* samples of feature vectors, where
i=i
N, is the num ber of samples belonging to the ciass w,. An unknown signal, represented by
the vector, g . is to be classified. We find the volume Vm around g that includes km training
samples. Out o f the km samples, k, samples belong to the class w;. An estimate for the
conditional probability becomes:
iw g k ) =
and
56 Biomedical Signal Processing
d(g) - p lg t (3.48)
Note that while the functions f /g ) can be nonlinear. Equation 3.48 is linear with respect to
&•
Consider, for example, the case where the' ta c tio n s'fj(g ).are quadratic functions of the
form:
In this case and tor the vwo-dimensional feature space, the discriminant is
d© = i
j*1
Pjjp / + ' t t
j - t k=j+l
+ t
j~l
PiPi + Pn+ , <3.49)
The matrix A vector b and e determine the hyperplane. Equation 3.50 uses weights and
functions tor the discriminant. For a given problem, a set o f functions has to be determined,,
and the proper weights found.
and gj «• w2 if:
We can replace all |3’s belonging to w2 with (~ j> ). The classification rule for w2 becomes
similar tG Equation 3.51 A.
Several methods are used to find the optimal weighting function for Equation 3 .5 1A.
Among them are gradient descent procedures, the perception criterion function, and various
relaxation procedures. We shall present here the method of minimum squared error. Instead
of solving the linear inequalities (Equation 3.51 A), we shall look for the solution of the set
of linear equations:
= bi
i = 1,2...... N (3.52)
where b*, i — 1,2, . . . ,N are some arbitrarily chosen positive constants known as the
“ margins” . The set of N equations (Equation 3.52) is solved to determine the weighting
vector £>a , in such a way that the N known samples will be classified with minimum error.
Define the Nx(n + 1) matrix F:
FT - ...... ig N] (3-53)
and define the constants vector bT = (b,.b;s . . . ,bNj; then Equations 3.52 becomes:
FpA = b (3.54)
Note that the matrix F is not square; hence Equation 3.54 cannot be solved directly. The
pseudo inverse o f F must be used, Define the error vector:
e = FpA - b (3.55)
£a = ( F F ) - ‘ F b (3.57A)
Note that the (n -f l)*(n + 1) matrix FTF is square. It may, however, be ill conditioned;
in such cases the ridge regression method should be used such that:
£ a = lim(FTF + e l ) - 1 F b (3.57B)
(S—*0 *
The solution (Equatic i 3.57) depends on the margin vector b. Different choices of b will
lead to different decision surfaces.
58 Bi omedical Signal Processing
We can choose Df to be the distance measure, since all distances are positive. To classify,
g , we calculate Equation 3.58 for ali i and choose the class j that yields the smallest distance:
A closer look4 at Equation 3.58 will show the relation between the minimum distance
classifier and linear discrimination:
The first term on the right side is independent of i and thus can be ignored in the minimization
process. Minimization o f D j is the same as maximization of the second term of the right
side. Hence, we can define an equivalent decision function:
i = 1,2,...,M (3.61)
P? = (3.62A)
gx = [gT:l] (3.62B)
di(g) = p7gA
i = 1,2,...,M (3.63)
V o l u m e II: C o m p r e s s i o n a n d Au tomatic Recognition 59
m,
which is a linear discriminant function discussed in Section III. Figure 12 shows a simple
case of two classes in a two-dimensional features space. It can be shown that in this case
the decision surface. d(g) - 0. is a hyperplane normal to the line segment joining the two
templates, located at an equal distance from them.
Note that the Euclidean distance of Equation 3.58 gives equal importance to each one of
the elements of the features space. If. however, we have some a priori knowledge on the
statistics of the classes, w>e may want to place weights on the features. If, for example, it
is a priori known that some of the features have large variances, we may want to consider
them ‘Tes^ reliable'' when defining the proximity measure. This leads intuitive!) to a distance
measure where the weights are inversely proportional to the features covariance matrix,
namely.
Example 3.4
Consider the minimum distance classification for the data of Example 1 (Figure 4). The
Euclidean distance (Equation 3.58) for the four unknown signals is calculated below:
Df D;
g 3 v ~+ 0.8 1 .3 6 -* g ,.s € w,
3.2 7.76 —» g 4 , € w,
60 Bi omedical Signal Processing
Consider M classes. \v„ i ~ 1 , 2 , . . . ,M. We shall make the restrictive assumption that
the signals of all M classes are normally distributed, The signals belonging the \th class
have the expectation jx! and covariance matrix, We assume therefore that all classes
have the same covariance matrix. This may be the case, for example, where the classes are
deterministic vectors jx=, i — i,2 , . . . ,M, and the measured signals are noisy signals from
zero mean normally distributed noise process common to all measurements.
The conditional probabilities p(g|w;), i = 1,2, . . . ,M, are given by Equation 3.16. The
entropy o f \vt is
where the integration is performed over the features space. We would like now to find a
linear transformation:
y = TTg (3.67)
that transfers the n dimensional feature vector, g , into a reduced dimensional vector .y. The
transformation matrix T is thus of dimension n x d. This is a procedure commonly taken
when the goal is signal compression.
2. Minimization o f Entropy
Here we shall look for the transformation that not only reduces the dimensionality of the
problem, but mainly preserves or even enhances the discrimination properties between
classes. The entropy is a measure of uncertainty since features which reduce the uncertainty
of a given situation are considered more informative. Tou4 46 has suggested that minimization
of the entropy is equivalent to minimizing the dispersion of the various classes, while not
effecting much the interclass dispersion. It is thus reasonable to expect that the minimization
will have clustering properties.
The d x d covariance matrix, of the reduced vector y, whose expectation is jl, is
given by
Since the new vector, y, is the result of a linear transformation of a gaussian process, it is
also gaussian. The conditional probability density in the reduced space is thus:
The determinant o f the eovariance matrix in Equation 3.70A is equivalent to the product
o f its eigenvalues; hence.
In Equation 3.70C we have assumed that the eigenvalues of had been arranged in
decreasing order.
0.54 - X - 0 .3 5
d e K i - XI) = \1 ~ k\ = 0
- 0 .3 5 0 .6 8 - X
The solution yields th- two eigenvalues. X, = 0.96693 and X, = 0.25307. The corresponding
eigenvectors u ; are given by:
lu , - XjUj
i - 1,2
1 = [0.77334, 0.63399]£
All signals in the new one-dimensional features space are the projections of on the line
along the eigenvector u2. This projection is shown in Figure 13. This figure clearly dem
onstrates that the clustering in the reduced, one-dimensional space was preserved. Note that
projections on the eigenvector (corresponding to the largest eigenvalue) do not preserve
62 Biomedical Signal Processing
the discrimination between classes. For classification we need to find a decision surface (a
threshold number in the one-dimensional case). If, for example, we take y = 0 to be the
threshold, we see that all the training data are classified correctly. The unknown data is
classified as follows:
Comparing these results with Examples 3 .1 ,3 .3 , and 3.4, we note that the various classifiers
differ in the classification of signals g 2;x and x which are indeed borderline cases.
3. Maximization o f Entropy
We shall consider now a transformation similar to Equation 3 .67, but rather than requiring
the minimization of the dispersion, we shall require that maximum information be trans
formed. This means we want to maximize the entropy, H(y) (rather than minimize it as was
done in the previous section). Let us assume that the probabilities involved are normal, with
identical covariance matrix, for all classes. The transformation T that maximizes the
entropy will be the transformation, the columns of which are d eigenvectors of Xp, corre
sponding to the largest eigenvalues.
Example 3.6
Consider the data given in Example 3.1 . Find the transformation into the one-dimensional
space that will preserve maximum information in the sense of the entropy. Clearly the
Volume 11: Compression and Automatic Recognition 63
transformation is the vector u, of Example 3.5. The vector u, is shown in Figure 13. Note
that the projections o f the signals onto u, completely destroy the discrimination between the
classes.
Y, = fi'g , (3.71)
The n dimensional vector, £, can be considered a line in the n dimensional space: then y,
is the projection o f g* on this line (scaled by ||g||). Let ^ be the mean of the N, samples of
class w. in the n dimensional space:
* - s j , 6 o m
and the mean o f the projected points yj on the line g, p.;, is the projection of {!,:
It can be made as large as required by scaling £. This by itself is, of course, meaningless
since the separation o f the two classes must include the variance of the samples.
Define the n x n scatter matrix, W h of the ith class as:
% = 2 (g - jy < g -
i = 1,2 (3 .7 5 )
64 Biomedical Signal Processing
W. is the estimation o f the covariance o f the ir/i class in the n-dimensional feature space. It
represents a measure of the dispersion o fihe signals belonging to \vr The within-ciass scatter
matrix, W, is defined as: '
W W.j (3.76)
of = V (y - = £ ;o !|? - p ’ii.!-
Consider row the variance between the means of the two clashes. Denote the matrix. B,
the ' “-between class scatter m atrix", in the'original n-dimensionai features space:
B - (jx, ~ M - ■
“ jfc)T (3.79)
.T his matrix represents the dispersion between the means of the various classes. Also the
variance of the means in the one-dimensiona! projection is
Noxe that for every n-dimensionai vector v, we have from Equation 3.79:
Since (|Xt — jx2)Tu is a scalar denoting the projection o fv of (§Xt - j l 2), we conclude that
Bv is always a vector in the direction of (p., — &>).
A criterion of separation can now be formulated’in terms o f the new scatter matrices. For
good separation we require that the variance of the populations o f each class be smaii. Hence
a good separation measure, J(p), is
J(E> = H (382)
W sBp — Xp (3.83)
where the optimal weighting vectors., g, are the eigenvectors of W !3 . In this case, however,
we need not solve for the eigenvectors. Recall that Bp is a vector in the direction of
(£-! “ £ 2) k t length be X. We can do this without loss of generality since the length of
the required vector is of no importance; its direction is what we look for. Hence, we get
p = W - ‘(£, - £ 2) (3.84)
Volume II: Compression and Amomoik Rero^ihicp (IS
The line along the vector £ given by Equation 3.84 is the optimal line in the sense that
the projections o f w, and w-> on it will have the maximum ratio of “ between”’ 1 wiiUm”
class scatter. The classification problem has now been reduced io iha‘. findiiig a decision
surface (threshold number) on the line £ to discriminate between the projections of w, and
w?. I
Example 3.7
Consider again the data given in Example 3.1 (Figure 4). It is de>sr:J t**. reduce the
dimensions of the feature vector, (3, to one dimension using the Fisher \ 'i i 'u n i n a n t . We
thus have to calculate the transformation vector, p. of Equation 3.71. The direction of the
vector is given by Equation 3.84. For this example (see Example 3.1),
5-0(H) j 1
£ = - PlJ = 5.353 i ~ 5 353
1.009j 1
which yields a transformation very close to u: o f Example 3.5 (Figure 13). The classification
results will be similar to those of Example 3.5 with signal g 3 x undefined.
Consider now' the general case where M classes are present. The within class scatter
matrix (Equation 3.76) will become:
M
W = ^ W : (3.85)
I vA
= - 2 (3.85)
N i ~
M
B = X Ni<Pri “ _ Jfr>T <3 -87>
i- I
y. = t f i
i = 1,2,.,.,M - ! (3.88)
The M -- 1 equations can be written in a matrix form using the M - 1 dimensional vector
y “ [ y i • • • ,yM-i n x (M - I) matrix T whose columns are the weighting
vectors Pj.
i - T 'g (3.89)
66 Bi omedical Signal Processing
Equation 3.89 gives the transformation onto the M — 1 space. The optimal transformation
matrix T is to be calculated.
The within class seatter matrix in the reduced M — I space is denoted by Wy and the
between class scatter matrix in that space is denoted by By. Similar to the two-classes case,
we have i
Wy = T fWT
By = T TBT (3.90)
We heed now a criterion for separability. The ratio of scalar m easures used in the reduced,
one-dimensional case cannot be used here since ratio of matrices is not defined. We could
have used the criterion tr(Wy !By) using the same logic as before. Another criterion can be
the ratio o f determinants:
« r > - E a M l)
The matrix, T, that maximizes1 Equation 3.91 is the one whose columns are the solution
o f the equation:
Bp{ = XiWp,
i — 1,2,...,M - 1 (3.92A)
which can be solved either by inverting W and solving the eigenvalue problem:
W ^ B fc - Xjgi (3.92B)
|B - XjW| = 0
(B - XjWlpi = 0 (3.92C)
The transformation (Equation 3.89) that transforms the n-dimensional features vector {3 into
a reduced, M - 1, dimensional vector while maximizing Equation 3.91, is given by
Equation 3.92. The optimal transformation is thus the matrix, T, whose columns, p,, i =
1,2, . . . ,M — 1, are the eigenvectors of W ~ 'B . The Fisher's discriminant method is
therefore useful for signal compression when classification in the reduced space is required.
V. K AR H U N E N -L O EV E EX PA N SIO N S (KLE)
A. Introduction
The problem of dimensionality reduction is well known in statistics and in communication
theory. A variety of methods have been developed, employing linear transformation, that
transform the original feature space intn a lower order space while optimizing some given
performance index. Two classical methods in statistics are the principal components analy
sis3'5-40 (PCA), known in the communication theory literature as Karhunen-Loeve Expansion
Volume //: Co m p r e s s i o n a n d Automatic Recognition 67
(KLE), and factor analysis (FA). The PCA optimizes the variance of the features while FA
optimizes the correlations among the features.
The KLE has been used to extract important features for representing sample signals taken
from a given distriBution. To this end the method is well suited for signal compression. In
classification, however, we wish to represent the features which possess the maximum'
discriminatory information between given classes and not to faithfully represent each class
by itself. There may be indeed cases where two classes may share the same (or similar)
important features, but also have some different features (which may be less important in
terms of representing each class). If we reduce the dimensions of the classes by keeping
only the important features, we lose all discriminatory information. It has been shown40 that
if a certain transformation is applied to the data, prior to KLE. discrimination is preserved.
The KLE applied to a vector, representing the time samples, can be extended to include
several signals. W e arrange the vectors representing a group of signals into a matrix form
and tr y' u, ;c;.. _ .......Jata” matrix in lower dimension, namely., by means o f a lower
rank matrix. This extension to the KLE (PCA) is known as singular value decomposition
(SVD).
Principal components analysis (PCA, KLE) has been widely applied.to biomedical signal
processing. ,4-2u x<> SVD methods45-47 50 have also been applied to the biomedical signal
processing, in particular to ECG51 and EEG processing.47
(3.93)
T ’ = TT (3.94)
and
<j n
jj(d) = V y.<j). + V b <j, (3 % )
68 Biomedical Signal Processing
since b;, i = d -f i, . . . ,n, are preselected constants, the vector y describing the signal
is d dimensional. The reconstruction error, Aj3(d), is
The mean square reconstruction error, €<d), is given from Equations 3.97 and 3.93 by
We shall choose b, and 4>j that minimize the mean square error of Equation 3.98. To get
the optimal b /s , derive Equation 3.98 with respect to h,:
^ = -2 (E { y J - b.) =■ 0
OD: (3.99)
P '
b, = E{yj =
= S - E{p})ig - E{p})TH>,
i = d-+ i
= 2 (3.100)
l = d -rl
The optimal vectors, and thus the optimal transformation T, are given by the minimization
of Equation 3.100 with the constraint - 1. Using the Lagrange multipliers. k„ the
constraint minimization result is3
The solution of Equation 3.101 provides the optimal <+>( which are the eigenvectors of the
original covariance matrix corresponding to the eigenvalues X,. Substituting Equation
3.101 into Equation 3.100 yields the minimum square error:
e(d)mi„ = y X, (3.102)
Note that since e(d) is positive, the eigenvalues are nonnegative. For data compression we
shall choose d columns in the transformation, T, in such a way that the error (Equation
V o l u m e II: C o m p r e s s i o n a n d Automatic Recognition 69
3.102) is minimal. W e shall choose the d eigenvectors, corresponding to the largest eigen
values for the transformation and delete the rest (n - d) eigenvectors. The error (Equation
3.102) will then consist of the sum of <n - d) smallest eigenvalues.
We can arrange the eigenvalues such that: j
A; 5? X, 25 X, > 3* xn s* 0
Then the required transformation matrix. Tx\consists of the columns, <J>is i = 1,2. . . . ,d.
Note also that
since T is the modal matrix of The covariance matrix of v is diagonal which means that
the features y, are unconeiated.
The calculation o f the eigenvectors of the matrix, is not an easy task. Several algorithms
for this calculation have been suggested.141
Some results o f KLE as applied to biomedical signals are shown in Figure 14. The KLE
as presented in this section is very effective for signal compression when the application is
effective storing or reconstruction. This is obvious since it is optimal in the sense of minimum
square of reconstruction error. It is, however, less attractive when the goal is classification.
Fukunaga and Koontz40 have suggested a method whereby a modified KLE can be used
for two-class discrimination. Their method is not optimal and indeed a counter example has
been presented52 thai shows poor discriminant behavior of the method.
F = (3.104)
and expand the matrix. If the rank of the matrix B is k, it can be expressed as54
F - US,-VT (3.105)
where the n x n matrix U and L x L matrix V are unitary matrices (VT = V ~ !) and the
rectangular n x L matrix, Sr . is a diagonal matrix with real nonnegative diagonal elements,
s,, known as the singular values. It is obvious that k is less than or equal to the minimum
of n and L.
The singular values are conventionally ordered in decreasing order s, ^ s, 2 = ............ sk
s* 0. with the largest one, s,, in the upper left hand comer of Sf;. The singular values are
the nonnegative square roots of the eigenvalues of FFT and F^F. The n x 1 dimensional
vector Uj. i = 1,2. . . . ,n (the columns of U), and the L x 1 dimensional vectors, x>,. i
= 1.2. . . . ,L (the columns of V), are the orthonormal eigenvectors of F F and FHF.
respectively.
The representation given in Equation 3.105 states that any k rank matrix can be expressed
by means of a rectangular diagonal matrix, multiplied from right and left by unitary matrices.
Equation 3.105 can also be written as
k
F = 2 s,UjU7 (3.106)
i= I
70 Biomedical Signal Processing
o r ig in a l j A _ J . . ^ w n r
..400»sec
... -J
N O R M A L IZ E D -
-JL A Ii _
1ST e ig e n v
j i ;V i -
LJ 12 L 3
FIGURE 1 4 . Karhunen-Loeve analysis, most significant eigenvectors o f (A) ECG signal and ( B ) pain evoked
potentials.
in Equation 3.106 the matrix F is expressed in terms of the sum of k matrices with rank -
one. The matrices u ^ are called singular planes or eigenplanes.
Since the eigenvectors are orthonormal, it follows from Equation 3.106 that
1
Ui = - Fv; (3.107 A)
Volume II: Compression and Automatic Recognition 11
(3.I07B)
Recall that the L columns of the features matrix F are the signals fij, j = 1,2, . . . ,L. Each
vector then can be expressed from Equation 3.106 as
k
§ , = X SiD.jfci
i* I
j = 1 ,2 ,...,L (3.108)
vj = ...... v lL]
Q.j = w j
i = l,2 ,...,k
j = 1,2,...,L
In Equation 3.108 each signal j3j is expressed in terms of eigenvectors of F F . The SVD is
thus, in principle, equivalent to the PCA.
We want to use the SVD for compression. Consider now the expansion of Equation 3.106
with summation index running until d k. Note that since we have arranged the singular
values in decreasing order, the reduced expansion will include the d largest values. If we
thus denote the estimate of F by F:
a
F = X s,u{v j = USfVt
1=1
d ^k (3.110)
Here the norm is in the sense of the sum o f squares of all entries:
In Equation 3.110 the singular values are the d largest ones and SF is obtained from SFby
setting to zero all but the d largest singular values. Equation 3.111 states that the estimated
reduced matrix F is the best least squares estimates of F by rank d matrices.
Analogous .o PCA, it can be shown here that the estimation error (residual error), expressed
72 Biomedical Signal Processing
m terms of the norm of Equation 3. \ H , equals the sum of squares o f the discarded singular
The reason toi ^joosisu: d Largest, singular values fo? iiEc i ;- d e a r from Equation
3.113* ■
Methods for computing the singular values from the eigenvalue^ ot F F 1 require heavy
computation load and a*e time consuming. In ad o n is., conventional methods v*eid ai! k
singular values where indeed onh the d laigesl «m<- arc required. A w tb ix J for the com
putation of-the vngulaf values, -.'tie at a time, hc^irmn;! v tih the 'Urczhi one, h-;?> b-.-ai
suggested:51" it is known a;> the power method. The computation eaa be sk pped when tlic
already acquired singular values yield residual error (Equation 3.113 j below a certain thresh
old. The method is especially attractive in cases where the data matrix is large, but its iank
is low. The computation method h briefly presented here. Note also that this method operates
on the data matrix F directly and not on the correlation matrix PFT.
The compulation is based on the solution of the two equations (Equation 3.107):
su = Fu (3.114 A)
SH ~ * 14 B >
*•
Using an arbitrary starting vector u<0), we form' the following iterative solution:
( 3. 115A I
F ru<k: u
where e is some predetermined stopping vector. The first (largest) singular value, s,» is then
estimated by taking the nonn of Equation 3 .1 14B and recalling that the length of u i s unity:
u, = u(k+J)
V, = v (k;i) (3.U 7B )
for the last k. To obtain the next singular value and eigenvectors, the estimated singular
plane is removed from F by
and the same iterations (Equations 3.115 to 3.118) are repeated for F<n to get s2, ju2, v 2 and
V o l u m e 11: C o m p r e s s i o n a n d Automatic Recognition 73
so on untfj.^,.- u t, i\,. The last (dth) singular value so be calculated is determined by the
thresh-ofcf- e oo the residua* error:
!|F ~ £ S M II . (3 .U 9 )
i J
0.5 0 1
i 2 1
1.25 0.5
FF*
0.5 6
0.10356 -0 .9 9 4 6 2
U = [ujiuj
0.99462 0.10356
where u, and u2 are the orthornormai eigenvectors o f FFT corresponding to X, and X2.
The eigenvalues arid eigenvectors of F rF are
1.25 2 0.5
FF = 2 4 2
0.5 2 2
F - ^ SjU.v^ —
j-i
0.5976 - 0 .2 0 6 0.8861
-0.5 0 1
1 2 2
If we desire to reduce the dimensions, we shall take only the first term in the expansion,
namely, the projection on the first eigenplane:
-0.5976 - 0 .2 0 6 0.8861
l|F ~ F||2 - 1.19756 = s; = k 2
0.0622 0.02144 -0 .0 9 2 2
Let us repeat this example with Shlien’s power method. Choose the initial unit length vector
to be
p ( 0) _ £3 — i /2 3 - 1/2 3 - M2]
v <3> =
{0.3831, 0.8085, 0.4467]
Expressing the matrix F by its reduced, rank one matrix, with largest singular value yields:
which is very close to the reduced matrix calculated by the direct method.
V I. D IR E C T F E A T U R E S E L E C T IO N A N D O R D E R IN G
A. In tro d u ctio n
In classification problems with two or more classes, it is often required to choose a subset
of d best features out of the given n or to arrange the features in order of importance. To
do this we require a measure of class separability. The optimal measure of features effec
tiveness is the probability of error. In practice one can use the training data and use the
percentage of classification error as a measure. This approach is often used: it is, however,
experimental and requires a relatively large training data.
Scatter matrices can be used to form a separability criterion. Recall that the within class
scatter m atrix, W (Equations 3.75 and 3.85) is the covariance matrix of the features in a
given class. The between class scatter matrix, B (Equation 3.87), is the covariance of the
means of the classes.
A criterion for separability can be any criterion which is proportional to the between
scatter matrix and also proportional to the inverse of the within scatter matrix. Maximization
of such a criterion will ensure that while maximizing the “ distance’’ between classes, we
do not amplify (with the same rate) the scatter of the classes, thus causing no improvement
to the separability. A criterion like this was used in Equations 3.82 and 3.91.
Several such criteria were suggested (e.g., see Fukunaga):3
76 Biomedical Signal Processing
(c) J , = (r{B)/tr(W) (3 .1 2 0 0
T h e v enters are relatively simple to use. However, they do not have a direct relationship
u> the piobabi»i!> o f error. More complicated criteria that can be related to the error probability
such i-s the Chernoff bound and Bhattacharyva distance are know n.’
%<
Similarly , the average discriminating information of class w, with respect to w, is
Lj( = j
-a
pipiwJin dfi
p*g;-v.) ~
(3.1215!
' The- tola! average information in Equations 3.121 A and B is known as the divergence; let
us denote it by D- y
For normal distributions with \xk and £ k. k = i,j, the expectations and covariance matrices,
respectively, the divergence becomes
When the two classes possess the same covariance matrix, Sj = 'X-. = X, the divergence
becomes
D Ki = ~ ~ &) (3.124}
w hich is the Mahalanobis distance. Methods for selecting optimal features by the maximi
zation of the divergence have been suggested.4
V o l u m e 11: C o mp re ss io n a n d Automatic Recognition 11
n i- i = ( ft,
Here jrjj.... is the j//7, (i — 1) dimensional vector and {3jr q - 1,2............i - 1, are the set
of (i - 1) features selected from the given n features (g). In the \th step we shall increase
the dimension o f the n, tj).. , vectors by adding one feature (that is not already included in
the vector) to each one of them, such that
Pi * Hi «
j - 1,2...... n (3.126)
The added feature |3| will be selected from the available n — (i - 1) features such that the
criterion will be maximized
78 Biomedical Signal Processing
A ^) « Max ( 3 .127)
(M s!-,
denotes the maximum of the divergences. The algorithm proceeds up to the d step. At
that step the set with maximum divergence is selected:
th
where D,(if}) is the value of,the divergence evaluated with the ith (dimensional vector tq- X,
= M a x O ^ )) (3.128)
j
The number o f searches required by the algorithm53 is n(d - l)(n - d/2) which is sub
stantially less than the exhaustive search (for large n and intermediate values of d). For the
example chosen previously of n = 40 and d = 10, the dynamic programming algorithm
requires 13,600 searches.
NS
Dj = tr((W|) !Bj)
Superscript V denotes average features for voiced (see Appendix A) segments and U denotes
unvoiced segments. For a discussion o f the features see Chapter 7, Volume I.
Volume II: Compression and Automatic Recognition 79
As an exam ple, the results of the dynamic programming search for one speaker (AC) is
shown here:
d Features Divergence
: [P.LE'l 250.67
? 1P.LE1. pi] 528.23
- IP.LE1 .pi.pM 684.06
5 IP.LE1 .pK
v,p;„.E ! 783.72
6 (P.p'.Ep.LE1 .pi.k; J 941.52
IP.LE1, pi .pJ„.E:.pM0j 1096.6
8 IP.p7.Ep.kf.LE' .k'.pV.Cu,] 1325.9
y (P.pV.E(N
,,k''.LE- .k i.Pt,Ep,kVi 1563.9
10 (P.p'.Ep.kJ.LE ,khpV.E;.kl.asv] 2036.9
Note that the pitch feature (P) was chosen in all subsets. The pitch is indeed known as an
important feature tor speaker identification. Note also that for low orders of features vectors
an increase in dimensions changed the features (for example, note the suboptimal vectors
of order 5 and 6). For larger orders, the main features did not change (e.g., see orders 9
and 10).
For actual speaker verification the Mahalanobis distance was used. Figure 15 shows the
distances from segments of 15-sec speech of speaker (IM) to the templates of speakers IM,
AC. and MH. The suboptimal feature vector of dimension 10 evaluated by the dynamic
programming method was used.
V II. T IM E W A R P IN G
One of the fundamental problems that arises when comparing two patterns is that of time
scaling. Up to now we have assumed that both pattern and template (reference) to be compared
share the same tim e base. This is not always correct. The problem is especially severe in
speech analysis. It has been found that when a speaker utters the same word several
times, he does so. in general, with different time bases for each utterance. Each word is
spoken such that parts of it are uttered faster, and parts are uttered slower. The human brain,
it seems, can easily overcome these differences and recognize the word. Machine recognition,
however, finds this a severe difficulty. To overcome the problem, algorithms that map all
patterns onto a common time base, thus allowing comparison, have been developed. These
80 Biomedical Signal Processing
F iG L R r i6 T ln ;c w a rp m g p la n e w u h s e a rc h a r j
are known as "tim e warping" algorithms. The basic idea of lime warping is depicted in
Figure 16. Assume we are given two signals, x(t;), xu,}:
X(t,;) , t; € (3.129A)
i-
x(t,) . I: € (t . t,.j (3.1298)
each with its own time base, t s and tj. We assume that the beginning and end o f each signal
are known. These are denoted (t}s, xi() and (t^;tK), respectively. We shall consider the discrete
case^-hdre>-bOth'sigflalsrwere sampled at the samr rate. Assume al:o that the samples have
been shifted sttchthat both signals begin at sample i - j = 1. Without the loss of generality
we have now:
if the two time bases were linearly related, the mapping function relating them was just i
= j * I/J. In general, however, the relation is nonlinear and one has to find the nonlinear
time w arping function. We shall make several assumptions on the warping function before
actual calculations.
The warping function, W(k), is defined as a sequence of points:
c (l), c(2),...,c(K )
where c(k) = (i(k),j(k)) is the matching of the point i(k) on the first time base and the
point j(k) on the second time base. The warping, W(k), thus allows us to compare the
appropriate parts of x(t,) with these of x(tj).
We shall impose a set of monotonic and continuity conditions on the warping function:60
Clk-0= C(k)*(i»])
'i he left side inequality ensures increasing monotony-; the right side incqu.ali.ty is a continuity
condition ibai restricts fine jumps in the warping. This restriction is important since
discontinuities can cause the elimination of parts of the signal, ii has been suggested 60 to
choose pi = p, ~ i; we shall adapt this here. As a resuit of conditions (Equation 3 .i3 0 ).
vve have restricted the relations between two consecutive warping points c(k) and c(k — 1 )
to be
Ui(k) , j(k) - !)
c(k — 1) = < (i(k) ~ 1 . j(k) - i) (3.131)
(i(k) -- 1 . j<k)
Figure 17 depicts the meaning of the last equation. Due to the constraints, ihere are only
three ways to get to the point c(k) ~ (i.j). These are given in Equation 3.13! and in Figure
17.
We also require boundary conditions. These will be defined as:
By the boundary condition, we mean that we match the beginning and end'of the signals.
This is not always a good condition 10 impose since we may not have the endpoints of the
signals accurately.
The warping function will be estimated by some type of search algorithm. We would like
to limit the area over which the search is performed. We shall restrict the search window 62
to:
\\ - j • I/Jj ^ 7 (3.133)
where y is some given constant. The last condition limits the window to an area between
lines parallel to the line j = iJ/I (see area bounded by solid lines in Figure 16).
Constraints on the slope can also be imposed. If we impose such conditions that limit the
maximum allowable slope, and minimum slope. ° f the warping function, we
end up with a parallelogram search window (see area bounded by broken lines in Figure
16).
We shall now proceed with the dynamic programming search. We recall now that the
signals nre represented, at each point, by their feature vectors, J3j(k) and P /k ). Here f3j(k)
denotes the feature vector of the signal x(t,) at the time i(k) with similar denotation forj^(k).
Define a distance measure between the two feature vectors by
82 Biomedical Signai Processing
We will search for (he warping function that will minimize a performance index, D(x(t,).x(tj)).
We shall use the normalized average weighted distance as the performance index; hence.
where p(k) are the weights. We shall simplify the calculation by employing weights, the
sum o f which is independent of W. Sakoe and Chiba61 have suggested the use .of
which yields
K
2 ) P<k ' = I + J (3 .1 3 6 B )
The weights are shown in Figure 17 . The performance index (Equation 3.135) becomes:
1
D(x(ti),x(tj)) — M in f X d(c(k >)p<k
w '
The dynamic programming65 (DP) equation proves the g(i(m).j(m)) measure by means of
g(i(m - 1) j(m - 1)):
The point, c(m), on the warping function will be determined by considering all allowable
routes into the point c(m) The route that minimizes g(i(m).j(m)) is chosen. It is clear that
the more constraints we pose on the warping function the less routes we shall have to check.
The procedure starts with the initial conditions;
gK(c(K)) (3.139B)
Volume 11: Compression and Automatic Recognition 83
Hence, for the weights and constraints we have imposed, we get the algorithm initial
conditions:
the DP equation:
f g (ij - 1) + d(i,j)
g<i,j) = Min{g(i - l.j - 1) + 2d(i,j) (3.140B)
\jl {\ - l.j) + d(i,j)
1 I .
- (j “ 7) ^ i ^ ~j (J + 7) (3.140C)
Equations 3.140A to D are recurrent! \ calculated in ascending order, scanning the search
window in the order:
j = 1 .2 ... J
The search is conducted, row by row . beginning with j = 1 and i = 1,2, . . . .(1 + -y)!/
J, followed by j = 2, i = 1,2............(2 + 7 )I/J, thus scanning all o f the search window.
For each point scanned. g<i,j ) is calculated, until the endpoint (I,J) is reached. The algorithm
yields d i r e c t l y the distance measure between the time warped words.
Note that when calculating the measures of the ith row, previously calculated measures
from the same row or from the row <i - 1) only are needed. Hence, one has to store the
measures of current and previous rows only. At the most, this means the storage of 27I/J
numbers.
The procedure described above yields the distance measure between the two warped
signals. It does not yield the warping function itself. This is so because the optimal route
W(c(k)) can be evaluated only after the search window has been completely scanned. In the
procedure described above we have not stored the measures g(i,j). The optimal route can
thus not be retrived. Consider the following modification to the described algorithm. After
calculating the measure g(i*,j) by Equation 3 .140B, we record and store the choice made by
w(i,j) = q
where
84 Biomedical Signal Processing
the-value of w(?,j) fells us from what allowable point we have reached the current point
(i.j). Hence, each point in the search window has attached to it information about the optimal
route to reach it. After scanning has terminated, we can reconstruct the optimal route W(c(k) J
by going backward irom w{I,J) to w(l ,1). For example, if \v(c(K}) ~ vv(M) =.2* we know
that c(K — 1) i\ — i J ) . To proceed wc check w(i ~ i J ? and ->o on.
The storage requirement for this procedure is much higher, oi u»urse For each point in
the search window wc require a storage of 0.25 b>?e* (q requires onK two bits). The total
slumber of points in the search window h -yH2 - y j ) (for t!v ca>c 1 ^ y il) ami required
storage is thu> 0.25 yi(2 - y j ) bvtes.
Time warping by means oi or^icd giaph search. (OGS) technique has been developed/’3
It has been shown chat the OGS algoutnm c m solve the time warping problem with essentially
the same accuracy as the DP itgorimrn w»th computations reduced by a factor o f about
2.5. This reduction in computation, however, is attained at the expense of a more complicated
combinatoric effort, h has been ^rgueu6’ therefore that when special high-speed hardware
is used for the computation, the OGS may have no advantage over the DP.
REFERENCES
17. Lam, C. F .. /.im m erm ann, K ., Simpson, K. K ., Katz, S ., and Blackburn, J . G ., Clarification of
somatic evoked potentials through maximum enuopy spectra! analysis, Electroencephalogr Clin. Neuro-
nhy.su>>.. 53. 491, 1982.
18. Gersch. W ., M aritnelli. F., and Yonvnioto, J., A u t o m a t i c c l a s s i f i c a t i o n o f E L G , K u l l b a c k - L e i b k i n e a r e s t
n e i g h b o r r u l e s . S a e m r . 2 ( '5 { 4 4 0 2 ‘. 1 9 3 . 1 9 7 9 .
1 9 . R u t t i m a n i i . L . L . , C o m p r e s s i o n o f i h e l.;C G b y p r e d i c t i o n o r f n t e q 'o l a t i o n a n d e n u o p y e n c o d i n g , IEEE
Trans. Butt. Med. tiny., 2 6 ( ! l j . 6 1 3 . 1 9 7 9 ,
2 0 . M e t a x a i k s - K o s s i d r t i d c s . C . , A t i u m a i o s , S . S., a n d C a r o u b a l o s , C . A . . A m e t h o d f o r c o m p r e s s i o n r e
c o n s t r u c t i o n o f !:.C C i s i g n a l s . ./ iUomcd. Enx.. 3 . 2 1 4 , 1 9 S 1 .
2 1 . A b e n s t e i n , J . i \ a n d T o m p M n s , W . J . , A n e w d a t a r e d u c t i o n a l g o r i t h m f o r r e a l ti m e E C G a n a l y s i s ,
IEEE Trons. S ign ed. Eng.. 2 9 . a ' 1 9 8 2 .
2 2 . Jain, I 1., K autfharju, 1*. M . , a n ‘ Warren, ,1., S e l e c t i o n o j o p t i m a l f e a t u r e s l o r c l a s s i f i c a t i o n o f e l e c
t r o c a r d i o g r a m s . J. E iectroratdiorr.. 1 4 . 2 3 9 , 1 9 8 1 .
2 3 . K u k l i n s k i , \ \ . S . , F a s t W a l s h tr,u iN f o tu s d a t a — e o m p r e v M o n a l g o r i t h m : E C G a p p l i c a t i o n '. Med. Biol.
Eng. C am pin.. 2 S . 4 6 5 , { 9 > :i.
2 4 . N v g a r d s , M . 1 .. a n d H u l t i n g . J . . A n a u t o m a t e d s y s t e m t o r H O G m o n i t o r i n g , Comput. Bioinsd. Res . . 1 2 .
181, 1979.
2 5 . Shridar. M . and Steven*, M . F., A n a l y s i s o f L O G d a t a f o r d a t a c o m p r e s s i o n , ini. J. B 'u w d . Comput..
10. 113. 197*
2 b . Pahhn, ( ) . , Borjesson, I*. ( ) . , a n d W e r n e r , ( ) . , C o m p a c t d i g i t a l s t o r a g e o f E C G ’s , Comput. Prog.
Riomed.. 9 . 2 9 3 . 1 9 7 9 .
2 7 . Cashman. I*. A p a t t e r n r e c o g n i t i o n p r o g r a m f o r c o n t i n u o u s E C G p r o c e s s i n g in a c c e l e r a t e d t i m e ,
Comput. B itw id . fit v .. ! 1 , 3 1 ' , 1 9 7 8 .
2 8 . G u s t a f s o n , D . f - . . W i l l s k y , A . S.. W a n g . J . V .. L a n c e s t e r . M . C . „ a n d T r i e b > v a s s e r , J . H . . E C G / V C G
r h y t h m d i a g n o - K u s i n g s t a t i s t i c a l s i g n a l a n a l y s i s . 1. i d e n t i f i c a t i o n o f p e r s i s t e n t r h y t h m s . I I . i d e n t i f i c a t i o n
o f t r a n s i e n t r h ;» ih n ix . IEEE Irons. Biomcd. Eng.. 2 5 . 3 4 4 . 1 9 7 8 .
2 l». WombSc, M. E .. Halliday. J. S.. Mitter, S . K ., Lancester, M. and Triebwasser, J . H ., D a ta
c o m p r e s s i o n f • s t o r i n g a n d t r a n s m i t t i n g E C G 's V C G ’s . P n v . IEEE. 6 5 . 7 0 2 , 1 9 7 7 .
3 0 . A h m e d , N . . M i i n e . P . J . . a n d H a r r i s , S . G , . l : i e c { r t v a r d i o g - : a p h i c d a ta c o m p r e s s i o n \ h s o r t h o g o n a l
t m n s f o n n s . IEEE Trans. Burned. Eng.. 2 2 . 4 8 4 , 1 9 7 5 .
3 1 . Y o u n g . T . V . a n d H u g g i n s , W . H . . C o m p u t e r a n a l y s i s o f e l e c t r o c a r d i o g r a m s u s in g a h r e a r r e g r e s s i o n
t e c h n i q u e . IEEE Inins. Blamed. Eng., 2 1 . 6 0 . 1 9 6 -1.
3 2 . Marcus, M .. H am m erm an. H .. a n d Inbar, G, F., E C G c l a s s i f i c a t i o n b y s ig n a l e x p a n s i o n o n o r t h o g o n a l
K - L b a s e s . P , r - e r 9 . 2 5 . in P r o c . IE L F . SId e to n Conf.. T e l - A v i v , k r a e l . M a y 1 9 8 1 .
33. iwata. A ., Suzum ura, N . . a n d Ikegaja, K., P a t t e r n c l a s s i f i c a t i o n o f th e p h o n o c a r d i o g r a m s u s i n g lin e a r
p r e d i c t i o n a n a ! y > i s . Med. Biol. Eng. Comput.. 1 5 . 4 0 7 . 1 9 7 7 .
3 4 Urquhari, K. B ., McGhee, .}.. Macleod, J. F.. S . . Banbam, S . W ,, and Moran, F ., T b e d i a g n o s t i c
v a l u e o f p u l m o n a r y s o u n d s ; a p r e l i m i n a r y s t u d y b y c o m p u t e r a i d e d a n a l y s i s . Comput. Biol. Mt J .. 1 ! . 1 2 9 .
1981.
35. Cohen, A. and Landsberg, B ., Analysis and automatic classification of breath sounds. IEEE Turns.
Biomcd. Eng . 31. 585. 1984.
3 6 . inbar, G. F. and Noujaim, A. E . . O n s u r f a c e 1 £ \1 G s p e c t r a l c h a r a c t e r i z a t i o n a n d its a p p l i c a t i o n t o d i a g n o s t i c
c l a s s i f i c a t i o n . IEEE Trims. Biomcd. Eng.. 3 1 . 5 9 7 , 1 9 8 4 .
3 7 . Childers, D. G .. L a r y n g e a l p a th o l o g y d e t e c t i o n , CRC Crit. Rev. Bioeng... 2 . 3 7 5 . 1 9 7 7 .
3 8 . Cohen, A. and Zm ora, E., A u t o m a t i c c l a s s i f i c a t i o n o f i n f a n t s ' h u n g e r a n d p a i n c r y . in P '- < v . Int. Conf
Digital Signal P rocess . . C a p p e l l i n i . V . a n d C o n s t a n t i n i d e s . A . G . . E d s . . E l s e v i e r . A m s t e r d a m . 1 9 8 4 .
3 9 . Annon, J. I . a n d McGilfen, C . G ., O n th e c l a s s i f i c a t i o n o f s i n g l e e v o k e d p o te n t i a l u s i n g a q u a d r a tic -
c l a s s i H e r . Comput. Prog. Biomed., 1 4 . 2 9 . 19 8 2 .
40. Fukunaga, K. and Koontz, W. L. (J., A p p l i c a t i o n o f t h e K a r h u n e n - L o e v e e x p a n s i o n . IEEE Trans.
Comput., 19. 311. 1970.
41. Mausher, M . J . and Landgrebe, D. A ., T h e K-L e x p a ^ i o n a s a n e f f e c t i v e f e a tu r e o r d e r i n g te c h n i q u e
f o r l i m i t e d t r a i n i n g s a m p l e s i z e . IEF.L Trans. Geosci. Rem. Sens.. 2 1 . 4 3 8 , 1 9 8 3 .
42. Fernando, K. V. M . and Nicholson, H ., D i s c r e t e d o u b l e s i d e d K - L e x p a n s i o n . IEE Proc.. 127, 155,
1980.
4 3 . Bromm, B. a n d S c h a r e i n , E . , P r i n c i p a l c o m p o n e n t a n a l y s i s o f p a i n r e l a t e d c e r e b r a l p o t e n t i a l s t o m e c h a n i c a l
a n d e l e c t r i c a l s i m u l a t i o n in m a n . Electroencephalogr. Clin. Xcurophysiol., 5 3 . 9 4 , 1 9 8 2 .
4 4 . O ja, E . and K arhunen. J . , R e c u r s i v e c o n s t r u c t i o n o f K a r h u n c n - L o e v e e x p a n s i o n s f o r p a tte - m r e c o g n i t i o n
p u r p o s e s , in Proc. IEEE Pattern Recog. Conf.. M i a m i . 1 9 8 0 . 1 2 1 5 .
4 5 . Klemma, V. C . and Laub, A. J . , T h e S V D . its c o m p u t a t i o n a n d s o m e a p p l i c a t i o n s , IEEE Trans. Autom.
Control . 2 5 . 1 6 4 . 1 9 8 0 .
4 6 . Tou, J. T. and Heydorn, R . P., S o m e a p p r o a c h e s t o o p t i m u m f. itu r e e x t r a c t i o n , in Computers and
Information Sciences, V o l . 2 , T o u . J . T . E u . . A c a d e m i c P r e s s . N e w Y o r k , 1 9 6 7 .
86 Bi om ed ic al Signal Processing
47. Haimi-Cohen, R. and Cohen, A ., A microcomputer controlled system for stimulation and acquisition o f
evoked potentials, Comput. Biomed. Res,, in press.
48. Tufts, R. W „ Kumaresan, R ., and K irsteins, 1., Data adaptive signal estimation by SVD o f data matrix,
Proc. IEEE, 70, 684, {982
49- Tom inaga, S ., Analysis o f experimental curves using SV D, I E E E T r a n s . A c o u s t. Speech S ig n a l P ro ce ss . ,
2 9 ,4 2 9 . 3981.
50. Shlien, S ., A method for computing the partial iV D , IE E E T ra n s . P a t te r n A n a !. M a c k . In te llig e n c e . 4,
6 7 1 ,1 9 8 2 .
51. Ditnten. A. A. and van der Kam, J ., The use o f the SVD in electrocardiography, Med. Biol. Eng.
Comput.. 2 0 .4 7 3 , 1982.
52. Foley, D . H. and Sam mon, J . W ., An optimal set o f discriminant vectors, I E E E T ra n s . C o m p u t., 24,
28 !, 1975.
.53. Cox, J. R ., Nolle, IF. M ., and Arthur, R. Digital analysis o f the EEG. the blood pressure wave and
the ECG. Proc. IEEE, 60, 1137, 1972.
54. Noble, B. and Daniel, j . W,». Applied Linear A t zebra. 2nd ed., Prentice-Hall, Englewood Cliffs, N.J..
1977.
55. Haimi-Cohen, R. and Cohen, A ,, On-the-computation of partial-SVD. ... V......... .... d ig ita l Sig.
Proc.. Cappellini. V. and Conslantinides, A. G ., Eds., Elsevier, Amsterdam. 1984.
56. Cheung, R. S. and Eisenstein, B. A ., Feature selection via dynamic programming for text-independent
speaker identification. IEEE Trans. Acoust, Speech. Signal Process., 26. 397, 1978.
57. Chang. C. V., Dynamic programming as applied to feature subset selection in pattern recognition system.
IEEE Tram. Syst. Man Cvbern . 3. 166„ 1973.
5$: Shrdhar, M ., Baramecki, M .. and M ohanlerishm an, N ., A unified approach to speaker verification.
Speech Conunun.. 1. 103, 1982.
59. Cohen, A. and Froind, T ., Software package for interactive text-independent speaker verification. Paper
6.2.3. in ProcJ IEEE MELECON Conf., Tel-Aviv, Israel, 1981.
60 Sakoe. H. and Chiba, S ., Dynamic programming algorithm optimization for spoken word recognition.
IEEE Trans. Acoust. Spcech Signal Process., 26, 43, 1978.
61 . Sakoe, H .. Two level DP matching — a dynamic programming ba<ed pattern matching algorithm for
connected work recognition, IEEE Trans. Acoust. Speech Signal Process.. 27. 588, 1979.
62. Paliwal, K. K ., Agarwal, A ., and Sinha, S. S ., A modification over Sakoe and Chiba’s dynamic time
warping algorithm for isolated word recognition. Signal Process., 4 . 329. 1982.
63. Brown. M. K. and Rabiner, L. R ., An adaptive, ordered, graph search technique for dynamic ti ne
warping for isolated word recognition, / £ £ £ 7r««s. Acoust. Speech Signal Process., 30, 535, 1982.
64. Rabiner. L. R ., Rosenberg, A. £ . , and Levinson, S . E ., Considerations in dynamic time warping
algorithms for discrete word recognition. IEEE Trars. Acoust. Speech Signal Process., 26, 575, 1978.
65. Bellman. R. and Dreyfus, S ., E ds., Applied Dynamic Programm ir”, Princeton University Press, Princeton.
N.J., 1962.
Volume II: Compression and Automatic Recognition 87
Chapter 4
SYNTACTIC METHODS
I. INTRODUCTION
Two general approaches are known for the problem of pattern (and signal) recognition.
The first, and better know r one, is the decision-theoretic, or discriminant, approach (see
Chapter 3) and the second i:. the syntactic, or structural, approach.
In the first approach the signal is represented by a set of features describing the charac
teristics of the signal which are of particular interest. For example, when analyzing the
speech signal for the purpose of detecting laryngial disorders, features must be defined and
extracted which are independent of the text (as much as possible) and are dependent on the
anatomy of the physiological system under test. The features set serves to compiess the data
and reduce redundancy. This general method with its biomedical applications is discussed
in Chapter 3.
The syntactic m ethod1'* uses structural information to define and classify patterns. Syn
tactic methods have been applied to the general problem of scene analysis, e.g ., to the
automatic recognition of chromosomes and finger prints. It has also been applied to signal
analysis,4 y with applications to such areas as seismology10 and biomedical signal processing.
Syntactic m ethods have been applied to EEG analysis,1, M ECG analysis.15 " ^ the classi
fication of the carotid waveform.2' and to speech processing.-4
The syntactic approach has gained a lot of attention from researchers in the field of scene
analysis since this approach possesses the structure-handling capability which seems to be
important in analyzing patterns and scenes. For exactly the same reasons, this approach
seems to have a good potential in analyzing complex biological signals. The human interpreter
of biological signals, the electrocardiographer or electroencephalographer, e.g .. when ana
lyzing the signals, observes the structure of the waveforms for his diagnostic decision. The
human diagnosis process is thus more closely related to the syntactic approach than to the
more conventional decision-theoretic approach.
The syntactic approach is also known bv the terms linguistic, structural, and grammatical
approach. An analog between the structure o f a pattern and the syntax of a language can
be drawn. A pattern is described by the relationships between simple subpattems from which
it is composed. The rules describing the composition of the subpattems are expressed in a
similar m anner to grammar rules in linguistics.
The basic ideas behind the syntactic approach are similar in principal to those of the
decision-theoretic approach. In order to classify a signal into several known classes, the
structure o f each class must be learned. In a supervised learning mode, known samples of
signals from each class are provided such that the structural features (primitives) and the
rules of their combination into the given signal (grammar) can be estimated. These are stored
in the system. An unknown signal, to be analyzed and classified, is subjected to some
preprocessing in order to reduce noise, and its primitives are extracted. Classification is
made by applying the syntax of each one of the classes. By means of some measure, a
decision is m ade as to the best syntax that fits the signal. Figure 1 shows schematically a
general syntactic signal recognition system.
Example 4.1
As an exam ple consider the signal in Figure 2a with the primitives (features) defined in
Figure 2b. The signal can be described by a string of primitives in the example:
aabbccdeffgggfccccccaacbbcccc. The string representation may be sufficient to describe
simple waveform s. More complex waveforms are described by means of an hierarchical
88 Biomedical Signal Processing
TRAINING
Samptesfc—5“ “\
of classified]?; f \ P re - i Pnmiiive .-..-JK,
Gromniatsca!
signets Proccessmd m Extract inference
L U
Grammars
C L A S S IF IC A T IO N
Unknown i
^ ] _fejfP'8- i ._ „ Prs rmiive *•—-------zA Syntax
Stgndi ~ ; Proccessinaj Extract Analysis
\ __ l \
/
a b c
structural tree. Consider the signal in Figure 3a. It has been segmented into six sections.
Each section is to be described by the primitives presented in Figure 3b and the complete
waveform described by the structural tree o f Figure 3c. Another possible set c f • ^ ves
for this example is shown in Figure 3d.
V o l u m e H : C o m p r e s s i o n a n d Automatic Recognition 89
<a.)
/ \ / \ — - - ~
PSLM N SLM P S L H NSLH HOR CUP PEK CAP
lb .)
P -Q QRS S -T
T V—s
/ V
\1 ^ 1 / \ V x HOR • PSLM CAP NSLM HOR
PSL M CAP N SLM HOR ,
(C.1
(d.)
F I G U R E 3 . S y n ta c tic re p r e s e n ta tio n o f F C G . (a ) E C G m o d e l; (b ) a se t o f e ig h t
p r im itiv e s : (c ) r e la tio n a l tre e : (d ) a n a lte rn a tiv e s e t o f p rim itiv e s .
The description o f the structure of the signal is performed by grammars or syntax rules.
Grammar can also be used to describe all the signals (sentences) belonging to a given class
(language). Usually a class is represented by a given set of known signals (training set). It
is required Jo estimate the class generating grammar from the training set — a process known
as grammatical inference.
90 Biomedical Signal Processing
We shall denote the grammar, G , as the grammar that can model the source generating
the signal set \ \ ] the sentence that can be genera ed by the grammar G constitute the set
L(G) — the language generated bv G . A phrase structure gram m ar,1 G, is a quadruple given
by:
G = (VN,VT,R,<r) <4,1)
where:
1. Unrestricted grammars (type 0) are grammars with no restrictions placed on the pro
ductions. Type 0 grammars are too general and have not been applied much.
2. Context-sensitive grammars (type 1) are grammars in which the productions are re
stricted to the form: , _
<4.2)
where A e VN and £*,£>>3 € V*. The languages generated by type 1 grammars are
called context sensitive languages.
3. Context-free grammars (type 2) are grammars in which the productions are restricted
to the form:
A —> p (4.3)
V n2 = M
VT = {a,b}
and R2: t —» a c
Volume 11: Compression and Automatic Recognition 91
Example 4.3 . (
Consider the grammar Gc , = (V ^.V T .R ^a) where VN3 = {a,A}, VT = {a,b}, and
R3: a —* Ab
A - * Aa
A —> a
This is a context-free grammar, since it obeys Equation 4.3. The language generated by it
is the language:
which is the language consisting of strings with n “ a V followed by one *‘b*\ Note that
this language is the same as the finite state language L ( G , o f the previous example. Different
grammars can generate the same language.
Example 4.4
Consider another exam ple1 wiih Gi : = (VN4,V,-.R4.<r), where VN4 — {or.A.B}. Vr =
{a,b}, and
(2) (r -+ bA (6) B b
Grammar Gc: is context free since it obeys Equation 4.3, namely, each production in R4
has a nonterminal to its lefi and to its right a string of terminal and nonterminal symbols.
Examples o f sentences generated by G< 2 are {(ab)"} by activation (n - 1) times rules i and
7 followed by 8, or {ba} by activating rules 2 and 5. In general, the language L(GcO is the
set of all words with an equal number of a’s and b’s.
An alternate method to describe a finite state grammar is by a graph known also as the
state transition diagram. The graph consists of nodes and paths. The nodes correspond to
states, nam ely, the nonterminal symbols of VN. and a special node T (the terminal node).
Paths exist between nodes N; and N, for every production if R of the type N, —» aN,. Paths
to the terminal node T from node A, exist for each production A, —» a.
Example 4.5
Consider the finite state grammar Gl2 = {VN5,VT,Rs,a} with VN5 = {o\A,B}, VT =
{a,b}, and
(2) cj —» b ; (6) B aB
O) A —> bA ; (7) B b
(4) A —» aB :
F IG U R E 4 . F in ite s ta te g ra m m a r . G R .
HI. S Y N T A C T IC RECOGNT7F.RS
A. Liiroducium
The signals under resting are represented by strings that were generated by a grammar.
Each class has its own grammar. It is the task of the recognizer to determine which of the
grammars has produced the* given unknown string. Consider the case where M classes are
given. \v,, i - 1,2, . . ,M, each with its grammar, G,. The process known as syntax
analysis, oi parsing, is the process that decides whether the unclassified string x belongs to
■the. language L(G ). i = 1.2, . . . ,M. If it has been determined that x € U G j), then x is
classified into w;. _ .
We shall consider first th.e recognition of strings by automata. Recognizing automata have
octrn developed for the various types of phrase structure grammars. O f interest here are the
iinite automaton, used to recognize finite state grammars and the push-down automaton used
to recognize context-free grammars. The discussion of more general parsing methods will
follow.
where '2 is the alphabet — a final set of input symbols, Q is a final set of states, b is the
mapping operator. q0 is the start state, and F is a set of final states.
The automaton operation can be envisioned as a device, reading data from a tape. The
device is initially at state q0. The sequence, x, written.-on the tape is read, symbol by symbol,
by the device. The device moves to another state by the mapping operator:
5(q ,.,(*) = q2
£€£ (4.6A)
which is interpreted as: the automaton is in state q, and upon reading the symbol £ moves
to state q2. The string x is said to be accepted.by the automaton A, if upon reading the
complete string x, the automaton is in one of the final states.
The transformation operation (Equation 4.6) can be extended to include strings. The string
x will thus be accepted dr recognized by automaton A, if:
namely, starling from stale q0 and scanning the complete string x, the automaton A will
follow a sequence o f slates and will hah at state p which is one of the final states.
Example 4 .6
Consider a deterministic finite state automaton, A ,, given by
Q. • and r , « {q,,q4}
- Id,} 8 ( q ,.b ) ~ { q j
The state transition diagram of A, is given in Figure 5. Note that terminal slates are denoted
bv two concentric circles. The strings x, ~ {(abfb}, x2 = {(ab)ma2} are recognized by A,
since Siq^.x. j {q3} € F, i - 1,2; the string x, - {(ab)"ab} is recognized since 5(q0.x3)
= { q j g F.
A nondeterministic finite stale automaton is the same as the deterministic one. except for
the fact that ihe transformation §(q,£) is a set of state-, rather than a single slate, as indicated
in Equation 4.6. Hence, for the uondcicrminisiic automaton:
Example 4,7
Consider the nondeterministic finite state automaton A2 = ( 2 MQ2,8,q0,F,) with X, =
{a,b}, Q 2 - {q0,q ; ,q2,q.}. and F, = {q,}. The state transition mapping of A2 is given by
the transformations:
The state transition diagram of A2 is given in Figure 6. Note that the transformation S(q,,a)
= (q2,q3} makes the automaton a nondeterministic one.
The strings x4 = {abaab}, x5 = jab”’a"b} will be recognized by A2, since 5(q0,X;) = {q3},
i - 4.5. Note that the automaton A, will recognize aii strings generated by the grammar
GH2 given in Figure 4. it can be show n1 that for any nondeterministic finite state automaton,
accepting a set of strings, L, there exists a deterministic automaton recognizing the same
set of strings. It can also be shown that for any finite state grammar, G, there exists a finite
state automaton that recognizes the language generated by G.
94 B i om ed ic al Signal Processing
F IG U R E 6 . S ta te tra n s itio n d ia g ra m c f n o n -
- d e te r m in is tic f in ite s ta te a u to m a to n . A ,. —
Example 4.8
Consider now the ECG signal shown in Figure 3A with the primitives of Figure 3D (the
example is based on an example given by Gonzalez and Thompson2). A regular grammar
describing the normal ECG is given by
B —> qrs C C ^ bD D - » tF
D —» bE E tF F -* b
H —» pA
The normal ECG is defined here as the one having a basic complex (p qrs b t b) with normal
variations including one b between the p and qrs waves, an additional b between the qrs
Volume II: Compression and Automatic Recognition 95
and Cwaves, and additional one or two b’s between the t and the next p waves. A deterministic
finite state automaton that recognizes normal ECG is shown in Figure 7. In this diagram a
state q r has been added to denote the terminal state.
with 2 , Q, q(„ and F the same as in Equation 4.5 and with T a finite set o f push-down
symbols; Z0 € T, a start symbol initially appearing on the push-down storage. The operator
8(q,£,Z) is a mapping operator:
where q ,q ,,q 2> • . - ,qfll € Q are states, £ e 2 is an input symbol, Z is the current symbol at
the top of the stack, and y x,y 2............. ^ T are strings of push-down symbols. The trans
formation (Equation 4.9) is interpreted as follows: the control is in state q, with the symbol
% Biomedical Signal Processing
\npu1 String
F IG U R E 8 . P u sh -d o w n a
Z ai the top o f the push -d ow n stack , and the input sym bol .£ is read from the input spring.
T he con trol w ill ch o o se on e o f the pairs (% ,-/,)»i = 1 ,2 , . . . .m , sa y ( q ^ . ) . !t w ill replace
"the sy m b o l Z in ih e stack l»y the string y ,, such that its leftm ost sy m b o l appears at the top
o f th«* Ntack, w ill m ove tu f.-ute q,. and w ill read the next sym bol o f the input string.
If £: = X (the null sy m b o l), then (independent o f the input s in n g ) the autom aton, in state
q , v\ i?i jep lace Z by 7 . in the stack and w ill remain in state q. W h en 7* ~ X the upper
sym b ol o f the stack is cleared . If th e autom aton reaches a step 8 ( q ,,^ c ) . n am ely, the
upperm ost s y m b o l-o f the stack m atch es, the input sym b o l, t , is p o p p ed from the stack
ex p o sin g the next in line stack sy m b o l, iq, w ith the next transform ation 5 (q i, 7 ,iq). If ihe
autom aton reaches a com b in ation o f state, input and state sy m b o ls for w h ich no transfor
m ation is d efin ed , it halts and rejects the input.
A ccep ta n ce o f strings b y the p u sh -d ow n autom aton can be ex p re ssed in tw o w ays: (1 )
I h e autom aton reads all sy m b o ls o f ih e input string w ithout bein g h alted and after the final
input sy m b o l has b een m o v ed in to a state q b elon gin g to the final set F . and (2 ) the autom aton
reads all input sym b ols w ith out b ein g halted. T h e transform ation taken after reading the last
input sam p le m o v es the au tom aton into a state q € Q w ith em p ty sta ck , n a m ely , 7 = X.
T h is typ e o f accep tan ce is ca lled “ accep tan ce by em pty store” . For th is c a se it is co n v en ien t
to d efin e the set o f final states as the null set F = 0 .
Example 4,9
C on sid er a signal w ith p rim itives {a ,b ,c ,d } as show n in Figure 9 . C onsid er a cla ss o f
sign als generated w ith th ese p rim itives such that the language d e s n lb in g the cla ss is the
n onregular (n onfm ite state) con tex t-fre e lan gu age {xjx = abncd n, n 2 = 0 }. T h e m em bers o f
the c la ss o f sign als are d ep icted in Figure 9 . T he con text-free gram m ar that generates the
c la ss o f sign al is
G C3 = (V N? V T9 ,R 9,cr) = ( { a ,A } ,{ a ,b ,c ,d } ,R ,,a )
R^ cr —» aA
A —* ! A d
A -> c (4.10)
V o l u m e II: C o m p r e s s i o n a n d Au tomatic Recognition 97
c d
QC
obed
ab? cd2
abncd"
F IG U R Ii P r i m i t i v e s a n d s a m p l e s o f L ( G 0 ).
The push-down automaton that recognizes language L(G,.,) is the PDA M, given by
M, = ( 2 , Q , r ?8.qn.Z,,.F) = ({a,b.c.d},{q0},{or,A,B,C\D},8.q(„<T,<J>)
Suppose that an input string x, = {abc}, that does not belong to G0 , is checked by the
........waton M ,. Th» following steps take place. Initially the stack holds the string cr; the
input symbol read first is a; hence the first transformation is invoked. The automaton has
to choose between replacing cr by DAB or by C (since this is a nondetermimstic machine).
If there is a right choice, it is assumed the automaton will take it.
98 Bi om ed ic al Signal Processing
£(q0,a,<r) = {(q^DAB)}
The automaton remains in state q0; its stack now holds the string DAB with the symbol D
in the uppermost (reading) location and the next input symbol (b) is read. The second
transformation is invoked.
S(q0,b,D) = {(q„,X)}
The automaton remains in state q{); the symbol D is removed from the stack (replaced by
the null symbol X); the stack contains AB with the symbol A pushed into the uppermost
stack location. The next input symbol, c, is read and the next transformation is
This transformation leaves the stack unempty, with symbol B. The input string has been all
read. It, however, cannot yet be rejected since the first choice made may have been the
wrong choice.
Return now to the first step and take the choice:
5(qc),a,o-) = {(q0,C)}
reading the next input symbol (b) and having C in the stack calls for transformation 8(q0,b,C)
which is undefined, thus causing the automaton to halt. The string {abc} is not recognized
by the PDA, M,.
Consider now the input string x, = {ab2cd2} which does belong to the langauge L(GC3).
The following steps will be taken by M,:
S(q0,a,or) = {(q0,DAB)} ;
The last transformation replaces the symbol B in the stack by the null symbol X and leaves
it clear at the end of the input string. The string x2 = {ab2cd2} is therefore accepted by M ,.
In iiiobt applications of syntactic signal processsing, a class o f signals is given (by means
of their primitives and grammar) and a recognizer has to be designed to recognize the class
of signals of interest. It has been proven2 that for each context-free grammar, G, a PDA
can be constructed to recognize L(G). The inverse statement, namely, for each PDA there
is a grammar recognized by the automation, is also correct.
Consider the case where a context-free grammar, G, is given and it is desired to obtain
the PDA that recognizes it. One relatively simple algorithm2 is as follows: let G = (VN,VT,R,a)
be the given context-free grammar. A PDA recognizing L(G) is A = (VT,{q0} ,r,5 ,q 0,cr,<j>)
with the push-do vn symbols T being the union of VT and VN and with the transformations
5 obtained from the productions R by:
Volume II: Compression and Automatic Recognition 99
(1) If A a is in R,
Example 4.10
Consider the context-free grammar GC3 (Equation 4.10). The PDA, M2, designed to
recognize L(G0 ) by the rules (Equation 4.12). is
M2 = ({a,b.c.d},{q0}.{<r,a,b,c,d,A}.8, q(Mcr,<J>)
8(q0,X.cr) {(qlt.aA)}
By rule (1)
8(q(„a,a) —►{(q,,.X)}
8(q0,b,b) - » {(q„.\)}
By rule (2)
8(q„,c,c) {(q.,,X)}
Consider again the input string x: = {ab2cd:} belonging to L(GC3). The PDA will proceed
as follows:
8(q0.X,<j) = {(q(„aA)}
stack = bAd
100 Biomedical Signal Protesting
stack “ Ad
stack - bAdd
b is popped from slack,
stack “ Add
stack - dd
stack = d
AT - <2,Q,A,-8,q0,F) <4.13)
where X, Q, q0, and F are the same as in Equation 4.5, A is a finite output alphabet, and
6 is the mapping operator. The translator can be seen as a finite state automaton with
additional output tape onto which output mapping is written.
E . P arsing
General techniques for determining the sequence of productions used to derive a given
string x, of context-free language L(G), exist. These techniques are called parsing or syntax
V o l u m e II: C o m p re ss io n a n d Automatic Recognition 101
"*/ X' * . #
^ S/Ab 7 \ y / \ \
>-^L-
\ ,b /^ i
qr^
^ 1>^s/Afe___ {qc / ?qc K"qrs/H
A Yu * ......>:,; 5
b/N\ /b/N
^ N ' x / ^ N \5 ^ A b / /
(q j ' \ / ; /
/p .b .q r s /A b ,
i
..t____
qT
-f '
analysis techniques. Two general approaches are known for parsing the string x. Bottom-
up parsing starrs from the srring x and applies productions of G, in reverse fashion, in order
to get. io the starting symbol <r. Top-down parsing starts from the symbol a and by applying
the productions of G tries to get to the string x. Efficient parsing algorithms, such as the
Cock-Yonger-Kasami algorithm, have been developed. The interested reader is referred to
pattern recognition literature (e.g., References 1 and 2).
A. In tro d u ctio n
Most often signal processing is done in a stochastic environment. Noise and uncertainties
are introduced either due to the stochastic nature of the signal under test or due to the
acquisition and primitive extraction processes. The classes of signals to be analyzed may
overlap in the sense that a given signal may belong to several classes. Stochastic languages
are used to solve such problems. If n grammars, G, i = 1,2, . . . ,n. are considered for the
string x (representing the signal), the conditional probabilities p(x|Gs), i = 1,2, , . . ,n, are
required. The gram m ar that most likely produced the string x is the grammar for which
p(Gjx) is the maximum.
A stochastic gram m ar is one in which probability values are assigned to the various
productions. The stochastic grammar G„ is a quintuple Gs = {VN,V, ,R ,P, ct}. where VN,
Vx, R, and cr are the same as in Equation 4.1 and P is a set of probabilities assigned to the
productions o f R.
We shall deal here only with unrestricted and proper stochastic grammars. An unrestricted
102 Biomedical Signal Processing
stochastic grammar is one in which the probability assigned to a production does not depend
on previous productions. Consider a nonterminal, Tir for which there, are m productions: T;
~ » a (,T j~ > a 2, . . . ,T; « m; the productions axe assigned probabilities, P,j, j - 1,2, . . . ,m.
A proper stochastic grammar is one in which
Stochastic grammars are divided into four types, in a similar manner to nonstochastic
grammars (Equations 4.2 a n d 4 .3 ). Therefore we speak of stochastic context-free and sto
chastic finite state grammars and languages.
Example 4.12
Consider the proper stochastic context-free grammar:
G s, = ({or},{a,b},R,{p,,(l - p^K o)
R: (pi) (r —* acra
(1 - p,) cr —> bb
where the first production is assigned the probability p, and the second is assigned (I -
j. Pi), The grammar is clearly a proper stochastic grammar. The grammar Gs, generates strings
of the form xn = a*bba". n 5= i. The probability of the string is p(xn) = p"(l - p,).
B. Stochastic Recognizers
Finite state stochastic grammars can be recognized by a stochastic finite automaton. The
automaton is defined by the sextuple:
As = (2 ,Q ,5 ,q 0,F,P) (4.15)
where 2 , Q, q0, and F are the same as in Equation 4.5, P is a set o f probabilities, and 8 is
the mapping operator to which probabilities are assigned. The stochastic finite automaton
operates in a similar way to that of the finite one, except that the transition from one state
to another is a random process with given probabilities. For an unrestricted automaton the
probabilities do not depend on a previous state.
Example 4.13
Consider the finite state automaton o f Figure 7, designed to recognize normal ECG.
Assume that there is a probability o f 0.1 that there will be no “ t ” wave present. We would
still want to recognize this as a normal ECG. The automaton designed to recognize the
signal is a modification of the finite state one. Its state transition diagram is shown in Figure
11. Note that another path has been added between states qD and qG and that each path is
assigned an input symbol and a probability. When the automaton is in state qD and the input
symbol is “ b ” , it can move (with probability of 0.9) to qE or (with probability 0.1) to state
qG. The stochastic state transitions of the automaton are as follows:
8(qA,qrs) = ( q j p(qc|qrs,qA) = 1
Volume II: Compression and Automatic Recognition 103
H G U R K I ! . S u ite t r a n s i t i o n d i a g r a m o f s t o c h a s t i c f i n i t e s ta te
a u to m a to n fo r E C G a n a ly s is .
; P(qG|b»q,>) =0.1
fi(qo.p) = W ; p(qA|p,qG) = 1
V. GRAMMATICAL. INFERENCE
V I. E X A M PL ES
(CAROTlO PULSE)
(M i! «M2' . IMS)
(P O S WAVE'. • •
' / WPV WPP
WPP
F I G U R E \2. C a r o t i d b l o o d p r e s s u r e w a v e w i t h r e l a t i o n a l tr e e .
A typical systole pari may contain the following primitives: LLP, WPP, WPV, WPP, MLN;
and a typical diastole may contain NPP, WPP, TE.
A context-free grammar, G p, has been chosen5 to describe the signal with
where:
VN = {(Carotid pulse), (Systole), (Diastole), (M axim a).-(M l), (M2), (M3), (Di
crotic wave), (Pos wave). (Neg wave)};
106 Biomedical Signal Processing
B. Syntactic Analysis o f EC G
Several syntactic algorithms have been suggested for the analysis of the ECG signal,
especially for the problem o f QRS complex detection.Wv!9-22A simple syntactic QRS detection
algorithm, implemented on a small portable device, was suggested by Fumo and Tompkins.21
A simple finite state automaton, AE, has been designed given by
where:
The two terminal states, qQ and qN, correspond to a QRS wave and noise. The state
transition rules of AE are
The state transition diagram of the automaton Ab is depicted in Figure 13. The primitives
(normup. normdown, zero, and other) are calculated as follows. The ECG signal. x(t), is
sampled with sampling interval, T. The derivative of x(t) is approximated by the first
difference, s(k):
x(kT) - x(kT - T)
si k) = (4.18)
The samples {s(k>} are grouped together into sequences. Each sequence consists of consec
utive samples with the same sign. Consider, for example, the case where s(n - 1) < Oand
s(n) > 0.
A new sequence of positive first differences is generated:
where s(m -I- 1) is the first sample to become negative. Two numbers are associated with
the sequence (Equation 4.19), the sequence length, SL, and the sequence sum, SM:
SL = m - n + 1
Using predetermined thresholds on S, and SM, the primitives are extracted. The algorithm
has been reported to operate at about ten times real time.
108 Biomedical Sigrnl Processing
Table 1
m i M m V E EX TRACTION FO R QRS DETECTIO N
6000
2.000
1000
24
A more elaborate syntactic QRS detection algorithm has been suggested by Belforte el
a I . H e r e a three-iead ECG was used. The first difference* of {he three signals were computed
• Equation 4.18), yielding s,(k), i ~ 1,2,3. The energ\ o| the first differences, sf(k), i =
1.2,3, 'vere used to extract the primitives. A threshold was determined for the energy and
..'.e pulses above this threshold were considered. The pe?k of a pulse was-denoted, a, and
duration (time above threshold) was denoted, d T h e quantaties a and d were roughly
quantized by means of Table 1, yielding the primitives a,b,c. Peaks were considered as
belonging to different events every time the interval between them was longer than 80 msec.
Strings were thus separated by the end of string symbol, w.
A sample o f one lead of the ECG, derivative, and energy are shown in Figure 14. Pulses
above threshold may belong to a QRS complex or may be the result of noise. A string,
from lead i. that may be the result o f a QRS complex is called a QRS hypothesis and is
denoted Q,. A grammar has been inferred from training samples that always appeared with
QRS complexes. This grammar was denoted GQ. Another grammar, Gz , has been introduced
representing strings that in the training set sometimes were from QRS complexes and some
times were not. The two grammars are given by
where
{U ,,U 2,U„U4,QRS}
VTp = {a,b,c}
Volume //; C o m p r e s s i o n a n d Automatic Recognition 109
l j r v
'‘- " h I'W.V" i ‘t f '
j! i , | i , , , ,
t i
I_L
sr.r..
rHjrUivh * Syr: lac lie OftS oeiecf:»>:'i — LCC derivative and rrnern). (From 8el
jbrtc. G . . D v-M ori, R .. an** i r i s . F . . I[H E I'runs. Biomed. En#.. B K 1 L - 2 0 . 1 2 5 ,
1979 (@ 1979. !£l-:F.). \\!)’.\"
■ n is s ic n .)
Va -* a: U4 b; U4 -* c
and
G, = {V„,,Vt,,R,.Z} (4.22!
where
- {Y,.Y,,Z}
= (b-c}
Y,->cY, Y, k bY2
Y, cY , Y, ►b
for example, the strings {bcbcaa}, {bn}, and {bcnaa} are generated by GQ and [ebb] and [bcnb]
are generated by Gz.
116 Biomedical Signal Processing
F IG U R E 15 . S y n ta c tic Q R S d e te c tio n — th r e e le a d s . ( F r o m B e l i e v e . G . , D e - M o r i.
R . . a n d F e r r a r i s . F . . IEEE Trans. Biomed. Eng., B M E - 2 6 , 1 2 5 . 1 9 7 9 ( © 1 9 7 9 .
I E E E ) . W i th p e r m i s s i o n . )
The rule suggested by Belforte et a l.'g for recognizing a QRS event is as follows. Let Q*.
i = 1,2.3* be a QRS hypothesis emitted under the control of grammer GQ in the time interval
{*>.!> ti.z}* where i denotes the lead number. Let also Zj? j = 1,2,3, be the hypothesis emitted
under the grammar G7 in the time {t3 ,,t, 2}, For a given lead and time interval, only one
hypothesis can be emitted since the grammars GQ and Gz generate disjoint languages.
The hypothesis Qj and Z., i,j = 1,2.3, whose time intervals partially overlap are used to
determine the presence or absence o f a QRS complex. The decision rule suggested19
i # j (4.23)
where A and V are the logical “ and” and “ inclusive or” operators. A QRS is declared if
h — 1. The algorithm was checked, in real time, with data base of 620 QRSs from 16
healthy and ill patients with no errors and less.than 0.5% false alarm errors. Examples of
the three-lead ECG and detection results are shown in Figure 15.
tactic methods may have a good potential for EEG analysis since they utilize this information.
Syntactic analysis o f EEG spectra has been suggested.1114
The EEG was divided into nonoverlapping segments o f 1-sec duration. The spectrum of
each epoch was estimated (by AR modeling). Discriminant analysis of the training set
generated seven discriminant functions:
{AL,A,SL,S,L,NL,N} (4.24)
REFERENCES
J
Volume II:Compression and Automatic Recognition 113
Appendix A
i. INTRODUCTION
H ie ty p ical le v e ls and frequency ranges o f various biom ed ical sign als are briefly d iscu ssed
in this a p p e n d ix . O n ly rough range- are given b ecau se o f {lie large variances that ex ist in
these ty p es o f s ig n a ls , and the strom d ep en dence on the acquisition m ethod. R eco rd s o f
typical s ig n a ls are sh ow n for m ost o f the signals d iscu ssed here. A brief d iscu ssio n on the
main p ro ce ssin g m eth od s and problem s is presented. B ecause o f the large a m ou nts o f
inform ation a v a ila b le con cern in g ’.Ik c ffe c .s o f various ubnoim aiitics on the sig n a ls , e s p e
cia lly on the m ore im portant on es such as the ECG or E E G r it w as im p ossib le 10 present a
d etailed d is c u ss io n . S ele cted referen ces are given that refer the reader to a more detailed
d iscu ssio n for each sig n a l. The signal's. have been divid ed into groups according to inherent
ch aracteristics. In so m e c a se s, h ow ever, the d iv isio n is not perfectly clear.
A. Action P otential
T his is the potential generated b\ the excitab le mem brane o f a nerve or m u sc le cell
(C hapter 2 . V o lu m e I). T he action potential generated by a sin gle cel! can be m easu red by
m eans o f a m icroelectrod e inserted into the cell and a reference electrode located ui the
extracellular flu id . T h e m icroeiccir^Je has a very high input im pedance. An a m p lifier w ith
a very lo w n o ise figu re and input capacitance m ust be u se d .1
In m ost a p p lica tio n s the shape oi the action potential is o f no interest. It is the interspike
intervals that are o f interest (Figure 1. Chapter 2). T he time o f occurrence o f the sp ik e is
detected and point p rocess m ethods are used (C hapter 2).
W hen the action p oten tials fiom m ore than on e unit are m onitored by the electro d e.
m u itisp ik e: train a n alysis techniques are required. T he action potentials from the various
neurons can be id en tified by tem plate m atching m ethods (Chapter 1) and m arked point
p ro cesses a n a ly sis can be applied. T ypical level range o f the action potential is 100 m V .
The band w id th required i.> about 2 kHz.
F I G U R E 1. S e n s o r y n e r v e a c t i o n p o t e n t i a l s e v o k e d tV o m t h e m e d ia * ! n e r v e a t h e e l b o w a n d w r i s t
a f t e r s t i m u l a t i o n o f t h e i n d e x f i n g e r . { F r o m L e n m a n . J . A . R . a n d R i t c h i e . A . E . . Clinical Elec
tromyography. P i t m a n M e d i c a l a n d S c i e n t i f i c , L o n d o n . 1 9 7 0 W itfc p e n « i » : o n . )
E. E lectroencephalogram (EEG)
The recording of the electrical activity of the brain is known as electroencephalography
(EEG). It is widely used1112 for clinical and research purposes. Methods have been developed
to investigate the functioning of the various parts of the brain by means of the EEG. Three
Volume II: Compression and Automatic Recognition 115
types o f recordings are used. Depth recording is done by the insertion of needle electrodes
into the neural tissue of the brain. Electrodes can be placed on (he exposed surface o f the
brain, a method known as efectrocorticogmm. The most generally used method is the
noninvasive recording from the scalp by means of surface clecliodes.
The investigation o f the electrical activity of the brain is generally divided into two modes.
The first is the recordings of spontaneous activity of the brain which is the result o f the
electrical field generated by the brain with no specific task assigned ro it. The second is the
evoked potentials (EP). These are the potentials generated by the brain as a result of a
specific stimulus (such as a flash of light, an ,.udio click, etc.). EPs are described in the
next section.
The surface recording of the EEG depends ori the locations of the dcorodes. In routine
clinical multiple EEG recordings, the electrodes arc placed in agreed upon locations in the
frontal (F), central (C), temporal O'), parietal (P). and occipital (0) regions, v.ith two
common electrodes placed on the eariobes. Between 6 to 32 channels are employed, with
8 or 16 being the number most often used. Potential differences between the various electrodes
are recorded. There arc three modes of recordings: the unipolar, averaging reference, and
bipolar recordings (e.g ., see Strong).1'
The bandwidth range of the scalp EEG is DC to 100 Hz, with the major power distributed
in the range o f 0 .5 to 60 Hz. Amplitudes of the scalp EEG range from 2 to 100 jaV. The
EEG power spectral density varies greatly with physical and behavioral states. EEG frequency
analysis has been a major processing tool in neurological diagnosis for many years, h has
been used for me diagnosis of epilepsy, head injuries, psychiatric malfunctions, sleep dis
orders. and others. The major portion of the EEG spectrum has been subdivided into fine
bands.
T he delta ra n g e — The part of the spectmm that occupies the frequency range of 0.5
to 4 Hz is the delta range. Delta waves appear in young children, deep sleep, and in some
brain diseases. In the alert adult, delta activity is considered abnormal.
T he theta ra n g e — The theta range i> the part of the spectrum that occupies the frequency
range of 4 to 8 Hz. Transient components of theta activities have been found in normal
adult subjects in the alert state. The theta activity occnrs mainly in the temporal and central
areas and is more common in children.
T he alpha ra n g e — The alpha range is the part of the spectrum that occupies the range
of 8 to 13 Hz. These types of rhythms are common in normal subjects, best seen when the
subject is awake, with closed eyes, under conditions of relaxation. The source of the alpha
waves is believed to be in the occipital iobes. An example of the alpha activity can be seen
in Figure 2.
T h e beta ran g e — The beta range is the part of the spectrum that occupies the range 13
to 22 Hz. The beta rhythms are recorded in the normal adult subject mainly from the precemral
regions, but many appear in other regions as well. The beta range has been subdivided into
two: Beta I is the higher frequency range and beta II is the lower frequency range. P>eta II
is present during intense activation of the CNS, while beta I is diminished by such activation.
Sedatives and various barbiturates cause an increase” of beta a c t i v i t y often «p to amplitudes
of 100 jtV.
Time domain analysis is also used for EEG processing to detect short wavelets. This
has been applied mainly in sleep analysis. Sleep is a dynamic process which consists of
various stages. At the beginning o f the process the subject is in a state of drowsiness where
widespread alpha activity appears. Light sleep, stage 1, is characterized by low voltages of
mixed frequences. Sharp waves may appear in the EEG. These are the result of a response
to stimuli and are known as V-waves (Figure 3b). The spectrum at stage 1 of sleep is
dominated by theta waves. In state 2, the slow activity is increased and sleep sr ndles appear.
These are bursts o f about 3 to 5 cycles of alpha-like activity with amplitude of about 50 to
100 fxV. In stages 3 (moderate sleep) and 4 (deep sleep), there is an increase in irregular
116 Biomedical Signal Processing
1 i
?<L J*3 l*O0iiV
Y • * *T 1 to ?
gMHI
Eyes ■ ■ Eyes
closed opened
L -
(a) U)
FIGURE 2. EEG recordings, (a) Subject with complete absence of alpha waves: (b) subject with alpha waves,
diminished for only about 1 sec following eve opening. (From Kiioh. L. G-. McComas. A. J., Ossclton. I. W .,
and Upton, A. R. M .. Clinical Electroencephalography, 4th ed,. Butterworths. London, 1981. With permission. )
FIGURE 3. EEG recordings, stages o f drowsiness and sleep, (a) Early drowsiness, widespread alpha rhythm:
(b) light sleep (stage 1), note vertex sharp waves in response to sound stimulus at X; (c) light sleep, theta dominant
stage; (d) stage 2. emerging o f sleep spindles; (e) and (f) stages 3 and 4. Increasing irregular delta activity, K-
complex responses to sound stimuli at X. (From Kiloh, L. G ., McComas, A. J.. Osselton, J. W ., and Upton, A.
R. M .. Clinical Electroencephalography, 4th ed., Butterworths, London, 1981. With permission.)
delta activity and the appearance o f K-complexes. These complexes, most readily evoked
by an auditory stimulus, consist of a burst of one or two high-voltage (100 to 200 uV) slow
waves, sometimes accompanied or followed by a short episode of 12- to 14-Hz activity11
(Figure 3). Another sleep stage has been defined, the rapid eye m o vet,^..^ stage.
The EEG of the REM stage is similar to that of stage 1 and early stage 2, but in which
REM appear. It has been also termed the paradoxial sleep state (Figure 4).
Volume 11: Compression and Automatic Recognition 117
L'5fn -Jtl'
v ^ A ^ / v
FIGURE 4. Stages of wakefulness and sleep. Upper channel of each pair, eve movements plus
submental EMG; Lower channel, EEG. (From Kiloh. L. G .. McComas, A. J.. Ossehon, j. W.,
and L pion, A. R. M.. Clinical Electroencephalography, 4th ed.. Butterworths, London. 1981. With
permission.)
Several abnormalities are seen by the EEG. Epilepsy is a condition where uncontrolled
neural discharges take place in some location in the CNS. Such a seizure unvoluntarily
activates various muscles and other functions while inhibiting others. Several types of
epilepsies are known, among them are the grand and petit mal, myoclonic epilepsy, and
others (Figure 5).
transient, having its maximum value in the region of the vertex. The response is similar in
all types of stimuli. It becomes less marked when the same stimulus is repeated. The V-
w a n ^ d K-complex discussed in the previous section are nonspecific EPs. The specific
rcsn^” e is initiated with some latency after the stimulus has been applied. It has its maximum
in i ordeal area, appropriate to the modality of stimulation.
The EP is very low in amplitude, which is in the range of 0. i to 10 jxV The ongoing
EEG in which the EP is burried may be an order of magnitude larger. Synchronized averaging
techniques are usually used to detect the average evoked potential (AEP) (an abbreviation
used also for auditory evoked potentials). When the single EP is required,*9 other methods
o f signal to noise enhancement must be used (Chapter !), There are essentially three major
types o f evoked potentials in common use.
V isual evoked potential (VEP) — The VEP is recorded19 from the scalp over the occipital
lobe. The stimuli are light flashes or visual patterns. The VEP has an amplitude range of 1
to 20 jjlV with a bandwidth of 1 to 300 Hz. The duration of the VEP is of 200 msec. VEP
has been used for the diagnosis of multiple sclerosis (the optical nerve is co m m o n ly affected
by the disease), to check color blindness, to assess visual fields deficits, and to check visual
acuity. Figure 6 shows a typical VER.
Som atosensory evoked potential (SE P, SSEP) — The SEP is recorded-' with surface
electrodes placed over the sensory7 cortex. The stimulus may be electrical or mechanical.
The duration of cortical SEP is about 25 to 50 msec, with a bandwidth of 2 to 3000 Hz.
Subcorticai SEP is much longer and lasts about 200 sec. Figure 7 depicts cortical and
subcorticai SEPs. SEP is used to provide information concerning the dorsal column pathway
between the periferal nerve fibers and *he cortex.!!
A u d ito ry evoked potential (A EP) — AEPs are recorded by electrodes placed at the
vertex.2' 22 The auditory stimulus can be a click, tone burst, white noise, and others. The
A EP is divided15 into the first potential (latency of about a millisecond), the early potential
(eighth nerve and brainstem, 8 msec), the middle potential (8 to 50 msec), and the late
potential (50 to 500 msec). The initial iO-msec response has been associated with brainstem
activities. These brainstem auditory evoked potentials (BAEP) are very low in amplitude
(about 0,5 jjlV ) . The AEP has a bandwidth o f 100 to 3000 Hz. AEPs have been used to
check hearing deficiencies, especially in children. Figure 7 dei icts a typical cortical and
subcorticai AEP.
Volume II: Compression and Automatic Recognition ! 19
N12C
Other evoked potentials — Potentials evoked by pain"5 stimuli have been recorded. Such
a stimulus can be an intense thermal pulse from an IR laser beam (Figure. 7 B), Olfactory
evoked potentials have been reported as weli as vestibulospinal potentials.
The processing of the EEG and EP requires many of the methods discussed in :h - hook.
The EEG is usually recorded in several channels for relatively long periods of t i Large
amounts o f data are thus collected. Automatic analysis and data compression techniques are
needed. Time series analysis methods (Chapter 7, Volume I) have widely been applied to
EEG analysis. Most often the EEG is modeled by an AR model,26 and adaptive segmentation
methods27 are employed. The estimation of the EEG power spectral density {Chapter 8,
Volume I) is an important part in both clinical and research oriented EEG analysis. Automatic
classification methods (Chapter 3) have been applied to the EEG for automatic sleep >iaging,
depth of anesthesia monitoring, and others. Wavelet detection methods (Chapter 1 ; have
been used lo detect K-complexes and spindles in the ongoing EEG. Principal components
and singular value decomposition methods (Chapter 3) have been used-8 to analyze evoked
potentials.
G. Electromyography (EMG)
EMG is the recording of the electrical potential generated by the muscle.3*29 The activity
of the muscle can be monitored by means of surface electrodes placed on the >kin The
signal received yields information concerning the total electrical activity associated with, the
muscle contraction. More detailed information is often needed for clinical diagnosis. Con
centric needle electrodes are then inserted through the skin into the muscle. The signal
received is known as the motor unit action potential (MUAP). Higher resolution can be
achieved by the use of microelectrodes by means of which single muscle fiber action
potentials are recorded. The three types of EMG signals are briefly discussed h ere.5*’
Single fiber electromyography (SFEMG) — The action potentials recorded from a single
muscle fiber have a duration of about 1 msec, with amplitudes of a few millivolts. The
bandwidth used to p r o c e s s tHf» ^ff.M G is 500 Hr to 10 kHz. Although the SFEMG contains
low frequencies, it is advisable to cut off the low band so that contributions from more
distant Fibers (having most of their power in the low range due the volume conductor) can
124 Biomedical Signal Processing
a difficult task mainly due to the complex volume conductor. Most ECG analysis and
diagnosis,46 however, are performed directly from the surface recordings.
Conventional ECG consists of the PQRST complex with amplitudes of several millivolts.
It is usually processed in the frequency band of 0.05 to 100 Hz47 where most of the energy
of the ECG Is included.
The first step in ECG processing is the identification of the R wave. This is done in order
to synchronize consecutive complexes and for R-R interval (heart rhythm) analysis. Various
techniques o f wavelet detection have been employed48 (Chapter 1, Volume II); the problem
is particularly severe when recording the ECG under active conditions where muscle signals
and other noise sources obscurc the QRS complex, The analysis of the R-R interval is an
important part of heart patient monitoring. Several methods have been employed for the
analysis, among them are autoregressive prediction49 and state estimation.50
Much effort has been placed on the development of algorithms lor automatic processing*' 32
of the ECG for monitoring, data compression, and classification. Optimal features53 of the
ECG have been discussed and a variety o f methods,54’57 including linear prediction^4 55 and
Karhunen-Loeve expansion,56 have been employed for compression and classification.
2. High-Frequency Electrocardiography
It has been found that the higher-frequency band of 100 to 1000 Hz filtered out in the
normal ECG does contain additional information.58'60 Waveforms known as notches and
slurs which are superimposed on the slowly varying QRS complexes have been recorded.
I. E lectrogastrography (EG G )
The stomach, like the heart, possesses a pacemaker that generates a sequence of electrical
potentials. Unlike the heart, defined pacemaker cells have not been found in the stomach.
The cyclic electrical potentials are transmitted through the smooth muscle fibers, causing a
slow rhythmic (of the order of 0.05 Hz) mechanical motion This motion is responsible for
mixing, grinding, and propelling the absorbed food.
Electrical potential changes generated by the stomach can be picked up65 by means of
Volume II: Compression and Automatic Recognition 125
0-2 0-3
f r e q u e n c y (H z)
FIGURE ! I. Power spectral density functions o f dog’s electrogastrogram (EGG l Calculated from
a record o f 107.7 tnin. The frequency at about 0.32 Hz is o f duodenal origin. (From van der Schee.
E. J.. Electrogastrography Signal Analytical Aspects and Interpretation. Doctoral thesis. University
o f Rotterdam, The Netherlands. 1984. With permission.)
surface electrodes. The signal has a dominant frequency equal to the frequency o f the gastric
electric control activity (ECA) which is about 0.05 Hz in m an/'5 The frequency bandwidth
of the signal is about 0.01 to 0.5 Hz. Optimal locations of electrodes for best recordings
have been suggested.66 Interferences due to electrode skin interface and motion and breathing
artifacts require signal to noise enhancement techniques. Correlation67 and adaptive filtering68
methods have been suggested. Autoregressive analysis o f the signal69 has been used. Using
duodenal implanted electrodes, the automatic classification of the ingestion of three different
test meals was successfully demonstrated.70 Pattern recognition methods discussed in Chapter
3 were em ployed. Figure 11 shows an example of EGG power spectral density function.
III. IM PEDANCE
A. Bioimpedance
The biological tissue obeys Ohm’s law for current densities73 below about 1 mA/cm.2
The impedance o f the tissue changes with time due to various phenomena such as blood
volume change, blood distribution change, blood impedance change with velocity, and tissue
impedance changes duetto pressure, endocrine, or autonomic nervous system activity. Im
portant information on the resistance of various tissues74 has been collected through the
years.
Bioimpedance measurements75 are usually performed with four electrodes: two for current
injection and two for the impedance measurement. At low frequencies, electrode polarization
causes some measurement problems. The range of 50 kHz to 1 MHz is usually employed.
126 Biom ed ic al Signal Processing
Current densities must be kept low so as not to cause changes due to heating. The range of
currents used in practice is 20 p.A to 20 mA.
B* Impedance Plethysmography
The use o f impedance changes for the recording of peripheral volume pulses is known as
impedance plethysmography. The method has been applied75 to various locations of the body
such as the digits, limbs, head, thorax, and kidney. Since calibration of the impedance in
terms o f blood flow is difficult, the method has been mainly used for relative monitoring.
An experiment on a dog at 50 kHz showed that a 1% change in blood volume generates
a change t f about 0.16% in resistance, with almost linear relationship at a range of ± 30%
blood flow change.
C. Rheoeneephalography (REG)
The measurement of impedance changes between electrodes placed on the scalp is known
as rheoencephahgram (REG). The frequencies used are in the range of I to 500 KHz,
yielding a transcamiai impedance of about 100 fi. The pulsatile impedance change is on
the order of 0.1 C .
D. Impedance Pneumography
Electrodes placed on the surface of the chest are used to monitor respiration in the frequency
range o f 50 to 600 kHz; the change in transthoractic impedance, from full inspiration to
maximum expiration, is almost entirely resistive with the value of about 20 H. The changes
in impedance are related io the changes in lung air volumes. The method is used also as an
apnea monitor, to detect pauses in breathing.
F. Electroglottography
The measurement of the impedance across the neck in known as electroglottography. 12-m
Variations in glottis size, as the vocal cords vibrate, cause impedance changes. The method
can thus be used to measure the pitch frequency.
B. A uscultation
Ti.w uiuiiitoring o f sounds heard over the chest walls is known as auscultation. It has long
been used as one o f the means by which pulmonary dysfunctions were diagnosed.*5 During
respiration, gases flow through the various airways emitting acoustical energy. This energy
is transmitted through the airway walls, the lung tissue, and chest walls.
128 Biomedical Signal Processing
FIGURE 14. Abnormal heart sounds. (AMMidsystolic click; upper trace, ECG; lower trace, apevcardiogram
(ACG): (B) systolic ejection murmur. (Reproduced with permission from'Tavel, M. E .. Clinical Phonocardiography
ami External Pulse Recording. 3rd ed.. Copyright 0 1 9 7 8 by Year Book Medical Publishing, Chicago.)
Breath sounds are generated by the air entering the alveoli during inspiration (local or
vesicular noise) and while passing through the larynx (laryngial or glottic hiss). Four types
of normal breath sounds have been defined: vesicular breath sounds (VBS), bronchial breath
sounds (BBS), broncho-vesicular breath sounds (BVBS), and trachial breath sounds (TBS).
Each one of the above breath sounds is normally heard over certain areas of the thorax.
When heard over other than its normal place, it is considered abnormal. Figure 15 depicts
the characteristics of the four types of normal breath sounds.
There are several types of breath sounds which, when present, always indicate abnormality.
The abnormal breath sounds are known as the cogwheel breath sound (CO), the asmatic
breath sound (AS), the amphoric breath sound (AM), and the cavernous breath sound (CA).
Verbal descriptions of the characteristics of the various breath sounds are used.81 A parametric
description and an automatic classification method have been suggested.82 Another type of
abnormal sounds are the adventitious sounds. These are called musical rales or wheezes and
nonmusical rales.
Auscultation is usually performed with the stethoscope. To get the full frequency range
and an electrical signals that can be processed, microphones are used. The frequency range
required is 20 Hz to 2 kHz.
C. Voice
Speech is produced by expelling air from the lungs through the trachea to the vocal cords.83
When uttering voiced sounds, the vocal cords are forced to be opened by the air pressure.
The opening slit is known as the glottis. The pulse of air propogates through the vocal tract.
V o l u m e 11: C o m p r e s s i o n a n d Au tomatic Recognition 129
FIGURE 15. Typical time and frequency plots o f normal breath sounds. Left: energy en\elope:
middle: power spectral density, estimated by FFT (upper trace, midinspiration: lower trace, beginning
inspiration): right: power density, estimated by LPC (midinspiration). (From Cohen, A. and Lands-
berg. D .. IEEE Trans. Biol. Med. £/»«.. BME-31. 35. 1984 (© 1984. IEEE). With permission.)
The generated sound depends on the acoustical characteristics of the various tubes and
cavities of the vocal system. These are changing during the speech process by moving the
tongue, the lips, or the velum.
The frequency o f oscillation of the vocal cords during voiced speech is cailed the fun
damental frequency or pitch. This frequency is determined by the subglottal pressure and
by the characteristics of the cords, their elasticity, compliance, mass, length, and thickness.
When uttering unvoiced sounds, the vocal cords are kept open and do not take part in
the sound generation. Figure 16 depicts a record of speech signals including silent, unvoiced,
and voiced segments. The speech signal has been used as a diagnostic aid for laryngeal
pathology or disorder;80 84-85 among these are laryngitis, hyperplasia, cancer, paralysis, and
more. It has been also used as a diagnostic aid for some neurological diseases*6 and as an
indicator of emotional states.87 Infant’s cry has also been suggested as a diagnostic aid88
(e.g., see Figure 2. Chapter 3).
D. K o ro tk o ff S ounds
The most common method for indirect blood pressure measurement is by means of the
sphygmomanometer. An inflatable cuff placed above the arm is used to oclude blood flow
to the arm. The pressure exerted by the cuff causes the artery to collapse. When cuff pressure
is gradually released to the point where it is just below arterial pressure, blood starts to How
through > im p re s s e d art~ry. The turbulent blood flow generates sounds known as Ko
rotkoff sounds. These are picked up by a microphone (or a stethoscope) placed over the
artery. The sounds continue, while decreasing the pressure, until no constriction is exerted
130 Bi om ed ic al Signal Processing
FIGURE 16, A sample o f speech signal demonstrating silent, unvoiced, and voiced segments.
on the artery, Most of the sound’s power is in the frequency range of 150 to 500 Hz. Usually
piezoelectric microphones are used yielding amplitudes of about 100 mV (peak to peak).
V. M E C H A N IC A L S IG N A L S
,A . P ressu re Signals
Blood pressure measurements53 are taken from the critically ill patient by the insertion of
a pressure transducer somewhere in the circulatory system. Figure 13 and Figure 12 in
Chapter 4 give typical examples of the carotid blood pressure signal. Pattern recognition
methods have been applied to the analysis o f the pressure blood wave (Chapter 4). The
frequency bandwidth required is about DC to 50 Hz. Other biological pressure signals are
of-clinical importance. Figure 17, for example, depicts the intrauterine pressure of a woman
in labor.
B. A pexcardiography (ACG)
Tavel77 has suggested the term apexcardiography to include a variety of methods used for
recording the movements of the precordium. Among the various methods are vibrocardio
graphy, kinetocardiography, ballistocardiography, and impulse cardiography. The motion
is detected by various transducers, accelerometers, strain gauges, or displacement devices
(LVDT). The frequency bandwidth required is about DC to 40 Hz. An example of the ACG
is shown in Figure 14A.
C . P neum otachograph}
Pneumotachography is a method used to analyze flow rate for respirator functions eval
uation. The flow rate signal has a bandwidth of about DC to 40 Hz.
.FIGURE i 7. Recording during labor. Upper trac;: feial heart rate: middie; abdominal pressure; lower: intrauterine
pressure. (Courtesy of f)r Yarkoni. Soroka Medical Center).
Unions, the first curve must be estimated. The techniques for echo cancellation (Chapter 9,
Volume I) can be employed here. A similar technique, using the injection of fluid having
temperature different than that of the blood, is sometimes employed. It is known as thermal
dilution.
V I. B IO M A G N E T IC S IG N A L S
A. M agnetoencephaiography (M EG)
Various organs such as the heaiu lungs, and brain produce extreme weak magnetic fields.
The measurement of these magnetic fields is difficult. Magnetic measurement has been made
on nerve cells** and from the brain.90 MEG was reported to be different than the EEG and
to provide additional information.90 An example of the MEG signal is shown in Figure 18.
C. M ag n etopneum ography (M PG )
The monitoring of the magnetic Fields generated over the lungs was also suggested.42
V II. B IO C H E M IC A L S IG N A L S
Biochemical m easurem ents1 are usually performed in the clinical laboratory. Blood gas
and acid-base measurements are'roulinely performed to evaluate partial pressure of oxygen
(pO:), partial pressure of CO , (p C 0 2), and concentration of hydrogen ions (pH). These
measurements are usually done by means of electrodes. Other methods for the measurements
of organic and nonorganic chemical substances are used, such as chromatography. electro
phoresis. flame photometry, atomic emission, and absorption fluorometry, nuclear magnetic
132 Biom
edicalSignalProcessing
'*■+ MEG
resofltfrtce (NMR), and more. These methods most often provide DC signals. The problems
associated are mainly in the instrumentation and acquisition systems rather than in the
processing. Some processing problems do exist, for example, in methods like chromatog
raphy where sometimes close or overlapping peaks have to be identified.
Biochemical measurements are performed also in the clinic and in the research laboratory.
Specific ion microelectrodes have been developed which allow the recordings o f ion con
centration variations of neural cells. Figure 20 is an example o f such a signal.
Noninvasive, transcutaneous monitoring o f p 0 2 and p C 0 2 can be conveniently performed
by means of special electrodes. This measurement is used in the clinic. Noninvasive blood
oxygenation monitoring is done by optical means (oximetry). These signals are very low-
frequency signals and usually require no special processing.
VIII. T W O - D I M E N S I O N A L S I G N A L S
I7 la In
m r t>f1 U 4
~L J ^ \ ___ p — ^
W l
rft h h t
-H
^ ULfv -fi-s— ___ /'-jl
rft.
M M h H H - f
1u c
J . P oqe 2 0
— J 3 * lO^QOUM
1H M L _ U |.
^ ------1- 5 « 9 in
13s <t>«
FIGURE !9. Magnetocardiogram obtained across the chest with 12-lead ECG and Frank
x,y,z leads. (From Cohen, D. and McCaughan, D ., Am. J. Cardiol., 29, 678, 1972. With
permission.)
134 Biomedical Signal Processing
mM
REFERENCES
1. Webster, J. C»., F A . * M edical Instrumentation Application and Design, Houghton Mifflin, Boston, 1978.
2. Abeles, M. and Goldstein, M. H ., Multispike train analysis, Proc. IEEE, 65(5). 762, 1977.
5. Lermtan, j . A. R. and Ritchie, A. E ., Clinical Electromyography, Pitman Medical and Scientific, London.
1970.
4. Arm ington, J ., The Electroretinogram, Academic Press, New York, 1974.
5. G ouras, P .v Electroretinography: some basic principles, Invest. Ophthalmol., 9, 557. 1970.
6. Chatrian, G . E ., Computer assist ERG. L Standardized method. Am. J . EEG Techno!., 20(2), 57, 1980.
7. Larkin, R. M ., Klein, S ., O dgen, T. E ., and Fender, D. H ., Non-linear kernels o f the human ERG.
Biol. C y b e m ., 35, 145. 1979.
8. Krill, A. E ., The electroretinogram and electro-oculogram: clinical applications. Invest. Ophthalmol., 9.
600, 1970.
9. North, A. W ., Accuracy and precision o f electro-oculographic recordings, Invest. Ophthalmol., 4, 343,
1965.
10. Kris, C ., Vision: electro-oculography, in M edical Physics, Vol. 3, Glasser, O ., Ed., Year Book Medical
Publishing, Chicago, 1960.
11. Kiloh, L. G ., M cComas, A. J ., O sselton, J . W ., and Upton, A. R. M ., Clinical Electroencephalography,
4 th ed., Butter.vorths, London, 1981.
12. Basar, E ., EEC-Brain Dynamics, Elsevier/North-Holland, Amsterdam, 1980.
13. Cox, J. R ., Nolle, F. M ., and Arthur, R. M ., Digital analysis of the EEG, the blood pressure and the
ECG, Proc. IEEE, 60, 1137, 1972.
14. Barlow, J. S ., Computerized clinical EEG in perspective, IEEE Trans. Biol. M ed. E ng., 26, 277, 1979.
15. Gevins, A. S ., Pattern recognition o f human brain electrical potentials, IEEE Trans. Pattern Anal. Mach.
Intelligence, 2, 383, 1980.
16. Isaksson, A ., W ennberg, A ., and Zetterberg, L. H ., Computer analysis o f EEG signals with parametric
models, Proc. IEEE, 69, 451, 1981.
17. Strong, P ., Biophysical M easurements, Tektronix, Beaverton, Ore., 1970.
18. Childers, D. G ., Evoked responses: electrogenesis, models, methodology and wavefront reconstruction
and tracking analysis, Proc. IEEE, 65(5), 611, 1977.
19. M cGillem , D. C . and Aonon, J . I ., Measurements o f signal components in single visually evoked brain
potentials, IEEE Trans. Biol. Med. Eng., 24, 232, 1977.
20. Sayers, B. M cA ., Beagley, H . A ., and Riha, J ., Pattern analysis o f auditory evoked EEG potentials.
Audiology, 18, 1, i 979.
21. Jervis, B . W ., N ichols, M . J ., Johnson, T . ! ., Allen, E ., and Hudson, N. R ., A fundamental investigation
of the composition of auditory evoked potentials, IEEE Trans. Biol. Med. Eng., 30, 43, 1983.
Volume //. Compression and Automatic Recognition 135
22. Boston, J . R .t Spectra o f auditory brainstem responses and spontaneous jBEG, IEEE T ram R io L Med.
Eng.. 28, 334. 1981.
23. Sclabussi, R. J ., Kisch, H. A M Hinman, C. L ., K roin, J. S ., Knns. N. I’., and Niii»utwv> N . $ „
Complex battem evoked somatosensory j^.ponscs in the study o f muHiplc U v ^ o s i s ^ m . . //.EC, 65(5),
626, 1977J.
24. Berger, M . ! )., Analysis of sensory evoked potentials using normalized cross-corcelation ns, A,\'d.
Biol. Eng. Com put., 21. 149, 1983.
25. Carm en, A ., Consideration o f the cerebral t^pdnse to painful stimulation: stimulus transduction \e:su»
perceptual event, H u ll. N .Y . A c a d . M e d .. 55. 31.3, 1979.
26. Zetterberg, L. H ., Estimation o f parameter*, for a linear difference equation with application to EEC
analysis, M ath. B io ta .. 5. 227. 1969.
27. Praetorius, H . M ., Bodenstein, <1., and Creutzfeidt, O . D ., Adaptive segmentation of I K"; iccord'.. a
new approach to automatic HUG analyst*.. Ffo'troeiu'ephalogr. Clin. Nrurophw;-.-42. 84. 1977.
28. tlaim i-C uhen, R. and Cohen, A ., A micropAKessor controlled system for st»inu« tlion and .n.quivi^on of
evoked potentials. Comput. Biomed. II: > . ;ri press.
29. Basm ajian. Clifford,■'H*V M cl.eod, \ \ . , and Nunnaly, H ., Eds.. C om puters--in E le c tro m y o g ra p h y .
Buttetworths. London, 1975.
30. Stalberg. E . and Antoni, L .. Computer aided EMG analysis, in Computer Aided Electromyography,
Progress in Clinical Electromyography. Vol. i<) Desmedt. J. E.. Ed., S. Karger, Basel. 1983. 186.
31. Ix;Fcvcr. R. S. and DdLuca. C. ,i., A procedure for decomposing the myoelectric signal into its constituent
action potentials. I. Techniques theory and implementation. IEEE Trans. Biol. Eng., 29. J49,. 1982.
32. LeFever, R. S ., Xenakis, A. P ., and Dei.uca, C , J ., A procedure for decomposing the myoelectric signal
into its constituent action potentials, II. Execution and test for accuracy. IEEE Trans Biol. Med. Eng.,
29, 158. 1982.
33. Nandedkar, S. D. and Sanders, D. B.. Special purpose orthonormai basis functions — application to
motor unit action potentials. IEEE T>\m\ Biol. Med. Eng., 3 i, 374, 1984.
34. Berzuini. C ., M aranzana-Figini, M .. and Bernard iuelli, C ., Effective use of EMG parameters in the
assessment o f neuromuscular diseases. Int. J. Biol. M ed. Comput., 13. 481. 1982.
35. Kranz, H ., W illiam s, A. M ., Cassell, .f.. Caddy, L). J ., and Silberstein, R. B ., Factors determining
the frequency content of the EMG. J. Ann'. Physiol. Respir. Environ. Exerc. Physiol., 55(2). 392, 1983.
36. Lindstrom , I.. H . and M agnusson, R. I.. Interpretation o f myoelectric power spectra: a model and its
application. Proc. IEEE. 65. 653. 1977.
37. inbar, G. F. and Noujaim, A. E ., On surface EMG spectral characterization and its application-to diagnostic
classifications. IEEE Trans. Biol. Med. Eng.. 31, 597. 1984.
38. Journee, J. L ., van-M anen, J ., and van>der M eer, J . J ., Demodulation o f EMG’s o f pathological
trcmours. Development and testing of a demodulator for clinical use. M ed. Biol. Eng. C om put., 21. 172,
1983.
39. Stuien, F. G . and DeLuca, C. J ,, Muscle fatigue monitor: a non-invasive device for observing localized
muscular fatigue, IEEE Trans. Biol. Eng.. 29. 760. 1982.
40. G ross, D ., G rassino, A ., Ross, W . R. D ., and M acklem , P. T ., Electromyogram pattern o f diaphragmatic
fatigue. J. Appl. Physiol. Respir. Environ. Exerc. Physiol., 46(1), 1, 1979.
41. G raupe, D. and Cline, W. K ., Functional separation o f EMG signals via ARMA identification methods
for prosthesis control purposes, IEEE Trans. Syst. M an Cybern., 5, 252, 1975.
42. Doerschuk, P. C „ Gustafson, D. E ., and Willsky, A. S ., Upper extremity limb function, discrimination
using EMG signal analysis, IEEE Trans. Biol. Med. E ng., 30, 18. 1983.
43. Saridis, G . N. and Gootee, P. T ., EMG pattern analysis and classification for a prosthetic arm. IEEE
Trans. Biol. M ed. Eng,, 29, 403, 1982.
44. M artin, R . O ., Pilkington, T . C ., and Marrow, M . N ., Statistically constrained inverse electrocardiog
raphy, IEEE Trans. Biol. M ed. Eng.. 22. 487. 1975.
45. Yam ashita, Y ., Theoretical studies on the inverse problem in electrocardiography and the uniqueness of
the solution, IE E E Trans. Biol. M ed. Eng.. 29. 719, 1982.
46. Friedman, H . H ., Diagnostic Electrocardiography an d Vectorcardiography, McGraw-Hill, New York.
1977.
47. Riggs, T ., Isenstein, B ., and Thom as, C ., Spectral analysis of the normal ECG in children and adults.
J . E lectrocardiol. , 12(4). 377, 1979.
48. Ligtenberg, A. and K unt,t M ., A robust digital QRS detection algorithm for arrhythmia monitoring.
Comput. Biorned. Res., 16, 273, 1983.
49. Haywood. L. Y ., Saltzberg, S. A ., Murthy, V. K ., H uss, R ., H arvey, G. A ,, and Kalaba, R ., Clinical
use oi k-R interval prediction for ECG monitoring: time series analysis by autoregressive models, M ed.
Inst., 6, 111. 1972.
50. Ciocloda, G . H ., Digits! analysis o f the R-R intervals for identification o f cardiac arrhythmia. Int. J. Biol.
Med. Com put.. 14. 155, 1983.
136 Biomedical Signal Processing
51. Caceres, C. A. Mid Drelfus, L. S., Eds*, Clinical E le c tro c a rd io g ra p h y a n d Computers, Academic Press,
New Yoric, 1970.
52. Wolf, H. K. and MacFariane, P. W., Eds*, O p tim iz a tio n o f C o m p u te r E C G P ro c e s s in g , North-Holland,
Amsterdam, 1980.
53. Jain, U. , Rautaharju, P. M ., and Warren, J., Selection o f optimal features for classification of elec
trocardiograms, / . Electrocardiot., 14(3), 239, 1981.
54. Shridhai, M. and Stevens, M. F., Analysis of ECG data, for data compression, Int. J. Biol. Med.
Comput., 10, 113 1979
55. Ruttimann, U. E. and Pipberger, H. V. , Compression o f the ECG by prediction or interpolation and
entropy encoding. l E E E T r a n s . B b t . M e d . E n g ., 26, 163, 1979.
56. Womble, ML E Hailiday J. S., Mitter, S. KM Lancaster, M. C ., and Triebwasser, J. H., Data
compression for storing and transmitting ECG’s/VCG’s, P r o c . I E E E , 65(5), 702, 1977.
57. Jain, I)., Rautaharju, P. M ., and Horacek, B. M., The stability of decision theoretic electrocardiographic
classifiers based on the use of discretized features, Comput. Biomed. Res., 13, 132, 1980.
58. Santopietro, R. F., The origin and characteristics of the primary signal, noise and interference sources in
the high frequency ECG, Proc. IEEE, 65(5), 707, 1977.
59. Chein, I. C ., Tompkins, W. J., z .d ! V \ , ' . Computer methods for analysing the high frequency
ECG Med. B io l Eng. Comput., 18, 303, 1980.
60 Kim, Y. and Tompkins, W. J. , Forward and inverse nigh frequency ECG, Med. Biol. Eng. Comput.,
19 11 1981.
M . Wheeler, T. , Murriits, A., and Shelly, T. , Measurement of the fetal heart rate during pregnancy by a
new electrocardiography technique, Br. J. Obstet. G y n a e c o l 85, 12, 1978.
62. Bergvefd, P. and Meijer, W. J. H.» A new technique for the suppression o f the MECG, IEEE Biol. Med.
Eng., 28, 348, 1981.
63. Flowers, N. C., Hand, R. C ., Orander, P. C. , Miller, C. R. , and Walden, M . O ., Surface recording
of electrical activity from the region of the bundle of His, Am. J. Cardiol., 33, 384, 1974.
64. Peper, A. , Jonges, R., Losekoot, T. G ., and Grimbergen, C. , Separation o f His-Purkinge potentials
from coinciding atrium signals: removal of the P-wave from the electrocardiogram, Med. Biol. Eng. Comput. ,
20, 195, 1982.
65. van der Schee, E. J., Electrogastrography Signal Analytical Aspects and Interpretation, Doctoral thesis,
University of Rotterdam, The Netherlands, 1984.
66. Mirizzi, N. and Scafoglieri, U., Optimal direction of the EGG signal in man, M ed. Biol. Eng. Comput.,
2 1 , 3 8 5 ,1 9 8 3 .
67. Postaire, J. G ., van Houtte, N., and Devroede, G., A computer system for quantitative analysis of
gastrointestinal signals, Comput. Biol. M ed., 9, 295, 1979.
68. Kentie, M. A., van der Schee, E. J., Grashuis, J. L., and Smout, A. J. P. M ., Adaptive filtering of
canine EGG signals. II. Filter performance, Med. Biol. Eng. Comput., 19. 765, 1981.
69. Kwok, H. H. L., Autoregressive analysis applied to surface and serosal measurements of the human
stomach, IEEE Trans. Biol. M ed. Eng., 26, 405, 1979.
70. Reddy, S. N., Dumpala, S. R. , Sarna, S. K., and Northeott, P. G., Pattern recognition of canine
duodenal contractile activity, IEEE Trans. Biol. Med. Eng., 28, 696, 1981.
71. Vdow, M. R., Erwin, C. W., and Cipolat, A. L., Biofeedback control o f skin potential level, Biofeedback
Self Regul., 4(2), 133, 1979.
72. Askenfeld, A., A comparison of contact microphone and electroglottograph for the measurement of vocal
fundamental frequency, J. Speech Hearing Res., 23, 258, 1980.
73. Schwan, H. P., Alternating cuirent spectroscopy of biological substances, Proc. IRE, 47(11), 1941, 1959.
74. Geddes, L. A. and Baker, L. E., The specific resistance of biological material. A compendium of data
for the biomedical engineer and physiologist, Med. Biol. Eng. Comput., 5, 271, 1967.
75. Geddes, L. A. and Baker, L. E., Principles o f Applied Biomedical Instrumentation, John Wiley & Sons,
New York, 1968.
76. Lifshitz, K., Electrical impedance cephalography (rheoencephalography), in Biomedical Engineering Sys
tems, Clynes, M. and Milsum, J. H., Eds., McGraw-Hill, New York, 1970.
77. Tavel, M. E., Clinical Phonocardiography and External Pulse Recording, 3rd ed., Year Book Medical
Publishing, Chicago, 1967.
78. Iwata, A., Suzumura, N. , and Ikegaya, K., Pattern classification o f the phonocardiogram using linear
prediction analysis, Med. Biol. Eng. Comput., 15 , 407, 1977.
79. Joo, T. H., McClellan, J. H., Foaie, R. A., Myers, G. S., and Lees, R. S. , Pole-zero modeling and
classification of PCG, IEEE Trans. Biol. Med. Eng., 30, 110, 1983.
80. Childers, D. G., Laryngial pathology detection, CRC Crit. Rev. Bioeng., 2, 375, 1977.
81. Druger, G., The Chest: Its Signs and Sounds, Humetrics Corp., Los A.ngeles, 1973.
82. Cohen, A. and Landsberg, D., Analysis and automatic classification of breath sounds, IEEE Trans. Biol.
Med. Eng., 3 1 , 3 5 ,1 9 8 4 .
Volume II: Compression and Automatic Recognition 137
83. Schafer, R. W . and Market, J. D., Eds., S peech A n a ly s is . IEEE Press, New York, 1978.
84. Mezzalama, M ., Prinetto, P., and Morra, B., Experiments in automatic classification of laryngeal
pathology, M e d . B io l. E n g . C o m p u t., 21, 603, 1983.
85. Detier, J. R. and Anderson, D. J., Automatic classification of laryngeal dysfunction using the roots of
the digital inverse filler, I E E E T ra n s . B io l. M e d . E n g .. 27. 714, 1980.
86. Okada, M ., Measurement of speech patterns in neurological disease, M e d . B io l. E n g . C o m p u t., 21, 145,
1983.
87. Streeter, L. A., Macdonald, N. H., Apple, W., Krauss, R. M., and Galott, K. M., Acoustic and
perceptual indicators of emotional stress, J . A coust. S o c. A m ., 73(4), 1354, 1983.
88. Cohen, A. and Zmora, E., Automatic classification o f infants’ hunger and pain cry, in P r o c . In t. Conf.
D i g i t a l S ig n a l P ro c e s s ., Cappelini, V. and Constantinidcs. A. G., Eds,, Elsevier, Amsterdam, 1984.
89. Wlkswo, J. P., Barach, J. P., and Freeman, J. A., Magnetic field of a nerve impulse: first measurements,
S c ie n c e , 208, 53, 1980.
90. Cohen, D. and Cuffin, B. N., Demonstration of useful differences between magnetoencephaiogram and
electroencephalogram, Electroencephalogr. Clin. Neurophys., 56, 1983.
91. Cohen, D. and McCaughan, D., Magnetocardiograms and their variation over the chest in normal subjects.
Am. J. C ardiol., 29, 678, 1972.
92. Robinson, S. E., Magnetopneumography non-invasive imaging of magnetic particulate in the lung and
other organs. I E E E T ra n s . N u c l. S c i.. 28, 171, 1981.
93. Heinemann, U. and Gutnick, M. J., Relation between extracellular potassium concentration and neuronal
activities in cat thalamus (VPL) during projection o f cortical epileptiform discharge, Electroencephalogr.
Clin. N europhys., 47. 345, 1979.
V o l u m e II: C o m p r e s s i o n a n d Automatic Recognition 139
Appendix B
; D A T A A N D LAG W INDOW S
i
j
! I. INTRODUCTION
Any practical signal processing problem requires the use o f a window. Since we cannot
process an infinitely long record, we must multiply it with a window that zeroes the signal
outside the observation period. The topics of window design and window applications are
dealt with by most signal processing books1'8 and by many papers.9 20
A w indow ,1 w(t), is a real and even function of time, with Fourier transform, W(w) =
F{w(t)}, which is also real and even. We also require that a window be normalized:
w(t) = 0
Windows are used for a variety of applications in continuous and discrete signal processing,
e.g., in the design6 o f the nonrecursive digital filters, in the application o f FFT, and in
power spectral density (PSD) function estimation (Chapter 8, Volume I).
In the application of PSD estimation, a window is required to reduce the spectral leakage.
Several figures o f merits have been defined to evaluate and compare windows. To cancel
leakage completely we need a window that behaves as a delta function in the frequency
domain. Such a window is of course unrealizable. We can consider a practical window
(Figure 1) and require that the main half width10 (MHW) of the main lobe and that the side
lobe level10 (SLL) be as small as possible. Other criteria such as the equivalent noise
bandwidth9 (ENBW ), processing gain9 (PG), maximum energy concentration,! and minimum
amplitude m om ent1 have been used.
When considering PSD estimation, a window can be applied directly to the data (data
window o r taper window) or to the autocorrelation function. The latter is known as the lag
window or quadratic window. Note that the data window does not preserve the energy of
the signal. The lag window, however, does preserve the energy since r(o), the signal’s
energy, is multiplied by w(o) = 1 .
A. Introduction
In this section we shall list a number of windows with their appropriate parameters. Plots
of the windows in the time and frequency domain are also given. To demonstrate the relative
behavior o f the windows in PSD estimation application a simple experiment was conducted
by Harris.9 A signal was Synthesized, composed of two sinusoids, one with frequency of
1 0 .5 fs/N and amplitude 1 .0 0 and the other with frequency 16.0 fs/N and amplitude 0 .0 1
( 4 0 .0 dB below iu c arst), with N ueing the number of samples in the window. The PSD
Biom ed ic al Signal Processing
t.QO
III!
HI Hj h
II '!! ni l;:
-2S -20 -15 -1Q -5 ^0 15 20 25
FIGURE 5A. The Hamming window. Upper trace, the window in the time domain; middle
trace, the window in the frequency domain, linear scale; lower trace, the window in the
frequency domain, logarithmic scale. (From Harris, F. J., Trigonometric Transforms, A
Unique Introduction to the FFT, Tech. Pub!. DSP-005 (8-81). Spectral Dynamic Division,
Scientific Atlanta. San Diego, 1981. With permission.)
Volume //: Compression and Automatic Recognition
FFTBin
10.5
16.0
30 *3 50 60 70 80
FiGL'RE 5B. The Hamming window. FFT power spectral density function estimation of syn
thesized signal consisting o f two slnewaves with frequencies o f 10.5 and 16 fs/N and amplitudes
o f 1.00 and 0.01, respectively. Data window was used (fs, sampling frequency; N. number of
samples in the window). (From Harris, F. J., Trigonometric Transforms. A Unique introduction
to the FFT. Tech. Publ, DSP-005 (8-81), Spectral Dynamic Division. Scientific Atlanta. San
Diego. 1981 With permission.)
Biomedical Signal Processing
-20
-4 0 - -
-6 0 '■
FIGURE 6B. The Dolph-Chcb) ,!. . .. . .' \ 0 . FFT power spectral density function
estimation o f synthesized signal consisting of two sinewaves with frequencies of 10.5 and 16
N and amplitudes o f 1.00 and 0.01, respectively. Data window was used (is, sampling
frequency; N, number o f samples in the window). (From Harris, F. J.. Trigonometric Transforms.
A Unique Introduction to the FFI\ Tech. Pub!. DSP-005 (8-81), Spectral Dynamic Division.
Scientific Atlanta, San Diego, 1981. With permission.)
REFERENCES
Appendix C
COMPUTER PROGRAMS
i
i
i. INTRODUCTION
This appendix contains a number of computer programs and subroutines for biomedical
signal processing. The programs are written in FORTRAN IV language and are u s e d o r the
VAX 11/750 com nuter, under the VMS operating system. The input to the vanou-. programs
aie vectors containing the samples of the signals to be processed. These are read from data
flies generated by the A 'D converter. The files arc therefore unformatted, integer fil es. In
order to make the software as compatible as possible with other machines, inpu: and output
statements, file definitions, and structuring arc done with input-output subroutine (RF1LE.
WFILE. RF1LEM. W FlLbM ). The user can adapt the software -o another computer or us e
different data files by just replacing these subroutines.
The programs use several mathematical subroutines mainly for matrix operations. Ail of
these subroutines are given in this appendix except for the subroutine EIGEN. which com
putes the eigenvalues and corresponding eigenvectors of a matrix. The listing of this sub
routine was not included in the appendix due to its length. Subroutines for eigenvalues and
eigenvectors computations can be found in one of the well known software libraries such
as the IBM System/360 Scientific Subroutine Package (SSP). the International Mathematical
and Statistical Libraries (IM SL). or the CERN Library.
The programs presented here are taken from the Bio-Medical Signal Processing Package
(BMSPP.) o f the Center for Bio-Medical Engineering. Ben Gurion Universii>. The listing
of the complete package could not be presented here, due to space limitation'. The few
programs presented here were selected to allow the interested reader ro implement some of
the processing methods discussed in this book.
Biomedical Signal Processing
IL M AIN PROGRAMS
PROGRAM NUSAMP
(VAX p S VERSION)
j
T H I S PF3GRAM PR OV IDE S5 THREE TYPES OF NON UNIFORM
SAMPLING WITH APPLICATIONS TO BIOMEDICAL S I G N A L S ♦
DATA I S READ FROM UNFORMATTED INTEGER F I L E .
THE USER HAS A CHOICE OF ONE OUT OF THREE NON UNIFORM
SAMFLINB METHODS FOR DATA COMPRESSION:
REFERENCES!
READ INPUT F I L E
10 CONTINUE
C
C OPEN AVERAGING WTNDOW
C
JJ=0
REF = 0
DO 1 4 1 = 1 , IAW
14 REF=REF+IV EC(I)
REF=RE F/IAW
XF IRS T = REF ' I N I T I A L CONDITION TO BE SENT FOR RECON.
DO 11 1 = 1 , ( N O P - I A W ) , IAW
IAV ER(1)=0
AVER( 1 ) " 0
DO 1 2 11 ~ 1 » IAW
12 A V E R ( 1 ) =AVER( 1 ) + IVEC <11 + 1 - 1 >
AVER ( 1 > = A VER(1 ) / 1 AW
IF <ABS<AVER< 1 ) - REF) .LE.RMOD) GO TO 11
C
C SAMPLING POINT I S NEEDED
C
JJ=^JJ+1
REF=AVER( 1 )
K ( J J ) " I+ IA W
11 CONTINUE
C
C PREPARING OUTPUT FOR PLOTING-RECONSTRUCTION OF SIGNAL
C
JJ = 1
IREC <1 ) = X F I R S T
' KF' ( 1 ) = 5 1 2
DO 7 0 1 1 1 = 2 , NOP
K P( I I ) = 0
I F ( K ( J J ) « NE 1 1 1 ) GO TO 7 0 0
K P( I I ) = 5 1 2
IR E C (II)= IV E C (II)
JJ=JJ+1
GO TO 7 0 1 (
700 IR E C (II)= IR E C < II-l)
701 CONTINUE
GO TO 7 7 7
C
40 CONTINUE
C
156 Biomedical Signal Processing
C
C SAMPLING POItfFTS NEEDED
1J- l+.JJ
K • JJ >- If I A!/.'
?.P T+ IAW) ='51 2
XHOTP-XDOT
60 L vO 600 I I » I * < I 0 * I ’
600 If* EC-, 11 > >+XD0Tf
nvfc&a>*-AVER<2*
46 >:iCTP^XD0T
o uT P ur f j i t has 3 r e c o r d s ;
c u OKJGINAi SIGNAL
C ?«. LOCATIONS Of NON U N J . SAMPLES.
C 2L. RECONSTRUCTED SIGNAL
r
C
777 CONTINUE
C
c
C WRITE SkSULTS ON OUTPUT FI LE
c
TYPE 2 1 1
211 F Q R K A T U H t ' E N i m OUTPUT FILE NAME: ')
ACCEPT 1 1 9 * N C « M N A « E 1 < I ) *1 = 1 * 11 >
119 FORMAT <0* 1 1 A U
K0P2=N0P*2
CALL AS SI GN ':2.«*ftMElrll>
D EF INE F I L E 2 0 * W 9 P 2 * U f IVAR)
wR I T E ( 2 7 1 ) < I V t m i ' * 1 1 = 1 * NOP )
W R I T E < 2 ' 2 ) <KP <31I * 1 1 - 1 t NOP >
WRITE ( 2 ' 3 ) i IREES-II) * 1 1 - 1 » NOP)
CALL CL OSE( 2 )
C
c PRIN T PROGRAM'S S T A T I S T I C S
C
P R IN T 9 0 0
900 FORMAT</ 2 5 X ' RES&.TS OF NUSAMP PROGRAM'>
P R IN T 9 0 1
901 FORMA T< 25 X' * * * * * # * # * # * * * * * * * * * * * * * * * # * * ' / )
PR IN T 9 0 2 * < N A M O l ) r I = l * U >
902 F G R M A T d S X 'I N P S T FILE NAME! ' * 1 1 A 1 >
PR IN T 9 0 5 * IAW
905 FORMAT<2 5 X ' NO. 8F SAMPLES IN AVERAGING WIND0W= '14)
P R IN T 9 0 8 rR
908 FORMAT( 2 5 X ' THRESHOLD LEVEL= ' E 1 0 . 3 )
P R IN T 9 0 1
PR IN T 9 0 6 * ISM
906 FORMAT < 1 5 X ' NON SiilFORM SAMPLING OF ORDER ID
P R IN T 9 0 7 * < N A H E 2 U > * 1 = 1 * 1 1 )
907 F 0 R M A T ( /1 5 X 'N A a & OF OUTPUT FILE.* ' H A D
P R IN T 9 0 9 * ( J J - l i
909 FORMAT( 2 5 X ' NO• SF SAMPLES USED = ' 1 6 )
C
C C O M P R E S S ^ RATIO <CR> I S THE RATIO BETWEEN
C NO. OF SaSPLES ( 1 2 B I T S ) OF ORIGINAL SIGNAL
C AND NO. OF SAMPLES OF NON UNIFORMALY SAMPLED
PROGRAM SEGMNT
(VAX VMS VERSION)
n n n
TH IS PROGRAM PROVIDES ADPTIVE SEGMENTATION OF
on
A SAMPLED FUNCTION* SEGMENTATION I S PERFORMED BY
ESTIMATING AN AR FILTER FOR AN I N I T I A L REFERENCE
non
reference:
non
C
C READ INPUT F I L E
C
CALL P 17ILE ( NAME , ISMP , NTS , I AUX )
100 CONTINUE
C
c
c OPEN LPC SPAR OUTPUT FI LE
Volume It: Compression and Automatic Recognition 159
DO 7 0 8 I = 1 f 1NN1
708 CORRES<I ) - C 6 R R E S ( I >*ENGRES
CTH = 0 »9#.EKGRES
GO TO 7 0 6
C
C CALCULATE NEW RESIDUAL
C
705 CONTINUE
RESM=S«PR(IW>
DO 7 3 3 J = ,t » INN
733 RESW“ RESW-f SMF’R ( IW- J >$ LPC <J )
CuRRES <1 > ^CORRES < 1 > -R ES < 1 ) * R E S < 1 ) +RESW*RE3W
C
C FIND CORRELATIONS ITERATIVELY FOR ALL SL ID IN G
C WINDOWS EXCEPT THE F I R S T ONE
DO 7 0 7 J = 2 r I N N l
707 COF:RES < J > “ CGRRES < J ) -•RES < i >*R E S ( J >+RES < I U - J + 2 >&RESW
C
C SHIFT RESIDUALS VECTOR
C
DO 7 3 4 I = i , I W - l
7 34 R E S(I) = R E S (I tl>
RE S( IW) =RESW
706 CONTINUE
C
C CLIP CORRELATIONS TO REMOVE SHORT TRANSIENT
C ARTIFACTS
C
DO 7 0 9 I - l , INN1
709 I F ( C O R R E S ( I ) *GT. CTH ) CORRES( I >=CTH
C
C CALCULATIONS OF SEN
C
■SUM= 0 . 0
DO 7 1 0 I = 2 r I N N l
710 SUM=SUM+( CORRES( I ) ) * ( C O R R E S < I > )
SEM=<ERR/CORRES(1 ) - l > * * 2 + 2 * S U M / ( C O R R E S < 1 ) * C O R R E S ( 1> )
ISEM ( ICW >= INT (•< SEM *4 0 9 ♦ 6 >+ 0 « 5 )
C
C COMPARE SEM WITH THRESHOLD
C
IF <SEM. GT. SEMTH) GO TO 1 0 3 ! START A NEW SEGMENT
GO TO 1 0 4 ! STAY IN CURRENT SEGMENT
C SHIFT S L ID I N G WINDOW
711 CONTINUE ! END OF DATA VECTOR
C
C CLOSE LPC OUTPUT F I L E
C
CALL CLOSE ( 2 )
NLS-'NT S + N2PAD
N -l
Nf ‘ ~ O
I N~ N* 2
N P ~ N F‘+ i
I F ( N . L T . N L S ) GO TO i
T YPE 1 0 5 , NLS
105 FORMAT (X'LENGTH OF P* I'D ED1 DATA UECTQRfPOWER OF 2 ) : '14)
I r ( N L S . G T , 2 0 4 8 ) TYPE 1 0*
10? FORMAT(X'MAX. LENGTH QF S£TA VECTOR I S 2 0 4 8 ! ! ' )
C
C
DO 3 1 1 = 1 , NTS
COR<II )= I S A M P < I I )
ZERO PADDING
DO 5 I = N T S + 1 , NLS
C0 R ( I ) = 0 . 0
C
C FFT CALCULATIONS
C
CALL FTO 1 A ( NLS » 2 , COR ♦ CGRU
NL SH -N LS/2
DO 6 I - 1 f NLSH
6 COR( I ) =SQRT <COR( I ) *CCR ( I H G Q R I < I ) * C O R I < I ) )
C
C * * * * * * * * * NORMALIZATION OF ESTIMATED PSD FUNCTION * * * * * * * * * *
C
CALL XTERM<COR,NLSH,C*AX,E>
DO 7 1 = 1 , NLSH
7 ISAMF < I ) = I N T ( ( COR ( I > . CMAI)* 1 0 2 4 + 0 . 5 )
C
C * * * * * * * * * OUTPUT PROCEDURES « * * « * * * *
C
CALL WFILE <NAME 1 , I S A ^ P , NLSH, NORO)
PRIN T H O
110 FORMATC20X'RESULTS Qc PROGRAM P E R S P T ~ ' / 2 5 X
* 'E ST IM A TI ON OF PSD P* THE PER IODOGRAM' )
PR IN T 1 1 1 , ( N A M E ( I ) , 1 = 1 > 1I>
162 Biomedical Signal Processing
PROGRAM PERSPT
<VAX VMS VERSION)
reference:
1* COHENrA* f BIOMEDICAL SIGNAL PROCESSING
CRC P R E S S » CHAPTER 8
2, OT EN SrR .K * AND ENOC HSON »L.f DIGITAL
TIME SE R I E S ANAL YSIS W I L E Y r l 9 7 2
F T 0 1 A» XT ER M» RF ILE ,W FIL E
INTEGER ISAMP< 2 0 4 8 ) » I A U X ( 2 0 4 8 )
REAL SAMP( 4 0 9 6 ) , COR( 2 0 4 8 ) r C O R I <2 0 4 8 )
BYTE NAME< 1 1 ) rNAMEI (1*1 )
READ INPUT F I L E
CALL RF IL E( NA M E» IS AM P» NT Sr I A U X )
TYPE 1 0 2
FORMAT( l H f " G I V E NO. OF PADDING ZEROES: ')
ACCEPT * , NZPAD
PROGRAM WOSA
(VAX VMS VERSION)
references:
INPUT f il e :
OUTPUT f i l e :
FTOl ArXTERMfRFILErWFILE
DIMENSION I S M P ( 1 6 3 8 4 ) , I A U X ( 2 0 4 8 )
INTEGER PEROV » I S P A C E , NEWS I Z , NOOREG, N O S , NLS * NLSP , NOREC
BYTE NAME< 1 3 ) , A A ( 9 ) , N A M E O ( 1 3 )
REAL F R E ( 2 0 4 8 > , FIM <2 0 4 8 ) , S P C T ( 2 0 4 8 )
DO 1 9 9 1 = 1 , 1 6 3 8 4
ISM P (I)= 0
199 CONTINUE
C
C READ INPUT FI LE
C
C
CALL R F I L E ( N A M E , I S M P , N O S , I AUX)
TYPE 3
FORMAT < 1H* ? 7 GI VE LENGTH OF SEGMENT (POWER OF 2 ) FOR FFT J ')
ACCEPT*,NLS
I
CHECK I F POWER OF 2
N= 1
N P =0
164 Biomedical Signal Processing
40 N^-N*2
NP~NP+1
I F <N. L E * N L S ) GOTO 40
NLS=H/2
T YPE 4 *NLS
4 FORMAT ( i X f 'LEN G T H OF SEGMENT (CLO SEST -POWER OF 2 ) I S t '* 1 4 )
TYPE 5
5 FORMAT<1 H $ * 'G IV E PERCENT O V ER LA PPIN G BETW EEN SEGMENTS t ')
ACCEPT %>PEROV
TYPE 8
8 FORMAT<1H$ * 'G IV E WINDOW** REGTAN»=1* T R IA ,= 2 * HAMMING=3? { ')
A C C E P T * *IW
C
c
€ CA LCULA TING WITH PERCENTAGE OF OVERLAP (P E R O V ) AND SEGMENT
C LENGTH NLS THE NO* OF SEGMENT A V A IL A B L E *
C
NDSP=< NLS*PEROV> /iOO
ITEMP-N-LS
N O REO l
29 IF M T E M P . G T *NGS> GOTO 30
JT EH P= IT£ M P- NL SP+ NL S
NOREC-NORECil
GOTO 2 9
30 N0REC=N0REC-1 ! NOREC I S THE NO. GF OVERLAPPED SEGMENTS
C
C CALCULATE FFT OF EACH SEGMENT ANB AVERAGE
C
NLSH -N LS/2
IB-NLSP-NLS+2
DO 1 9 7 1 = 1 tNOREC
IB = IB *N L S -i~N L S F
DO 1 9 6 J ~ 1 • NLS
196 F R E ( J ) = I S jMP ( IB + J - 1 )
CALL F TO1 A <NLS r 2 *FR E* FIM 5
DO 1 9 5 J = i* N L S H
195 S P C T < J >=SPCT < J ) +S QRT <FRE<J ) * F R E < J ) ± F IM ( J ) * F I M <J >)
197 CONTINUE
DO 1 9 4 J = i *NLSH
194 SF*CT< J ) = S P C T <J ) /N O R E C
CALL XTERM<SPCT,NLSH»SMAX»SHIN>
DO 1 9 2 1 = 1 » NLSH
192 I S M P < I ) = I N T < < S P C T (I V /S M A X > * 1 0 2 4 -! - 0 .5 >
C
C WRITE OUTPUT F I L E
C
CALL WFILE( NAMEO* I SM P*NLSH * NOR)
191 FORMAT < E 1 2 * 5 )
CALL DATE(AA)
PRINT 6 0 2 , < AA <I ) * 1 = 1 * 9 )
602 FORMAT ( / 2 0 X » ' RESULTS OF 'WOSA* PROGRAM * DATE*. '*9A1>
C
PRINT 7 0 2
702 FORMAT<19X*'*********%*****%%%%%%%%%%%%%%%%%%%%%%%' >
C
C
C .
PRINT 6 0 4 * ( N A M E ( I ) * I = 1 * 1 1 >
604 FORMATC/lOX* ' * * * * * I N F ‘UT ORIGINAL DATA FILE** '*11A 1>
C
c
PRINT 6 0 7 * PEROV
607 FORMAT( / 2 X * ' PERCENTAGE OF OVERLAPPJ '*I3>
C
PRINT 6 0 8 * NLSP
608 F0RMAT</2X»/NUMBER OF POINTS OVERLAPPED BETWEEN SEGMENTS: '*14)
C
Volume II: Compression ami Automatic Recognition 165
PRINT 6 0 9 t NOREC
609 FORMAT ( / 2 X * ' NUMBER OF SEGMENTS WILL SE USED IN "WOSA*: '*14)
C-
Cf t * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
r
If *IW--2> 610?6 2 0 j630
c
610 PRINT 6 1 1
6M FQF*AT</5X»'TY PE OF UrNDOW: RECTANGULAR")
GO TO 6 4 0
C
625 P?:INT 6 2 1
621 f CRH A r ( / 5 X r 'TYPE OF WINDOW? TRIANGULAR')
GO TO 6 4 0
C
630 F5-IN7 631
63! FORMAT < / 5 X r 'TYPE OF WINDOW: HAMMING')
C
640 PRINT 6 4 1 » ( NAMEG =
64S FORMAT</10X» 'OUTPUT FILE NAME: ' » 1 1 A1 >
C
C
99? STOP
END
PROGRAM MEMSPT
(VAX VMS VERSION)
reference ;
1. COHEN >A ♦ r BIOMEDICAL SIGNAL PROCESSING
CRC PR ESS* CHAPTER 8
INPU T?
UNFORMATTED INTEGER DATA FI LE
NO♦ OF RECORDS AND SAMPLES DETERMIND
BY USER
l in k in g :
NACORfDLPCrXTERM»RFILE»WFILE
INTEGER I S A M P ( 2 0 4 8 ) t IAUX(2048)
REAL SAMP( 4 0 9 6 ) > COR< 4 1 ) r LPC( 4 1 ) f PA R ( 4 1 ) rRHO( 4 1 ) * AUX< < 1 )
BYTE NAM E(1 1 ) » NAM El( 1 1 )
DO 3 I ~ 1 * N T S
3 SA M P fI>=ISA H P (I>
887 TYPE 1 0 3
103 FORMAT <H$ GIVE ORDER OF AR MODEL: ')
ACCEPT *»NAR
IFCMAR » LE « 4 0 ) GO TO 8 8 8
TYPE 1 0 9
C CALCULATE RHOI
C ,
DO 3 6 0 I ~ 1 »NAR-1
RH O I=LPC(I)
DO 3 4 0 J - 1 * N A R - I
3 40 RHOI=RHO1 + ( LPC <J )> * ( L P C( J + I ) )
RH 0( I ) =RHOI
36.0 CONTINUE
RHO( NAR) ~ L P C ( NAR)
C
C CALCULATE THE DISCRETE SPECTRUM
t: '
PI2 = 8 .0 * A T A N < 1 . 0 )
IT2--2* I T
DO 4 0 0 K= 1 r I T
SI G M A =0 .
DO 3 8 0 1 = 1 y NAR
S I G M A =S IG M A + ( R H O ( I > ) *COS <P I 2 * I # K / I T 2 )
380 CONTINUE
SAMP( K) = E R R / ( RHO0 + 2 * SIGMA >
400 CONTINUE
C
C * * * * * * * * * NORMALIZATION OF ESTIMATED PSD FUNCTION * * * * * * * * * *
C
CALL XTERM(SAMP»ITtCMAXrCMIN)
ACMIN=ABS( CMIN)
■> IF<CMAX«LT «ACMIN) CMAX=ACMIN
DO 7 1 = 1 » I T
7 I S A M P ( I ) = I N T < ( S A M P ( I >/CMAX) # 1 0 2 4 + 0 ♦ 5 )
C
C * * * * * * * * * OUTPUT PROCEDURES * * * * * * * * *
C
CALL W F I L E ( N A M E l » I S A M P f I T f NORX)
PRINT 1 1 0
110 FORMAT<20X'RESULTS OF PROGRAM MEMSPT-/ / 2 5 X
Volume 11: Compression and Automatic Recognition 167
in put :
1, UNFORMATTED I NTEGER DATA F I L E HOLDING
SI GNAL SAMPLES ( PRI MARY I N P U T )
output:
lin k : LM S, U F IL E , R F IL t
references : .
type 5
FO RM AT </H *'6 IV E FILTERS PARAMETERSi MU? ORDER AND GAIM5 ')
ACCEPT * ? MU»ORBfGAIW
DO x 0 2 I ~ l f O R D - l
102 X< I>*I5R<0R&~-I>*GAIN'.
C IN IT I A T E WEIGHTING VECTOR S E P SI
c ■
DO 10 3 I~i yQRD
10 3 W<I>*0»
EF'SI“ ISX (O RB -1 > * G A I N
C IN I T I A T E OUTPUT VECTOR
C
DO 1 0 6 1 = 1 rORB
106 T S O ( I >= I S I < I )
o n c
10 4 CONTINUE
DO 1 0 5 J=ORD»NSAMI
C
t: UPDATE REFERENCE VECTOR
€
DO 1 0 7 I = O R D f 2 » - l
107 X (I)-X < I-1)
X( D = I S R ( J>*GA IN
. C' GET NEW DESIRED OUTPUT SAMPLE
C
D = I S I < J)*;GAIN
CALL LH S( X yORD rW» H U » E P S I »Y * D )
I S O ( J ) = I N T ( E P S I / G A I N + O ♦5> 5
105 CONTINUE
c
c WRITE 0UTPU1 FI L E
C
CALL WFI LE (NAMEO»ISOiNSAMI» NORO)
€
C PROGRAMS DETAILS
C
TYPE 2 0 1
201 FORMAT( / / 1 0 X ' RESULTS OF ADPCAN PROGRAM'/)
TYPE 2 0 0 » ( NAMEI< I ) » 1 = 1 » 1 1 )
200 FORMAT<5 X ' INPUT F I L E NAME♦ ' 1 1 A D
TYPE 2 0 2 » <N A M E R ( I ) r l = l » l l )
202 FORMAT</5X'REFERENCE FI LE NAME♦ ' 1 1 A D
TYPE 2 0 3 1 <NAMED < I ) » I = l r l D
203 FORMAT( / 5 X ' OUTPUT FI LE NAMEt ' 1 1 A D
STOP
END
Vtiktme II: Compression and Automatic Recognition 169
c
PROGRAM CONLIM
C fl»AX VMS VERSION)
C
c.
c
c
C TH IS PROGRAM DETECTS WAVLETS BY MEANS OF THE CONTOUR LIM ITING
C ME'HOD. THE PROGfcSM READS THE TEMPLATE 5IGNA1 ( I TEMP) FROM
C A - I LE .
C U P - E h AND LOWER S9N1QURS ARE DEFINED:
0
C UPPER CONieSR( I>-- I r E M F ( I > + ( ( E P S I >* I T E h P a >+ C P N >
C LOWER CONTJBttR <I ) = 1 TEMP( I > - ( < EPS I ) * 1 TE MF( I } PCON)
HAVE?
INPU T f il e s :
1. tSKFORMATTED INTEGER TEMPLATE F I L E WITH
m £ RECORD AND NOPT SAMPLES.
2. tfcFORMATTED INTEGER SIGNAL FILE WITH NREC
«CORI»S AND NOPS SAMPLES PER RECORD.
OUTPUT f i l e :
1 . FO RM AT TE D INTEGER F I L E WITH 3 RECORDS
f m NOPT SAMPLES PER RECORD. THE RECORDS:
1 . THE TEMPLATE
2 . UPPER CONTOUR
3 . LOWER CONTOUR
2 . tfisFDRMATTED INTEGER F I L E WITH NREC RECORDSy
NJFS SAMPLES PER RECORD. THE FI LE CONTAINS
Tift: LOCATIONS OF DETECTED WAVELETS (DETECTED
W«*£LET I S DENOTED BY A PULSE OF AMPLITUDE
OF 5 1 2 )
reference:
TYPE 1 0 0
FORMAT < H * ' ENTER MPUT TEMPLATE FI L E NAME: ')
ACCEPT 1 1 9 » N C U r « f t M E l ( I > * I = l r l l >
FOR rtAT < 0 r 1 1 A 1 )
TYPE 1 0 1
Biomedical Signal Processing
TYPE 1 0 2
FORMAT< H $ ' ENTER NAME OF SIGNAL F I L E i ' )
ACCEPT 1 1 9 * N C H 2- *( N A M £ 2 < I ) » I = i • 1 1 >
TYPE 1 0 3
FORMAT<H$'ENTER NO. OF RECORDS AND SAMPLES PER RECORD: ')
ACCEPT * r NREC»NOPS
N0PS2=N0PS*2
PREPARE OUTPUT F I L E S
TYPE 1 0 4
FORMAT <H$ 'ENTER CONTOURS OUTPUT F I L E NAME J ' )
ACCEPT 1 1 9 * N C H 3 » ( NAME3 < I ) * 1 = 1 * 1 1 )
TYPE 1 0 5
FORMAT<H$'ENTER SIGNAL OUTPUT F I L E NAME: ' )
ACCEPT 119 .» NCH4*< NAME4 < I > *1 = 1 » 1 1 )
CALL A S S I G N ? 2 »NAME2»1 1 )
TYPE 1 0 6
FORMAT(H$'ENTER CONSTANT AND RELATIVE CONTOUR PARAMETERS: ')
ACCEPT * , CO N. EP SI
NO PTH=INT(N0P T/2+0.5)
NOPTL=INT <NOPT* 0 . 9 + 0 . 5 )
NMODS=NOPS+NOPT-1 {NO. OF SAMPLES IN BUFFER
DO 9 K= 1» NREC
9 CONTINUE
C
C END DETECTION
C
CALL CLOS E< 2 )
CALL CLOSE < 4 >
CALL A S S I G N ( 3 * NAME3 * 1 1 )
DEFINE F I L E 3 ( 3 * N 0 P T 2 *U r I V A R )
W R I T E ( 3 / 1 ) ( I TEMP< I ) * 1 = 1 *NOPT)
DO 1 0 1 = 1* NOPT
10 I TEMP( I ) = I T E M P ( I ) * < 1 . + E P S I ) + C 0 N
C
C
PROGRAM COMPRS
o o o o n o o o r j o o n o o ’r j o o n n o n n n n o o o o n n o o o
in pu t :
1. A DATA FI L E HOLDING A MATRIX OF LI VECTORS
OF DIMENSION N* CORRESPONDING TO THE MEMBERS
OF THE FI R S T CLUSTER♦
n o o r
C
C OUTPUT?
C i » A DATA FILE HOLDING M VECTORS OF DIMENSION N»
C CORRESPONDING TO THE TRANSFORMATION MATRIX
C OF THE COMPRESSION* IN THE CASE OF FISHER
C rtLIHOu H = l .
G
C
C
e
C REFERENCE S *
C
C 1. COHEN r A , t BIOMEDICAL SIGNAL PROCESSING*
C CRC PRESS * CHAPTER 12
C
C 2. r un A «F’« * 0 * t A N D H A R T »P . * E . * P A T T E R N
C C L A S SI FI CA TI ON AND SCENE ANAL YSIS* WILEY
C I N T E R S CI EN CE * N.Y.* 1973
C
C 3* FUKUNAGAiK,? INTRODUCTION TO ST A T I ST I C A L
C PATTERN RECOGNITION* ACADEMIC P R E S S ?
C N .Y .* 1972
C
C 4, TOU*J . * T .* AND GONZALEZ *R . * C .* PATTERN
C RECOGNITION P R I N C I P L E S , ADDISON-WESLEY*
C READ.IN6?Ms. * 1 9 7 4
C
C
C
C LIN KING: EIuENrRFILEMrafFILEM*MEANrCOVAfADD*INVER*MUL*SYMINV
C
c
D I M E N S I O N X I < 4 0 * 5 0 0 ) ? X 2 ( 4 0 ? 5 G 0 ) Y X M < 4 O ) *XM1<4 0 > ? X M 2 c 4 0 >
D I M E N S I O N R2 ( 4 0 ,-4 0 > * R1 C 40 * 4 0 > , C 1 <40* 40) tC 2 ( 4 0 , 4 0 )
D I M E N S I O N C I N V <40* 4 0 ) » COR<4 0* 40> * C O V ( 4 0 * 4 0 )
D I M E N S I O N D E L T A < 4 0 >.-r DEM 4 0 » 4 0 ) * A< 4 0 » 4 0 ) f WR < 4 0 > . WI ■<4 0 >
INTEGER IAUXC80)
BYTE NA-iEl i 11> *NAHE2 0 1 1 )
T YPE 777f2
CAL L RFJL£M<NAME2*X2*40*500»L2*N>
.MEAN CALCULATIONS
CffeL H E A N ( X 2 f 4 0 « » 5 0 0 f N r L 2 » X h 2 >
C
C Ct & ^ IA N C E CALCUt..ATICN
c
CftU C oC'A ( X1 t 4 0 . 5 0 0 • J • XMt f C 1 )
CAil COVA ( X2 » 4 ’•>» 5 0 0 »*• ♦. 2 » X r t 2 * C 2 )
c
c
C COT ON COVAKIANCF
F*-0 .5
C AD M C l f C 2 * COV.^C • -30, N» N* 1 )
Dfc «’?-C I - l r N
Df? 0 J = 1 >N
480 CD-' i * J>-FA*COV< I > J
C
C. , , , . v , I f f i ZRSE OF COMMON CC . - IA N CE
C
C«__ I N V E R < C 0 V » 4 0 r 4 C . * » C I N V > ' CIHV t ' 'wk<9N COVARIANCE
N1‘ 1
C
C
C KISHER METHOI
C
IF£?*E . N E . 3 > GO TO 8 0 0
C
C PREPARE MEAN DIFFERENCE
C
CALL A D D ( X M l » X M 2 » D E L T A f 4 0 » l » N » N l » - l > ‘ DELTA I S
C THE DIFFERENCE IN CLUSTERS MEANS♦
C
C CALCULATE FISHER VECTOR <CINV*DELTA)
C
• CAi-_ HUL. <:CIfyV, 4 0 » 4 0 r : £ l 7 A r 4 0 f 1 TDEL r 4 0 » 1 * N » N f N l )
C
C NORMALIZE FISHER VECTOR
xm~ o.
DG 5 1 0 J - 1 * N
810 XX^~XXN+DEL( Jr 1 >*DEL •: J » 1 )
XX$~SQRT < XXN >
DO 8 1 1 J = 1 ? N
811 A < 1> = DEL( J * 1 ) /XXN ' F I R S T ROW OF A HOLDS NORM ♦ FI SH ER
G© TO 7 7 8
800 CONTINUE
C
C MINIMUM ENTROPY METHOD
C
I F ? M £ ♦N E ♦ 2 ) GO TO 8 0 1
CALL EIGEN < 4 0 > N * COV >UR» W I j A »IERRrWO)
6 0 TO 7 7 8
801 CO#T INUE
C
C K-L METHOD
C
IF ?flE . NE . 1 ) GOTO 7 7
C
C COHMON CORRELATION
C
DO 6 0 0 1 = 1 *N
600 X M «1 > = 0 . ! XM I S A NULL VECTOR DUMMY MEAN
CALL C 0 V A < X 1 * 4 0 » 5 0 0 » * . L 1 » X M » R 1 ) !R1 I S CLUSTER 1 CORRELATION
CALL C 0 V A ( X 2 * 4 0 » 5 0 0 r ' « . L 2 ? X M f R 2 ) !R2 I S CLUSTER 2 CORRELATION
CALL ADD ( R1 » R2 » C0 R » 4 =:« 40 * N f N r 1 ) ! C0 R I S THE COMMON CORRELATION
DO 4 5 1 1 = 1 >N
DO 4 5 1 J = 1 » N
COP: I » J ) = C O R < I ? J ) / 2 •
451 CONTINUE
CALL E I GEN < 4 0 r N » C0 R • =: * WI * A » IERR » W0 )
C
174 Biomedical Signal Processing
III. SUBROUTINES
SUBROUTINE L M S (X , N ,W ,M U , EF'S I , Y ,D )
T H I S S U B R O U T I N E C O M P U T E S TH E LP C?
THE PA R C O R COEF. AN D THE TOTAL S Q U A R E J E R R O R
OF A S E Q U E N C E t OUT OF THE A U T O C O R R E L A T I O N SE QU E N C E .
DESCRIPTION OF P A R A M E T E R S
P . . .......D I M E N S I O N O F * L P C * » • P A R * % *AUX *
COR ♦ *P 41 A U T O - C O R R . C O E F ♦ V E C T O R .
L P C . L F C CO FF . V E C T O R .
PA R« * v F £F C3F COEF VECTOR.
AUX WORi' a NG A R F A .
ER R. -. ................... . .NORMALIZED PR E D I C T I O N ERROR.
reference;
l in k ; none
RETURN
END
178 Biomedical Signal Processing
o o
THIS ROUTINE CALCULATES THE DISCRETE FOURIER TRANSFORM OF
o
THE SEQUENCE F<N>» N<0*1 * . . * rIT -1 j
fi
THE BATA I S TAKEN T O .B E PERIODIC NAMELY5 F<N+IT> = F < N > .|
r> n
FOR WHICH?
reference:
n
o
2 - - DIRECT TRANSFORM.
o
12 ‘ -K+l
I F ' K - I l > 1 1 »6 »6
INDEX OUTER LOOP ,
6 13=13+13
10^10-1
I 1 “ 1 1. / 2
: c ( I 1 ) 51 , 5 1 f 10
VNSCRAMBLE
Classification o f signals, 37— 86 Data windows, see also specific types, 139— 151
alternative hypothesis, 39 Decision rule, 41, 47, 49
applications, 38 Decision-theoretic approach, see also Classification
Bayes decision theory, 39—50 o f signals, 37— 86 j
feature selection, 75— 79 wavelet detection, 1
Fisher’s linear discriminant, 63— 66 Decision threshold, 41 j
Karhunen-Loeve expansions, 66— 75 Delta range, 115
k-nearest neighbor, 50—53 Depth recording, 115
linear discriminant functions, see also Linear dis Diastolic phase o f heart, 104
criminant functions, 53—63 Dicrotic notch. 104
null hypothesis, 39 Diriehlet window, 140, 142— 143
statistical, 39— 53 Discriminant approach, 87
time warping, 79— 84 Discriminant functions. 38. 41— 43, 45— 46, 111
Class .separability, see Separability linear, see Linear discriminant functions
Cluster-seeking algorithms, 38 Divergence, 76, 78
Cogwheel breath sound (CO). 128 Dolph-Chebyshev window, 145. 150— 151
Color blindness, 118 DP, see Dynamic programming topics
Compression, 37, 60, 62. 66— 69, 71. 124 Dve dilution, 130
Compression ratio, 63 Dye dilution curves. 14
Computer programs, 153— 188 Dynamic biomedical signals, characteristics of, see
Conditional density function, 23 also specific topics, 113— 137
Conditional intensity function, 23— 24, 34 Dynamic programming (DP) equation, 82— 83
Conditional probability, 23 Dynamic programming (DP) methods, 77— 79,
Conditional risk, 41—42 81— 84
Context-free grammar, 90— 92, 96, 98— 99
stochastic, 102— 103
Context-free languages, 95
E
Context-free push-down automata, 92, 95— 100
Context-sensitive grammar, 90 ECG, see Electro-cardiograms
Contour limiting EEG. see Electroencephalograms
QRS detection, 6 EGG, see Electrogastrography
wavelet detection, 5— 6 Eigenplanes, 70, 74
Convergence properties, 73 Eigenvalues, 61— 62. 66. 69. 72— 73
Coordinate reduction time encoding system Eigenvectors, 61, 66. 69— 73
(CORTES), 5 Ejection clicks, 127
Cornea, 114 EKG, see Electrocardiograms: Electrocardiography
Corneal-retinal potential. M4 Electric control activity (ECA). 125
Correlation, 125 Electrocardiograms (ECG), 37
Correlation analysis, point process, 20 adaptive wavelet detection o f QRS complexes,
Correlation coefficients, renewal process, 27 13— 16
Correlation function, spectral analysis, 24 analysis, 87— 89
CORTES, see Coordinate reduction time encoding finite transducer for, 100— 101
system high-frequency, 124
Cosine windows, 141, 143, 146— 147 point process, 19. 21
Counting canonical form, 22— 24 QRS complex, 1— 5
Counting process, 20, 22— 24 signal, 121— 124
Counts autocovariance, 26 finite state automata, 94— 95
Counts PSD function, 26 syntactic analysis of, 106— 110
Counts spectral analysis, 24— 26 Electrocardiography (ECG). see also Electrocardi
Cross correlator, 8 ograms, 1, 38. 121— 124
Cross covariance density, 34— 35 inverse problem. 123
Cross intensity function, 34 Electrocorticogram. 115
Cross spectral density function, 35 Electrodermal response (EDR), 125
Cube vectorcardiograms, 124 Electroencephalograms (EEG). 37, 114— 118
Cumulative distribution function, 23 alpha range, 115
Weibull distribution, 31 analysis, 37
aperiodic wavelets, 1, 3
beta range, 115
D delta range, 115
depth recording, 115
Data compression, see Compression k-nearest neighbor classification, 53
Volume II
J classifiers. 58— 60
err;v rate. 42, 52
litters, 5 Mr>;ns;rm squared errn’- method, 56— 57
Joint interval histograms, 24 Mo:..r-unit. 120
Mi-te. unit aun»n potermul (M l,AP. MUP). 119—
122'
K i "i'Cv'ss mode*. 19
M*;*: ^ ici* ms* 51 ■’
Karfeunen-Loeve Expansion »'KL£),<0 —7s> I2n • ua*n. 3 ' 3>
Karhunen-Loeve Transformation <KLf) t r -V/ .,’ikr1 -j.i.n an,».h ■- 14. ; !3
K-eomplexes, 1. !!6 , i !8— II9 N''ji,*sp-|>e tusn rccwro-‘:es, 1
Kinetoeardiography. 130 M;*!m*-KUe point uress. !*>. 3 3 —35
KLE. see Karhunen-Loeve Expansions M.'-miuin 1 2 /
KLT, see Karhunen-Loeve Transformation Muscles, 33
k- nearest neighbor (k-NN) elassitieation. i , 50— 53 over’-sppma wavelets detection. 14
Knock-out method, 7? M-.l-iSC'ir Idh's. 128
Kolmogorov-Smimov statistics. 28. 30 M; -J..1H epIL-jM. I P
Korotkoff >ounds, 129— 130 M;.<v;!ecUii_ acr.'.iue-. anaKsis of. 19
Mvopathics. 19. s2*>
L
N
Lag windows, see also specific types. 139— 15S
Laplace transformation, renewal process. 27 \ . i )»duc>!nr \a .u\. I P
Laryngeal disorders, 19, 87, 129 Nerve. lifter .damage, i 13
Laryngitis. 129 Vrvj ijher rescue! tK>i . ? P>
Least squares method, adaptive wavelet detection. 9 V ira« siunaN 3S
Lie detector. 125 V - rai s^skc i^tcr, <i!- spe~:ra! analysis. 25
Light sleep. 115 rscura! '■pike tidtn Sv-o dl\o Renewal processes,
Likelihood ratio, 41, 76 jg—?o
Linear discriminant functions. 53— 63. 12i i-na-ig 'Gamma, i.-f.U ition. 32
advantages, 53 ,'«■ n{ protes'es 2h
entropy criteria methods. 60— 63 res.'ener,iti\c t)pe 26
generalized, 55— 56 Neurogenic lesions. 120
geometrical meaning, 54 V jr logical diseases 129
minimum distance classifiers. 58-—60 Netiromustular diseases. 19, 120
minimum squared error method, 56— 57 Neurons
Linear prediction, 124 action poienuais. -.s
Linguistic approach, 87 i^eiijpp.hg u<.\c!et> detect ion. J4
Linguistics, 87 Neurophysiologv. 19. 33
Logarithmic survivor function, 23 Nor ’<»nt*>sicncoiis Poisson process. 30
LPC, newborn’s cry. 46, 50
Nonpa'rfinernc trend test. 2K
isonstationanties.2b
M Normal distribution. 27. 29
Normalized autoco\arunce. spectral analysis' 25
Machine recognition, 79 Notches, 124
Magnetocardiography (MCG), 131, 133 nth order interval. 20. 22
M agnetoencephal ography (M EG). 131— i 3 2 Null hypothesis. 27. 39
Magnetopneumography (MPG). 131
Mahalanobis distance, 43. 47. 59, 76, 79
Main half widui >. 139
o
Mann-Whitney statistic, 28
Markov. 32 Observation window, 8. 14
Volume U 193
context-free push-down automata. 92, 95— 100 Vector electrocardiography (VCG), 124
finite stale automata, 92— 95 Ventricular fibrillation, 123
parsing, 92. 100— 101 VEP, see Visual evoked potential
syntax-directed translation, 100 VER, see Visual evoked responses
Syntax, B7, 89 Vesicular breath sounds (VBS), 128
Syntax analysis, 92, 101— 104 Vestibulospinal potentials, 119
Syntax-directed translation, 100 Vibrocardiography, 130
Systolic phase of heart, 104 Visual acuity, 118
Visual evoked potential (VEP), 118
Visual evoked responses (VER), overlapping wave
T lets detection. 14
Visual fields deficits, 118
Tachycardia, 122 Vocal cord, 19. 126. 128
Taper window. 139 Vocal tract. 78. 129
TA wave, 121 Voice. 38. 128— 129
Template, 1, 3. 5, 8— 9, 58, 79 V-waves, 115, 118
correction, 12— 16
Template adaptation, wavelet detection. 10— 11
Template matching, I w
Terminal symbols, 90— 91
Tetrahedron vectorcardiograms. 124
Thermal dilution, 130— 131 Wald-Woliowitz run test, 30
Theta range. 115 Warping function, see Time warping function
Time interval lengths (TIL), 30 Waveform. 37
Time series analysis, 37, 119, 121 Wavelet detection. 1— 18, 119— 120. 124
Time warping. 1. 79— 84 adaptive, see also Adaptive wavelet detection.
Time warping algorithms. 80 9— 16
Time warping function. 80— 83 algorithms. 1— 5
Tracheal breath sounds (TBS), 128 alignment, 1. 5
Training set, 38, 44, 46. 51. 58. 89. 104. 107. 11 i amplitude zone time epoch coding. 5
Transmission, 37 baseline shifts. 3— 4
Tremors, 121 contour limiting. 5— 6
Trends, 20, 26 coordinate reduction time encoding system. 5
Triangle window . 140— 141, 144— i 45 decision-theoretic. I
T wave. 121 Fourier discriptors, 3
Two-class discrimination, 69 jitters. 5
Two-dimensional signals, 132 matched filtering. I. 6— 8
Two-state semi-Markov model (TSSM). 32— 33 multivariate point processes, 35
overlapping wavelets, see tlso Overlapping wave
lets detection. 14— 17
'u pattern recognition. 1
polygonal approximations, 3
Univariate conditional intensity. 34 probability density functions, 1
Univariate point process analysis, 19 QRS complex, 1— 5
Univariate point processes, 33— 35 random variables. 1
Unrestricted grammar, 90 structural features. 1— o
Unrestricted stochastic grammar. 101— 102 syntactic, 1. 3
Unsupervised learning, 38 template. 1. 3, 5. 8— 9
U wave, 121 template matching. 1
time warping. I
Wei ball distribution, 19, 31
V Wheezes. 128
Windows, see specific types
Vectorcardiograms, 124 Within-class scatter matrix, 64— 66, 75