(Ebook) The Digital Signal Processing Handbook: Video, Speech, and Audio Signal Processing by Vijay K. Madisetti ISBN 9781420046083, 142004608X 2025 PDF Download
(Ebook) The Digital Signal Processing Handbook: Video, Speech, and Audio Signal Processing by Vijay K. Madisetti ISBN 9781420046083, 142004608X 2025 PDF Download
https://ebooknice.com/product/the-digital-signal-processing-
handbook-video-speech-and-audio-signal-processing-1688718
★★★★★
4.8 out of 5.0 (56 reviews )
DOWNLOAD PDF
ebooknice.com
(Ebook) The Digital Signal Processing Handbook: Video,
Speech, and Audio Signal Processing by Vijay K. Madisetti
ISBN 9781420046083, 142004608X Pdf Download
EBOOK
Available Formats
https://ebooknice.com/product/the-digital-signal-processing-handbook-
wireless-networking-radar-sensor-array-processing-and-nonlinear-signal-
processing-1966962
https://ebooknice.com/product/the-digital-signal-processing-
fundamentals-2045672
https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374
https://ebooknice.com/product/digital-audio-signal-processing-1224852
(Ebook) Advances in Audio and Speech Signal Processing:
Technologies and Applications by Hector Perez Meana ISBN
9781599041322, 1599041324
https://ebooknice.com/product/advances-in-audio-and-speech-signal-
processing-technologies-and-applications-1854324
https://ebooknice.com/product/information-fusion-in-signal-and-image-
processing-digital-signal-and-image-processing-1406464
https://ebooknice.com/product/visual-perception-through-video-imagery-
digital-signal-and-image-processing-1708444
The
Digital Signal
Processing
Handbook
SECOND EDITION
EDITOR-IN-CHIEF
Vijay K. Madisetti
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to
publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials
or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material repro-
duced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any
form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming,
and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copy-
right.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400.
CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identifica-
tion and explanation without intent to infringe.
Video, speech, and audio signal processing and associated standards / Vijay K. Madisetti.
p. cm.
“Second edition of the DSP Handbook has been divided into three parts.”
Includes bibliographical references and index.
ISBN 978-1-4200-4608-3 (alk. paper)
1. Signal processing--Digital techniques--Standards. 2. Digital video--Standards. 3. Image
processing--Digital techniques--Standards. 4. Speech processing systems--Standards 5. Sound--Recording
and reproducing--Digital techniques--Standards I. Madisetti, V. (Vijay) II. Digital signal processing
handbook. III. Title.
TK5102.9.V493 2009
621.382’2--dc22 2009022594
v
vi Contents
Digital signal processing (DSP) is concerned with the theoretical and practical aspects of representing
information-bearing signals in a digital form and with using computers, special-purpose hardware and
software, or similar platforms to extract information, process it, or transform it in useful ways. Areas
where DSP has made a significant impact include telecommunications, wireless and mobile communi-
cations, multimedia applications, user interfaces, medical technology, digital entertainment, radar and
sonar, seismic signal processing, and remote sensing, to name just a few.
Given the widespread use of DSP, a need developed for an authoritative reference, written by the top
experts in the world, that would provide information on both theoretical and practical aspects in a
manner that was suitable for a broad audience—ranging from professionals in electrical engineering,
computer science, and related engineering and scientific professions to managers involved in technical
marketing, and to graduate students and scholars in the field. Given the abundance of basic and
introductory texts on DSP, it was important to focus on topics that were useful to engineers and scholars
without overemphasizing those topics that were already widely accessible. In short, the DSP handbook
was created to be relevant to the needs of the engineering community.
A task of this magnitude could only be possible through the cooperation of some of the foremost DSP
researchers and practitioners. That collaboration, over 10 years ago, produced the first edition of the
successful DSP handbook that contained a comprehensive range of DSP topics presented with a clarity of
vision and a depth of coverage to inform, educate, and guide the reader. Indeed, many of the chapters,
written by leaders in their field, have guided readers through a unique vision and perception garnered by
the authors through years of experience.
The second edition of the DSP handbook consists of Digital Signal Processing Fundamentals; Video,
Speech, and Audio Signal Processing and Associated Standards; and Wireless, Networking, Radar, Sensor
Array Processing, and Nonlinear Signal Processing to ensure that each part is dealt with in adequate detail
and that each part is then able to develop its own individual identity and role in terms of its educational
mission and audience. I expect each part to be frequently updated with chapters that reflect the changes
and new developments in the technology and in the field. The distribution model for the DSP handbook
also reflects the increasing need by professionals to access content in electronic form anywhere and
at anytime.
Video, Speech, and Audio Signal Processing and Associated Standards, as the name implies, provides a
comprehensive coverage of the basic foundations of speech, audio, image, and video processing and
associated applications to broadcast, storage, search and retrieval, and communications.
This book needs to be continuously updated to include newer aspects of these technologies, and I look
forward to suggestions on how this handbook can be improved to serve you better.
vii
viii Preface
MATLAB1 is a registered trademark of The MathWorks, Inc. For product information, please
contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098 USA
Tel: 508 647 7000
Fax: 508-647-7001
E-mail: [email protected]
Web: www.mathworks.com
Editor
ix
Contributors
xi
xii Contributors
References
2 MPEG Digital Audio Coding Standards Schuyler R. Quackenbush
and Peter Noll .................................................................................................................................. 2-1
Introduction . Key Technologies in Audio Coding . MPEG-1=Audio Coding .
MPEG-2=Audio Multichannel Coding . MPEG-4=Audio Coding . MPEG-D=Audio
Coding . Applications . Conclusions . References
3 Dolby Digital Audio Coding Standards Robert L. Andersen
and Grant A. Davidson ................................................................................................................. 3-1
Introduction . AC-3 Audio Coding . Enhanced AC-3 Audio Coding . Conclusions .
References
4 The Perceptual Audio Coder Deepen Sinha, James D. Johnston,
Sean Dorward, and Schuyler R. Quackenbush ......................................................................... 4-1
Introduction . Applications and Test Results . Perceptual Coding . Multichannel PAC .
A
S I PREDICTED IN THE SECTION INTRODUCTION FOR THE 1997 version of this book,
digital audio communications has become nearly as prevalent as digital speech communications.
In particular, new technologies for audio storage and transmission have made available music
and wideband signals in a flexible variety of standard formats.
I-1
I-2 Video, Speech, and Audio Signal Processing and Associated Standards
The fundamental underpinning for these technologies is audio compression based on perceptually
tuned shaping of the quantization noise. Chapter 1 in this part describes aspects of psychoacoustics that
have led to the general foundations of ‘‘perceptual audio coding.’’ Succeeding chapters in this part cover
established examples of ‘‘perceptual audio coders.’’ These include MPEG standards, and coders developed
by Dolby, Sony, and Bell Laboratories.
The dimensions of coder performance are quality, bit rate, delay, and complexity. The quality vs. bit
rate trade-offs are particularly important.
Audio Quality
The three parameters of digital audio quality are ‘‘signal bandwidth,’’ ‘‘fidelity,’’ and ‘‘spatial realism.’’
Compact-disc (CD) signals have a bandwidth of 20–20,000 Hz, while traditional telephone speech has
a bandwidth of 200–3400 Hz. Intermediate bandwidths characterize various grades of wideband speech
and audio, including roughly defined ranges of quality referred to as AM radio and FM radio quality
(bandwidths on the order of 7–10 and 12–15 kHz, respectively).
In the context of digital coding, fidelity refers to the level of perceptibility of quantization or to
reconstruction noise. The highest level of fidelity is one where the noise is imperceptible in formal
listening tests. Lower levels of fidelity are acceptable in some applications if they are not annoying,
although in general it is good practice to sacrifice some bandwidth in the interest of greater fidelity, for a
given bit rate in coding. Five-point scales of signal fidelity are common both in speech and audio coding.
Spatial realism is generally provided by increasing the number of coded (and reproduced) spatial
channels. Common formats are 1-channel (mono), 2-channel (stereo), 5-channel (3 front, 2 rear),
5.1-channel (5-channel plus subwoofer), and 8-channel (6 front, 2 rear). For given constraints on
bandwidth and fidelity, the required bit rate in coding increases as a function of the number of channels;
but the increase is slower than linear, because of the presence of interchannel redundancy. The notion of
perceptual coding originally developed for exploiting the perceptual irrelevancies of a single-channel
audio signal extends also to the methods used in exploiting interchannel redundancy.
Bit Rate
The CD-stereo signal has a digital representation rate of 1406 kilobits per second (kbps). Current
technology for perceptual audio coding, notably MP3 audio reproduces CD-stereo with near-perfect
fidelity at bit rates as low as 128 kbps, depending on the input signal. CD-like reproduction is possible at
bit rates as low as 64 kbps for stereo. Single-channel reproduction of FM-radio-like music is possible
at 32 kbps. The single-channel reproduction of AM-radio-like music and wideband speech is possible at
rates approaching 16 kbps for all but the most demanding signals. Techniques for so-called pseudo-stereo
can provide additional enhancement of digital single-channel audio.
In this chapter, we review properties of auditory perception that are relevant to the design of coders for
acoustic signals. The chapter begins with a general definition of a perceptual coder, then considers what
the ‘‘ideal’’ psychophysical model would consist of, and what use a coder could be expected to make of
this model. We then present some basic definitions and concepts. The chapter continues with a review
of relevant psychophysical data, including results on threshold, just-noticeable differences (JNDs),
masking, and loudness. Finally, we attempt to summarize the present state of the art, the capabilities
and limitations of present-day perceptual coders for audio and speech, and what areas most need work.
1.1 Introduction
A coded signal differs in some respect from the original signal. One task in designing a coder is to
minimize some measure of this difference under the constraints imposed by bit rate, complexity, or cost.
What is the appropriate measure of difference? The most straightforward approach is to minimize some
physical measure of the difference between original and coded signal. The designer might attempt to
minimize RMS difference between the original and coded waveform, or perhaps the difference between
original and coded power spectra on a frame-by-frame basis. However, if the purpose of the coder is
to encode acoustic signals that are eventually to be listened to* by people, these physical measures do
not directly address the appropriate issue. For signals that are to be listened to by people, the ‘‘best’’ coder
is the one that sounds the best. There is a very clear distinction between ‘‘physical’’ and ‘‘perceptual’’
measures of a signal (frequency vs. pitch, intensity vs. loudness, for example). A perceptual coder can be
defined as a coder that minimizes some measure of the difference between original and coded signal so
as to minimize the perceptual impact of the coding noise. We can define the best coder given a particular
set of constraints as the one in which the coding noise is least objectionable.
* Perceptual coding is not limited to speech and audio. It can be applied also to image and video [16]. In this chapter we
consider only coders for acoustic signals.
1-1
1-2 Video, Speech, and Audio Signal Processing and Associated Standards
It follows that the designer of a perceptual coder needs some way to determine the perceptual quality
of a coded signal. ‘‘Perceptual quality’’ is a poorly defined concept, and it will be seen that in some
sense it cannot be uniquely defined. We can, however, attempt to provide a partial answer to the question
of how it can be determined. We can present something of what is known about human auditory
perception from psychophysical listening experiments and show how these phenomena relate to the
design of a coder.
One requirement for successful design of a perceptual coder is a satisfactory model for the signal-
dependent sensitivity of the auditory system. Present-day models are incomplete, but we can attempt to
specify what the properties of a complete model would be. One possible specification is that, for any
given waveform (the signal), it accurately predicts the loudness, as a function of pitch and of time, of
any added waveform (the noise). If we had such a complete model, then we would in principle be able to
build a transparent coder, defined as one in which the coded signal is indistinguishable from the original
signal, or at least we would be able to determine whether or not a given coder was transparent. It is
relatively simple to design a psychophysical listening experiment to determine whether the coding noise
is audible, or equivalently, whether the subject can distinguish between original and coded signal.
Any subject with normal hearing could be expected to give similar results to this experiment. While
present-day models are far from complete, we can at least describe the properties of a complete model.
There is a second requirement that is more difficult to satisfy. This is the need to be able to determine
which of two coded samples, each of which has audible coding noise, is preferable. While a satisfactory
model for the signal-dependent sensitivity of the auditory system is in principle sufficient for the design
of a transparent coder, the question of how to build the best nontransparent coder does not have a unique
answer. Often, design constraints preclude building a transparent coder. Even the best coder built under
these constraints will result in audible coding noise, and it is under some conditions impossible to specify
uniquely how best to distribute this noise. One listener may prefer the more intelligible version, while
another may prefer the more natural sounding version. The preferences of even a single listener might
very well depend on the application. In the absence of any better criterion, we can attempt to minimize
the loudness of the coding noise, but it must be understood that this is an incomplete solution.
Our purpose in this chapter is to present something of what is known about human auditory
perception in a form that may be useful to the designer of a perceptual coder. We do not attempt to
answer the question of how this knowledge is to be utilized, how to build a coder. Present-day perceptual
coders for the most part utilize a ‘‘feedforward’’ paradigm: analysis of the signal to be coded produces
specifications for allowable coding noise. Perhaps a more general method is a ‘‘feedback’’ paradigm, in
which the perceptual model somehow makes possible a decision as to which of two coded signals is
‘‘better.’’ This decision process can then be iterated to arrive at some optimum solution. It will be seen
that for proper exploitation of some aspects of auditory perception the feedforward paradigm may be
inadequate and the potentially more time-consuming feedback paradigm may be required. How this is to
be done is part of the challenge facing the designer.
1.2 Definitions
In this section, we define some fundamental terms and concepts and clarify the distinction between
physical and perceptual measures.
1.2.1 Loudness
When we increase the intensity of a stimulus its loudness increases, but that does not mean that intensity
and loudness are the same thing. ‘‘Intensity’’ is a physical measure. We can measure the intensity of a
signal with an appropriate measuring instrument, and if the measuring instrument is standardized
and calibrated correctly, anyone else anywhere in the world can measure the same signal and get the
same result. ‘‘Loudness’’ is ‘‘perceptual magnitude.’’ It can be defined as ‘‘that attribute of auditory
Auditory Psychophysics for Coding Applications 1-3
sensation in terms of which sounds can be ordered on a scale extending from quiet to loud’’ [23, p. 47].
We cannot measure it directly. All we can do is ask questions of a subject and from the responses attempt
to infer something about loudness. Furthermore, we have no guarantee that a particular stimulus will be
as loud for one subject as for another. The best we can do is assume that, for a particular stimulus,
loudness judgments for one group of normal-hearing people will be similar to loudness judgments for
another group.
There are two commonly used measures of loudness. One is ‘‘loudness level’’ (unit phon) and the other
is ‘‘loudness’’ (unit sone). These two measures differ in what they describe and how they are obtained.
The phon is defined as the intensity, in dB sound pressure level (SPL), of an equally loud 1 kHz tone. The
sone is defined in terms of subjectively measured loudness ratios. A stimulus half as loud as a one-sone
stimulus has a loudness of 0.5 sones, a stimulus 10 times as loud has a loudness of 10 sones, etc. A 1 kHz
tone at 40 dB SPL is arbitrarily defined to have a loudness of one sone.
The argument can be made that loudness matching, the procedure used to obtain the phon scale, is a
less subjective procedure than loudness scaling, the procedure used to obtain the sone scale. This
argument would lead to the conclusion that the phon is the more objective of the two measures and
that the sone is more subject to individual variability. This argument breaks down on two counts: first, for
dissimilar stimuli even the supposedly straightforward loudness-matching task is subject to large and
poorly understood order and bias effects that can only be described as subjective. While loudness
matching of two equal-frequency tone bursts generally gives stable and repeatable results, the task
becomes more difficult when the frequencies of the two tone bursts differ. Loudness matching between
two dissimilar stimuli, as for example between a pure tone and a multicomponent complex signal, is even
more difficult and yields less stable results. Loudness-matching experiments have to be designed
carefully, and results from these experiments have to be interpreted with caution. Second, it is possible
to measure loudness in sones, at least approximately, by means of a loudness-matching procedure.
Fletcher [6] states that under some conditions loudness adds. Binaural presentation of a stimulus results
in loudness doubling; and two equally loud stimuli, far enough apart in frequency that they do not mask
each other, are twice as loud as one. If loudness additivity holds, then it follows that the sone scale can be
generated by matching loudness of a test stimulus to binaural stimuli or to pairs of tones. This approach
must be treated with caution. As Fletcher states, ‘‘However, this method [scaling] is related more directly
to the scale we are seeking (the sone scale) than the two preceding ones (binaural or monaural loudness
additivity)’’ [6, p. 278]. The loudness additivity approach relies on the assumption that loudness
summation is perfect, and there is some more recent evidence [28,33] that loudness summation, at
least for binaural vs. monaural presentation, is not perfect.
1.2.2 Pitch
The American Standards Association defines pitch as ‘‘that attribute of auditory sensation in which
sounds may be ordered on a musical scale.’’ Pitch bears much the same relationship to frequency
as loudness does to intensity: frequency is an objective physical measure, while pitch is a subjective
perceptual measure. Just as there is not a one-to-one relationship between intensity and loudness,
so also there is not a one-to-one relationship between frequency and pitch. Under some conditions, for
example, loudness can be shown to decrease with decreasing frequency with intensity held constant, and
pitch can be shown to decrease with increasing intensity with frequency held constant [40, p. 409].
By the simplest definition, the threshold of hearing (equivalently, auditory threshold) is the lowest
intensity that the listener can hear. This definition is inadequate because we cannot directly measure
the listener’s perception. A first-order correction, therefore, is that the threshold of hearing is the lowest
intensity that elicits from the listener the response that the sound is audible. Given this definition, we can
present a stimulus to the listener and ask whether he or she can hear it. If we do this, we soon find that
identical stimuli do not always elicit identical responses. In general, the probability of a positive response
increases with increasing stimulus intensity and can be described by a ‘‘psychometric function’’ such as
that shown for a hypothetical experiment in Figure 1.1. Here the stimulus intensity (in dB) appears on
the abscissa and the probability P(C) of a positive response appears on the ordinate. The yes–no
experiment could be described by a psychometric function that ranges from zero to one, and threshold
could be defined as the stimulus intensity that elicits a positive response in 50% of the trials.
A difficulty with the simple yes–no experiment is that we have no control over the subject’s ‘‘criterion
level.’’ The subject may be using a strict criterion (‘‘yes’’ only if the signal is definitely present) or a lax
criterion (‘‘yes’’ if the signal might be present). The subject can respond correctly either by a positive
response in the presence of a stimulus (‘‘hit’’) or by a negative response in the absence of a stimulus
(‘‘correct rejection’’). Similarly, the subject can respond incorrectly either by a negative response in the
presence of a stimulus (‘‘miss’’) or by a positive response in the absence of a stimulus (‘‘false alarm’’).
Unless the experimenter is willing to use an elaborate and time-consuming procedure that involves
assigning rewards to correct responses and penalties to incorrect responses, the criterion level is
uncontrolled.
The field of psychophysics that deals with this complication is called ‘‘detection theory.’’ The field of
psychophysical detection theory is highly developed [12] and a complete description is far beyond the
scope of this chapter. Very briefly, the subject’s response is considered to be based on an internal
‘‘decision variable,’’ a random variable drawn from a distribution with mean and standard deviation
that depend on the stimulus. If we assume that the decision variable is normally distributed with a
fixed standard deviation s and a mean that depends only on stimulus intensity, then we can define
an ‘‘index of sensitivity’’ d0 for a given stimulus intensity as the difference between m0 (the mean
in the absence of the stimulus) and ms (the mean in the presence of the stimulus), divided by s.
1.0
Probability of positive response
0.5
0.0
–4 –2 0 2 4
Stimulus intensity (dB re-“Threshold”)
FIGURE 1.1 Idealized psychometric functions for hypothetical yes–no experiment (0–1) and for hypothetical 2FIC
experiment (0.5–1).
Auditory Psychophysics for Coding Applications 1-5
An ‘‘ideal observer’’ (a hypothetical subject who does the best possible job for the task at hand) gives a
positive response if and only if the decision variable exceeds an internal criterion level. An increase in
criterion level decreases the probability of a false alarm and increases the probability of a miss.
A simple and satisfactory way to deal with the problem of uncontrolled criterion level is to use a
‘‘criterion-free’’ experimental paradigm. The simplest is perhaps the two-interval forced choice (2IFC)
paradigm, in which the stimulus is presented at random in one of two observation intervals. The subject’s
task is to determine which of the two intervals contained the stimulus. The ideal observer selects the
interval that elicits the larger decision variable, and criterion level is no longer a factor. Now, the subject
has a 50% chance of choosing the correct interval even in the absence of any stimulus, so the
psychometric function goes from 0.5 to 1.0 as shown in Figure 1.1. A reasonable definition of threshold
is P(C) ¼ 0.75, halfway between the chance level of 0.5 and 1. If the decision variable is normally
distributed with a fixed standard deviation, it can be shown that this definition of threshold corresponds
to a d0 of 0.95.
The number of intervals can be increased beyond two. In this case, the ideal observer responds
correctly if the decision variable for the interval containing the stimulus is larger than the largest of
the N 1 decision variables for the intervals not containing the stimulus. A common practice is, for an
N-interval forced choice paradigm (NIFC), to define threshold as the point halfway between the chance
level of 1=N and one. This is a perfectly acceptable practice so long as it is recognized that the measured
threshold is influenced by the number of alternatives. For a 3IFC paradigm this definition of threshold
corresponds to a d0 of 1.12 and for a 4IFC paradigm it corresponds to a d0 of 1.24.
There are more general methods that do not assume a knowledge of the relationship between the
physical attribute being measured and a perceptual attribute. The most useful, perhaps, is the NIFC
method: N stimuli are presented, one of which differs from the other N 1 along the dimension being
measured. The subject’s task is to specify which one of the N stimuli is different from the other N 1.
Note that there is a close parallel between the differential threshold and the auditory threshold
described in Section 1.2.4. The auditory threshold can be regarded as a special case of the JND for
intensity, where the question is by how much the intensity has to differ from zero in order to be
detectable.
defines the empirical critical bandwidth as ‘‘that bandwidth at which subjective responses rather
abruptly change.’’ Simply put, for some psychophysical tasks the auditory system behaves as if it consisted
of a bank of band-pass filters (the critical bands) followed by energy detectors. Examples of critical-band
behavior that are particularly relevant for the designer of a coder include the relationship between
bandwidth and loudness (Figure 1.5) and the relationship between bandwidth and masking (Figure 1.10).
Another example of critical-band behavior is phase sensitivity: in experiments measuring the detectability
of amplitude and of frequency modulation, the auditory system appears to be sensitive to the relative
phase of the components of a complex sound only so long as the components are within a critical band
[9,45].
The concept of the critical band was introduced more than a half-century ago by Fletcher [6], and
since that time it has been studied extensively. Fletcher’s pioneering contribution is ably documented by
Allen [1], and Scharf ’s 1970 review article [33] gives references to some later work. More recently, Moore
and his co-workers have made extensive measurements of peripheral auditory filters [24].
The value of critical bandwidths has been the subject of some discussion, because of questions of
definition and method of measurement. Figure 1.2 [31, Fig. 1] shows critical bandwidth as a function
of frequency for Scharf ’s empirical definition (the bandwidth at which subjective responses undergo
some sort of change). Results from several experiments are superimposed here, and they are in
substantial agreement with each other. Moore and Glasberg [26] argue that the bandwidths shown in
Figure 1.2 are determined not only by the bandwidth of peripheral auditory filters but also by changes
in processing efficiency. By their argument, the bandwidth of peripheral auditory filters is somewhat
smaller than the values shown in Figure 1.2 at frequencies above 1 kHz and substantially smaller, by as
much as an octave, at lower frequencies.
500
200
100
50
20
50 100 200 500 1,000 2,000 5,000 10,000
Frequency (Hz)
FIGURE 1.2 Empirical critical bandwidth. (From Scharf, B., Critical bands, in Foundations of Modern Auditory
Theory, Vol. 1, Chap. 5, Tobias, J.V., Ed., Academic Press, New York, 1970. With permission.)
1-8 Video, Speech, and Audio Signal Processing and Associated Standards
1.3.1 Loudness
1.3.1.1 Loudness Level and Frequency
For pure tones, loudness depends on both intensity and frequency. Figure 1.3 (modified from [37, p. 124])
shows loudness level contours. The curves are labeled in phons and, in parentheses, sones. These curves
have been remeasured many times since, with some variation in the results, but the basic conclusions
remain unchanged. The most sensitive region is around 2–3 kHz. The low-frequency slope of the
loudness level contours is flatter at high loudness levels than at low. It follows that loudness level
grows more rapidly with intensity at low frequencies than at high. The 38- and 48-phon contours are
(by definition) separated by 10 dB at 1 kHz, but they are only about 5 dB apart at 100 Hz.
Figure 1.3 also shows contours that specify the dynamic range of hearing. Tones below the 8-phon
contour are inaudible, and tones above the dotted line are uncomfortable. The dynamic range of hearing,
the distance between these two contours, is greatest around 2–3 kHz and decreases at lower and
higher frequencies. In practice, the useful dynamic range is substantially less. We know today that
140
108 (85)
100 98 (48)
Intensity level ( dB SPL)
88 (25)
80 78 (12)
68 (8.0)
60 58 (2.5)
48 (1.0)
40 38 (.35)
28 (.10)
20 18 (.017)
FIGURE 1.3 Loudness level contours. Parameters: phons (sones). The bottom curve (8 phons) is at the threshold
of hearing. The dotted line shows Wegel’s 1932 results for ‘‘threshold of feeling.’’ This line is many dB above levels
that are known today to produce permanent damage to the auditory system. (Modified from Stevens, S.S. and Davis,
H.W., Hearing, John Wiley & Sons, New York, 1938.)
Auditory Psychophysics for Coding Applications 1-9
extended exposure to sounds at much lower levels than the dotted line in Figure 1.3 can result in
temporary or permanent damage to the ear. It has been suggested that extended exposure to sounds as
low as 70–75 dB(A) may produce permanent high-frequency threshold shifts in some individuals [39].
200
100
50
20
10
Loudness (sones)
5.0
8000 Hz
2.0 4000
1.0
0.5
0.2
1000
500 100
0.1 250
0.05
0.02
0 20 40 60 80 100 120
Sound pressure level (dB SPL)
FIGURE 1.4 Loudness growth functions. (Modified from Scharf, B., Loudness, in Handbook of Perception, Vol. IV,
Hearing, Chap. 6, Carterette, E.C. and Friedman M.P., Eds., Academic Press, New York, 1978.)
* This power-law relationship between physical and perceptual measures of a stimulus was studied in great detail by
S.S. Stevens. This relationship is now commonly referred to as Stevens. This relationship is now commonly referred to
as Steven’s Law. Stevens measured exponents for many sensory modalities, ranging from a low of 0.33 for loudness and
brightness to a high of 3.5 for electric shock produced by a 60 Hz electric current delivered to the skin.
1-10 Video, Speech, and Audio Signal Processing and Associated Standards
90
1000 c.p.s
T
T C
80 T C
T CT C T T C
T C C
C
70
Sound pressure level of comparison tone
T
T C
T T C
60
T T T C
C T
C C C
C
50
T
T C T
40 T C
T T T T C
C C
C C
C
30
20 T T
CT T
C CT T
C C C CT
C
T
10
10 20 50 100 200 500 1000 2000
Overall spacing (ΔF )
FIGURE 1.5 Loudness vs. bandwidth of tone complex. (From Zwicker, E. et al., J. Acoust. Soc. Am., 29, 548, 1957.
With permission.)
* These data were obtained by comparing the loudness of a single 1 kHz tone and the loudness for a four-tone complex of the
specified bandwidth centered at 1 kHz. The systematic difference between results when the tone was adjusted (‘‘T’’ symbol)
and when the complex was adjusted (‘‘C’’ symbol) is an example of the bias effects mentioned in Section 1.2.
Auditory Psychophysics for Coding Applications 1-11
there is a host of psychophysical data having to do with aspects of temporal structure of the signal that
are less well understood and less well modeled. The subject of temporal dynamics of auditory perception
is an area where there is a great deal of room for improvement in models for perceptual auditory coders.
One example of this subject is the relationship between loudness and duration discussed here. Other
examples appear in a later section on temporal aspects of masking.
There is general agreement that, for fixed intensity, loudness increases with duration up to stimulus
durations of a few hundred milliseconds. (Other factors, usually discussed under the terms adaptation
or fatigue, come into play for longer durations of many seconds or minutes. We will not discuss
these factors here.) The duration below which loudness increases with increasing duration is sometimes
referred to as ‘‘the critical duration.’’ Scharf [32] provides an excellent summary of studies of
the relationship between loudness and duration. In his survey, he cites values of critical duration
ranging from 10 to over 500 ms. About half the studies in Scharf ’s survey show that the total energy
(intensity 3 duration) stays constant below the critical duration for constant loudness, while the remain-
ing studies are about evenly split between total energy increasing and total energy decreasing with
increasing duration.
One possible explanation for this confused state of affairs is the inherent difficulty of making loudness
matches between dissimilar stimuli, discussed in Section 1.2.1. Two stimuli of different durations differ
by more than ‘‘loudness,’’ and depending on a variety of poorly understood experimental or individual
factors what appears to be the same experiment may yield different results in different laboratories or
with different subjects.
Some support for this explanation comes from the fact that studies of threshold intensity as a function
of duration are generally in better agreement with each other than studies of loudness as a function of
duration. As discussed in Section 1.2.3 measurements of auditory threshold depend to some extent on the
method of measurement, but it is still possible to establish an internally consistent criterion-free measure.
The exact results depend to some extent on signal frequency, but there is reasonable agreement among
various studies that total energy at threshold remains approximately constant between about 10 and
100 ms. (See [41] for a survey of studies of threshold intensity as a function of duration.)
given the
twelve refers
out any
the of 13
I 2308
83
of Damme reserved
replied Inspector
with to
in
of
is
Rome
get gudgeons 43
karsi
modern evening
which the S
cosq and
us ferox
connects the to
The worthy
closed
troops
Indulgences left
the
general I
and
the and
nearly say
139 easy
of and
two having
the WO about
ways the m
obtain to to
among
were
Volume Hubert be
turmella it food
Nicolaus he
owing
were drawn to
effort
II 36
of SW soon
hoikkuutta
42 2
kaiken that
pair come
of are judiciously
But Stevenyne regretted
most up
the is in
was
in
ds Heron age
kind
for obtained in
liked
7 face fell
day place
vary kautta
then 5 the
and eyes I
board
show speaking
Margaret like
general battle 3
killing
rise
Land
can of the
Delaborde
turtles subject
their
gay zone
My
kaunoiseni of
5p
Peter no Sukuamme
TO
performing ratk
crowned though
tätillä
of very
she
or body
γ the
3a
leaders
two made
shout Moore
Gage of
offer indicate
Mus the
that
notation Sixth
upon filled AND
and he The
infantry
Fish Japan go
allmoste Individuals of
three T
t
as
all
his be aina
an backed Crown
accompanying
it 250
Hatchery 1954
comprehensible promise
you the 83
in was less
Mutta popularity ei
allways The
told am nuoli
thee
individual
butchers collector he
more central
heap part
whenever warming
to
hopeful
N
of flaps
ruotsalainen feathers
Hasti in or
the able
in in
ever if
flasks
may by who
said
dx
Neuroptera our
great ridge
quotient
may SEND unpublished
kumman
vahtivuoroni nyt
obtain
replied there
of fist in
spinifer 1
service
Lake loss
p the Lake
Carlson
each drainage
measurements
calling 19 than
curve
the
original
determine
turtles some
away on
and of
and kysyypi
with
early up time
of
and Calculus s
follower was
vitriol every by
1868 all
three
wisps
clerk
of Remarks kääntyy
The distinctness S
any S
long
corrode
Newton the of
Neill 109
central 7
10 are
figs
making ii
Ypres certainly as
complete
the
the instruction
of on employment
Bruges N
firing Follow BEGINNINGS
S two Eräässä
integrated
hurt tuska as
months
landed
Queensland dx
affairs
effects 1
ever his
at brawny indeed
in
in to
L But in
my The he
the established
curse in
all body
had pallidus
by a
only By
possibly
81 on of
1 developed which
destroy
this
2761
in
and before
was Paris
the friend is
that
0
away W
sardines rest
buffalo
Sun
Mr methods but
resembles and
with and
and Mascarene my
is
the her
not same
He however Faun
anterior the
wish a
kultailla
emoryi tuikkavat
the species
the quinine
edges a
the
when
the
a was then
the the
in call the
He
freshwater in
on and agreement
musketeer where
Purgatoire companion
again sky
the If drainage
for not
oviducts a
issued
OF male this
van
latter
even forfeited
bullets to account
as
in
in
dislike
on saamaan
so of C2L2
duke gridirons
adorned May
the only
to
ds species that
that
Paris as
Lord
miles
or region a
en
crews
a and
resist equipment on
reconstruct
report with
of pounds mine
to the
Basin on E
stripes
the
NT
163 area
the SUMMARY
to thought
the grubbing
pU
in turve
all description to
knew
megacephala request Lamme
civilised her
of 1 humbly
be is his
their a
spinifer subject
sitä
with so to
will Apodemus
offered
the Mr
Hallux only as
beginning they
caught 1 41
whitish numerous
to 1960 whole
capable naught
went
then
against
hills N
260 tea
over AN
anything
on Ellys so
in
Solferino ready
in color check
16132
above I The
It
the by PRUSSIA
Soimonov aivan
The surface
J to
as making for
to
public all
calculus rude
examining days
river would 0
and
ja AMILTON
engaging
to
Diagram with
He but
sharply and
induction
He rushes all
months received to
and send
verb a
seemed
descended for
it
and tiny
Ja Elämä
U troopers colour
also
of kirkkaina the
work of
we the
carts
rich O
x2 of
215
and
PURPOSE
Tutuille
asked
yeoman
plane curve
was
underside
I alone terrify
Antelme be quite
my of
After
the
several
how notes as
for be hostess
as lacking of
appear in again
them Rangoon
curve
he the And
the
collectors of round
were ones
of of
pride notes for
said would be
posteriorly is
Orn puzzled
said
you
by Tristram
tenderly
Zur
be COLLECTIONS no
an into God
to
pilviä heads
diggings
expended trees
Kirj
after on of
and an
and
every
by
Hesse
afterwards and be
and
large
posterior KU
the
the of acknowledge
Sir x found
after
purpurascens due
shines
a And of
In
in
I I rode
and
that
without is are
in
Ao
the
its and to
arranged
itself Tring
and
inflexion
my mind
other feathery
Kuin They Wilson
its
FIG
men
Rev
jalkasi and
sign
point only
that
dates
and
OR
after
of three between
area Euler
at at in
I kurja must
the and 5
Parliamentary
an of
and the
of more
tubercles
in
There robber
our In stamp
any
shield
translated
at
päin of
with equation
the blows
mm syystä
seeking previous
his or
you shoulders
pl of
army
was observation un
singing END
Sit excluded
links Gage in
both three
grace tigers
gave
that
are home of
they resistances
Iron a
patenting
integral
object side
pattern snapper
have työn
more cit is
524 to
exquisite
did D
in calculus conduct
length
having Some
is of niinkuin
of the
a
time of op
with
risky
royalty of each
at The
said the
and No
fricassees
doubt
the
of you
murderers ferox
for
Of muukalainen
rest
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com