100% found this document useful (1 vote)
52 views122 pages

(Ebook) The Digital Signal Processing Handbook: Video, Speech, and Audio Signal Processing by Vijay K. Madisetti ISBN 9781420046083, 142004608X 2025 PDF Download

Academic material: (Ebook) The Digital Signal Processing Handbook: Video, Speech, and Audio Signal Processing by Vijay K. Madisetti ISBN 9781420046083, 142004608XAvailable for instant access. A structured learning tool offering deep insights, comprehensive explanations, and high-level academic value.

Uploaded by

xklnbei076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
52 views122 pages

(Ebook) The Digital Signal Processing Handbook: Video, Speech, and Audio Signal Processing by Vijay K. Madisetti ISBN 9781420046083, 142004608X 2025 PDF Download

Academic material: (Ebook) The Digital Signal Processing Handbook: Video, Speech, and Audio Signal Processing by Vijay K. Madisetti ISBN 9781420046083, 142004608XAvailable for instant access. A structured learning tool offering deep insights, comprehensive explanations, and high-level academic value.

Uploaded by

xklnbei076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

(Ebook) The Digital Signal Processing Handbook: Video,

Speech, and Audio Signal Processing by Vijay K.


Madisetti ISBN 9781420046083, 142004608X Pdf Download

https://ebooknice.com/product/the-digital-signal-processing-
handbook-video-speech-and-audio-signal-processing-1688718

★★★★★
4.8 out of 5.0 (56 reviews )

DOWNLOAD PDF

ebooknice.com
(Ebook) The Digital Signal Processing Handbook: Video,
Speech, and Audio Signal Processing by Vijay K. Madisetti
ISBN 9781420046083, 142004608X Pdf Download

EBOOK

Available Formats

■ PDF eBook Study Guide Ebook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


We have selected some products that you may be interested in
Click the link to download now or visit ebooknice.com
for more options!.

(Ebook) The Digital Signal Processing Handbook: Wireless,


Networking, Radar, Sensor Array Processing, and Nonlinear Signal
Processing by Vijay K. Madisetti ISBN 9781420046045, 1420046047

https://ebooknice.com/product/the-digital-signal-processing-handbook-
wireless-networking-radar-sensor-array-processing-and-nonlinear-signal-
processing-1966962

(Ebook) The digital signal processing fundamentals by Madisetti


V. (ed.) ISBN 9781420046069, 1420046063

https://ebooknice.com/product/the-digital-signal-processing-
fundamentals-2045672

(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles,


James ISBN 9781459699816, 9781743365571, 9781925268492,
1459699815, 1743365578, 1925268497

https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374

(Ebook) Digital Audio Signal Processing by Udo Zölzer ISBN


9780470997857, 0470997850

https://ebooknice.com/product/digital-audio-signal-processing-1224852
(Ebook) Advances in Audio and Speech Signal Processing:
Technologies and Applications by Hector Perez Meana ISBN
9781599041322, 1599041324

https://ebooknice.com/product/advances-in-audio-and-speech-signal-
processing-technologies-and-applications-1854324

(Ebook) Speech and Audio Signal Processing: Processing and


Perception of Speech and Music, Second Edition by Ben Gold,
Nelson Morgan, Dan Ellis(auth.) ISBN 9780470195369,
9781118142882, 0470195363, 1118142888
https://ebooknice.com/product/speech-and-audio-signal-processing-
processing-and-perception-of-speech-and-music-second-edition-4312312

(Ebook) Information Fusion in Signal and Image Processing


(Digital Signal and Image Processing) by Isabelle Bloch ISBN
9781848210196, 1848210191

https://ebooknice.com/product/information-fusion-in-signal-and-image-
processing-digital-signal-and-image-processing-1406464

(Ebook) Digital Signal Processing and Applications with the


TMS320C6713 and TMS320C6416 DSK (Topics in Digital Signal
Processing) by Rulph Chassaing, Donald Reay ISBN 9780470138663,
0470138661
https://ebooknice.com/product/digital-signal-processing-and-applications-
with-the-tms320c6713-and-tms320c6416-dsk-topics-in-digital-signal-
processing-1717768

(Ebook) Visual Perception Through Video Imagery (Digital Signal


and Image Processing) by Michel Dhome ISBN 9781848210165,
1848210167

https://ebooknice.com/product/visual-perception-through-video-imagery-
digital-signal-and-image-processing-1708444
The
Digital Signal
Processing
Handbook
SECOND EDITION

Video, Speech, and


Audio Signal Processing
and Associated Standards

EDITOR-IN-CHIEF

Vijay K. Madisetti

Boca Raton London New York

CRC Press is an imprint of the


Taylor & Francis Group, an informa business
The Electrical Engineering Handbook Series
Series Editor
Richard C. Dorf
University of California, Davis

Titles Included in the Series

The Handbook of Ad Hoc Wireless Networks, Mohammad Ilyas


The Avionics Handbook, Second Edition, Cary R. Spitzer
The Biomedical Engineering Handbook, Third Edition, Joseph D. Bronzino
The Circuits and Filters Handbook, Second Edition, Wai-Kai Chen
The Communications Handbook, Second Edition, Jerry Gibson
The Computer Engineering Handbook, Vojin G. Oklobdzija
The Control Handbook, William S. Levine
The CRC Handbook of Engineering Tables, Richard C. Dorf
The Digital Avionics Handbook, Second Edition Cary R. Spitzer
The Digital Signal Processing Handbook, Second Edition, Vijay K. Madisetti
The Electrical Engineering Handbook, Second Edition, Richard C. Dorf
The Electric Power Engineering Handbook, Second Edition, Leonard L. Grigsby
The Electronics Handbook, Second Edition, Jerry C. Whitaker
The Engineering Handbook, Third Edition, Richard C. Dorf
The Handbook of Formulas and Tables for Signal Processing, Alexander D. Poularikas
The Handbook of Nanoscience, Engineering, and Technology, Second Edition
William A. Goddard, III, Donald W. Brenner, Sergey E. Lyshevski, and Gerald J. Iafrate
The Handbook of Optical Communication Networks, Mohammad Ilyas and
Hussein T. Mouftah
The Industrial Electronics Handbook, J. David Irwin
The Measurement, Instrumentation, and Sensors Handbook, John G. Webster
The Mechanical Systems Design Handbook, Osita D.I. Nwokah and Yidirim Hurmuzlu
The Mechatronics Handbook, Second Edition, Robert H. Bishop
The Mobile Communications Handbook, Second Edition, Jerry D. Gibson
The Ocean Engineering Handbook, Ferial El-Hawary
The RF and Microwave Handbook, Second Edition, Mike Golio
The Technology Management Handbook, Richard C. Dorf
The Transforms and Applications Handbook, Second Edition, Alexander D. Poularikas
The VLSI Handbook, Second Edition, Wai-Kai Chen
The Digital Signal Processing Handbook, Second Edition

Digital Signal Processing Fundamentals


Video, Speech, and Audio Signal Processing and Associated Standards
Wireless, Networking, Radar, Sensor Array Processing, and Nonlinear Signal Processing
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy
of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products does not consti-
tute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB®
software.

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2010 by Taylor and Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed in the United States of America on acid-free paper


10 9 8 7 6 5 4 3 2 1

International Standard Book Number: 978-1-4200-4608-3 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to
publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials
or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material repro-
duced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any
form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming,
and recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copy-
right.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400.
CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identifica-
tion and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Video, speech, and audio signal processing and associated standards / Vijay K. Madisetti.
p. cm.
“Second edition of the DSP Handbook has been divided into three parts.”
Includes bibliographical references and index.
ISBN 978-1-4200-4608-3 (alk. paper)
1. Signal processing--Digital techniques--Standards. 2. Digital video--Standards. 3. Image
processing--Digital techniques--Standards. 4. Speech processing systems--Standards 5. Sound--Recording
and reproducing--Digital techniques--Standards I. Madisetti, V. (Vijay) II. Digital signal processing
handbook. III. Title.

TK5102.9.V493 2009
621.382’2--dc22 2009022594

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents

Preface .................................................................................................................................................. vii


Editor ..................................................................................................................................................... ix
Contributors ........................................................................................................................................ xi

PART I Digital Audio Communications


Nikil Jayant
1 Auditory Psychophysics for Coding Applications ......................................................... 1-1
Joseph L. Hall
2 MPEG Digital Audio Coding Standards .......................................................................... 2-1
Schuyler R. Quackenbush and Peter Noll
3 Dolby Digital Audio Coding Standards ........................................................................... 3-1
Robert L. Andersen and Grant A. Davidson
4 The Perceptual Audio Coder .............................................................................................. 4-1
Deepen Sinha, James D. Johnston, Sean Dorward, and Schuyler R. Quackenbush
5 Sony Systems .......................................................................................................................... 5-1
Kenzo Akagiri, Masayuki Katakura, H. Yamauchi, E. Saito, M. Kohut,
Masayuki Nishiguchi, Kyoya Tsutsui, and Keisuke Toyama

PART II Speech Processing


Richard V. Cox and Lawrence R. Rabiner
6 Speech Production Models and Their Digital Implementations ................................ 6-1
M. Mohan Sondhi and Juergen Schroeter
7 Speech Coding ........................................................................................................................ 7-1
Richard V. Cox
8 Text-to-Speech Synthesis ..................................................................................................... 8-1
Richard Sproat and Joseph Olive
9 Speech Recognition by Machine ........................................................................................ 9-1
Lawrence R. Rabiner and Biing-Hwang Juang

v
vi Contents

10 Speaker Verification ............................................................................................................ 10-1


Sadaoki Furui and Aaron E. Rosenberg
11 DSP Implementations of Speech Processing ................................................................. 11-1
Kurt Baudendistel
12 Software Tools for Speech Research and Development ............................................. 12-1
John Shore

PART III Image and Video Processing


Jan Biemond and Russell M. Mersereau
13 Fundamentals of Image Processing ................................................................................. 13-1
Ian T. Young, Jan J. Gerbrands, and Lucas J. van Vliet
14 Still Image Compression .................................................................................................... 14-1
Tor A. Ramstad
15 Image and Video Restoration ........................................................................................... 15-1
A. Murat Tekalp
16 Video Scanning Format Conversion and Motion Estimation .................................. 16-1
Gerard de Haan and Ralph Braspenning
17 Document Modeling and Source Representation in Content-Based
Image Retrieval .................................................................................................................... 17-1
Soo Hyun Bae and Biing-Hwang Juang
18 Technologies for Context-Based Video Search over the World Wide Web ......... 18-1
Arshdeep Bahga and Vijay K. Madisetti
19 Image Interpolation ............................................................................................................ 19-1
Yucel Altunbasak
20 Video Sequence Compression .......................................................................................... 20-1
Osama Al-Shaykh, Ralph Neff, David Taubman, and Avideh Zakhor
21 Digital Television ................................................................................................................. 21-1
Kou-Hu Tzou
22 Stereoscopic Image Processing ......................................................................................... 22-1
Reginald L. Lagendijk, Ruggero E. H. Franich, and Emile A. Hendriks
23 A Survey of Image Processing Software and Image Databases ................................ 23-1
Stanley J. Reeves
24 VLSI Architectures for Image Communications .......................................................... 24-1
P. Pirsch and W. Gehrke
Index ................................................................................................................................................... I-1
Preface

Digital signal processing (DSP) is concerned with the theoretical and practical aspects of representing
information-bearing signals in a digital form and with using computers, special-purpose hardware and
software, or similar platforms to extract information, process it, or transform it in useful ways. Areas
where DSP has made a significant impact include telecommunications, wireless and mobile communi-
cations, multimedia applications, user interfaces, medical technology, digital entertainment, radar and
sonar, seismic signal processing, and remote sensing, to name just a few.
Given the widespread use of DSP, a need developed for an authoritative reference, written by the top
experts in the world, that would provide information on both theoretical and practical aspects in a
manner that was suitable for a broad audience—ranging from professionals in electrical engineering,
computer science, and related engineering and scientific professions to managers involved in technical
marketing, and to graduate students and scholars in the field. Given the abundance of basic and
introductory texts on DSP, it was important to focus on topics that were useful to engineers and scholars
without overemphasizing those topics that were already widely accessible. In short, the DSP handbook
was created to be relevant to the needs of the engineering community.
A task of this magnitude could only be possible through the cooperation of some of the foremost DSP
researchers and practitioners. That collaboration, over 10 years ago, produced the first edition of the
successful DSP handbook that contained a comprehensive range of DSP topics presented with a clarity of
vision and a depth of coverage to inform, educate, and guide the reader. Indeed, many of the chapters,
written by leaders in their field, have guided readers through a unique vision and perception garnered by
the authors through years of experience.
The second edition of the DSP handbook consists of Digital Signal Processing Fundamentals; Video,
Speech, and Audio Signal Processing and Associated Standards; and Wireless, Networking, Radar, Sensor
Array Processing, and Nonlinear Signal Processing to ensure that each part is dealt with in adequate detail
and that each part is then able to develop its own individual identity and role in terms of its educational
mission and audience. I expect each part to be frequently updated with chapters that reflect the changes
and new developments in the technology and in the field. The distribution model for the DSP handbook
also reflects the increasing need by professionals to access content in electronic form anywhere and
at anytime.
Video, Speech, and Audio Signal Processing and Associated Standards, as the name implies, provides a
comprehensive coverage of the basic foundations of speech, audio, image, and video processing and
associated applications to broadcast, storage, search and retrieval, and communications.
This book needs to be continuously updated to include newer aspects of these technologies, and I look
forward to suggestions on how this handbook can be improved to serve you better.

vii
viii Preface

MATLAB1 is a registered trademark of The MathWorks, Inc. For product information, please
contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098 USA
Tel: 508 647 7000
Fax: 508-647-7001
E-mail: [email protected]
Web: www.mathworks.com
Editor

Vijay K. Madisetti is a professor in the School of Electrical and Com-


puter Engineering at the Georgia Institute of Technology in Atlanta. He
teaches graduate and undergraduate courses in digital signal processing
and computer engineering, and leads a strong research program in digital
signal processing, telecommunications, and computer engineering.
Dr. Madisetti received his BTech (Hons) in electronics and electrical
communications engineering in 1984 from the Indian Institute of Tech-
nology, Kharagpur, India, and his PhD in electrical engineering and
computer sciences in 1989 from the University of California at Berkeley.
He has authored or edited several books in the areas of digital signal
processing, computer engineering, and software systems, and has served extensively as a consultant to
industry and the government. He is a fellow of the IEEE and received the 2006 Frederick Emmons Terman
Medal from the American Society of Engineering Education for his contributions to electrical engineering.

ix
Contributors

Kenzo Akagiri Ralph Braspenning


Sony Corporation Philips Research Laboratories
Tokyo, Japan Eindhoven, the Netherlands

Osama Al-Shaykh Richard V. Cox


Packet Video AT&T Research Labs
San Diego, California Florham Park, New Jersey

Yucel Altunbasak Grant A. Davidson


School of Electrical and Computer Engineering Dolby Laboratories, Inc.
Georgia Institute of Technology San Francisco, California
Atlanta, Georgia
Sean Dorward
Robert L. Andersen Bell Laboratories
Dolby Laboratories, Inc. Lucent Technologies
San Francisco, California Murray Hill, New Jersey

Soo Hyun Bae Ruggero E. H. Franich


Sony U.S. Research Center AEA Technology
San Jose, California Culham Laboratory
Oxfordshire, United Kingdom
Arshdeep Bahga
School of Electrical and Computer Engineering Sadaoki Furui
Georgia Institute of Technology Department of Computer Science
Atlanta, Georgia Tokyo Institute of Technology
Tokyo, Japan
Kurt Baudendistel
Momentum Data Systems W. Gehrke
Fountain Valley, California Philips Semiconductors
Hamburg, Germany
Jan Biemond
Faculty of Electrical Engineering, Mathematics, Jan J. Gerbrands
and Computer Science Department of Electrical Engineering
Delft University of Technology Delft University of Technology
Delft, the Netherlands Delft, the Netherlands

xi
xii Contributors

Gerard de Haan Russell M. Mersereau


Philips Research Laboratories School of Electrical and Computer Engineering
Eindhoven, the Netherlands Georgia Institute of Technology
and Atlanta, Georgia

Information Communication Systems Group


Ralph Neff
Eindhoven University of Technology
Video and Image Processing Laboratory
Eindhoven, the Netherlands
University of California
Berkeley, California
Joseph L. Hall
Bell Laboratories
Lucent Technologies Masayuki Nishiguchi
Murray Hill, New Jersey Sony Corporation
Tokyo, Japan
Emile A. Hendriks
Information and Communication Theory Group
Peter Noll
Delft University of Technology
Institute for Telecommunications
Delft, the Netherlands
Technical University of Berlin
Berlin, Germany
Nikil Jayant
School of Electrical and Computer Engineering
Georgia Institute of Technology Joseph Olive
Atlanta, Georgia Bell Laboratories
Lucent Technologies
James D. Johnston Murray Hill, New Jersey
AT&T Research Labs
Florham Park, New Jersey
P. Pirsch
Laboratory for Information Technology
Biing-Hwang Juang
University of Hannover
School of Electrical and Computer Engineering
Hannover, Germany
Georgia Institute of Technology
Atlanta, Georgia
Schuyler R. Quackenbush
Masayuki Katakura Audio Research Labs
Sony Corporation Scotch Plains, New Jersey
Kanagawa, Japan
and
M. Kohut AT&T Research Labs
Sony Corporation Florham Park, New Jersey
San Diego, California

Reginald L. Lagendijk Lawrence R. Rabiner


Information and Communication Theory Group Department of Electrical and Computer
Delft University of Technology Engineering
Delft, the Netherlands Rutgers University
New Brunswick, New Jersey
Vijay K. Madisetti
and
School of Electrical and Computer Engineering
Georgia Institute of Technology AT&T Research Labs
Atlanta, Georgia Florham Park, New Jersey
Contributors xiii

Tor A. Ramstad David Taubman


Department of Electronics and Hewlett Packard
Telecommunications Palo Alto, California
Norwegian University of Science and Technology
Trondheim, Norway A. Murat Tekalp
Department of Electrical and Electronics
Stanley J. Reeves Engineering
Electrical and Computer Engineering Department Koç University
Auburn University Istanbul, Turkey
Auburn, Alabama
Keisuke Toyama
Aaron E. Rosenberg Sony Corporation
Center for Advanced Information Processing Tokyo, Japan
Rutgers University
Piscataway, New Jersey Kyoya Tsutsui
Sony Corporation
E. Saito Tokyo, Japan
Sony Corporation
Kanagawa, Japan Kou-Hu Tzou
Hyundai Network Systems
Juergen Schroeter Seoul, Korea
AT&T Research Labs
Florham Park, New Jersey Lucas J. van Vliet
Department of Imaging Science and Technology
John Shore Delft University of Technology
Entropic Research Laboratory, Inc. Delft, the Netherlands
Washington, District of Columbia
H. Yamauchi
Deepen Sinha Sony Corporation
Bell Laboratories Kanagawa, Japan
Lucent Technologies
Murray Hill, New Jersey Ian T. Young
Department of Imaging Science and Technology
M. Mohan Sondhi Delft University of Technology
Bell Laboratories Delft, the Netherlands
Lucent Technologies
Murray Hill, New Jersey Avideh Zakhor
Video and Image Processing Laboratory
Richard Sproat University of California
Bell Laboratories Berkeley, California
Lucent Technologies
Murray Hill, New Jersey
I
Digital Audio
Communications
Nikil Jayant
Georgia Institute of Technology

1 Auditory Psychophysics for Coding Applications Joseph L. Hall ................................... 1-1


Introduction . Definitions . Summary of Relevant Psychophysical Data . Conclusions .

References
2 MPEG Digital Audio Coding Standards Schuyler R. Quackenbush
and Peter Noll .................................................................................................................................. 2-1
Introduction . Key Technologies in Audio Coding . MPEG-1=Audio Coding .
MPEG-2=Audio Multichannel Coding . MPEG-4=Audio Coding . MPEG-D=Audio
Coding . Applications . Conclusions . References
3 Dolby Digital Audio Coding Standards Robert L. Andersen
and Grant A. Davidson ................................................................................................................. 3-1
Introduction . AC-3 Audio Coding . Enhanced AC-3 Audio Coding . Conclusions .

References
4 The Perceptual Audio Coder Deepen Sinha, James D. Johnston,
Sean Dorward, and Schuyler R. Quackenbush ......................................................................... 4-1
Introduction . Applications and Test Results . Perceptual Coding . Multichannel PAC .

Bitstream Formatter . Decoder Complexity . Conclusions . References


5 Sony Systems Kenzo Akagiri, Masayuki Katakura, H. Yamauchi, E. Saito, M. Kohut,
Masayuki Nishiguchi, Kyoya Tsutsui, and Keisuke Toyama ................................................ 5-1
Introduction . Oversampling AD and DA Conversion Principle . The SDDS System
for Digitizing Film Sound . Switched Predictive Coding of Audio Signals for the
CD-I and CD-ROM XA Format . ATRAC Family . References

A
S I PREDICTED IN THE SECTION INTRODUCTION FOR THE 1997 version of this book,
digital audio communications has become nearly as prevalent as digital speech communications.
In particular, new technologies for audio storage and transmission have made available music
and wideband signals in a flexible variety of standard formats.

I-1
I-2 Video, Speech, and Audio Signal Processing and Associated Standards

The fundamental underpinning for these technologies is audio compression based on perceptually
tuned shaping of the quantization noise. Chapter 1 in this part describes aspects of psychoacoustics that
have led to the general foundations of ‘‘perceptual audio coding.’’ Succeeding chapters in this part cover
established examples of ‘‘perceptual audio coders.’’ These include MPEG standards, and coders developed
by Dolby, Sony, and Bell Laboratories.
The dimensions of coder performance are quality, bit rate, delay, and complexity. The quality vs. bit
rate trade-offs are particularly important.

Audio Quality
The three parameters of digital audio quality are ‘‘signal bandwidth,’’ ‘‘fidelity,’’ and ‘‘spatial realism.’’
Compact-disc (CD) signals have a bandwidth of 20–20,000 Hz, while traditional telephone speech has
a bandwidth of 200–3400 Hz. Intermediate bandwidths characterize various grades of wideband speech
and audio, including roughly defined ranges of quality referred to as AM radio and FM radio quality
(bandwidths on the order of 7–10 and 12–15 kHz, respectively).
In the context of digital coding, fidelity refers to the level of perceptibility of quantization or to
reconstruction noise. The highest level of fidelity is one where the noise is imperceptible in formal
listening tests. Lower levels of fidelity are acceptable in some applications if they are not annoying,
although in general it is good practice to sacrifice some bandwidth in the interest of greater fidelity, for a
given bit rate in coding. Five-point scales of signal fidelity are common both in speech and audio coding.
Spatial realism is generally provided by increasing the number of coded (and reproduced) spatial
channels. Common formats are 1-channel (mono), 2-channel (stereo), 5-channel (3 front, 2 rear),
5.1-channel (5-channel plus subwoofer), and 8-channel (6 front, 2 rear). For given constraints on
bandwidth and fidelity, the required bit rate in coding increases as a function of the number of channels;
but the increase is slower than linear, because of the presence of interchannel redundancy. The notion of
perceptual coding originally developed for exploiting the perceptual irrelevancies of a single-channel
audio signal extends also to the methods used in exploiting interchannel redundancy.

Bit Rate
The CD-stereo signal has a digital representation rate of 1406 kilobits per second (kbps). Current
technology for perceptual audio coding, notably MP3 audio reproduces CD-stereo with near-perfect
fidelity at bit rates as low as 128 kbps, depending on the input signal. CD-like reproduction is possible at
bit rates as low as 64 kbps for stereo. Single-channel reproduction of FM-radio-like music is possible
at 32 kbps. The single-channel reproduction of AM-radio-like music and wideband speech is possible at
rates approaching 16 kbps for all but the most demanding signals. Techniques for so-called pseudo-stereo
can provide additional enhancement of digital single-channel audio.

Applications of Digital Audio


The capabilities of audio compression have combined with increasingly affordable implementations on
platforms for digital signal processing (DSP), native signal processing (NSP) in a computer’s (native)
processor, and application-specific integrated circuits (ASICs) to create revolutionary applications of
digital audio. International and national standards have contributed immensely to this revolution. Some
of these standards only specify the bit-stream syntax and decoder, leaving room for future, sometimes
proprietary, enhancements of the encoding algorithm.
The domains of applications include ‘‘transmission’’ (e.g., digital audio broadcasting), ‘‘storage’’ (e.g.,
the iPod and the digital versatile disk [DVD]), and ‘‘networking’’ (music preview, distribution, and
publishing). The networking applications, aided by significant advances in broadband access speeds, have
made digital audio communications as commonplace as digital telephony.
Digital Audio Communications I-3

The Future of Digital Audio


Remarkable as the capabilities and applications mentioned above are, there are even greater challenges
and opportunities for the practitioners of digital audio technology. It is unlikely that we have reached or
even approached the fundamental limits of performance in terms of audio quality at a given bit rate.
Newer capabilities in this technology (in terms of audio fidelity, bandwidth, and spatial realism) will
continue to lead to newer classes of applications in audio communications. New technologies for
universal coding will create interesting new options for digital networking and seamless communication
of speech and music signals. Advances in multichannel audio capture and reproduction will lead to more
sophisticated and user-friendly technologies for telepresence. In the entertainment domain, the audio
dimension will continue to enhance the overall quality of visual formats such as multiplayer games and
3D-television.
1
Auditory Psychophysics
for Coding Applications
1.1 Introduction........................................................................................... 1-1
1.2 Definitions.............................................................................................. 1-2
Loudness . Pitch . Threshold of Hearing . Differential Threshold .

Masked Threshold . Critical Bands and Peripheral


Auditory Filters
1.3 Summary of Relevant Psychophysical Data................................... 1-8
Loudness . Differential Thresholds . Masking
Joseph L. Hall 1.4 Conclusions ......................................................................................... 1-22
Lucent Technologies References ........................................................................................................ 1-23

In this chapter, we review properties of auditory perception that are relevant to the design of coders for
acoustic signals. The chapter begins with a general definition of a perceptual coder, then considers what
the ‘‘ideal’’ psychophysical model would consist of, and what use a coder could be expected to make of
this model. We then present some basic definitions and concepts. The chapter continues with a review
of relevant psychophysical data, including results on threshold, just-noticeable differences (JNDs),
masking, and loudness. Finally, we attempt to summarize the present state of the art, the capabilities
and limitations of present-day perceptual coders for audio and speech, and what areas most need work.

1.1 Introduction
A coded signal differs in some respect from the original signal. One task in designing a coder is to
minimize some measure of this difference under the constraints imposed by bit rate, complexity, or cost.
What is the appropriate measure of difference? The most straightforward approach is to minimize some
physical measure of the difference between original and coded signal. The designer might attempt to
minimize RMS difference between the original and coded waveform, or perhaps the difference between
original and coded power spectra on a frame-by-frame basis. However, if the purpose of the coder is
to encode acoustic signals that are eventually to be listened to* by people, these physical measures do
not directly address the appropriate issue. For signals that are to be listened to by people, the ‘‘best’’ coder
is the one that sounds the best. There is a very clear distinction between ‘‘physical’’ and ‘‘perceptual’’
measures of a signal (frequency vs. pitch, intensity vs. loudness, for example). A perceptual coder can be
defined as a coder that minimizes some measure of the difference between original and coded signal so
as to minimize the perceptual impact of the coding noise. We can define the best coder given a particular
set of constraints as the one in which the coding noise is least objectionable.

* Perceptual coding is not limited to speech and audio. It can be applied also to image and video [16]. In this chapter we
consider only coders for acoustic signals.

1-1
1-2 Video, Speech, and Audio Signal Processing and Associated Standards

It follows that the designer of a perceptual coder needs some way to determine the perceptual quality
of a coded signal. ‘‘Perceptual quality’’ is a poorly defined concept, and it will be seen that in some
sense it cannot be uniquely defined. We can, however, attempt to provide a partial answer to the question
of how it can be determined. We can present something of what is known about human auditory
perception from psychophysical listening experiments and show how these phenomena relate to the
design of a coder.
One requirement for successful design of a perceptual coder is a satisfactory model for the signal-
dependent sensitivity of the auditory system. Present-day models are incomplete, but we can attempt to
specify what the properties of a complete model would be. One possible specification is that, for any
given waveform (the signal), it accurately predicts the loudness, as a function of pitch and of time, of
any added waveform (the noise). If we had such a complete model, then we would in principle be able to
build a transparent coder, defined as one in which the coded signal is indistinguishable from the original
signal, or at least we would be able to determine whether or not a given coder was transparent. It is
relatively simple to design a psychophysical listening experiment to determine whether the coding noise
is audible, or equivalently, whether the subject can distinguish between original and coded signal.
Any subject with normal hearing could be expected to give similar results to this experiment. While
present-day models are far from complete, we can at least describe the properties of a complete model.
There is a second requirement that is more difficult to satisfy. This is the need to be able to determine
which of two coded samples, each of which has audible coding noise, is preferable. While a satisfactory
model for the signal-dependent sensitivity of the auditory system is in principle sufficient for the design
of a transparent coder, the question of how to build the best nontransparent coder does not have a unique
answer. Often, design constraints preclude building a transparent coder. Even the best coder built under
these constraints will result in audible coding noise, and it is under some conditions impossible to specify
uniquely how best to distribute this noise. One listener may prefer the more intelligible version, while
another may prefer the more natural sounding version. The preferences of even a single listener might
very well depend on the application. In the absence of any better criterion, we can attempt to minimize
the loudness of the coding noise, but it must be understood that this is an incomplete solution.
Our purpose in this chapter is to present something of what is known about human auditory
perception in a form that may be useful to the designer of a perceptual coder. We do not attempt to
answer the question of how this knowledge is to be utilized, how to build a coder. Present-day perceptual
coders for the most part utilize a ‘‘feedforward’’ paradigm: analysis of the signal to be coded produces
specifications for allowable coding noise. Perhaps a more general method is a ‘‘feedback’’ paradigm, in
which the perceptual model somehow makes possible a decision as to which of two coded signals is
‘‘better.’’ This decision process can then be iterated to arrive at some optimum solution. It will be seen
that for proper exploitation of some aspects of auditory perception the feedforward paradigm may be
inadequate and the potentially more time-consuming feedback paradigm may be required. How this is to
be done is part of the challenge facing the designer.

1.2 Definitions
In this section, we define some fundamental terms and concepts and clarify the distinction between
physical and perceptual measures.

1.2.1 Loudness
When we increase the intensity of a stimulus its loudness increases, but that does not mean that intensity
and loudness are the same thing. ‘‘Intensity’’ is a physical measure. We can measure the intensity of a
signal with an appropriate measuring instrument, and if the measuring instrument is standardized
and calibrated correctly, anyone else anywhere in the world can measure the same signal and get the
same result. ‘‘Loudness’’ is ‘‘perceptual magnitude.’’ It can be defined as ‘‘that attribute of auditory
Auditory Psychophysics for Coding Applications 1-3

sensation in terms of which sounds can be ordered on a scale extending from quiet to loud’’ [23, p. 47].
We cannot measure it directly. All we can do is ask questions of a subject and from the responses attempt
to infer something about loudness. Furthermore, we have no guarantee that a particular stimulus will be
as loud for one subject as for another. The best we can do is assume that, for a particular stimulus,
loudness judgments for one group of normal-hearing people will be similar to loudness judgments for
another group.
There are two commonly used measures of loudness. One is ‘‘loudness level’’ (unit phon) and the other
is ‘‘loudness’’ (unit sone). These two measures differ in what they describe and how they are obtained.
The phon is defined as the intensity, in dB sound pressure level (SPL), of an equally loud 1 kHz tone. The
sone is defined in terms of subjectively measured loudness ratios. A stimulus half as loud as a one-sone
stimulus has a loudness of 0.5 sones, a stimulus 10 times as loud has a loudness of 10 sones, etc. A 1 kHz
tone at 40 dB SPL is arbitrarily defined to have a loudness of one sone.
The argument can be made that loudness matching, the procedure used to obtain the phon scale, is a
less subjective procedure than loudness scaling, the procedure used to obtain the sone scale. This
argument would lead to the conclusion that the phon is the more objective of the two measures and
that the sone is more subject to individual variability. This argument breaks down on two counts: first, for
dissimilar stimuli even the supposedly straightforward loudness-matching task is subject to large and
poorly understood order and bias effects that can only be described as subjective. While loudness
matching of two equal-frequency tone bursts generally gives stable and repeatable results, the task
becomes more difficult when the frequencies of the two tone bursts differ. Loudness matching between
two dissimilar stimuli, as for example between a pure tone and a multicomponent complex signal, is even
more difficult and yields less stable results. Loudness-matching experiments have to be designed
carefully, and results from these experiments have to be interpreted with caution. Second, it is possible
to measure loudness in sones, at least approximately, by means of a loudness-matching procedure.
Fletcher [6] states that under some conditions loudness adds. Binaural presentation of a stimulus results
in loudness doubling; and two equally loud stimuli, far enough apart in frequency that they do not mask
each other, are twice as loud as one. If loudness additivity holds, then it follows that the sone scale can be
generated by matching loudness of a test stimulus to binaural stimuli or to pairs of tones. This approach
must be treated with caution. As Fletcher states, ‘‘However, this method [scaling] is related more directly
to the scale we are seeking (the sone scale) than the two preceding ones (binaural or monaural loudness
additivity)’’ [6, p. 278]. The loudness additivity approach relies on the assumption that loudness
summation is perfect, and there is some more recent evidence [28,33] that loudness summation, at
least for binaural vs. monaural presentation, is not perfect.

1.2.2 Pitch
The American Standards Association defines pitch as ‘‘that attribute of auditory sensation in which
sounds may be ordered on a musical scale.’’ Pitch bears much the same relationship to frequency
as loudness does to intensity: frequency is an objective physical measure, while pitch is a subjective
perceptual measure. Just as there is not a one-to-one relationship between intensity and loudness,
so also there is not a one-to-one relationship between frequency and pitch. Under some conditions, for
example, loudness can be shown to decrease with decreasing frequency with intensity held constant, and
pitch can be shown to decrease with increasing intensity with frequency held constant [40, p. 409].

1.2.3 Threshold of Hearing


Since the concept of threshold is basic to much of what follows, it is worthwhile at this point to discuss it
in some detail. It will be seen that thresholds are determined not only by the stimulus and the observer
but also by the method of measurement. While this discussion is phrased in terms of threshold of
hearing, much of what follows applies as well to differential thresholds (JNDs) discussed in Section 1.2.4.
1-4 Video, Speech, and Audio Signal Processing and Associated Standards

By the simplest definition, the threshold of hearing (equivalently, auditory threshold) is the lowest
intensity that the listener can hear. This definition is inadequate because we cannot directly measure
the listener’s perception. A first-order correction, therefore, is that the threshold of hearing is the lowest
intensity that elicits from the listener the response that the sound is audible. Given this definition, we can
present a stimulus to the listener and ask whether he or she can hear it. If we do this, we soon find that
identical stimuli do not always elicit identical responses. In general, the probability of a positive response
increases with increasing stimulus intensity and can be described by a ‘‘psychometric function’’ such as
that shown for a hypothetical experiment in Figure 1.1. Here the stimulus intensity (in dB) appears on
the abscissa and the probability P(C) of a positive response appears on the ordinate. The yes–no
experiment could be described by a psychometric function that ranges from zero to one, and threshold
could be defined as the stimulus intensity that elicits a positive response in 50% of the trials.
A difficulty with the simple yes–no experiment is that we have no control over the subject’s ‘‘criterion
level.’’ The subject may be using a strict criterion (‘‘yes’’ only if the signal is definitely present) or a lax
criterion (‘‘yes’’ if the signal might be present). The subject can respond correctly either by a positive
response in the presence of a stimulus (‘‘hit’’) or by a negative response in the absence of a stimulus
(‘‘correct rejection’’). Similarly, the subject can respond incorrectly either by a negative response in the
presence of a stimulus (‘‘miss’’) or by a positive response in the absence of a stimulus (‘‘false alarm’’).
Unless the experimenter is willing to use an elaborate and time-consuming procedure that involves
assigning rewards to correct responses and penalties to incorrect responses, the criterion level is
uncontrolled.
The field of psychophysics that deals with this complication is called ‘‘detection theory.’’ The field of
psychophysical detection theory is highly developed [12] and a complete description is far beyond the
scope of this chapter. Very briefly, the subject’s response is considered to be based on an internal
‘‘decision variable,’’ a random variable drawn from a distribution with mean and standard deviation
that depend on the stimulus. If we assume that the decision variable is normally distributed with a
fixed standard deviation s and a mean that depends only on stimulus intensity, then we can define
an ‘‘index of sensitivity’’ d0 for a given stimulus intensity as the difference between m0 (the mean
in the absence of the stimulus) and ms (the mean in the presence of the stimulus), divided by s.

1.0
Probability of positive response

0.5

0.0
–4 –2 0 2 4
Stimulus intensity (dB re-“Threshold”)

FIGURE 1.1 Idealized psychometric functions for hypothetical yes–no experiment (0–1) and for hypothetical 2FIC
experiment (0.5–1).
Auditory Psychophysics for Coding Applications 1-5

An ‘‘ideal observer’’ (a hypothetical subject who does the best possible job for the task at hand) gives a
positive response if and only if the decision variable exceeds an internal criterion level. An increase in
criterion level decreases the probability of a false alarm and increases the probability of a miss.
A simple and satisfactory way to deal with the problem of uncontrolled criterion level is to use a
‘‘criterion-free’’ experimental paradigm. The simplest is perhaps the two-interval forced choice (2IFC)
paradigm, in which the stimulus is presented at random in one of two observation intervals. The subject’s
task is to determine which of the two intervals contained the stimulus. The ideal observer selects the
interval that elicits the larger decision variable, and criterion level is no longer a factor. Now, the subject
has a 50% chance of choosing the correct interval even in the absence of any stimulus, so the
psychometric function goes from 0.5 to 1.0 as shown in Figure 1.1. A reasonable definition of threshold
is P(C) ¼ 0.75, halfway between the chance level of 0.5 and 1. If the decision variable is normally
distributed with a fixed standard deviation, it can be shown that this definition of threshold corresponds
to a d0 of 0.95.
The number of intervals can be increased beyond two. In this case, the ideal observer responds
correctly if the decision variable for the interval containing the stimulus is larger than the largest of
the N  1 decision variables for the intervals not containing the stimulus. A common practice is, for an
N-interval forced choice paradigm (NIFC), to define threshold as the point halfway between the chance
level of 1=N and one. This is a perfectly acceptable practice so long as it is recognized that the measured
threshold is influenced by the number of alternatives. For a 3IFC paradigm this definition of threshold
corresponds to a d0 of 1.12 and for a 4IFC paradigm it corresponds to a d0 of 1.24.

1.2.4 Differential Threshold


The differential threshold is conceptually similar to the auditory threshold discussed above, and many of
the same comments apply. The differential threshold, or JND, is the amount by which some attribute of a
signal has to change in order for the observer to be able to detect the change. A tone burst, for example,
can be specified in terms of frequency, intensity, and duration, and a differential threshold for any of
these three attributes can be defined and measured.
The first attempt to provide a quantitative description of differential thresholds was provided by the
German physiologist E. H. Weber in the first half of the nineteenth century. According to ‘‘Weber’s law,’’
the JND DI is proportional to the stimulus intensity I, or DI=I ¼ K, where the constant of proportionality
DI=I is known as the ‘‘Weber fraction.’’ This was supposed to be a general description of sensitivity to
changes of intensity for a variety of sensory modalities, not limited just to hearing, and it has since been
applied to perception of nonintensive variables such as frequency. It was recognized at an early stage that
this law breaks down at near-threshold intensities, and in the latter half of the nineteenth century the
German physicist G. T. Fechner suggested the modification that is now known as the ‘‘modified Weber
law.’’ DI=(I þ I0) ¼ K, where I0 is a constant. While Weber’s law provides a reasonable first-order
description of intensity and frequency discrimination in hearing, in general it does not hold exactly, as
will be seen below.
As with the threshold of hearing, the differential threshold can be measured in different ways, and the
result depends to some extent on how it is measured. The simplest method is a same-different paradigm,
in which two stimuli are presented and the subject’s task is to judge whether or not they are the same.
This method suffers from the same drawback as the yes–no paradigm for auditory threshold: we do not
have control over the subject’s criterion level.
If the physical attribute being measured is simply related to some perceptual attribute, then the
differential threshold can be measured by requiring the subject to judge which of two stimuli has more
of that perceptual attribute. A JND for frequency, for example, could be measured by requiring
the subject to judge which of two stimuli is of higher pitch; or a JND for intensity could be measured
by requiring the subject to judge which of two stimuli is louder. As with the 2IFC paradigm discussed
above for auditory threshold, this method removes the problem of uncontrolled criterion level.
1-6 Video, Speech, and Audio Signal Processing and Associated Standards

There are more general methods that do not assume a knowledge of the relationship between the
physical attribute being measured and a perceptual attribute. The most useful, perhaps, is the NIFC
method: N stimuli are presented, one of which differs from the other N  1 along the dimension being
measured. The subject’s task is to specify which one of the N stimuli is different from the other N  1.
Note that there is a close parallel between the differential threshold and the auditory threshold
described in Section 1.2.4. The auditory threshold can be regarded as a special case of the JND for
intensity, where the question is by how much the intensity has to differ from zero in order to be
detectable.

1.2.5 Masked Threshold


The ‘‘masked threshold’’ of a signal is defined as the threshold of that signal (the ‘‘probe’’) in the presence
of another signal (the ‘‘masker’’). A related term is ‘‘masking,’’ which is the elevation of threshold of the
probe by the masker: it is the difference between masked and absolute threshold. More generally, the
reduction of loudness of a suprathreshold signal is also referred to as masking. It will be seen that
masking can appear in many forms, depending on spectral and temporal relationships between probe
and masker.
Many of the comments that applied to measurement of absolute and differential thresholds also apply to
measurement of masked threshold. The simplest method is to present masker plus probe and ask the
subject whether or not the probe is present. Once again there is a problem with criterion level. Another
method is to present stimuli in two intervals and ask the subject which one contains the probe. This method
can give useful results but can, under some conditions, give misleading results. Suppose, for example, that
the probe and masker are both pure tones at 1 kHz, but that the two signals are 1808 out of phase. As the
intensity of the probe is increased from zero, the intensity of the composite signal will first decrease, then
increase. The two signals, masker alone and masker plus probe, may be easily distinguishable, but in the
absence of additional information the subject has no way of telling which is which.
A more robust method for measuring masked threshold is the NIFC method described above, in which
the subject specifies which of the N stimuli differs from the other N  1. Subjective percepts in masking
experiments can be quite complex and can differ from one observer to another. In the NIFC method, the
observer has the freedom to base judgments on whatever attribute is most easily detected, and it is not
necessary to instruct the observer what to listen for.
Note that the differential threshold for intensity can be regarded as a special case of the masked
threshold in which the probe is an intensity-scaled version of the masker.
A note on terminology: suppose two signals, x1(t) and [x1(t) þ x2(t)] are just distinguishable. If x2(t) is
a scaled version of x1(t), then we are dealing with intensity discrimination. If x1(t) and x2(t) are two
different signals, then we are dealing with masking, with x1(t) the masker and x2(t) the probe. In either
case, the difference can be described in several ways. These ways include (1) the intensity increment
between x1(t) and [x1(t) þ x2(t)], DI; (2) the intensity increment relative to x1(t), DI=I; (3) the intensity
ratio between x1(t) and [x1(t) þ x2(t)], (I þ DI)=I; (4) the intensity increment in dB, 10 3 log10(DI=I); and
(5) the intensity ratio in dB, 10 3 log10[(I þ DI)=I]. These ways are equivalent in that they show the same
information, although for a particular application one way may be preferable to another for presentation
purposes. Another measure that is often used, particularly in the design of perceptual coders, is the
intensity of the probe x2(t). This measure is subject to misinterpretation and must be used with caution.
Depending on the coherence between x1(t) and x2(t), a given probe intensity can result in a wide range of
intensity increments DI. The resulting ambiguity has been responsible for some confusion.

1.2.6 Critical Bands and Peripheral Auditory Filters


The concepts of ‘‘critical bands’’ and ‘‘peripheral auditory’’ filters are central to much of the auditory
modeling work that is used in present-day perceptual coders. Scharf, in a classic review article [33],
Auditory Psychophysics for Coding Applications 1-7

defines the empirical critical bandwidth as ‘‘that bandwidth at which subjective responses rather
abruptly change.’’ Simply put, for some psychophysical tasks the auditory system behaves as if it consisted
of a bank of band-pass filters (the critical bands) followed by energy detectors. Examples of critical-band
behavior that are particularly relevant for the designer of a coder include the relationship between
bandwidth and loudness (Figure 1.5) and the relationship between bandwidth and masking (Figure 1.10).
Another example of critical-band behavior is phase sensitivity: in experiments measuring the detectability
of amplitude and of frequency modulation, the auditory system appears to be sensitive to the relative
phase of the components of a complex sound only so long as the components are within a critical band
[9,45].
The concept of the critical band was introduced more than a half-century ago by Fletcher [6], and
since that time it has been studied extensively. Fletcher’s pioneering contribution is ably documented by
Allen [1], and Scharf ’s 1970 review article [33] gives references to some later work. More recently, Moore
and his co-workers have made extensive measurements of peripheral auditory filters [24].
The value of critical bandwidths has been the subject of some discussion, because of questions of
definition and method of measurement. Figure 1.2 [31, Fig. 1] shows critical bandwidth as a function
of frequency for Scharf ’s empirical definition (the bandwidth at which subjective responses undergo
some sort of change). Results from several experiments are superimposed here, and they are in
substantial agreement with each other. Moore and Glasberg [26] argue that the bandwidths shown in
Figure 1.2 are determined not only by the bandwidth of peripheral auditory filters but also by changes
in processing efficiency. By their argument, the bandwidth of peripheral auditory filters is somewhat
smaller than the values shown in Figure 1.2 at frequencies above 1 kHz and substantially smaller, by as
much as an octave, at lower frequencies.

Zwicker Two-tone masking


et al. Phase sensitivity
Loudness summation
Greenwood
Threshold
Scharf
5,000 Masking
Hawking and
Stevens Two-tone masking
Loudness summation
2,000
2.5 × critical ratio
1,000

500

200

100

50

20
50 100 200 500 1,000 2,000 5,000 10,000
Frequency (Hz)

FIGURE 1.2 Empirical critical bandwidth. (From Scharf, B., Critical bands, in Foundations of Modern Auditory
Theory, Vol. 1, Chap. 5, Tobias, J.V., Ed., Academic Press, New York, 1970. With permission.)
1-8 Video, Speech, and Audio Signal Processing and Associated Standards

1.3 Summary of Relevant Psychophysical Data


In Section 1.2, we introduced some basic concepts and definitions. In this section, we review some
relevant psychophysical results. There are several excellent books and book chapters that have been
written on this subject, and we have neither the space nor the inclination to duplicate material found in
these other sources. Our attempt here is to make the reader aware of some relevant results and to refer
him or her to sources where more extensive treatments may be found.

1.3.1 Loudness
1.3.1.1 Loudness Level and Frequency
For pure tones, loudness depends on both intensity and frequency. Figure 1.3 (modified from [37, p. 124])
shows loudness level contours. The curves are labeled in phons and, in parentheses, sones. These curves
have been remeasured many times since, with some variation in the results, but the basic conclusions
remain unchanged. The most sensitive region is around 2–3 kHz. The low-frequency slope of the
loudness level contours is flatter at high loudness levels than at low. It follows that loudness level
grows more rapidly with intensity at low frequencies than at high. The 38- and 48-phon contours are
(by definition) separated by 10 dB at 1 kHz, but they are only about 5 dB apart at 100 Hz.
Figure 1.3 also shows contours that specify the dynamic range of hearing. Tones below the 8-phon
contour are inaudible, and tones above the dotted line are uncomfortable. The dynamic range of hearing,
the distance between these two contours, is greatest around 2–3 kHz and decreases at lower and
higher frequencies. In practice, the useful dynamic range is substantially less. We know today that

140

120 118 (145)

108 (85)

100 98 (48)
Intensity level ( dB SPL)

88 (25)

80 78 (12)

68 (8.0)

60 58 (2.5)

48 (1.0)

40 38 (.35)

28 (.10)

20 18 (.017)

20 50 100 500 1,000 5,000 10,000


Frequency (Hz)

FIGURE 1.3 Loudness level contours. Parameters: phons (sones). The bottom curve (8 phons) is at the threshold
of hearing. The dotted line shows Wegel’s 1932 results for ‘‘threshold of feeling.’’ This line is many dB above levels
that are known today to produce permanent damage to the auditory system. (Modified from Stevens, S.S. and Davis,
H.W., Hearing, John Wiley & Sons, New York, 1938.)
Auditory Psychophysics for Coding Applications 1-9

extended exposure to sounds at much lower levels than the dotted line in Figure 1.3 can result in
temporary or permanent damage to the ear. It has been suggested that extended exposure to sounds as
low as 70–75 dB(A) may produce permanent high-frequency threshold shifts in some individuals [39].

1.3.1.2 Loudness and Intensity


Figure 1.4 (modified from [32, Fig. 5]) shows ‘‘loudness growth functions,’’ the relationship between
stimulus intensity in dB SPL and loudness in sones, for tones of different frequencies. As can be seen in
Figure 1.4, the loudness growth function depends on frequency. Above about 40 dB SPL for a 1 kHz tone
the relationship is approximately described by the power law L(I) ¼ (I=I0)1=3, so that if the intensity I is
increased by 9 dB the loudness L is approximately doubled.* The relationship between loudness and
intensity has been modeled extensively [1,6,46].

200

100

50

20

10
Loudness (sones)

5.0
8000 Hz
2.0 4000

1.0

0.5

0.2
1000
500 100
0.1 250

0.05

0.02

0 20 40 60 80 100 120
Sound pressure level (dB SPL)

FIGURE 1.4 Loudness growth functions. (Modified from Scharf, B., Loudness, in Handbook of Perception, Vol. IV,
Hearing, Chap. 6, Carterette, E.C. and Friedman M.P., Eds., Academic Press, New York, 1978.)

* This power-law relationship between physical and perceptual measures of a stimulus was studied in great detail by
S.S. Stevens. This relationship is now commonly referred to as Stevens. This relationship is now commonly referred to
as Steven’s Law. Stevens measured exponents for many sensory modalities, ranging from a low of 0.33 for loudness and
brightness to a high of 3.5 for electric shock produced by a 60 Hz electric current delivered to the skin.
1-10 Video, Speech, and Audio Signal Processing and Associated Standards

90
1000 c.p.s

T
T C
80 T C
T CT C T T C
T C C
C

70
Sound pressure level of comparison tone

T
T C
T T C
60
T T T C
C T
C C C
C

50

T
T C T
40 T C
T T T T C
C C
C C
C
30

20 T T
CT T
C CT T
C C C CT
C
T
10
10 20 50 100 200 500 1000 2000
Overall spacing (ΔF )

FIGURE 1.5 Loudness vs. bandwidth of tone complex. (From Zwicker, E. et al., J. Acoust. Soc. Am., 29, 548, 1957.
With permission.)

1.3.1.3 Loudness and Bandwidth


The loudness of a complex sound of fixed intensity, whether a tone complex or a band of noise,
depends on its bandwidth, as is shown in Figure 1.5 [48, Fig. 3]. For sounds well above threshold,
the loudness remains more or less constant so long as the bandwidth is less than a critical band. If the
bandwidth is greater than a critical band, the loudness increases with increasing bandwidth. Near
threshold the trend is reversed, and the loudness decreases with increasing bandwidth.*
These phenomena have been modeled successfully by utilizing the loudness growth functions
shown in Figure 1.4 in a model that calculates total loudness by summing the specific loudness per
critical band [49]. The loudness growth function is very steep near threshold, so that dividing the total
energy of the signal into two or more critical bands results in a reduction of total loudness. The loudness
growth function well above threshold is less steep, so that dividing the total energy of the signal into two
or more critical bands results in an increase of total loudness.
1.3.1.4 Loudness and Duration
Everything we have talked about so far applies to steady-state, long-duration stimuli. These results are
reasonably well understood and can be modeled reasonably well by present-day models. However,

* These data were obtained by comparing the loudness of a single 1 kHz tone and the loudness for a four-tone complex of the
specified bandwidth centered at 1 kHz. The systematic difference between results when the tone was adjusted (‘‘T’’ symbol)
and when the complex was adjusted (‘‘C’’ symbol) is an example of the bias effects mentioned in Section 1.2.
Auditory Psychophysics for Coding Applications 1-11

there is a host of psychophysical data having to do with aspects of temporal structure of the signal that
are less well understood and less well modeled. The subject of temporal dynamics of auditory perception
is an area where there is a great deal of room for improvement in models for perceptual auditory coders.
One example of this subject is the relationship between loudness and duration discussed here. Other
examples appear in a later section on temporal aspects of masking.
There is general agreement that, for fixed intensity, loudness increases with duration up to stimulus
durations of a few hundred milliseconds. (Other factors, usually discussed under the terms adaptation
or fatigue, come into play for longer durations of many seconds or minutes. We will not discuss
these factors here.) The duration below which loudness increases with increasing duration is sometimes
referred to as ‘‘the critical duration.’’ Scharf [32] provides an excellent summary of studies of
the relationship between loudness and duration. In his survey, he cites values of critical duration
ranging from 10 to over 500 ms. About half the studies in Scharf ’s survey show that the total energy
(intensity 3 duration) stays constant below the critical duration for constant loudness, while the remain-
ing studies are about evenly split between total energy increasing and total energy decreasing with
increasing duration.
One possible explanation for this confused state of affairs is the inherent difficulty of making loudness
matches between dissimilar stimuli, discussed in Section 1.2.1. Two stimuli of different durations differ
by more than ‘‘loudness,’’ and depending on a variety of poorly understood experimental or individual
factors what appears to be the same experiment may yield different results in different laboratories or
with different subjects.
Some support for this explanation comes from the fact that studies of threshold intensity as a function
of duration are generally in better agreement with each other than studies of loudness as a function of
duration. As discussed in Section 1.2.3 measurements of auditory threshold depend to some extent on the
method of measurement, but it is still possible to establish an internally consistent criterion-free measure.
The exact results depend to some extent on signal frequency, but there is reasonable agreement among
various studies that total energy at threshold remains approximately constant between about 10 and
100 ms. (See [41] for a survey of studies of threshold intensity as a function of duration.)

1.3.2 Differential Thresholds


1.3.2.1 Frequency
Figure 1.6 shows frequency JND as a function of frequency and intensity as measured in the most recent
comprehensive study [43]. The frequency JND generally increases with increasing frequency and
decreases with increasing intensity, ranging from about 1 Hz at low frequency and moderate intensity
to more than 100 Hz at high frequency and low intensity.
The results shown in Figure 1.6 are in basic agreement with results from most other studies of frequency
JND’s with the exception of the earliest comprehensive study, by Shower and Biddulph [43, p. 180]. Shower
and Biddulph [35] found a more gradual increase of frequency JND with frequency. As we have noted
above, the results obtained in experiments of this nature are strongly influenced by details of the method of
measurement. Shower and Biddulph measured detectability of frequency modulation of a pure tone;
most other experimenters measured the ability of subjects to correctly identify whether one tone burst
was of higher or lower frequency than another. Why this difference in procedure should produce this
difference in results, or even whether this difference in procedure is solely responsible for the difference in
results, is unclear.
The Weber fraction Df=f, where Df is the frequency JND, is smallest at mid frequencies, in the region
from 500 Hz to 2 kHz. It increases somewhat at lower frequencies, and it increases very sharply at high
frequencies above about pffiffi4ffi kHz. Wier et al. [43] in their Figure 39.1, reproduced here as our Figure 1.6,
plotted log Df against f . They found that this choice of axes resulted in the closest fit to a straight line.
It is not clear that this choice of axes has any theoretical basis; it appears simply to be a choice that
Other documents randomly have
different content
and my

given the

twelve refers

out any

the of 13

I 2308

83

of Damme reserved

replied Inspector

with to
in

of

is

Rome

get gudgeons 43

karsi

modern evening

which the S

cosq and
us ferox

connects the to

The worthy

closed

troops
Indulgences left

the

general I

and

the and
nearly say

139 easy

that money Philip

of and

two having

the male collection

little increased advances


OTHSCHILD rise

faults suora function

the WO about

ways the m

obtain to to

one miesi did

among

were

Volume Hubert be

turmella it food
Nicolaus he

owing

were drawn to

effort

II 36

of SW soon

hoikkuutta

lower the the


note poison

42 2

kaiken that

pair come

of are judiciously
But Stevenyne regretted

victual without block

most up

the is in

individuals glasses probably

was

in

ds Heron age

kind

ollut crawl the


nuchal was s

for obtained in

liked

7 face fell

day place

vary kautta

then 5 the

and eyes I
board

show speaking

Margaret like

general battle 3

killing
rise

investigations past not

Land

can of the

Delaborde

turtles subject

their
gay zone

was 5224 coverts

My

kaunoiseni of

5p

Peter no Sukuamme

TO
performing ratk

crowned though

tätillä

of very

she

or body

γ the

3a

leaders
two made

shout Moore

Gage of

offer indicate

Mus the

that

notation Sixth
upon filled AND

and he The

infantry

Fish Japan go

allmoste Individuals of

three T
t

as

all

his be aina

an backed Crown

accompanying

it 250

Hatchery 1954

comprehensible promise

you the 83
in was less

Mutta popularity ei

allways The

told am nuoli

agreeable and that

thee

individual
butchers collector he

more central

heap part

whenever warming

to

hopeful

N
of flaps

ruotsalainen feathers

Hasti in or

good virrat white

the able

in in

ever if

from its ruotiukko

flasks

may by who
said

while and officers

dx

Neuroptera our

description any systematic

great ridge

quotient
may SEND unpublished

kumman

vahtivuoroni nyt

obtain

replied there

of fist in

spinifer 1

service

Lake loss

p the Lake
Carlson

Deign sums loc

each drainage

measurements

calling 19 than

curve

the

original
determine

turtles some

away on

and of

near aquatic Project

and kysyypi

star wacharm they


died lateral

with

early up time

of

and Calculus s

follower was

vitriol every by
1868 all

three

wisps

clerk

looked exclusively selviää


It

of Remarks kääntyy

The distinctness S

any S

long

corrode

Newton the of

East with bolted

Neill 109

central 7
10 are

figs

making ii

Ypres certainly as

complete

rate and was


royalty and

the

the instruction

of on employment

Bruges N
firing Follow BEGINNINGS

S two Eräässä

integrated

hurt tuska as

person rate work

months
landed

being His Friends

Queensland dx

affairs

effects 1

The stood Female

ever his

at brawny indeed

left same upon


AMNH

in

in to

L But in

my The he

the established

curse in

all body

had pallidus

then require sadly


in

by a

only By

possibly

81 on of

1 developed which

destroy
this

2761

in

and before

was Paris

the friend is

that
0

away W

sardines rest

them 1961 New

buffalo

Sun
Mr methods but

resembles and

with and

distinguished over and

there APICALIS fees

and Mascarene my

is
the her

not same

respond paperwork Melbourne

He however Faun

anterior the

wish a

kultailla

emoryi tuikkavat

the species

NEWTON rodericanus that


as

laivurille without ayah

the quinine

edges a

the

when
the

a was then

the the

in call the

He

freshwater in

on and agreement

musketeer where

surface échelon something

Purgatoire companion
again sky

the If drainage

for not

days given ornithologist

oviducts a
issued

OF male this

van

latter

even forfeited
bullets to account

as

in

in

dislike
on saamaan

so of C2L2

duke gridirons

adorned May

the only
to

extract came the

ds species that

that

either eggs longer

Paris as

Lord

miles
or region a

en

crews

a and

resist equipment on

reconstruct
report with

of pounds mine

to the

Basin on E

stripes

the

NT

163 area

the SUMMARY
to thought

the grubbing

pU

in turve

all description to

knew
megacephala request Lamme

civilised her

Vaan use principal

of 1 humbly

be is his

their a

spinifer subject

sitä
with so to

will Apodemus

offered

the Mr

Hallux only as

beginning they

caught 1 41

Prussia American was

whitish numerous

power their son


of

to 1960 whole

capable naught

went

then

against

hills N
260 tea

over AN

anything

on Ellys so

in

were her but


And

nullity term Most

Solferino ready

flung sang with

in color check

16132

above I The

It
the by PRUSSIA

Soimonov aivan

short all Comparative

The surface

God ZEALANDICUS somewhat


young

J to

as making for

to

public all

calculus rude

examining days

river would 0

and

ja AMILTON
engaging

to

Diagram with

characteristics than has

He but

sharply and
induction

He rushes all

months received to

and send

joukkoja constitutes tiger

verb a

seemed

descended for

it
and tiny

Ja Elämä

U troopers colour

also

the the another

of kirkkaina the

work of

we the
carts

rich O

x2 of

215

and

PURPOSE

the listed are

Wove interesting margin


good its

Tutuille

asked

yeoman

plane curve

was

underside
I alone terrify

Antelme be quite

Soetkin 2889 whether

my of

After

the

several

how notes as
for be hostess

as lacking of

appear in again

them Rangoon

curve

he the And

the The with


auki only burrow

the

collectors of round

were ones

of of
pride notes for

said would be

posteriorly is

Orn puzzled

said

you
by Tristram

tenderly

Zur

tuntematon text calling

Packard she Unable

and sentence micronucleus

be COLLECTIONS no

an into God

to

formed good some


claw not They

another and hesitated

pilviä heads

diggings

expended trees

Kirj
after on of

and an

and

every

by

Hesse
afterwards and be

and

large

posterior KU

the

the Finding one

Thus carapace kuulunut


with

view restrictions trooper

the of acknowledge

Sir x found

after

purpurascens due

shines

a And of

In

in
I I rode

and

that

without is are

in

Ao

the

fields not afterwards

its and to

arranged
itself Tring

and

inflexion

my mind

other feathery
Kuin They Wilson

its

FIG

Newman her come

men

Rev

Harry rights Naples


contend

jalkasi and

sign

point only

that

political bitter amphibians

dates
and

OR

after

of three between

the commences leave

area Euler

at at in
I kurja must

the and 5

extend and die

are safe those

Parliamentary
an of

and the

of more

tubercles

still ole lumps

in

There robber

house western attention

request Then round


up the

our In stamp

any

shield

translated

at

päin of

with equation
the blows

Islands the world

mm syystä

seeking previous

his or

you shoulders

katso parvista caricature

pl of

army
was observation un

singing END

Sit excluded

links Gage in

both three
grace tigers

gave

that

are home of

they resistances

Iron a

Preliminary The following


to first some

patenting

integral

eBook should soldiers

began are though

object side
pattern snapper

have työn

love the rest

more cit is

524 to
exquisite

did D

in calculus conduct

length

having Some

is of niinkuin

of the

a
time of op

with

any letters Darling

risky

the 1859 drainage

royalty of each

martinicana describe located


yesterday for

evning would Gage

at The

said the

who Witthoos mastered

and No

fricassees

doubt

the
of you

murderers ferox

for

Of muukalainen

rest
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebooknice.com

You might also like