0% found this document useful (0 votes)

66 views13 pages

A Practical Handbook of Speech Coders

CHAPxc12

Uploaded by

Vany Braun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views13 pages

A Practical Handbook of Speech Coders

CHAPxc12

Uploaded by

Vany Braun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Goldberg, R. G.

"Perceptual Speech Coding"

A Practical Handbook of Speech Coders
Ed. Randy Goldberg
Boca Raton: CRC Press LLC, 2000

© 2000 CRC Press LLC

Chapter 12
Perceptual Speech Coding

The goal of perceptual coding is to reduce the size of the signal represen-
tation while maintaining the perceived sound quality by exploiting the
limits of human auditory perception. The exploitable limits of auditory
perception stem from the frequency and temporal masking discussed in
Chapter 6.
This chapter presents an overview of perceptual speech coding. Fre-
quency and temporal masking are considered together to determine
which signal components are not perceptible. The impact of the sound
quality (tone or noise) of the maskee and masker is discussed. Because
of the limited time and frequency resolution of standard frequency-
domain transforms (discrete Fourier transform), the Multi-Band Exci-
tation (MBE) speech model is shown to have advantages for perceptual
coding. The last section lists a sampling of the current research in per-
ceptual speech coding.
While this chapter discusses perceptual speech coding schemes, these
approaches are not as dominant as perceptual approaches for general
wideband audio coding. To date, the highest quality, lowest rate speech
coders are of the type described in Chapter 11. However, the progress of
future research holds the promise of perceptual coding gains for speech.

12.1 Auditory Processing of Speech

Section 6.4 discussed monaural masking. One sound can mask an-
other simultaneous, lower amplitude sound when the two are close in
frequency. This is referred to as "simultaneous masking in frequency."

 2000 CRC Press LLC

When two sounds occur at nearly the same time, the lower level signal
can be masked by the stronger signal in the phenomena of "temporal
masking." The challenge of perceptual coding is how to determine which
sounds mask which other sounds in a complex, rapidly varying speech
signal.

12.1.1 General Perceptual Speech Coder

Most of the algorithm processing steps of a perceptual speech coder are

similar to those of conventional speech coders. The primary difference
is the determination and deletion of signal components that are not
perceptible.

FIGURE 12.1
General perceptual speech coder.

Figure 12.1 displays a block diagram of a general perceptual speech

coder. The input speech is analyzed, yielding a short-term spectral rep-
resentation of the vocal tract and excitation information. These para-

 2000 CRC Press LLC

meterizations are transformed by an auditory analysis into a perceptual
representation. In the perceptual representation, the frequency scale is
warped to a nonlinear scale based on critical bands (see Chapter 6).
Within the perceptual domain, masking and masked signal compo-
nents are determined. The masked components, which are not percep-
tible, are deleted from the representation or marked to be coded more
coarsely, that is, with fewer bits. The reduced perceptual representation
results in a lower bit rate due to the reduced number of parameters, but
is used to synthesize output speech of the same perceived quality as the
complete representation. Determining the particulars of the masking is
discussed in more detail in the following sections.
As with any speech encoding/decoding system, the decoder merely
reverses the operations to synthesize the output speech.

12.1.2. Frequency and Temporal Masking

It is well known that simultaneous masking in frequency is more

prominent when the masker is lower in frequency than the maskee. Re-
ferring back to Figure 6.4, the plotted threshold of detectability is much
lower for frequencies below the masker, than for frequencies above.
This observation suggests an efficient method to determine which com-
ponents are masked:

1. Transform each short time segment of speech into the frequency

domain.

2. Segment frequency domain representation into logarithmically

spaced frequency bands (constant number of barks per frequency
segment).

3. Calculate the total energy in the lowest band.

4. Determine the threshold of detectability within this critical band

and in the higher frequency critical bands.

5. Code only frequency information above the threshold level.

6. Continue threshold calculation/coding process for the next higher

critical band.

7. Repeat steps 3 through 6 until all critical bands are coded.

Although more complex, this method could be extended to include

masking regions where the maskee is in a lower frequency critical band

 2000 CRC Press LLC

FIGURE 12.2
Island of perceptually significant signal and resulting area of
masking.

than the masker. The previously described method is highly efficient;

however, it does not take full advantage of the properties of simultaneous
masking. The method calculates a saturation threshold level for each
critical band, but it does not take into account spectral changes within
each critical band. The described method does not consider the effect
of temporal masking across different frames.

Simultaneous frequency and temporal masking suggest that

substantial economies in coding can be gained by spectral analysis
to determine "islands" of perceptually significant signals in the
time/frequency/intensity dimensional representation. Figure 12.2 shows
an intense complex signal surrounded in the time/frequency domain by
a box which represents the signals that would be masked in the presence
of this complex signal. These complex signals appear as high intensity
"islands" in a typical spectrogram. The majority of the available coding
capacity can then be assigned to accurately represent these islands and
a minimum assigned to regions masked by these islands.

 2000 CRC Press LLC

12.1.3 Determining Masking Levels

In Figure 12.1, an auditory analysis of the speech parameters is per-

formed on each frame of speech. The auditory analysis transforms the
signal representation into the perceptual representation. The high in-
tensity regions in the time-frequency plane mask (either somewhat or
completely) some of the less intense regions as in Figure 12.2. This
masking causes the threshold of detectability of the maskee to be in-
creased. If the threshold of detectability of a region is greater than the
intensity of that region, then the portion of the signal denoted by that
region is not audible to human hearing. These values are calculated
by comparing all the regions to each other and determining how much
the threshold of detectability is raised for each time-frequency region.
Psycho-acoustic data such as those represented in Figures 6.4 and 6.5
are used in the calculations of these values.
Figure 12.3 is a 3-dimensional representation of the union of a partic-
ular set of simultaneous and temporal masking data. The time scale is
the time difference between the masker and maskee, and the frequency
scale is the frequency difference between the two. The peak of the sur-
face is the origin, where the relative time, relative frequency, and relative
amplitude are all zero.
This graphical representation can lend insight into the workings of
perceptual speech coding. A time/frequency/magnitude representation
of a speech utterance can be displayed as a 3-dimensional surface. This
is a 3-D representation of the spectrograms of Chapter 2, where the
amplitude, displayed as shades of gray in the spectrograms, is now the
vertical height of the surface.
Figure 12.4 displays this data representation for a half-second segment
of speech for the frequency range of 0 to 1000 Hz. This representation
can be visualized as a mountainous landscape. High elevation areas cor-
respond to high amplitude signals regions located at particular time and
frequency coordinates. The ridges running across time, of nearly con-
stant frequency, are the pitch harmonics. (The same pitch harmonics
appear as dark bands in the spectrograms of Chapter 2.) If the moun-
tainous speech landscape is divided up into segments, the time divisions
correspond to different analysis frames, and the frequency divisions cor-
respond to dividing the spectrum into critical bands.
Visualize a copy of Figure 12.3 (appropriate for the frequency of the
masker) placed at the time/frequency coordinate of the masker under
consideration in the speech landscape of Figure 12.4. The surface of
Figure 12.3 will be below the surface of the speech landscape of Figure

 2000 CRC Press LLC

FIGURE 12.3
Psycho-acoustic masking data, both temporal and frequency.

12.4 at some places, and above at others. Figure 12.3 represents the
threshold of detectability. When the surface of the speech landscape is
below, those sounds cannot be heard. When the surface of the speech
landscape is above, those sounds can be heard, relative to the masker
under consideration.

This process is repeated for all time/frequency coordinates of the

speech landscape, with the appropriate masking surfaces, to determine
which sounds are masked by which others.

 2000 CRC Press LLC

FIGURE 12.4
Time/frequency/magnitude representation of a segment of
speech.

12.2 Perceptual Coding Considerations

The discussion of the previous section describes in a conceptual man-
ner the application of simultaneous and temporal masking data to de-
termine which signal components are not perceptible. Two other spe-
cific considerations impact practical application of masking in percep-
tual coding. Standard frequency domain transformations include limits
on their time/frequency resolution (size of the x−y grid spacing on the
speech landscape). Additionally, masking data sets (the surface of Fig-
ure 12.3) differ, depending on whether the masker is tone-like or noise-
like, and whether the maskee is tone-like or noise-like.

 2000 CRC Press LLC

12.2.1 Limits on Time/Frequency Resolution

Wideband perceptual coding (bandwidth of approximately 20 kHz) is

used in audio coding in standards such as MPEG-2 and MPEG-4. The
duration of the analysis window used in these wideband coding tech-
niques is around 10 ms [82, 12, 83]. The reciprocal relation between
time and frequency dictates that a frequency resolution of only 100 Hz
can be obtained using standard frequency-domain transforms (discrete
Fourier transform) for this time resolution (10 ms). Here, the frequency
resolution refers to the frequency spacing between samples of the DFT.
Given this frequency resolution, it is possible to locate regions of si-
multaneous masking in frequency at high frequencies (i.e., above 5kHz)
using the type of data in Figure 6.4. Significant economies in coding
can be achieved with frequency domain analysis for wideband audio sig-
nals. However, for the lower frequency regions of signals, a much higher
frequency resolution is required to exploit the masking properties of the
human auditory system. This results from the much narrower frequency
spacing of the critical bands at low frequencies.
For temporal masking analysis, a time length as short as 10 ms is
useful to take advantage of the qualities of both forward and backward
masking. This time resolution is crucial because of the rapid drop off
of the amount of masking with time (see Figure 6.5). Longer analysis
windows would blend together separate sounds. However, as described,
this frequency resolution (100 Hz) is not sufficient to separate the low
frequency critical bands for simultaneous masking. To fully exploit the
properties of both simultaneous masking and temporal masking, it is
necessary to bypass the constraints imposed by the reciprocal relation
between time and frequency. Section 12.2.3 suggests a method to circum-
vent these limitations by utilizing information about the human vocal
system.

12.2.2 Sound Quality of Signal Components

Psycho-acoustic experimentation on the auditory system [45, 141, 139,

176, 31] has revealed that a tone masked by a broad band of noise is
different from a broad band of noise masked by a broad band of noise, a
tone masked by a tone, or a broad band of noise masked by a tone. (It has
been shown that a narrow band of noise has similar masking properties
as a pure tone [59, 60, 61].) This is because the notches in the plot of
Figure 6.4 occur only during tone-on-tone and broadband-noise-on-tone
masking. These notches are not present for tone-on-broadband-noise or

 2000 CRC Press LLC

broadband-noise-on-broadband-noise masking [81, 60]. "The notch has
been shown to be caused by the detection on the lower frequency side of
the masker, of the combination tones that are produced by the addition
of the masker and the signal" [60].

For perceptual coding, it is important to know the characteristics of

the signal. With simple signal processing techniques, the speech spec-
trum of the short-time analysis window is segmented into discrete fre-
quency bands. Each subband is classified as either noise-like or tone-like
so it can be determined which type of masking occurs on each subband
of the signal.

12.2.3 MBE Model for Perceptual Coding

The Multi Band Excitation (MBE) speech model, discussed in Sec-

tion 11.1, provides an approach to handle the considerations of the two
previous sections: tone-like or noise-like signal component classification
and limits on time/frequency resolution.

The MBE model divides the frequency spectrum into frequency bins
centered at harmonics of the pitch frequency of the speech signal. The
analysis classifies each frequency bin as voiced or unvoiced. Voiced bins
are characterized by a pitch harmonic (tone) located at that frequency.
Unvoiced bins are characterized by a band of white noise across the
frequency bin. This provides an inherent tone-like versus noise-like clas-
sification of signal components in the MBE analysis. This classification
can be used to select the appropriate masking data for perceptual cod-
ing, based on the sound qualities of the masker and maskee.

By assuming that speech follows the basic properties of the MBE

speech model, the complex magnitudes of the speech spectra at harmon-
ics of the pitch frequency, and the associated voiced/unvoiced decisions,
determine the speech spectra completely. Based on the MBE speech
analysis, the temporal resolution of the signal corresponds to the frame
rate. Considering psycho-acoustic frequency and temporal masking data
and critical bands, a 10 ms temporal resolution and 25 Hz frequency res-
olution are required to sufficiently determine the masked regions of the
signal. The 10 ms temporal resolution is required in order to utilize the
strongest aspects of temporal masking, when the maskee and masker are
close in time (see Figure 6.5). Because the critical bands in frequency
region below 800 Hz are less than 75 Hz, a 25 Hz frequency resolution
is needed to ensure at least two frequency bins in most critical bands.

 2000 CRC Press LLC

MBE Analysis/Synthesis with Masking

The masking approach described in Section 12.1.3 was applied to

speech data analyzed and synthesized using the MBE model without
quantization [56, 57]. The voiced/unvoiced classification of frequency
bins was used to select appropriate masking data.
Degradation mean opinion score (DMOS) (see Section 8.2.2) listen-
ing tests were performed to rate the relative quality of two processing
schemes. The first scheme analyzed and synthesized the speech data us-
ing the MBE model without quantization or other altering of the model
parameters. This data was used as the reference. The second processing
algorithm eliminated specific spectral magnitude information as guided
by the masking thresholds.
The perceptual processing was designed to yield perceptual quality
measurements equal to those of the unaltered MBE parameters. This
indicates that the additional auditory processing is functioning trans-
parently, by not adding perceptible degradations.
Figure 12.5 displays the results of the listening tests. A DMOS score
of 4 was obtained when the threshold was held at 10dB. A score of 4
indicates no degradation. Although the speech is no longer coded trans-
parently, it is interesting to note that as the threshold level is raised, less
perceptually significant information is removed. This is a good technique
to use to lower the bit rate of the coder.

12.3 Research in Perceptual Speech Coding

Researchers are actively investigating the field of perceptual speech
coding. Current efforts are being directed at two primary concepts:
transforming the speech signal into a perceptual representation and dis-
tributing quantization noise below masking thresholds. The two are
related, and both are necessary to improve coder quality through per-
ceptual considerations.
Johnston [82, 12] was the first to use masking criteria to distribute
quantization bits in wideband audio coding. He calculated the percep-
tual significance of each frequency band in an audio signal using simul-
taneous masking calculations and distributed quantization bits accord-
ingly. Huang [76] extended these techniques to include forward masking
criteria. Johnston's work is now being incorporated into speech coders.

 2000 CRC Press LLC

FIGURE 12.5
MBE synthesized data compared to auditory processed data.
Quality degrades when masking threshold is raised above 10
dB. Below 10 dB there is no audible degradation.

Bourget et al. [11], Sen and Holmes [145], and Soheili et al. [149] use
perceptual measures in creating the excitation codebook in CELP-based
coders. Carnero and Drygajlo [32, 18] decompose the signal into crit-
ical bands and use masking thresholding to determine bit allocation.
Najafzadeh-Azghandi and Kabal [118, 119] as well as George and Smith
[49] use perceptual masking thresholds to train the vector quantization
in a sinusoidal based coder.
Both Virag [161] and Drygajlo and Carnero [32] are utilizing acoustic
masking techniques for speech enhancement. Kubin and Kleijn are work-
ing on computationally efficient perceptual speech coding algorithms
[99]. Tang et al. are using perceptual techniques within a subband
speech coder [157].
Much work has been directed at using perceptual criteria to distribute
coding error to minimize perceived degradation [82, 76, 102, 11, 18, 99,
149, 145, 118, 119]. In [145], Sen and Holmes attempt to shape the error
spectrum of a CELP coder such that it is below the calculated mask-

 2000 CRC Press LLC

ing threshold. This method quantizes the areas of the speech spectrum
which lie above the masking threshold with minimal distortion, while
quantizing those regions below the masking threshold with as much dis-
tortion as the masking threshold will allow. Reducing the coding bit rate
and, correspondingly, raising the quantization noise above the masking
threshold introduced only minor perceptual distortion. The reported
perceived effect of this coder is much smoother, more natural sounding
decoded speech than typical CELP encoded/decoded speech at the same
bit rate.
Drygajlo and Carnero approach coding and speech quality enhance-
ment in the same algorithm [32]. The method uses wavelet decompo-
sition (a transformation, similar to an FFT, but with unevenly spaced
frequency basis functions placed to resemble critical bands) to obtain
frequency responses of critical bands to help efficiently calculate mask-
ing thresholds. Coding bits are allocated such that the quantization
noise remains below the masked threshold of detectability.

 2000 CRC Press LLC

Audio Compression Using Wavelet Techniques: Project Report
No ratings yet
Audio Compression Using Wavelet Techniques: Project Report
41 pages
Enhancement Based On Auditory Masking
No ratings yet
Enhancement Based On Auditory Masking
4 pages
Overview of Speech Enhancement: 3.1 Psychoacoustics
No ratings yet
Overview of Speech Enhancement: 3.1 Psychoacoustics
19 pages
Speech and Audio Processing: Lecture-2
No ratings yet
Speech and Audio Processing: Lecture-2
23 pages
Audio Compression Techniques
No ratings yet
Audio Compression Techniques
34 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
11 pages
Module 5 Data Compression KTU
No ratings yet
Module 5 Data Compression KTU
17 pages
36-Perceptual Coding, MPEG Audio Coding-03!04!2025
No ratings yet
36-Perceptual Coding, MPEG Audio Coding-03!04!2025
57 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
Perceptual Modelling For Low-Rate Audio Coding: Christopher R. Cave
No ratings yet
Perceptual Modelling For Low-Rate Audio Coding: Christopher R. Cave
96 pages
Microphone Interference Reduction in Live Sound Alice Clifford, Josh Reiss Centre For Digital Music Queen Mary, University of London London, UK Alice - Clifford@eecs - Qmul.ac - Uk
No ratings yet
Microphone Interference Reduction in Live Sound Alice Clifford, Josh Reiss Centre For Digital Music Queen Mary, University of London London, UK Alice - Clifford@eecs - Qmul.ac - Uk
8 pages
Lectures Hearing 2
No ratings yet
Lectures Hearing 2
7 pages
MPEG Standards For Audio
No ratings yet
MPEG Standards For Audio
46 pages
Auditory Interfaces (2022)
100% (1)
Auditory Interfaces (2022)
241 pages
tmp1737 TMP
No ratings yet
tmp1737 TMP
12 pages
Laboratorio 2
No ratings yet
Laboratorio 2
33 pages
Hearing, 2nd Edition Complete EPUB Ebook
100% (17)
Hearing, 2nd Edition Complete EPUB Ebook
16 pages
Susini 2011 Measurement
No ratings yet
Susini 2011 Measurement
27 pages
MSP 1982 28454
No ratings yet
MSP 1982 28454
6 pages
A Hi-Fi Audio Coding Technique For Wireless Communication Based On Wavelet Packet Transformation
No ratings yet
A Hi-Fi Audio Coding Technique For Wireless Communication Based On Wavelet Packet Transformation
5 pages
Cocktail Party Problem As Binary Classification: Deliang Wang
No ratings yet
Cocktail Party Problem As Binary Classification: Deliang Wang
39 pages
Masking Frequency Selectivity and The Critical Band
No ratings yet
Masking Frequency Selectivity and The Critical Band
80 pages
Ziemer2020 Chapter WaveFieldSynthesis
No ratings yet
Ziemer2020 Chapter WaveFieldSynthesis
41 pages
Audio and Audio Compression
No ratings yet
Audio and Audio Compression
27 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
29 pages
Principles of Digital Dynamic-Range Compression
No ratings yet
Principles of Digital Dynamic-Range Compression
32 pages
Pitch Detection of Speech Signals (Project Report)
No ratings yet
Pitch Detection of Speech Signals (Project Report)
9 pages
Lesson 20 Audio Compression
No ratings yet
Lesson 20 Audio Compression
34 pages
Perceptual Noise-Masking With Music Through Deep Spectral Envelope Shaping
No ratings yet
Perceptual Noise-Masking With Music Through Deep Spectral Envelope Shaping
5 pages
Towards Neurocomputational Speech and So
No ratings yet
Towards Neurocomputational Speech and So
279 pages
Psycho Acoustics
No ratings yet
Psycho Acoustics
64 pages
Taal 2013
No ratings yet
Taal 2013
4 pages
Human Speech Communication
No ratings yet
Human Speech Communication
44 pages
Lecture 3
No ratings yet
Lecture 3
7 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
30 pages
A Combined Approach For Broadband Noise Reduction
No ratings yet
A Combined Approach For Broadband Noise Reduction
4 pages
Bregman PDF
No ratings yet
Bregman PDF
12 pages
LPC, Which Has Mathematically Tractable and Well-Understood Model. This Model Is
No ratings yet
LPC, Which Has Mathematically Tractable and Well-Understood Model. This Model Is
14 pages
Augmented Audification: July 2015
No ratings yet
Augmented Audification: July 2015
8 pages
Masking, Freq Res, Excitation Pattern
No ratings yet
Masking, Freq Res, Excitation Pattern
54 pages
Lec2 Audition
No ratings yet
Lec2 Audition
37 pages
Auditory Masking.1
No ratings yet
Auditory Masking.1
6 pages
Digital Audio Watermarking Fundamentals Techniques and Challenges 1st Edition Yong Xiang PDF Download
No ratings yet
Digital Audio Watermarking Fundamentals Techniques and Challenges 1st Edition Yong Xiang PDF Download
153 pages
Journal Pone 0244433
No ratings yet
Journal Pone 0244433
17 pages
A Tutorial On Speech Synthesis Models
No ratings yet
A Tutorial On Speech Synthesis Models
8 pages
Speech Perception Challenges
No ratings yet
Speech Perception Challenges
39 pages
Unit 1 - Part A - Masking
No ratings yet
Unit 1 - Part A - Masking
20 pages
Multimedia Systems: Sreeraj K. P. Asst. Professor, Dec, Rset
No ratings yet
Multimedia Systems: Sreeraj K. P. Asst. Professor, Dec, Rset
27 pages
Bros Sier 04 Fast Notes
No ratings yet
Bros Sier 04 Fast Notes
6 pages
Audio Compression
No ratings yet
Audio Compression
53 pages
ASA Demo Booklet9V4
No ratings yet
ASA Demo Booklet9V4
96 pages
LD04 ch14 PDF
No ratings yet
LD04 ch14 PDF
42 pages
Yost09 PDF
100% (1)
Yost09 PDF
15 pages
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
100% (1)
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
32 pages
4 Chapter Audio and Video Compression
No ratings yet
4 Chapter Audio and Video Compression
122 pages
Richards 2
No ratings yet
Richards 2
45 pages
Treisman 1963
No ratings yet
Treisman 1963
8 pages
Gabor - 1946 - Theory of Communication. Part 2 The Analysis of Hearing
No ratings yet
Gabor - 1946 - Theory of Communication. Part 2 The Analysis of Hearing
4 pages
Glotin H. (Ed.) - Soundscape Semiotics. Localization and Categorization PDF
No ratings yet
Glotin H. (Ed.) - Soundscape Semiotics. Localization and Categorization PDF
193 pages
Chapter-8 ODE PDF
100% (1)
Chapter-8 ODE PDF
56 pages
Adaptive Image Processing: A Computational Intelligence Perspective
No ratings yet
Adaptive Image Processing: A Computational Intelligence Perspective
10 pages
Model-Based Adaptive Image Restoration
No ratings yet
Model-Based Adaptive Image Restoration
44 pages
Fundamentals of Neural Network Image Restoration
No ratings yet
Fundamentals of Neural Network Image Restoration
18 pages
Synchronous Rectifier DC/DC Converters: 1956 - C12.fm Page 575 Monday, August 18, 2003 3:17 PM
No ratings yet
Synchronous Rectifier DC/DC Converters: 1956 - C12.fm Page 575 Monday, August 18, 2003 3:17 PM
17 pages
Algebraic Evaluations: 2.1 Evaluation of Polynomials in Grunert Form
No ratings yet
Algebraic Evaluations: 2.1 Evaluation of Polynomials in Grunert Form
28 pages
Edge Detection Using Model-Based Neural Networks: 200 2 CRC Press LLC
No ratings yet
Edge Detection Using Model-Based Neural Networks: 200 2 CRC Press LLC
24 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
14 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
15 pages
Apéndice A - Reglamentación de SAE para El Concurso Aero Design
No ratings yet
Apéndice A - Reglamentación de SAE para El Concurso Aero Design
16 pages
Cap7 Bibliografia
No ratings yet
Cap7 Bibliografia
2 pages
MP 2128 Heli
No ratings yet
MP 2128 Heli
2 pages
Interpolation and Approximation: 7.1 Real Data in One Dimension
No ratings yet
Interpolation and Approximation: 7.1 Real Data in One Dimension
8 pages
How To Get The Best Performance From Your Futaba 2.4Ghz Fasst™ Aircraft Receivers
No ratings yet
How To Get The Best Performance From Your Futaba 2.4Ghz Fasst™ Aircraft Receivers
4 pages
Curvas Polares de Perfiles de Aire Eppler, Selig y Liebeck
No ratings yet
Curvas Polares de Perfiles de Aire Eppler, Selig y Liebeck
3 pages
FCB Ex1000
No ratings yet
FCB Ex1000
2 pages
MP2028G PDF
No ratings yet
MP2028G PDF
2 pages
Low Cost Miniature UAV Autopilot: 1.204.344.5558 World Leader in Small UAV Autopilots
No ratings yet
Low Cost Miniature UAV Autopilot: 1.204.344.5558 World Leader in Small UAV Autopilots
2 pages
Redes Neuronales - Aplicación y Oportunidades en Aeronáutica
No ratings yet
Redes Neuronales - Aplicación y Oportunidades en Aeronáutica
6 pages
Chireix-Mesny Shortwave Antenna Digitalhome - Ca Their Web Page
No ratings yet
Chireix-Mesny Shortwave Antenna Digitalhome - Ca Their Web Page
18 pages
Ac Machines Lesson 2
No ratings yet
Ac Machines Lesson 2
35 pages
Mechatronics Sensor Calibration Guide
No ratings yet
Mechatronics Sensor Calibration Guide
8 pages
Clipsal 2010 Product Price List
No ratings yet
Clipsal 2010 Product Price List
64 pages
Hip 180 - 190B2 Bo1
No ratings yet
Hip 180 - 190B2 Bo1
2 pages
Hardware Colibrick
No ratings yet
Hardware Colibrick
49 pages
CB Services Online en
No ratings yet
CB Services Online en
5 pages
L4 Ports and Cables CHS
100% (1)
L4 Ports and Cables CHS
17 pages
Roma 120 Go User Manual Guide
No ratings yet
Roma 120 Go User Manual Guide
4 pages
Relay Setting Guide for Engineers
100% (4)
Relay Setting Guide for Engineers
11 pages
VFD E Manual
100% (1)
VFD E Manual
271 pages
PN Junction Diode
100% (2)
PN Junction Diode
50 pages
NXG Washroom Automation Brochure
No ratings yet
NXG Washroom Automation Brochure
18 pages
DSP Lab 2
No ratings yet
DSP Lab 2
5 pages
Noise Triange & Pre-Emphasis - De-Emphasis 2
No ratings yet
Noise Triange & Pre-Emphasis - De-Emphasis 2
9 pages
Magnetic Motor Design
No ratings yet
Magnetic Motor Design
16 pages
QuickSilver Controls QCI-DS021 QCI-D2-IGF
No ratings yet
QuickSilver Controls QCI-DS021 QCI-D2-IGF
10 pages
PSU Power Input and Output
No ratings yet
PSU Power Input and Output
2 pages
BoM For Transformer
No ratings yet
BoM For Transformer
24 pages
FQIT - Micro-100-Data-Sheet
No ratings yet
FQIT - Micro-100-Data-Sheet
5 pages
DS306-CPGK - PDF CUMMIS PDF
No ratings yet
DS306-CPGK - PDF CUMMIS PDF
3 pages
Amazon - Ae Pipe For DC Motor
No ratings yet
Amazon - Ae Pipe For DC Motor
1 page
Siemens AG: Circuit Diagram Smart Infrastructure - Distribution Systems
No ratings yet
Siemens AG: Circuit Diagram Smart Infrastructure - Distribution Systems
37 pages
Datasheet Encoder Absoluto AC58 Parallel
No ratings yet
Datasheet Encoder Absoluto AC58 Parallel
11 pages
Digital Principles and Application by Leach & Malvino
82% (17)
Digital Principles and Application by Leach & Malvino
700 pages
Radial Distribution Load Flow Analysis
100% (1)
Radial Distribution Load Flow Analysis
8 pages
Pt-Modbus Converter for Temperature Sensors
No ratings yet
Pt-Modbus Converter for Temperature Sensors
1 page
DIY Metal Detector Guide
No ratings yet
DIY Metal Detector Guide
9 pages
Wireless PLCs (EU2-16Langu
No ratings yet
Wireless PLCs (EU2-16Langu
75 pages
Principles of Measurement System-Formula
100% (1)
Principles of Measurement System-Formula
8 pages
Mk-V1 User Manual Installation
No ratings yet
Mk-V1 User Manual Installation
20 pages

A Practical Handbook of Speech Coders

Uploaded by

A Practical Handbook of Speech Coders

Uploaded by

Goldberg, R. G.

"Perceptual Speech Coding"

© 2000 CRC Press LLC

12.1 Auditory Processing of Speech

 2000 CRC Press LLC

12.1.1 General Perceptual Speech Coder

Most of the algorithm processing steps of a perceptual speech coder are

Figure 12.1 displays a block diagram of a general perceptual speech

 2000 CRC Press LLC

12.1.2. Frequency and Temporal Masking

It is well known that simultaneous masking in frequency is more

1. Transform each short time segment of speech into the frequency

2. Segment frequency domain representation into logarithmically

3. Calculate the total energy in the lowest band.

4. Determine the threshold of detectability within this critical band

5. Code only frequency information above the threshold level.

6. Continue threshold calculation/coding process for the next higher

7. Repeat steps 3 through 6 until all critical bands are coded.

Although more complex, this method could be extended to include

 2000 CRC Press LLC

than the masker. The previously described method is highly efficient;

Simultaneous frequency and temporal masking suggest that

 2000 CRC Press LLC

In Figure 12.1, an auditory analysis of the speech parameters is per-

 2000 CRC Press LLC

This process is repeated for all time/frequency coordinates of the

 2000 CRC Press LLC

12.2 Perceptual Coding Considerations

 2000 CRC Press LLC

Wideband perceptual coding (bandwidth of approximately 20 kHz) is

12.2.2 Sound Quality of Signal Components

Psycho-acoustic experimentation on the auditory system [45, 141, 139,

 2000 CRC Press LLC

For perceptual coding, it is important to know the characteristics of

12.2.3 MBE Model for Perceptual Coding

The Multi Band Excitation (MBE) speech model, discussed in Sec-

By assuming that speech follows the basic properties of the MBE

 2000 CRC Press LLC

The masking approach described in Section 12.1.3 was applied to

12.3 Research in Perceptual Speech Coding

 2000 CRC Press LLC

 2000 CRC Press LLC

 2000 CRC Press LLC

You might also like