(Ebook) Signal Processing, Perceptual Coding and Watermarking of Digital Audio: Advanced Technologies and Models by Xing He ISBN 9781615209255, 9781615209262, 1615209255 pdf download
(Ebook) Signal Processing, Perceptual Coding and Watermarking of Digital Audio: Advanced Technologies and Models by Xing He ISBN 9781615209255, 9781615209262, 1615209255 pdf download
https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374
https://ebooknice.com/product/sat-ii-success-math-1c-and-2c-2002-peterson-
s-sat-ii-success-1722018
https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312
(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master the SAT
Subject Test: Math Levels 1 & 2) by Arco ISBN 9780768923049,
0768923042
https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-arco-
master-the-sat-subject-test-math-levels-1-2-2326094
(Ebook) The Digital Signal Processing Handbook: Video, Speech,
and Audio Signal Processing by Vijay K. Madisetti ISBN
9781420046083, 142004608X
https://ebooknice.com/product/the-digital-signal-processing-handbook-video-
speech-and-audio-signal-processing-1688718
https://ebooknice.com/product/audio-signal-processing-and-coding-1763286
https://ebooknice.com/product/digital-audio-watermarking-techniques-and-
technologies-applications-and-benchmarks-1691584
https://ebooknice.com/product/digital-audio-signal-processing-1224852
Signal Processing,
Perceptual Coding and
Watermarking of Digital
Audio:
Advanced Technologies and
Models
Xing He
SRS Labs Inc., USA
Senior Editorial Director: Kristin Klinger
Director of Book Publications: Julia Mosemann
Editorial Director: Lindsay Johnston
Acquisitions Editor: Erika Carter
Production Editor: Sean Woznicki
Typesetters: Milan Vracarich, Jr.
Print Coordinator: Jamie Snavely
Cover Design: Nick Newcomer
Copyright © 2012 by IGI Global. All rights reserved. No part of this publication may be repro-
duced, stored or distributed in any form or by any means, electronic or mechanical, including
photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the
names of the products or companies does not indicate a claim of ownership by IGI Global of the
trademark or registered trademark.
He, Xing.
Signal processing, perceptual coding, and watermarking of digital audio: advanced technologies
and models / by Xing He.
p. cm.
Includes bibliographical references and index.
Summary: “This book focuses on watermarking, in which data is marked with hidden ownership
information, as a promising solution to copyright protection issues and deals with understanding
human perception processes and including them in effective psychoacoustic models”-- Provided by
publisher.
ISBN 978-1-61520-925-5 (hardcover) -- ISBN 978-1-61520-926-2 (ebook) -- ISBN 978-1-60960-
790-6 (print & perpetual access) 1. Signal processing--Digital techniques. 2. Sound--Recording
and reproducing--Digital techniques. 3. Sound recordings--Security measures. 4. Digital water-
marking. I. Title.
TK5102.9.H42 2012
621.382’2--dc22
2011003217
All work contributed to this book is new, previously-unpublished material. The views expressed in
this book are those of the authors, but not necessarily of the publisher.
Dedication
Preface..................................................................................................................vii
Chapter 1
Introduction of Human Auditory System and Psychoacoustics.............................. 1
1.1 Simple Introduction of the Ear................................................................. 1
1.2 Properties of the Human Auditory System............................................... 3
1.3 The Masking Phenomena.......................................................................... 6
1.4 Temporal Masking.................................................................................. 12
Chapter 2
Introduction of Digital Watermarking................................................................... 14
2.1 Motivation and Goals............................................................................. 14
2.2 Watermark Applications.......................................................................... 15
2.3 Elements of a Watermarking System...................................................... 17
2.4 Organization of the Rest of the Book..................................................... 18
Chapter 3
Novel Applications of Digital Watermarking....................................................... 20
3.1 Error Detection, Concealment and Recovery......................................... 21
3.2 Quality of Service in Multimedia Communications............................... 23
3.3 Subjective Signal Quality Measurement................................................ 25
3.4 Bandwidth Extension.............................................................................. 26
3.5 Security/ Air Traffic Control / Secret Communication........................... 28
Chapter 4
Literature Review of Selected Watermarking Schemes........................................ 32
4.1 LSB Coding............................................................................................ 32
4.2 Patch Work.............................................................................................. 33
4.3 Quantization Index Modulation.............................................................. 34
4.4 Echo Coding / Hiding............................................................................. 37
4.5 Phase Coding.......................................................................................... 41
4.6 Fragile Watermarking............................................................................. 43
4.7 Spread Spectrum Coding........................................................................ 44
Chapter 5
Principles of Spread Spectrum.............................................................................. 46
5.1 Theory of Spread Spectrum Technology in Communication................. 46
5.2 Spread Spectrum for Audio Watermarking............................................. 49
5.3 Analysis of Traditional SS Watermarking Systems................................ 51
5.4 Problems of Traditional SS Watermarking Systems............................... 54
Chapter 6
Survey of Spread Spectrum Based Audio Watermarking Schemes...................... 56
6.1 Basic Direct Sequence Spread Spectrum................................................ 56
6.2 Time Domain Spread Spectrum Watermarking Scheme........................ 57
6.3 Spread Spectrum Watermarking with Psychoacoustic Model
and 2-D Interleaving Array............................................................... 59
6.4 An Improved Spread Spectrum Method................................................. 61
6.5 Novel Spread Spectrum Approach.......................................................... 63
6.6 Enhanced Spread Spectrum Watermarking for AAC Audio................... 64
6.7 Frequency Hopping Spread Spectrum.................................................... 65
6.8 Limits of Traditional Spread Spectrum Method..................................... 66
Chapter 7
Techniques for Improved Spread Spectrum Detection......................................... 68
7.1 Matched Filter Approach........................................................................ 68
7.2 Savitzky-Golay Smoothing Filters......................................................... 70
7.3 Cepstrum Filtering.................................................................................. 71
7.4 Spectral Envelop Filtering...................................................................... 71
7.5 The Linear Prediction Method................................................................ 72
7.6 De-Synchronization Attacks and Traditional Solutions.......................... 73
Chapter 8
A Psychoacoustic Model Based on the Discrete Wavelet Packet Transform....... 75
8.1 Introduction............................................................................................. 76
8.2 Wavelet Transform Analysis................................................................... 78
8.3 An Improved Wavelet-Based Psychoacoustic Model............................. 79
8.4 Experimental Procedures and Results.................................................... 91
8.5 Conclusion.............................................................................................. 98
Chapter 9
A High Quality Audio Coder Using Proposed Psychoacoustic Model............... 102
9.1 Structure of Proposed Perceptual Audio Coder.................................... 102
9.2 Quantization and Huffman Coding....................................................... 104
9.3 Evaluation of Proposed Audio Coder................................................... 110
9.4 Conclusion............................................................................................ 113
Chapter 10
A Novel Spread Spectrum Digital Audio Watermarking Scheme...................... 115
10.1 Introduction......................................................................................... 115
10.2 Watermark Design, Insertion and Detection....................................... 119
10.3 Experimental Procedures and Results................................................ 126
10.4 Conclusion.......................................................................................... 131
Chapter 11
Further Improvements of the Watermarking Scheme......................................... 134
11.1 Diagram of Proposed Enhanced Watermark System.......................... 135
11.2 Encoder of the Proposed Enhanced Watermark System..................... 136
11.3 Decoder of Proposed Enhanced Watermark System........................... 137
11.4 Discussion about the Savitzky-Golay Smoothing Filter..................... 140
11.5 Evaluation of the Proposed Enhanced Watermark System................. 145
11.6 More Watermarking Systems Comparison......................................... 148
11.7 Conclusion.......................................................................................... 150
Chapter 12
A Fast and Precise Synchronization Method for Digital Audio
Watermarking...................................................................................................... 152
12.1 Introduction......................................................................................... 152
12.2 The Synchronization Problem in Watermarking and
Traditional Solutions.................................................................... 153
12.3 A Fast and Efficient Synchronization Method.................................... 159
12.4 Experimental Results.......................................................................... 164
12.5 Conclusion.......................................................................................... 173
Chapter 13
Conclusion and Future Trends............................................................................ 177
13.1 Conclusion of the Book...................................................................... 177
13.2 Future Trends...................................................................................... 179
Index................................................................................................................... 189
vii
Preface
The availability of increased computational power and the proliferation of the In-
ternet have facilitated the production and distribution of unauthorized copies of
multimedia information. As a result, the problem of copyright protection has at-
tracted the interest of the worldwide scientific and business communities. The most
promising solution seems to be the watermarking process where the original data
is marked with ownership information hidden in an imperceptible manner in the
original signal. Compared to embedding watermarks into still images, hiding data
in audio is much more challenging due to the extreme sensitivity of the human
auditory system to changes in the audio signal. Understanding of the human percep-
tion processes and including them in effective psychoacoustic models is the key to
successful watermarking. Aside from psychoacoustic modeling, synchronization is
also an important component for a successful watermarking system. In order to
recover the embedded watermark from the watermarked signal, the detector has to
know the beginning location of the embedded watermark first.
In this book, we focus on those two issues. We propose a psychoacoustic model
which is based on the discrete wavelet packet transform (DWPT). This model
takes advantage of the flexibility of DWPT decomposition to closely approximate
the critical bands and provides precise masking thresholds, resulting in increased
extent of inaudible spectrum and reduction of sum to signal masking ratio (SSMR)
compared to the existing competing techniques. The proposed psychoacoustic
model has direct applications to digital perceptual audio coding as well as digital
audio watermarking.
For digital perceptual audio coding, the greater extent of inaudible spectrum
provided by the psychoacoustic model results more audio samples to be quantized
to zero, leading to a decreased compression bit rate. The reduction of SSMR on the
other hand, allows a coarser quantization step, which further cuts the necessary bits
for audio representation in the audible spectrum areas. In other words, the audio
compressed with the proposed digital perceptual codec achieves better subjective
quality than an existing coding standard when operating at the same information
rate, which is proven by the subjective listening test.
viii
Digital audio watermarking applications will benefit from the proposed psy-
choacoustic model from two perspectives: a) It can embed more watermarks to the
inaudible spectrum, which results to a watermark payload increase and b) It hiding
higher energy watermarks to the audible spectrum areas possible, which leads to
improved robustness and greater resiliency to attacks and signal transformations
than existing techniques, as proven by the experimental results.
We finally introduce a fast and robust synchronization algorithm for watermark-
ing which exploits the consistency of the signal energy distribution under varying
transformation conditions and uses a matched filter approach in a fast search for
determining the precise watermark location. The proposed synchronization method
achieves error free sample-to-sample synchronization under different attacks and
signal transformations and shows very high robustness to severe malicious time
scaling manipulation.
1
Chapter 1
Introduction of Human
Auditory System and
Psychoacoustics
This chapter reviews the background of human auditory system (HAS) and psycho-
acoustics.
The ear consists of three separate parts, the outer, middle, and inner ears as shown
in Figure 1 (Wikipedia, 2009).
The outer ear consists of the head, the pinna, and the external auditory canal.
The main function of the pinna is to locate the source of the sound, especially at
high frequencies. Auditory canal is where the sound travels through to hit the tym-
panic membrane. The outer ear offers frequency directivity by shadowing, shaping
and diffraction. Different people localize sound differently due to considerable
DOI: 10.4018/978-1-61520-925-5.ch001
Copyright ©2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Introduction of Human Auditory System and Psychoacoustics
variations in the pinna. A generalized summary of such ability for average listener
is modeled by the “Head Related Transfer Functions” (HRTF’s) or “Head Related
Impulse Responses” (HRIR’s).
The air-filled middle ear is composed of the eardrum (tympanic membrane),
the opening of the eustachian tube and the 3 small bones (ossicles), including the
malleus (hammer), incus (anvil) and stapes (stirrup) (Bharitkar, et al. 2006). The
sound vibrations in the ear canal are transmitted to the tympanic membrane, which
causes movement of the malleus, incus and stapes. Then stapes footplate pushes
on the oval window, causing the movement of the fluid within the cochlea in inner
ear. Here the whole ossicles act as an amplifier, transmitting the sound vibrations
and passing them through to the fluid-filled inner ear.
The inner ear is constituted of the cochlea, containing the organ of corti, two
membranes (basilar membrane and tectoral membrane) and the associated fluids
and spaces (Bharitkar, et al. 2006). The cochlea is lined with tiny hair cells, which
create nerve signals when the sound reaches cochlea.
2
Introduction of Human Auditory System and Psychoacoustics
T (f ) =
2 (1.1)
3.64(f / 1000)−0.8 − 6.5e −0.6( f /1000−3.3) + 10−3 (f / 1000)4 (dB ) SPL
3
Introduction of Human Auditory System and Psychoacoustics
with frequency greater than 10 kHz. The area below the absolute threshold of hear-
ing is called the quiet zone and the audio signal that falls into quiet zone is not
perceptible.
In the presence of an acoustic stimulus the basilar membrane in the human inner
ear performs a short-time spatio-spectral analysis on the incoming sound. This
process is done in specific overlapping regions of the basilar membrane (Deller et
al., 1993). Experiments showed that human sensitivity to acoustic events is related
to the unevenly spaced frequency scale. The term “critical band” describes regions
of equivalent sensitivity in this frequency scale and is defined as the frequency band
within which the loudness of a band of continuously distributed sound of constant
SPL is independent of its bandwidth (Atal et al., 1984). The critical band is rated on
the so-called Bark scale. Because the critical bands are unevenly spaced, the “Bark”
scale is a nonlinear frequency scale (Deller et al., 1993). The cycles-per-second (Hz)
to Bark mapping is described by the following formula:
f f 2
z = 13 arctan(0.76 ) + 3.5 arctan( ). (1.2)
1000 1000
4
Introduction of Human Auditory System and Psychoacoustics
where f is in Hz and z is in Bark. Barks are rounded to the nearest integer to provide
the critical band index. Figure 3 illustrates such mapping.
The critical bandwidth at each center frequency is closely approximated by
f 2 0.69
BWc ( f ) = 25 + 75(1 + 1.4( )) (1.3)
1000
The critical bands and its bandwidth is listed in Table 1 (Zwicker et al., 1991)
and shown in Figure 4.
Although critical bands notation is widely used in psychoacoustic modeling and
perceptual audio coding, there is an alternative called equivalent rectangular band-
width (ERB), which models human hearings as brick rectangular band pass filters
and provides an approximation to the bandwidths of those filters.
To convert a frequency in Hz to a frequency in units of ERB-bands, the follow-
ing formula should be used, namely
5
Introduction of Human Auditory System and Psychoacoustics
Critical Band Index Lower Edge (Hz) Center Edge (Hz) Upper Edge (Hz)
1 0 50 100
2 100 150 200
3 200 250 300
4 300 350 400
5 400 450 500
6 510 570 630
7 630 700 770
8 770 840 920
9 920 1000 1080
10 1080 1170 1270
11 1270 1370 1480
12 1480 1600 1720
13 1720 1850 2000
14 2000 2150 2320
15 2320 2500 2700
16 2700 2900 3150
17 3150 3400 3700
18 3700 4000 4400
19 4400 4800 5300
20 5300 5800 6400
21 6400 7000 7700
22 7700 8500 9500
23 9500 10500 12000
24 12000 13500 15500
25 15500 19500
f
ERBc ( f ) = 21.4 log(4.37 + 1) (1.4)
1000
f
BWERB (f ) = 24.7(4.37 + 1) (1.5)
1000
It is important to note that the formula above converts a frequency (in Hz) to a
bandwidth (also in Hz), which is illustrated in Figure 6.
Auditory masking refers to the phenomenon where one sound becomes inaudible
due to the existence of another sound. The sound being masked is called maskee
6
Introduction of Human Auditory System and Psychoacoustics
and the sound that masks the other sound is called masker. There are two types of
auditory masking phenomena: simultaneous masking and non-simultaneous mask-
ing, which are also referred to as frequency masking and temporal masking.
Standard critical bands distribution (Table 1) happens when two or more sounds are
present at the same time and the weaker signal is rendered imperceptible because of
the presence of the stronger signal, in another words, the weaker signal is masked by
the stronger signal. Whenever there is a stimuli, it creates a masking threshold and
makes inaudible any signal that falls below the masking curve. Figure 7 (Zwicker
et al., 1990) shows the masking thresholds of five pure tones at 0.07 kHz, 0.25
kHz, 1 kHz, 4 kHz and 8 kHz. The broken line is the absolute threshold of hearing.
There are many types of simultaneous masking and the mainly three simplified
paradigms of simultaneous masking are noise-masking-tone(NMT) (Scharf, 1970),
tone-masking-noise(TMN) (Hellman, 1972), and noise-masking-noise(NMN)
(Akansu, et al. 1996).
7
Introduction of Human Auditory System and Psychoacoustics
8
Introduction of Human Auditory System and Psychoacoustics
Figure 7. Frequency masking thresholds of pure tones at 0.07 kHz, 0.25 kHz, 1 kHz,
4 kHz and 8 kHz
tone, note however, the intensity of the noise is 24 dB less than that of the tone
(Spanias, et al. 2007).
In the noise masking noise scenario, a narrow band noise is masked by another
narrow band noise. One study showed that wide band noises can produce about 26
dB SMR in the noise masking noise case.
9
Introduction of Human Auditory System and Psychoacoustics
As it can be easily seen from Figure 9 and Figure 10, noise is much more effec-
tive as masker, creating much higher SMR compared to tone signal. It is easy for
narrow band noise to render a tone signal in the same critical band inaudible with
barely 4 dB SMR. However, in order to mask the narrow band noise at the same
critical band, a tone signal has to have the intensity 21 to 28 dB higher than that of
the noise signal.
An excitation signal can not only create simultaneous masking effect and render
other weaker audio signal in the same critical band inaudible, but can also create
such masking effect across nearly critical bands. This phenomenon is called the
spread of masking.
This effect is often approximately modeled by a triangular spreading function
with slopes of 25 and 10 dB per Bark. A general formula of such approximation
can be expressed as (Painter et al., 2000)
10
Introduction of Human Auditory System and Psychoacoustics
SFdb (x ) =
(1.6)
15.81 + 7.5(x + 0.474) − 17.5 1 + (x + 0.474)2
A strong stimuli can not only generate simultaneous masking, but also create mask-
ing effects even before its onset or after its present. Such masking phenomenon is
called non-simultaneous masking or temporal masking. The masking phenomenon
before a sound onset is called pre-masking and the one after its presence is called
post-masking.
Figure 10 (Painter et al., 2000) illustrates the non-simultaneous masking property
of the HAS. Note that although pre-masking lasts only several milliseconds, post-
masking can last up to 200 milliseconds.
11
Random documents with unrelated
content Scribd suggests to you:
Fifth Day: The same as the fourth.
Sixth Day: The same as the first.
Seventh Day: The same as the second and so on, for a period of
about fifteen days.
WINTER MENU
RUN-DOWN CONDITION
FLATULENCY—UNDERWEIGHT
It is well to remember that the best nourished person is the one who
subsists upon the fewest number of things that will give to the body
the required amount and character of nutrition.
Two glasses of cool water on rising, and the juice of a sweet orange.
Devote as much time as possible to vigorous deep breathing
exercises before an open window.
MENU I MENU II
BREAKFAST
A cup of hot water A spoonful or two of bran,
A spoonful or two of wheat cooked
bran, cooked; serve with Whole wheat gems with nut
thin cream butter
Whole wheat gems eaten One egg, boiled two minutes
with nuts or nut butter A glass of milk or a cup
A cup of milk, cocoa, or chocolate of cocoa
LUNCHEON
MENU I MENU II
Three or four glasses of milk Three or four eggs, whipped,
Half a cup of wheat bran into which put a teaspoonful
Or of sugar to each egg, and
Baked white potatoes a flavor of lemon juice,
Butter omitting milk
A cup of water
The juice of an orange an
hour later
DINNER
Carrots, squash, or boiled Turnips, carrots, or beets—any
onions—any two of these two or all of these
A baked potato A baked potato
One egg Fish
A cup of milk or chocolate A baked banana eaten with
cream, and something
sweet if desired
A baked omelet may be used now and then. (See recipe, p. 678.)
For "Choice of Menus," see p. 683.
Transcriber notes:
P. 831. 'o' changed to 'of'.
*** END OF THE PROJECT GUTENBERG EBOOK ENCYCLOPEDIA OF
DIET: A TREATISE ON THE FOOD QUESTION, VOL. 3 ***
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com