Audio - Introduction To Sound Recording - 2004 (Advanced) PDF
Audio - Introduction To Sound Recording - 2004 (Advanced) PDF
1 Introductory Materials 1
1.1 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Right Triangles . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Slope . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Warning . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Radians . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.2 Phase vs. Addition . . . . . . . . . . . . . . . . . . . . 16
1.5 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.1 Whole Numbers and Integers . . . . . . . . . . . . . . 19
1.5.2 Rational Numbers . . . . . . . . . . . . . . . . . . . . 19
1.5.3 Irrational Numbers . . . . . . . . . . . . . . . . . . . . 19
1.5.4 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . 20
i
CONTENTS ii
2 Analog Electronics 55
2.1 Basic Electrical Concepts . . . . . . . . . . . . . . . . . . . . 55
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 55
2.1.2 Current and EMF (Voltage) . . . . . . . . . . . . . . . 56
2.1.3 Resistance and Ohm’s Law . . . . . . . . . . . . . . . 58
2.1.4 Power and Watt’s Law . . . . . . . . . . . . . . . . . . 60
2.1.5 Alternating vs. Direct Current . . . . . . . . . . . . . 61
2.1.6 RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.1.7 Suggested Reading List . . . . . . . . . . . . . . . . . 68
2.2 The Decibel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2.1 Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2.2 Power and Bels . . . . . . . . . . . . . . . . . . . . . 69
2.2.3 dBspl . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.2.4 dBm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.2.5 dBV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
CONTENTS iii
2.2.6 dBu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.2.7 dB FS . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.2.8 Addendum: “Professional” vs. “Consumer” Levels . . 75
2.2.9 The Summary . . . . . . . . . . . . . . . . . . . . . . 75
2.3 Basic Circuits / Series vs. Parallel . . . . . . . . . . . . . . . 77
2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 77
2.3.2 Series circuits – from the point of view of the current . 77
2.3.3 Series circuits – from the point of view of the voltage 78
2.3.4 Parallel circuits – from the point of view of the voltage 79
2.3.5 Parallel circuits – from the point of view of the current 80
2.3.6 Suggested Reading List . . . . . . . . . . . . . . . . . 81
2.4 Capacitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.4.1 Suggested Reading List . . . . . . . . . . . . . . . . . 88
2.5 Passive RC Filters . . . . . . . . . . . . . . . . . . . . . . . . 89
2.5.1 Another way to consider this... . . . . . . . . . . . . . 94
2.5.2 Suggested Reading List . . . . . . . . . . . . . . . . . 95
2.6 Electromagnetism . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.6.1 Suggested Reading List . . . . . . . . . . . . . . . . . 99
2.7 Inductors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.7.1 Impedance . . . . . . . . . . . . . . . . . . . . . . . . 105
2.7.2 RL Filters . . . . . . . . . . . . . . . . . . . . . . . . . 106
2.7.3 Inductors in Series and Parallel . . . . . . . . . . . . . 106
2.7.4 Inductors vs. Capacitors . . . . . . . . . . . . . . . . . 107
2.7.5 Suggested Reading List . . . . . . . . . . . . . . . . . 107
2.8 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.8.1 Suggested Reading List . . . . . . . . . . . . . . . . . 109
2.9 Diodes and Semiconductors . . . . . . . . . . . . . . . . . . . 110
2.9.1 The geeky stuff: . . . . . . . . . . . . . . . . . . . . . 118
2.9.2 Zener Diodes . . . . . . . . . . . . . . . . . . . . . . . 120
2.9.3 Suggested Reading List . . . . . . . . . . . . . . . . . 122
2.10 Rectifiers and Power Supplies . . . . . . . . . . . . . . . . . . 123
2.11 Transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.11.1 Suggested Reading List . . . . . . . . . . . . . . . . . 130
2.12 Basic Transistor Circuits . . . . . . . . . . . . . . . . . . . . . 131
2.12.1 Suggested Reading List . . . . . . . . . . . . . . . . . 131
2.13 Operational Amplifiers . . . . . . . . . . . . . . . . . . . . . . 132
2.13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 132
2.13.2 Comparators . . . . . . . . . . . . . . . . . . . . . . . 132
2.13.3 Inverting Amplifier . . . . . . . . . . . . . . . . . . . . 134
2.13.4 Non-Inverting Amplifier . . . . . . . . . . . . . . . . . 136
CONTENTS iv
3 Acoustics 151
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.1.1 Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.1.2 Simple Harmonic Motion . . . . . . . . . . . . . . . . 153
3.1.3 Damping . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.1.4 Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . 155
3.1.5 Overtones . . . . . . . . . . . . . . . . . . . . . . . . . 156
3.1.6 Longitudinal vs. Transverse Waves . . . . . . . . . . . 157
3.1.7 Displacement vs. Velocity . . . . . . . . . . . . . . . . 158
3.1.8 Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . 159
3.1.9 Frequency and Period . . . . . . . . . . . . . . . . . . 160
3.1.10 Angular frequency . . . . . . . . . . . . . . . . . . . . 161
3.1.11 Negative Frequency . . . . . . . . . . . . . . . . . . . 162
3.1.12 Speed of Sound . . . . . . . . . . . . . . . . . . . . . . 163
CONTENTS v
5 Electroacoustics 273
5.1 Filters and Equalizers . . . . . . . . . . . . . . . . . . . . . . 273
5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 273
5.1.2 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
5.1.3 Equalizers . . . . . . . . . . . . . . . . . . . . . . . . . 280
5.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 290
5.1.5 Phase response . . . . . . . . . . . . . . . . . . . . . . 290
5.1.6 Applications . . . . . . . . . . . . . . . . . . . . . . . 294
5.1.7 Spectral sculpting . . . . . . . . . . . . . . . . . . . . 294
5.1.8 Loudness . . . . . . . . . . . . . . . . . . . . . . . . . 296
5.1.9 Noise Reduction . . . . . . . . . . . . . . . . . . . . . 297
5.1.10 Dynamic Equalization . . . . . . . . . . . . . . . . . . 298
5.1.11 Further reading . . . . . . . . . . . . . . . . . . . . . . 299
5.2 Compressors, Limiters, Expanders and Gates . . . . . . . . . 300
5.2.1 What a compressor does. . . . . . . . . . . . . . . . . 300
5.2.2 How compressors compress . . . . . . . . . . . . . . . 318
5.2.3 Suggested Reading List . . . . . . . . . . . . . . . . . 327
5.3 Analog Tape . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
5.3.1 Suggested Reading List . . . . . . . . . . . . . . . . . 328
CONTENTS viii
0.1 Preface
Once upon a time I went to McGill University to try to get into the Master’s
program in sound recording at the Faculty of Music. In order to accomplish
this task, students are required to do a “qualifying year” of introductory
courses (although it felt more like obstacle courses...) to see who really
wants to get in the program. Looking back, there is no question that I
learned more in that year than in any other single year of my life. In
particular, two memories stand out.
One was my professor – a guy named Peter Cook who now works at
the CBC in Toronto as a digital editor. Peter is one of those teachers who
doesn’t know everything, and doesn’t pretend to know everything – but if
you ask him a question about something he doesn’t understand, he’ll show
up at the next week’s class with a reading list where the answer to your
question can be found. That kind of enthusiasm in a teacher cannot be
replaced by any other quality. I definitely wouldn’t have gotten as much out
of that year without him. A good piece of advice that I recently read for
university students is that you don’t choose courses, you choose professors.
The second thing was a book by John Woram called the Sound Recording
Handbook (the 1989 edition). This book not only proved to be the most
amazing introduction to sound recording for a novice idiot like myself, but
it continued to be the most used book on my shelf for the following 10 years.
In fact, I was still using it as a primary reference when I was studying for
my doctoral comprehensives 10 years later. Regrettably, that book is no
longer in print – so if you can find a copy (remember – the 1989 edition...
no other...) buy (or steal) it and guard it with your life.
Since then, I have seen a lot of students go through various stages of be-
coming a recording engineer at McGill and in other places and I’ve lamented
the lack of a decently-priced but academically valuable textbook for these
people. There have been a couple of books that have hit the market, but
xvii
0. Opening materials xviii
they’re either too thin, too full of errors, too simplistic or too expensive. (I
won’t name any names here, but if you ask me in person, I’ll tell you...)
This is why I’m writing this book. From the beginning, I intended it to
be freely accessible to anyone that was interested enough to read it. I can’t
guarantee that it’s completely free of errors – so if you find any, please let
me know and I’ll make the appropriate corrections as soon as I can. The
tone of this book is pretty colloquial – that’s intentional – I’m trying to
make the concepts presented here as accessible as possible without reducing
the level of the content, so it can make a good introduction that covers a lot
of ground. I’ll admit that it doesn’t make a great reference because there
are too many analogies and stories in here – essentially too low a signal to
noise ratio to make a decent book for someone that already understands the
concepts.
Note that the book isn’t done yet – in fact, in keeping with everything
else you’ll find on the web, it will probably never be finished. You’ll find
many places where I’ve made notes to myself on what will be added where.
Also, there’s a couple of explanations in here that don’t make much sense –
even to me... so they’ll get fixed later. Finally, there are a lot of references
missing. These will be added in the next update – I promise...
If you think that I’ve left any important subjects out of the Table of
Contents, please let me know by email at [email protected].
0. Opening materials xix
0.2 Thanks
There are a lot of people to thank for helping me out with this project. I
have to start with a big thanks to Peter Cook for getting me started on the
right track in the first place. To Wieslaw Woszczyk for allowing me to teach
courses at McGill back when I was still just a novice idiot – the best way to
learn something is to have to teach it to someone else. To Alain Terriault,
Gilbert Soulodre and Michel Lavoie for answering my dumb questions about
electronics when I was just starting to get the hang of exactly what things
like a capacitor or an op amp do in a circuit. To Brian Sarvis and Kypros
Christodoulides who were patient enough to put up with my subsequent
dumb questions regarding what things like diodes and transistors do in a
circuit. To Jason Corey for putting up with me running down many a
wrong track looking for answers to some of the questions found in here.
To Mark Ballora for patiently answering questions about DSP and trying
(unsuccessfully) to get me to understand Shakespeare. Finally to Philippe
Depalle – once upon a time I was taking a course in DSP for dummies at
McGill and, about two-thirds of the way through the semester, Philippe
guest-taught for one class. In that class, I wound up taking more notes
than I had for all classes in the previous part of the semester combined.
Since then, Philippe came to be a full-time faculty at McGill and I was
lucky enough to have him as a thesis advisor. Basically, when I have any
questions about anything, Philippe is the person I ask.
I also have to thank a number of people who have proofread some of
the stuff you’ll find here and have offered assistance and corrections – either
with or without being asked. In alphabetical order, these folks are Bruce
Bartlett, Peter Cook, Goran Finnberg, John La Grou, George Massenburg,
Bert Noeth, Ray Rayburn, Eberhard Sengpiel and Greg Simmons.
Also on the list of thanks are people who have given permission to use
their materials. Thanks to Claudia Haase and Thomas Lischker at RTW
Radio-Technische (www.rtw.de) for their kind permission to use graphics
from their product line for the section on levels and meters. Also to George
Massenburg (www.massenburg.com) for permission to duplicate a chapter
from the GML manual on equalizers that I wrote for him.
There are also a large number of people who have emailed me, either to
ask questions about things that I didn’t explain well enough the first time,
or to make suggestions regarding additions to the book. I’ll list those people
in a later update of the text – but thanks to you if you’re in that group.
Finally, thanks to Jonathan Sheaffer and The Jordan Valley Academic
College (www.yarden.ac.il) for hosting the space to put this file for now.
0. Opening materials xx
0.3 Autobiography
Just in case you’re wondering “Who does this guy think he is?” I’ll tell
you... This is the bio I usually send out when someone asks me for one. It’s
reasonably up-to-date.
Originally from St. John’s, Newfoundland, Geoff Martin completed his
B.Mus. in pipe organ at Memorial University of Newfoundland in 1990.
He is a graduate of McGill’s Masters program in Sound Recording and, in
2001, he completed his doctoral studies in which he developed a method
of simulating reflections from quadratic residue diffusers for multichannel
virtual acoustic environments.
Following completion of his doctorate, Geoff was a Faculty Lecturer for
McGill’s Music Technology area, where he taught courses in new media,
electronics, and electroacoustics. In addition, he was a member of the de-
velopment team for McGill’s new Centre for Interdisciplinary Research in
Music Media and Technology (CIRMMT). He taught electroacoustic music
composition and conducted the contemporary music ensemble at the Uni-
versity of Ottawa. He has also been a regular member of the visiting faculty
in the Music and Sound Department at the Banff Centre for the Arts. He is
presently a researcher in acoustics and perception at Bang and Olufsen a/s
in Denmark where he has worked since the Fall of 2002. He maintains an
active musical career as an organist, choral conductor and composer.
Geoff has been a member of the Audio Engineering Society since 1990
and has served on the executive for the Montreal Student Chapter for two
years. He was the Papers Chair for the 24th International Conference of the
Audio Engineering Society titled “Multichannel Audio: The New Reality”
held at The Banff Centre in Alberta, Canada. He is presently the chair of
the AES Technical Committee on Microphones and Applications.
0.4.4 Psychoacoustics
Moore, B. C. J. (1997) An Introduction to the Psychology of Hearing, Aca-
demic Press, San Diego, 4th Edition.
Blauert, J. (1997) Spatial Hearing: The Psychophysics of Human Sound
Localization, MIT Press, Cambridge, Revised Edition.
Bregman, A. S. (1990) Auditory Scene Analysis : The Perceptual Orga-
nization of Sound, MIT Press. Cambridge.
Zwicker, E., & Fastl, H. (1999) Psychoacoustics: Facts and Models,
Springer, Berlin.
0. Opening materials xxii
0.4.5 Acoustics
Morfey, C. L. (2001). Dictionary of Acoustics, Academic Press, San Diego.
Kinsler L. E., Frey, A. R., Coppens, A. B., & Sanders, J. V. (1982)
Fundamentals of Acoustics, John Wiley & Sons, New York, 3rd edition.
Hall, D. E. (1980) Musical Acoustics: An Introduction, Wadsworth Pub-
lishing, Belmont.
Kutruff, K. H. (1991) Room Acoustics, Elsevier Science Publishers, Es-
sex.
2. By the time the thing actually got out on the market, I’d be long gone
and everything here would be obsolete. It takes a long long time to
get something published... and
Introductory Materials
1.1 Geometry
It may initially seem that a section explaining geometry is a very strange
place to start a book on sound recording, but as we’ll see later, it’s actually
the best place to start. In order to understand many of the concepts in
the chapters on acoustics, electronics, digital signal processing and electroa-
coustics, you’ll need to have a very firm and intuitive grasp of a couple of
simple geometrical concepts. In particular, these two conceptes are the right
triangle and the concept of slope.
1
1. Introductory Materials 2
that it’s 90◦ as is shown in Figure 1.1. One other new word to learn. The
side opposite the right angle (in Figure 1.1, that would be side a) is called
the hypotenuse of the triangle.
a
b
c
Figure 1.1: A right trangle with sides of lengths a, b and c. Note that side a is called the hypotenuse
of the triangle.
One of the things Pythagoras discovered was that if you take a right
trangle and make a square from each of its sides as is shown in Figure 1.2,
then the sum of the areas of the two smaller squares is equal to the area of
the big square.
Figure 1.2: Three squares of areas A, B and C created by making squares out of the sides of a
right trangle of arbitrary dimensions. A = B + C
should know that the area of a square is equal to the square of the length
of one of its sides. Looking at Figures 1.1 and 1.2 this means that A = a2 ,
B = b2 , and C = c2 .
Therefore, we can put this information together to arrive at a standard
equation for right triangles known as the Pythagorean Theorem, shown in
Equation 1.2.
a2 = b2 + c2 (1.1)
and therefore
p
a= b2 + c2 (1.2)
1.1.2 Slope
Let’s go downhill skiing. One of the big questions when you’re a beginner
downhill skiier is “how steep is the hill?” Well, there is a mathematical way
to calculate the answer to this question. Essentially, another way to ask the
same question is “how many metres do I drop for every metre that I ski
forward?” The more you drop over a given distance, the steeper the slope
of the hill.
So, what we’re talking about when we discuss the slope of the hill is how
much it rises (or drops) for a given run. Mathematically, the slope is written
as a ratio of these two values as is shown in Equation 1.3.
rise
slope = (1.3)
run
but if we wanted to be a little more technical about this, then we would
talk about the ratio of the difference in the y-value (the rise) for a given
difference in the x-value (the run), so we’d write it like this:
∆y
slope = (1.4)
∆x
Where ∆ is a symbol (it’s the Greek capital letter delta) commonly used
to indicate a difference or a change.
Let’s just think about this a little more for a couple of minutes and
consider some different slopes.
1. Introductory Materials 4
y = mx + k (1.5)
where m is the slope.
1. Introductory Materials 5
1.2 Exponents
An exponent is just a lazy way to write down that you want to multiply a
number by itself.
If I say 102 , then this is just a short notation for “10 multiplied by itself
2 times” – therefore, it’s 10 ∗ 10 = 100. For example, 34 = 3 ∗ 3 ∗ 3 ∗ 3 = 81.
Sometime’s you’ll see a negative number as the exponent. This simply
means that you have to include a little division in your calculation. When-
ever you see a negative exponent, you just have to divide 1 by the same
thing without the negative sign. For example, 10−2 = 1012
1. Introductory Materials 6
1.3 Logarithms
Once upon a time you learned to do multiplication, after which someone
explained that you can use division to do the reverse. For example:
if
A=B∗C (1.6)
then
A
=C (1.7)
B
and
A
=B (1.8)
C
Logarithms sort of work in the same way, except that they are the back-
wards version of an exponent. (Just as division is the backwards version of
multiplication.) Logarithms (or logs) work like this:
If 102 = 100 then log10 100 = 2
Actually, it’s:
If AB = C then logA C = B
Now we have to go through some properties of logarithms.
log10 10 = 1 or log10 101 = 1
log10 100 = 2 or log10 102 = 2
log10 1000 = 3 or log10 103 = 3
This should come as no great surprise – you can check them on your
calculator if you don’t believe me. Now, let’s play with these three equations.
log10 1000 = 3
log10 103 = 3
3 ∗ log10 10 = 3
Therefore:
logC AB = B ∗ logC A
1.3.1 Warning
I once learned that you should never assume, because when you assume you
make an ass out of you and me... (get it? ass—u—me... okay... dumb joke).
One small problem with logarithms is the way they’re written. People usu-
ally don’t write the base of the log so you’ll see things like log(3) written
which usually means log10 3 – if the base isn’t written, it’s assumed to be 10.
1. Introductory Materials 7
This also holds true on most calculators. Punch in 100 and hit LOG and
see if you get 2 as an answer – you probably will. Unfortunately, this as-
sumption is not true if you’re using a computer to calculate your logarithms.
For example, if you’re using MATLAB and you type log(100) and hit the
RETURN button, you’ll get the answer 4.6052. This is because MATLAB
assumes that you mean base e (a number close to 2.7182) instead of base 10.
So, if you’re using MATLAB, you’ll have to type in log10(100) to indicate
that the logarithm is in base 10. If you’re in Mathematica, you’ll have to
use Log[10, 100] to mean the same thing.
Note that many textbooks write log and mean log10 just like your cal-
culator. When the books want you to use loge like your computer they’ll
write “ln” (pronounced “lawn”) meaning the natural logarithm.
The moral of the story is: BEWARE! Verify that you know the base of
the logarithm before you get too many wrong answers and have to do it all
again.
1. Introductory Materials 8
Figure 1.3: Wheel rotating counterclockwise when viewed from the side of the handle that’s sticking
out on the right.
Vertical displacement
Time
Figure 1.4: A record of the height of the handle over time producing a wave that appears on the
right of the wheel.
Figure 1.5: Graphs showing the relationship between the angle of rotation of the wheel and the
waveform’s X-axis.
1. Introductory Materials 11
motion.
There’s one important thing that the wave isn’t telling us – the direction
of rotation of the wheel. If the wheel were turning clockwise instead of
counterclockwise, then the wave would look exactly the same as is shown in
Figure 1.6.
Vertical displacement
Vertical displacement Time
Time
Figure 1.6: Two wheels rotating at the same speed in opposite directions resulting in the same
waveform.
Figure 1.7: Graphs showing the relationship between the angle of rotation of the wheel and the
vertical and horizontal displacements of the handle.
Keep in mind as well that if we only knew the cosine, we still wouldn’t
know the direction of rotation of the wheel – we need to know the simulta-
neous values of the sine and the cosine to know whether the wheel is going
clockwise or counterclockwise.
Now then, let’s assume for a moment that the circle has a radius of 1.
(1 centimeter, 1 foot... it doesn’t matter so long as we keep thinking in
the same units for the rest of this little chat.) If that’s the case then the
maximum value of the sine wave will be 1 and the minimum will be -1.
The same holds true for the cosine wave. Also, looking back at Figure 1.5,
we can see that the value of the sine is 1 when the angle of rotation (also
known as the phase angle) is 90◦ . At the same time, the value of the cosine
is 0 (because there’s 0 horizontal displacement at 90◦ ). Using this, we can
complete Table 1.1:
In fact, if you get out your calculator and start looking for the Sine (“sin”
on a calculator) and the Cosine (“cos”) for every angle between 0 and 359◦
(no point in checking 360 because it’ll be the same as 0 – you’ve made a
full rototation at that point...) and plot each value, you’ll get a graph that
looks like Figure 1.8.
As can be seen in Figure 1.8, the sine and cosine intersect at 45◦ (with
a value of 0.707 or √12 and at 215◦ (with a value of -0.707 or − √12 . Also,
you can see from this graph that a cosine is essentially a sine wave, but 90◦
1. Introductory Materials 13
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
Figure 1.8: The relationship between Sine (blue) and Cosine (red) for angles from 0◦ to 359◦ .
1. Introductory Materials 14
earlier. That is to say that the value of a cosine at any angle is the same as
the value of the sine 90◦ later. These two things provide a small clue as to
another way of looking at this relationship.
Look at the first 90◦ of rotation of the handle. If we draw a line from the
centre of the wheel to the location of the handle at a given angle, and then
add lines showing the vertical and horizontal displacements as in Figure 1.7,
then we get a triangle like the one shown in Figure 1.9.
Figure 1.9: A right triangle within the rotating wheel. Notice that the value of the sine wave is the
green vertical leg of the triangle, the value of the cosine is the red horizontal leg of the triangle
and the diameter of the wheel (and therefore the peak values of both the sine and cosine) is the
hypotenuse.
Now, if the radius of the wheel (the hypotenuse of the triangle) is 1, then
the vertical line is the sine of the inside angle indicated with a red arrow.
Likewise, the horizontal leg of the triangle is the cosine of the angle.
Also, we know from Pythagoreas that the square of the hypotenuse of
a right triangle is equal to the sum of the squares of the other two sides
(remember a2 + b2 = c2 where c is the length of the hypotenuse). In other
words, in the case of our triangle above where the hypotenuse is equal to
1, then the sin of the angle squared + the cosine of the angle squared = 1
squared... This is a rule (shown below) that is true for any angle.
Since this is true, then when the angle is 45◦ , then we know that the right
triangle is isoceles – meaning that the two legs other than the hypotenuse are
of equal length (take a look at the graph in Figure 1.8). Not only are they
the same length, but, their squares add up to 1. Remember that a2 +b2 = c2
and that c2 = 1. Therefore, with a little bit of math, we can see that the
value of the sine and the cosine when the angle is 45◦ is √12 because it’s the
q √
square root of 12 and 12 = √12 = √12 .
1.4.1 Radians
Once upon a time, someone discovered that there is a relationship between
the radius of a circle and its circumference. It turned out that, no matter how
big or small the circle, the circumference was equal to the radius multiplied
by 2 and multiplied again by the number 3.141592645... That number was
given the name “pi” (written π) and people have been fascinated by it ever
since. In fact, the number does’t stop where I said it did – it keeps going
for at least a billion places without repeating itself... but 9 places after the
decimal is plenty for our purposes.
So, now we have a new little equation:
0.5
−0.5
−1
0 50 100 150 200 250 300 350
0.5
−0.5
−1
0 50 100 150 200 250 300 350
0.5
−0.5
−1
0 50 100 150 200 250 300 350
Figure 1.10: Adding two sinusiods with the same frequency and different phases. The sum of the
top two waveforms is the bottom waveform.
So what? Well, most recording engineers talk about phase. They’ll say
things like “a sine wave, 135◦ late” which looks like the curve shown in
Figure 1.11.
If we wanted to be a little geeky about this, we could use the equation
below to say the same thing:
1. Introductory Materials 17
0.8
0.6
0.4
0.2
Displacement
−0.2
−0.4
−0.6
−0.8
−1
0 50 100 150 200 250 300 350
Angle of rotation (deg.)
What does this mean? Well, all it means is that we can now specify
values for a and b and, using this equation, wind up with a sinusoidal
waveform of any amplitude and phase that we want. Essentially, we just
have an alternate way of describing the waveform.
For example, where you used to say “A cosine wave with a peak ampli-
tude of 0.93 and π3 radians (60◦ ) late” you can now say:
A = 0.93
φ = π3
a = 0.93 ∗ cos( π3 ) = 0.93 ∗ 0.5 = 0.4650
b = 0.93 ∗ sin( π3 ) = 0.93 ∗ 0.8660 = 0.8054
Therefore
Numbers like this (π is another one...) that never repeat after the decimal
are called irrational numbers
Now, remember that j * j = -1. This is useful for any square root
of any negative number, you just calculate the square root of the number
pretending that it was positive,√and then stick√an j after it. So,
√ since the
square root of 16, abbreviated 16 = 4 and −1 = j, then −16 = j4.
Let’s do a couple:
√
−9 = j3 (1.14)
√
−4 = j2 (1.15)
√ √ √ √ √
Another way to think of this is −a = −1 ∗ a = −1 ∗ a = j a so:
√ √ √ √
−9 = −1 ∗ 9=j∗ 9 = j3 (1.16)
Of course, this also means that
j3 ∗ j3 = (3 ∗ 3) ∗ (j ∗ j) = −1 ∗ 9 = −9 (1.17)
(5 + 2) + j(3 + 4) (1.20)
7 + j7 (1.21)
If you’d like the short-cut rule, it’s
− 2 + j26 (1.28)
The shortened rule is:
Commutative Laws
This law says that the order of the numbers doesn’t matter when you add
or multiply. For example, 3 + 5 is the same as 5 + 3, and 3 * 5 = 5 * 3. In
the case of complex math:
Associative Laws
This law says that, when you’re adding more than two numbers, it doesn’t
matter which two you do first. For example (2 + 3) + 5 = 2 + (3 + 5). The
same holds true for multiplication.
and
Distributive Laws
This law says that, when you’re multiplying a number by the sum of two
other numbers, it’s the same as adding the results of multiplying the numbers
one at a time. For example, 2 * (3 + 4) = (2 * 3) + (2 * 4). In the case of
complex math:
Identity Laws
These are laws that are pretty obvious, but sometimes they help out. The
corresponding laws in normal math are x + 0 = x and x * 1 = x.
and
Multiplicative Inverse
Similarly to additive inverses, every number has a matching number, which,
when the two are multiplied, equals 1. The only exception to this is the
number 0, so, if x does not equal 0, then the multiplicative inverse of x is
1 1 x −1 = 1 because
x because x ∗ x = ∗ x = 1. Some books write this as x ∗ x
x−1 = x1 . In the case of complex math, things are unfortunately a little
different because 1 divided by a complex number is... well... complex.
if (a + jb) is not equal to 0 then:
1 a − jb
= 2 (1.38)
a + jb a + b2
We won’t worry too much about how that’s calculated, but we can es-
tablish that it’s true by doing the following:
a − jb
(a + jb) ∗ (1.39)
a2 + b2
(a + jb)(a − jb)
(1.40)
a2 + b2
a2 + b2
(1.41)
a2 + b2
1 (1.42)
1. Introductory Materials 25
There’s one interesting thing that results from this rule. What if we
looked for the multiplicative inverse of j? In other words, what is 1j ? Well,
let’s use the rule above and plug in (0 + 1j).
1 a − jb
= 2 (1.43)
a + jb a + b2
1 0 − j1
= 2 (1.44)
0 + j1 0 + 12
1 −1j
= (1.45)
j 1
1
= −j (1.46)
j
Weird, but true.
a + jb
(1.47)
c + jd
1
(a + jb) ∗ (1.48)
c + jd
(a + jb)(c − jd)
(1.49)
c2 + d2
2+3i
lus
imaginary
du
mo
real axis
real
imaginary axis
Figure 1.12: The relationship bewteen the real and imaginary components for the number (2 + 3j).
Notice that the X and Y axes have been labeled the “real” and “imaginary” axes.
Notice that Figure 1.12 actually winds up showing three things. It shows
the real component along the x-axis, the imaginary component along the
y-axis, and the absolute value or modulus of the complex number as the
hypotenuse of the triangle.
This should make the calculation for determining the modulus of the
complex number almost obvious. Since it’s the length of the hypotenuse of
the right triangle formed by the real and imaginary components, and since
we already know the Pythagorean theorem then the modulus of the complex
number (a + jb) is
1. Introductory Materials 27
p
modulus = a2 + b2 (1.51)
Given the values of the real and imaginary components, we can also
calculate the angle of the hypotenuse from horizontal using the equation
imaginary
φ = arctan (1.52)
real
b
φ = arctan (1.53)
a
This will come in handy later.
2. how fast it’s rotating, but we need to know both components to know
At any given moment in time, if we froze the wheel, we’d have some
contribution of these two components – a cosine component and a sine com-
ponent for a given angle of rotation. Since these two components are effec-
tively identical functions that are 90◦ apart (for example, a sine wave is the
same as a cosine that’s been delayed by 90◦ ) and since we’re thinking of the
real and imaginary components in a complex number as being 90◦ apart,
1. Introductory Materials 28
0.8
0.6
0.4
0.2
Displacement
0
−0.2
−0.4
−0.6
−0.8
−1
0 50 100 150 200 250 300 350
Angle of rotation (deg.)
then we can use complex math to describe the contributions of the sine and
cosine components to a signal.
Huh? Let’s look at an example. If the signal we wanted to look at a
signal that consisted only of a cosine wave as is shown in Figure 1.13, then
we’d know that the signal had 100% cosine and 0% sine. So, if we express
the cosine component as the real component and the sine as the imaginary,
then what we have is:
1 + 0j (1.54)
If the signal was an upside-down cosine, then the complex notation for
it would be (−1 + 0j) because it would essentially be a cosine * -1 and no
sine component. Similarly, if the signal was a sine wave, it would be notated
as (0 − 1j).
This last statement should raise at least one eyebrow... Why is the
complex notation for a positive sine wave (0 − 1j)? In other words, why is
there a negative sign there to represent a positive sine component? Well...
Actually there is no good explanation for this at this point in the book,
but it should become clear when we discuss a concept known as the Fourier
Transform in Section 8.2.
This is fine, but what if the signal looks like a sinusoidal wave that’s
been delayed a little like the one in Figure 1.14?
This signal was created by a specific combination of a sine and cosine
wave. In fact, it’s 70.7% sine and 70.7% cosine. (If you don’t know how
I arrived that those numbers, check out Equation 1.11.) How would you
1. Introductory Materials 29
0.8
0.6
0.4
0.2
Displacement
−0.2
−0.4
−0.6
−0.8
−1
0 50 100 150 200 250 300 350
Angle of rotation (deg.)
Figure 1.14: A signal consisting of a combination of attenuated cosine and sine waves with the
same frequency.
express this using complex notation? Well, you just look at the relative
contributions of the two components as before:
that a cosine wave with a peak amplitude of 0.93 and a delay of π3 radians
was equivalent to the combination of a cosine wave with a peak amplitude of
0.4650 and an upside-down sine wave with a peak amplitude of 0.8054. Since
1. Introductory Materials 30
the cosine is the real component and the sine is the imaginary component,
this can be expressed using the complex number as follows:
π
0.93 cos(n + ) = 0.4650 cos(n) − 0.8054 sin(n) (1.57)
3
which is represented as
0.4650 + j 0.8054
which is a much simpler way of doing things. (Notice that I flipped
the “-” sign to a “+.”) For more information on this, check out The
Scientist and Engineer’s Guide to Digital Signal Processing available at
www.dspguide.com
present phase (constantly changing over time)
x cos ( φ + θ )
Figure 1.15: A quick guide to what things mean when you see a cosine wave expressed as this type
of equation.
1. Introductory Materials 31
1 1 1 1
e= + + + + ... (1.58)
1! 2! 3! 4!
(If you’re not familiar with the mathematical expression “!” you don’t
have to panic! It’s short for factorial and it means that you multiply all
the whole numbers up to and including the number. For example, 5! =
1 ∗ 2 ∗ 3 ∗ 4 ∗ 5.)
How is this e useful to us? Well, there are a number of reasons, but one
in particular. It turns out that if we raise e to an exponent x, we get the
following.
x1 x2 x3 x4
ex = + + + + ... (1.59)
1! 2! 3! 4!
Unfortunately, this isn’t really useful to us. However, if we raise e to an
exponent that is an imaginary number, something different happens.
ejπ = −1 + 0 (1.62)
therefore
ejπ + 1 = 0 (1.63)
x e j(φ + θ)
Figure 1.16: A quick guide to what things mean when you see a cosine wave expressed in exponential
notation.
1. Introductory Materials 34
7 3 5 4
Thousand’s place Hundred’s place Ten’s place One’s place
7 ∗ 10004+ 3 ∗ 100+ 5 ∗ 10+ 4∗1
7 ∗ 103 + 3 ∗ 102 + 5 ∗ 101 + 4 ∗ 100
= 7354
Table 1.2: An illustration of how the location of a digit within a number determines the power of
ten by which it’s multiplied.
it’s reasonable to assume that we’d only have two digits – 0 and 1. This
means that we have to re-think the way we construct the recycling of digits
to make bigger numbers.
Let’s count using this new two-digit system...
000 (zero)
001 (one)
Now what? Well, it’s the same as in decimal, we increase our number to
two digits and keep going.
010 (two)
011 (three)
100 (four)
This time, because we only have two digits, we multiply the digits in
different locations by powers of two instead of powers of ten. So, for a big
number like 10011, we can follow the same procedure as we did for base 10.
1 0 0 1 1
Sixteen’s place Eight’s place Four’s place Two’s place One’s place
1 ∗24 + 0 ∗23 + 0 ∗22 )+ 1 ∗21 + 1 ∗20
1 ∗ 16+ 0 ∗ 8+ 0 ∗ 4+ 1 ∗ 2+ 1∗1
16 + 0+ 0+ 2+ 1
= 19
Table 1.3: A breakdown of an abritrary binary number, converting it to a decimal number. Note
that binary and decimal are used simultaneously in this table: to distinguish the two, all binary
numbers are in italics.
So, the binary number 10011 represents the same quantity as the decimal
number 19. Remember, all we’ve done is to change the method by which
we’re writing down the same thing. “19” in base 10 and “10011” in base 2
both mean “nineteen.”
This would be a good time to point out that if we add an extra digit to
our binary number, we increase the number of quantities we can represent
by a factor of two. For example: if we have a three-digit binary number, we
can represent a total of eight different numbers (000 – 111 or zero to seven).
If we add an extra digit and make it a four-digit number we can represent
sixteen different quantities (0000 – 1111 or zero to fifteen).
There are a lot of reasons why this system is good. For example, let’s
say that you had to send a number to a friend using only a flashlight to
communicate. One smart way to do this would be to flash the light on and
off with a short “on” corresponding to a “0” and a long “on” corresponding
1. Introductory Materials 37
So, now we wind up with these strange numbers that include the letters A
through F. So we’ll see something like 3D4A. What number is this exactly?
If this seems a little confusing at this point, don’t panic. It does for
everyone. I think that the confusion with hexadecimal arises from the fact
that it’s so close to decimal – you can have the number 246 in decimal and
the number 246 in hexadecimal – but these are not the same number, so
you have to translate. (for example, the German word for “poison” is “Gift”
– so if you’re reading in German, this is not a word that you should think
in English. An English “gift” and a German “Gift” are different things...
hopefully...)
Of course, this raises the question “Why would we use such a confusing
system in the first place!?” The answer actually lies back in the binary
system. All of our computers and DSP and digital audio everything use the
binary system to fire numbers around. This is inescapable. The problem
is that those binary words are just so long to write down that, if you had
1. Introductory Materials 39
3 D 4 A
4096’s place 256’s place 16’s place 1’s place
3 ∗163 + D∗162 + 4 ∗161 + A∗160
3 ∗4096+ D∗256+ 4 ∗16+ A∗1
3 ∗ 4096+ 13 ∗ 256+ 4 ∗ 16+ 10 ∗ 1
12288 + 3328 + 64 + 10
= 15690
Table 1.5: A breakdown of an abritrary hexadecimal number, converting it to a decimal number.
Note that hexadecimal and decimal are used simultaneously in this table: to distinguish the two,
all hexadecimal numbers are in italics.
to write them in a book, you’d waste a lot of paper. You could translate
the numbers into decimal, but there’s no correlation between binary and
decimal – it’s difficult to translate. However, check back to Table 3. Notice
that going from the number fifteen to the number sixteen results in the
hexadecimal number going from a 1-digit number to a 2-digit number. Also
notice that, at the same time, the binary word goes from 4 bits to 5. This is
where the magic lies. A single hexadecimal digit (0 – F) corresponds directly
to a four-bit binary word (0000 – 1111). Not only this, but if you have a
longer binary word, you can slice it up into four-bit sections and represent
each section with its corresponding hexadecimal digit. For example, take
the number 38069:
1001010010110101
Slice this number into 4-bit sections (start slicing from the right)
1001 0100 1011 0101
Now, look up the corresponding hexadecimal equivalents for each 4-bit
section using Table 1.4:
94B5
94B5
So, as can be seen from the example above, there is a direct relationship
between each 4-bit “slice” of the binary word and a corresponding hexadec-
imal number. If we were to try to convert the binary word into decimal, it
would be much more of a headache. Since this translation is so simple, and
because we use one quarter the number of digits, you’ll often see hexadeci-
mal used to denote numbers that are actually sent through the computer as
a binary word.
1. Introductory Materials 40
1.8.1 Function
When you have an equation, you are saying that something is equal to
something else. For example, look at Equation 1.66.
y = 2x (1.66)
This says that the value of y is calculated by getting a value for x and
multiplying that by 2 (I really hope that this is not coming as a surprise...).
A function is basically the busy side of an equation. Frequently, you
will see books talking about a function f (x) which just means that you do
something to x to come up with something else. For example, in Equation
1.66, the function f (x) (when you read this out loud, you say “f of x”) is
2x. Of course, this can be much more complicated, but it can still just be
thought of as a function - do some math using x and you’ll get your answer.
1.8.2 Limit
Let’s pretend that we’re standing in a concert hall with a very long reverb
time. If I clap my hands and create a sound, it starts dying away as soon
as I’ve stopped making the noise. Each second that goes by, the sound in
the concert hall gets closer and closer to no sound.
One interesting thing about this model is that, if it were true, the sound
pressure level would be reduced each second and, since half of something
1. Introductory Materials 42
can never equal nothing, there will be sound in the room forever, always
getting quieter and quieter. The level of sound is always getting closer and
closer to 0, but it never actually gets there.
This is the idea behind a limit. In this particular example, the limit of
the sound pressure level in the decaying reverb is 0 – we never get to 0, but
we always get closer to it.
There are lots of things in nature that follow this idea of a limit. The
radioactive half-life of a material is one example. (The remaining principal
on my house loan feels like another example...)
We won’t do anything with limits directly, but we’ll use them below.
Just remember that a limit is a boundary that is never reached, but you can
always get closer to it.
Think about Equation 1.67.
1
y= (1.67)
x
This is a pretty simple equation that says that y is inversely proportional
to x. Therefore, if x gets bigger, y gets smaller. For example, if we calculate
the value of y in this equation for a number of values of x, we’ll get a graph
that looks like Figure 1.17.
1
0.9
0.8
0.7
0.6
0.5
y
0.4
0.3
0.2
0.1
0
10 20 30 40 50 60 70 80 90 100
x
As x gets bigger and bigger, y will get closer and closer to 0, but it will
never reach it. If x = ∞ then y = 0, but you don’t have an ∞ button on
your calculator. If x is less than ∞ but greater than 0, then y has to be a
number that is greater than 0 because 1 divided by something can never be
1. Introductory Materials 43
nothing.
This is exactly the idea of a limit, the first concept to learn when delving
into the world of calculus. It’s the number that a function (like x1 for exam-
ple) gets closer and closer to, but never reaches. In the case of the function
1
x , its limit is 0 as x approaches ∞.) For example, take a look at Equation
1.68.
1
z = lim (1.68)
x→∞ x
Equation 1.68 says “z is equal to the number that the function x1 ap-
proaches as x gets closer to ∞.” This does not mean that x will ever get to
∞, but that it will forever get closer to it.
In the example above, x is getting closer and closer to ∞ but this isn’t
always the case in a limit. For example, Equation 1.69 shows that you can
have a limit where a number is getting closer and closer to something other
than ∞.
sin(x)
lim =1 (1.69)
x→0 x
If x = 0, then we get a nasty number from f (x), but as x approaches 0,
then f (x) approaches 1 because sin(x) gets closer and closer to x as x gets
closer and closer to 0.
As I said in the introduction, calculus is just math than can cope with
infinity. In the case of limits, we’re talking about numbers that we get
infinitely close to, but never reach.
1.5
0.5
sin(x)
0
-0.5
-1
-1.5
-2
0 1 2 3 4 5 6
x (radians)
Figure 1.18: A graph of y = sin(x) showing the point x = 2 where we want to find the slope of
the graph.
1.5
-0.5
-1
-1.5
-2
0 1 2 3 4 5 6
x (radians)
Figure 1.19: A graph of y = sin(x) showing the tangent to the curve at the point x = 2. The slope
of the tangent is the slope of the curve at that point.
1. Introductory Materials 45
1.5
rise
0.5
sin(x)
run
0
-0.5
-1
-1.5
-2
0 1 2 3 4 5 6
x (radians)
Figure 1.20: A graph of y = sin(x) showing an estimate to the tangent to the curve at the point
x = 2 by drawing a line through the curve at x1 = 2 and x2 = 3.
As we can see in this example, the run for the line we’ve created is
1 (because it’s 3 − 2) and the rise is -0.768 (because it’s sin(3) − sin(2)).
Therefore the slope is -0.768.
This method of approximating will give us a slope that is pretty close
to the slope at the point we’re interested in, but how to we make a better
approximation? One way to do it is to reduce the distance between the
point we’re looking for and the points where we’re drawing the line. For
example, looking at Figure 1.21, we’ve changed the nearby points to x1 = 2
and x2 = 2.5, which gives us a line with a slope of -0.622. This is a little
closer to the real answer, but still not perfect.
As we get the points closer and closer together, the slope of the line gets
closer and closer to the right answer. When the two points are infinitely
close to x = 2 (therefore, they are at x1 = 2 and x2 = 2 because two points
that are infinitely close are in the same place) then the slope of the line is
the slope of the curve at that point.
What we’re doing here is using the idea of a limit – as the run of the
line (the horizontal distance between our two points) approaches the limit
of 0, then the slope of the line approaches the slope of the curve at the point
we’re looking at.
This is the idea behind something called the derivative of a function.
The curve that we’re looking at can be described by an equation where the
value of y is determined by some math that we do using the value of x. For
example, if the equation is
1. Introductory Materials 46
1.5
rise
0.5 run
sin(x)
0
-0.5
-1
-1.5
-2
0 1 2 3 4 5 6
x (radians)
Figure 1.21: A graph of y = sin(x) showing a better estimate to the tangent to the curve at the
point x = 2 by drawing a line through the curve at x1 = 2 and x2 = 2.5.
y = sin(x) (1.70)
then we get the curve seen above in Figure 1.18. In this particular case,
given a value of x, we can figure out the value of y. As a result we say that
y is a function of x or
y = f (x) (1.71)
So, the derivative of f (x) is just another equation that gives you the slope
of the curve at any value of x. In mathematical language, the derivative of
f (x) is written in one of two ways. This simplest is if we just we write f 0 (x)
which means “the derivative (or the slope) of the function f (x)” (remember:
derivative is just a fancy word for slope).
If you’re dealing with an equation where y = f (x) as we’ve seen above in
this chapter, then you’re looking for the “derivative of y with respect to x.”
This is just a fancier way of saying “what’s the slope of f (x)?” We don’t
need to say “f of x” because it’s easier to say “y” but we need to say “with
respect to x” because the slope changes as x changes. Therefore there is a
relationship between the derivative of y and the value of x. If you want to
use mathematical language to write “derivative of y with respect to x,” you
write
dy
(1.72)
dx
1. Introductory Materials 47
dy
= f 0 (x) (1.73)
dx
There is one important thing to remember when you see this symbol.
dy
dx is one thing that means “the derivative of y with respect to x.” It is
not something divided by something else. This is a stand-alone symbol that
doesn’t have anything to do with division.
Let’s look at a practical example. Any introductory calculus book will
tell you that the derivative of a sine function is a cosine. (We don’t need
to ask why at this point, but if you think back to the spinning wheel and
the horizontal and vertical components of its movement, then it might make
sense intuitively.) What does this mean? Well, that means that the slope
of a sine wave at some value of x is equal to the cosine of the same value of
x. This is written as is shown in Equation 1.74.
f (x + h) − f (x)
f 0 (x) = lim (1.75)
h→0 h
Huh? What Equation 1.75 says is that we’re looking for the value that
the slope of the line drawn between two points separated in the x-axis by
the value h approaches as h approaches 0. For example, in Figure 1.20,
h = 1. In Figure 1.21, h = 0.5. Remember that h is just the “run” and
f (x + h) − f (x) is just the rise of the triangle shown in those plots. As
h approaches 0, the slope of the hypotenuse gets closer and closer to the
answer that we’re looking for.
Just to make things really miserable, I’ll let you know that you can have
beasts like the vicious double derivative written f 00 (x). This just means that
you’re looking for the slope of the slope of a function. So, we’ve already seen
1. Introductory Materials 48
1.8.4 Sigma - Σ
A Σ (the Greek capital letter Sigma) in an equation is just a lazy way of
writing “the sum of” whatever follows it. The stuff under and over it give
you an indication of when to start and when to stop adding. Let’s look at
a simple example shown in Equation 1.76...
10
X
y= x (1.76)
x=1
Equation 1.76 says “y equals the sum of all of the values of x from x = 1
to x = 10. The Σ sign says “add up everything from...” the “x = 1” at the
bottom says where to start adding, the “10” on top says when to stop, and
the “x” after the Σ tells you what you’re adding. So:
10
X
x = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55 (1.77)
x=1
5
X
sin(x) = sin(1) + sin(2) + sin(3) + sin(4) + sin(5) (1.78)
x=1
1.8.5 Delta - ∆
A ∆ (a Greek capital Delta) means “the change in...” or “the difference in.”
So, ∆x means “the change in x.” This is useful if you’re looking at a value
that’s changing like the speed of a car when you’re accelerating.
0.8
sin(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x (radians)
We’ve seen in the previous section how to express the equation for finding
the slope of this function, but what if we wanted to find the area under the
graph? How do we find this? Well, the way we found the slope was initially
to break the curve into shorter and shorter straight line components. We’ll
do basically the same thing here, but we’ll make rectangles by creating
simplified slices of the area under the curve.
For example, compare Figure 1.22 to Figure 1.23. What we’ve done in
the second one is to consider the sine wave as 10 discrete values, and draw
a rectangle for each. The width of each rectangle is equal to n1 of the total
length of the curve where n is the number of rectangles. For example, in
Figure 1.23, there are 10 rectangles (so n = 10) in a total width of π (the
π
length of the curve on the x-axis) so each rectangle is 10 wide. The height of
each rectangle is the value of the function (in our case, sin(x) for the series
of values of x, 0π 1π 2π nπ
n , n , n , and so on up to n .
If we add the areas of these 10 rectangles together, we’ll get an approxi-
1. Introductory Materials 50
1.2
0.8
sin(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x (radians)
mation of the area under the curve. How do we express this mathematically?
This is shown in Equation ??.
9
X i π
A= sin (1.79)
10 10
i=0
What does this mess mean? Well, we already know that n is the number
of rectangles. The An on the left side of the equation means “the area
contained in n rectangles.” The right side of the equation says that we’re
π
going to add up the product of the sin P
of some changing number and 10 ten
times (because the number under the is 1 and the number above is 10).
Therefore, Equation ?? written out the long way is shown in Equation
1.80
0π π 1π π 2π π 9π π
A = sin + sin + sin + ... + sin (1.80)
10 10 10 10 10 10 10 10
Each component in this sum of ten components that are added together
iπ π
is the product of the height (sin 10 ) and width ( 10 ) of each rectangle. (All
of those 10’s in there are there because we have divided the shape into 10
rectangles.) Therefore, each component is the area of a rectangle, which,
when all added together give an approximation of the area under the curve.
There is a general equation that describes the way we divided up the area
called the Riemann Sum. This is the general method of cutting a shape into
1. Introductory Materials 51
1.2
0.8
sin(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x (radians)
1.2
0.8
sin(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x (radians)
1.2
0.8
sin(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x (radians)
A is equal to all of the vertical “slices” of the space under the curve f (x)
from a to b. Each of these vertical slices has a width of dx. Essentially, dx
has an infinitely small width (in other words, the width is 0) but there are
an infinite number of the slices, so the whole thing adds up to a number.
y y = f(x)
a b x
dx
Figure 1.27: A graphic representation of what is meant by each of the components in Equation
1.82
Analog Electronics
55
2. Analog Electronics 56
e e
e e
e e e
e e
e e
e e e N e e e e
e e
e e
e e e
e e
e e
Figure 2.1: The structure of a copper atom showing the arrangement of the electrons in valence
shells orbiting the nucleus.
The drain of the two sinks feed into a single drain which has built up some
rust inside over the years (I live in an old building).
When I fill up one sink with water and pull the plug, the water goes
down the drain, but can’t get down through the bottom drain as quickly as
it should, so the result is that my second sink fills up with water coming up
through its drain from the other sink.
Why does this happen? Well – the first answer is “gravity” – but there’s
more to it than that. Think of the two sinks as two tanks joined by a pipe
at their bottoms. We’ll put different amounts of water in each sink.
The water in the sink on the left weighs a lot – you can prove this by
trying to lift the tank. So, the water is pushing down on the tank – but
we also have to consider that the water at the top of the tank is pushing
down on the water at the bottom. Thus there is more water pressure at the
bottom than the top. Think of it as the molecules being squeezed together
at the bottom of the tank – a result of the weight of all the water above it.
Since there’s more water in the tank on the left, there is more pressure at
the bottom of the left tank than there is in the right tank.
Now consider the pipe. On the left end, we have the water pressure
trying to push the water through the pipe, on the right end, we also have
pressure pushing against the water, but less so than on the left. The result
is that the water flows through the pipe from left to right. This continues
until the pressure at both ends of the pipe is the same – or, we have the
same water level in each tank.
We also have to think about how much water flows through the pipe in
a given amount of time. If the difference in water pressure between the two
ends is quite high, then the water will flow quite quickly though the pipe.
If the difference in pressure is small, then only a small amount of water will
flow. Thus the flow of the water (the volume which passes a point in the
pipe per amount of time) is proportional on the pressure difference. If the
pressure difference goes up, then the flow goes up.
The same can be said of electricity, or the flow of electrons through
a wire. If we connect two “tanks” of electrons, one at either end of the
wire, and one “tank” has more electrons (or more pressure) than the other,
then the electrons will flow through the wire, bumping from atom to atom,
until the two tanks reach the same level. Normally we call the two tanks a
battery. Batteries have two terminals – one is the opening to a tank full of
too many electrons (the negative terminal – because electrons are negative)
and the other the opening to a tank with too few electrons (the positive
terminal). If we connect a wire between the two terminals (don’t try this at
home!) then the surplus electrons at the negative terminal will flow through
2. Analog Electronics 58
to the positive terminal until the two terminals have the same number of
electrons in them. The number of surplus electrons in the tank determines
the “pressure” or voltage (abbreviated V and measured in volts) being put on
the terminal. (Note: once upon a time, people used to call this electromotive
force or EMF but as knowledge increases from generation to generation, so
does laziness, apparently... So, most people today call it voltage instead of
EMF.) The more electrons, the more voltage, or electrical pressure. The
flow of electrons in the wire is called current (abbreviated I and measured
in amperes or amps) and is actually a specific number of electrons passing
a point in the wire every second (6,250,000,000,000,000,000 to be precise).
(Note: some people call this “amperage” – but it’s not common enough to
be the standard... yet...) If we increase the voltage (pressure) difference
between the two ends of the wire, then the current (flow) will increase, just
as the water in our pipe between the two tanks.
There’s one important point to remember when you’re talking about
current. Due to a bad guess on the part of Benjamin Franklin, current flows
in the opposite direction to the electrons in the wire, so while the electrons
are flowing from the negative to the positive terminal, the current is flowing
from positive to negative. This system is called conventional current theory.
There are some books out there that follow the flow of electrons – and
therefore say that current flows from negative to positive. It really doesn’t
matter which system you’re using, so long as you know which is which.
Let’s now replace the two tanks by a pump with pipe connecting its
output to its input – that way, we won’t run out of water. When the pump
is turned on, it acts just like the fan in chapter 1 – it decreases the water
pressure at its input in order to increase the pressure at its output. The
water in the pipe doesn’t enjoy having different pressures at different points
in the same pipe so it tries to equalize by moving some water molecules out
of the high pressure area and into the low pressure area. This creates water
flow through the pipe, and the process continues until the pump is switched
off.
higher the flow; the greater the restriction, the smaller the flow.
We’ll also have a situation where the pressure at the input to the restric-
tion is different than that at the output. This is because the water molecules
are bunching up at the point where they are trying to get through the smaller
pipe. In fact the pressure at the output of the pump will be the same as
the input of the restriction while the pressure at the input of the pump will
match the output of the restriction. We could also say that there is a drop
in pressure across the smaller diameter pipe.
We can have almost exactly the same scenario with electricity instead of
water. The electrical equivalent to the restriction is called a resistor . It’s a
small component which resists the current, or flow of electrons. If we place a
resistor in the wire, like the restriction in the pipe, we’ll reduce the current
as is shown in Figure 2.2.
Constriction
Resistor
Battery
Pump
Figure 2.2: Equivalent situations showing the analogy between an electrical circuit with a battery
and resistor to a plumbing network consisting of a pump and a constriction in the pipe. In both
cases the flow (of electrical current or water, depending) runs clockwise around the loop.
In order to get the same current as with a single wire, we’ll have to
increase the voltage difference. Therefore, the higher the voltage difference,
the higher the current; bigger the resistor, the smaller the current. Just as
in the case of the water, there is a drop in voltage across the resistor. The
voltage at the output of the resistor is lower than that at its input. Normally
this is expressed as an equation called Ohm’s Law which goes like this:
Voltage = Current * Resistance
or
V = IR (2.1)
P =VI (2.2)
where P is in watts, V is in volts and I is in amps.
Just as Ohm’s law defines the ohm, Watt’s law defines the watt to be
the amount of power consumed by a device which, when supplied with 1
volt of difference across its terminals will use 1 amp of current.
We can create a variation on Watt’s law by combining it with Ohm’s law
as follows:
P = V I and V = IR
therefore
P = (IR)I (2.3)
P = I 2R (2.4)
and
V
P =V (2.5)
R
V2
P = (2.6)
R
Note that, as is shown in the equation above on the right, the power is
proportional to the square of the voltage. This gem of wisdom will come in
handy later.
are more electrons in the tested point in the circuit than there are in the
reference point, therefore more negative charge. If you think of this in terms
of the two tanks of water – if we’re sitting at the bottom of the empty tank,
and we measure the relative pressure of the full one, its pressure will be more,
and therefore positive relative to your reference. If you’re at the bottom of
the full tank and you measure the pressure at the bottom of the empty one,
you’ll find that it’s less than your reference and therefore negative. (Two
other analogies to completely confuse you... it’s like describing someone by
their height. It doesn’t matter how tall or short someone is – if you say
they’re tall, it probably means that they’re taller than you.
Secondly, the idea that the voltage is fluctuating. When you plug your
coffee maker into the wall, you’ll notice that the plug has two terminals. One
is a reference voltage which stays constant (normally called a “cold” wire
in this case...) and one is the “hot” wire which changes in voltage realtive
to the cold wire. The device in the coffee maker which is doing the work is
connected with each of these two wires. When the voltage in the hot wire is
positive in comparasion to the cold wire, the current flows from hot through
the coffee maker to cold. One one-hundred and twentieth of a second later
the hot wire is negative compared to the cold, the current flows from cold
to hot. This is commonly known as alternating current or AC.
So remember, alternating current means that both the voltage and the
current are changing in time.
2.1.6 RMS
Look at a light bulb. Not directly – you’ll hurt your eyes – actually let’s
just think of a lightbulb. I turn on the switch on my wall and that closes a
connection which sends electricity to the bulb. That electricity flows through
the bulb which is slightly resistive. The result of the resistance in the bulb
is that it has to burn off power which is does by heating up – so much that
it starts to glow. But remember, the electricity which I’m sending to the
bulb is not constant – it’s fluctuating up and down between -170 and 170
volts. Since it takes a little while for the bulb to heat up and cool down,
its always lagging behing the voltage change – actually, it’s so slow that it
stays virtually constant in temperature and therefore brightness.
This tendancy is true for any resistor in an AC circuit. The resistor does
not respond to instantaneous voltage values – instead, it burns off an average
amount of power over time. That average is essentially an equivalent DC
voltage that would result in the same power dissipation. The question is,
how do we calculate it?
2. Analog Electronics 63
200
150
100
50
Voltage (V)
-50
-100
-150
-200
0 50 100 150 200 250 300 350
Time
Figure 2.3: A full cycle of a sinusoidal AC waveform with a level of 170Vp . Note that the total
average for this waveform would be 0 V because the negative voltage is identically opposite to the
positive voltage.
200
180
160
140
120
Voltage (V)
100
80
60
40
20
0
0 20 40 60 80 100 120 140 160 180
Time
Figure 2.4: The positive half of one cycle of a sinusoidal AC waveform with a level of 170Vp . Note
that the total average for this waveform would be 63.6% of the peak voltage as is shown by the
red horizontal line at 108.1 V (63.6% of 170Vp ).
170V 2
= 28900W atts (2.7)
1Ω
But this is the power consumption for one point in time, when the volt-
age level is actually at 170 V. The rest of the time, the voltage is either
swinging on its way up to 170 V or on its way down from 170 V. The power
consumption curve would no longer be a sine wave, but a sin2 wave. Think
of it as taking all of those 180 voltage measurements and squaring each one.
From this list of 180 numbers (the instantaneous power consumption for
each of the 180◦ ) we can find the average power consumed for a half of a
waveform. This number turns out to be 0.5 of the peak power, or, in the
above case, 0.5*28900 Watts, or 14450 W as is shown in Figure 2.5.
This gives us the average power consumption of the resistor, but what
is the equivalent DC voltage which would result in this consumption? We
find this by using Watt’s law in reverse as follows:
V2
P = (2.8)
R
V2
14450W = (2.9)
1Ω
√
14450 = V (2.10)
2. Analog Electronics 65
4
x 10
3
2.5
Power (W) 2
1.5
0.5
0
0 20 40 60 80 100 120 140 160 180
Time
Figure 2.5: The power consumed during the positive half of one cycle of a sinusoidal AC waveform
with a level of 170Vp and a resistance of 1 Ω. Note that the total average for this waveform would
be 50% of the peak power as is shown by the red horizontal line at 14450 W (50% of 28900 W).
V = 120V (2.11)
Vpeak
Crestf actor = (2.12)
VRM S
So, the crest factor of a sine wave is 1.41. The crest factor of a square
wave is 1.
2. Analog Electronics 66
This causes a small problem when you’re using a digital volt meter.
The reading on these devices ostensibly show you the RMS value of the
AC waveform you’re measuring, but they don’t really measure the RMS
value. They measure the peak value of the wave, and then multiply that
value by 0.707 – therefore they’re assuming that you’re measuring a sine
wave. If the waveform is anything other than a sine, then the measurement
will be incorrect (unless you’ve thrown out a ton of money on a True RMS
multimeter...)
0.8
0.6
Voltage (V)
0.4
0.2
-0.2
0 100 200 300 400 500 600 700 800 900 1000
Time
Take a look at the signal in Figure 2.6. This signal usually has a pretty
low level, but there’s a spike in the middle of it. This signal is comprised of
a string of 1000 values, numbered from 1 to 1000. If we assume that this a
voltage level, then it can be converted to a power value by squaring it (we’ll
keep assuming that the resistance is 1 Ω). That power curve is shown in
Figure 2.7.
2. Analog Electronics 67
0.8
0.6
Power (W)
0.4
0.2
0 100 200 300 400 500 600 700 800 900 1000
Time
Figure 2.7: The power dissapation resulting from the signal in Figure 2.6 being sent through a 1 Ω
resistor.
Now, let’s make a running average of the values in this signal. One way
to do this would be to take all 1000 values that are plotted in Figure 2.7 and
find the average. Instead, let’s use an average of 100 values (the length of
this window in time is our time constant). So, the first average will be the
values 1 to 100. The second average will be 2 to 101 and so on until we get
to the average of values 901 to 1000. If these averages are plotted, they’ll
look like the graph in Figure 2.8.
There are a couple of things to note about this signal. Firstly, notice
how the signal gradually ramps in at the beginning. This is becuase, as the
time window that we’re using for the average gradually “slides” over the
transition from no signal to a low-level signal, the total average gradually
increases. Also notice that what was a very short, very high level spike in
the signal in the instantaneous power curve becomes a very wide (in fact,
the width of the time constant), much lower-level signal (notice the scale
of the y-axis). This is because the short spike is just getting thrown into
an average with a lot of low-level signals, so the RMS value is much lower.
Finally, the end ramps out just as the beginning ramped in for the same
reasons.
So, we can now see that the RMS value is potentially much smaller than
the peak value, but that this relationship is highly dependent on the time
constant of the RMS detection. The shorter the time constant, the closer the
RMS value is to the instantaneous peak value (in fact, if the time constant
was infinitely short, then the RMS would equal the peak...).
The moral of the story is that it’s not enough to just know that you’re
2. Analog Electronics 68
0.05
0.04
0.03
0.01
-0.01
0 100 200 300 400 500 600 700 800 900
Time
Figure 2.8: The average power dissapation of 100 consecutive values from the curve in Figure 2.7.
For example, value 1 in this plot is the average of values 1 to 100 in Figure 2.7, value 2 is the
average of values 2 to 101 in Figure 2.7 and so on.
being given the RMS value, you’ll also need to know what the time constant
of that RMS value is.
2.2.1 Gain
Lesson 1 for almost all recording engineers comes from the classic movie
“Spinal Tap” [] where we all learned that the only reason for buying any
piece of audio gear is to make things louder (“It goes all the way up to 11...”)
The amount by which a device makes a signal louder or quieter is called the
gain of the device. If the output of the device is two times the amplitude of
the input, then we say that the device has a gain of 2. This can be easily
calculated using Equation 2.13.
amplitudeout
gain = (2.13)
amplitudein
Note that you can use gain for evil as well as good - you can have a gain
of less than 1 (but more than 0) which means that the output is quieter
than the input.
If the gain equals 1, then the output is identical to the input.
If the gain is 0, then this means that the output of the device is 0,
regardless of the input.
Finally, if the device has a negative gain, then the output will have an
opposite polarity compared to the input. (As you go through this section,
you should always keep in mind that a negative gain is different from a gain
with a negative value in dB... but we’ll straighten this out as we go along.
(Incidentally, Lesson 2, entitled “How to wrap a microphone cable with
one hand while holding a chili dog and a styrofoam cup of coffee in the other
hand and not spill anything on your Twisted Sister World Tour T-shirt” will
be addressed in a later chapter.)
ratio (the loudest sound is 10,000,000 times louder than the softest). This
range is simply too big. So a group of people at Bell Labs decided to
represent the same scale with smaller numbers. They arrived at a unit of
measurement called the Bel (named after Alexander Graham Bell – hence
the capital B.) The Bel is a measurement of power difference. It’s really just
the logarithm of the ratio of two powers (Power1:Power2 or PP ower2
ower1
). So to
find out the difference in two power measurements measured in Bels (B).
We use the following equation.
P ower1
∆P ower(Bels) = log (2.14)
P ower2
Let’s leave the subject for a minute and talk about measurements. Our
basic unit of length is the metre (m). If I were to talk about the distance
between the wall and me, I would measure that distance in metres. If I
were to talk about the distance between Vancouver and me, I would not
use metres, I would use kilometres. Why? Because if I were to measure
the distance between Vancouver and me in metres the number would be
something like 5,000,000 m. This number is too big, so I say I’ll measure it
in kilometres. I know that 1 km = 1000 m therefore the distance between
Vancouver and me is 5 000 000 m / 1 000 m/km = 5 000 km. The same
would apply if I were measuring the length of a pencil. I would not use
metres because the number would be something like 0.15 m. It’s easier to
think in centimetres or millimetres for small distances – all we’re really doing
is making the number look nicer.
The same applies to Bels. It turns out that if we use the above equation,
we’ll start getting small numbers. Too small for comfort; so instead of using
Bels, we use decibels or dB. Now all we have to do is convert.
There are 10 dB in a Bel, so if we know the number of Bels, the number
of decibels is just 10 times that. So:
1Bel
1dB = (2.15)
10
P ower1
∆P ower(dB) = 10 log (2.16)
P ower2
So that’s how you calculate dB when you have two different amounts of
power and you want to find the difference between them. The point that I’m
trying to overemphasize thus far is that we are dealing with power measure-
ments. We know that power is measured in watts (Joules per second) so we
2. Analog Electronics 71
use the above equation only when the ratio is comparing two measurements
in watts.
What if we wanted to calculate the difference between two voltages (or
electrical pressures)? Well, Watt’s Law says that:
V oltage2
P ower = (2.17)
Resistance
or
V2
P = (2.18)
R
Therefore, if we know our two voltages (V1 and V2) and we know the
resistance stays the same:
P ower1
∆P ower(dB) = 10 log (2.19)
P ower2
2
!
V1
R
= 10 log V 22
(2.20)
R
2
V1 R
= 10 log ∗ (2.21)
R V 22
2
V1
= 10 log (2.22)
V 22
V1 2
= 10 log (2.23)
V2
V1
= 2 ∗ 10 log (2.24)
V2
(because log AB = B ∗ log A) (2.25)
V1
= 20 log (2.26)
V2
That’s it! (Finally!) So, the moral of the story is, if you want to compare
two voltages and express the difference in dB, you have to go through that
last equation.
Remember, voltage is analogous to pressure. So if you want to compare
two pressures (like 20 ∗ 10−6 Pa and 200000000 ∗ 10−6 Pa) you have to use
the same equation, just substitute V1 and V2 with P1 and P2 like this:
P ressure1
∆P ower(dB) = 2 ∗ 10 log (2.27)
P ressure2
2. Analog Electronics 72
This is all well and good if you have two measurements (of power, voltage
or pressure) to compare with each other, but what about all those books
that say something like “a jet at takeoff is 140 dB loud.” What does that
mean? Well, what it really means is “the sound a jet makes when it’s taking
off is 140 dB louder than...” Doesn’t make a great deal of sense... Louder
than what? The first measurement was the sound pressure of the jet taking
off, but what was the second measurement with which it’s compared?
This is where we get into variations on the dB. There are a number of
different types of dB which have references (second measurements) already
supplied for you. We’ll do them one by one.
2.2.3 dBspl
The dBspl is a measurement of sound pressure (spl stand for Sound Pressure
Level). What you do is take a measurement of, say, the sound pressure of
a jet at takeoff (measured in Pa). This provides Power1. Our reference
Power2 is given as the sound pressure of the softest sound you can hear,
which we have already said is 20 ∗ 10−6 Pa.
Let’s say we go to the end of an airport runway with a sound pressure
meter and measure a jet as it flies overhead. Let’s also say that, hypotheti-
cally, the sound pressure turns out to be 200 Pa. Let’s also say we want to
calculate this into dBspl. So, the sound of a jet at takeoff is :
P ressure1
P ressure(dBspl) = 20 log (2.28)
Ref erence
P ressure1
= 20 log (2.29)
20 ∗ 10−6 P a
200P a
= 20 log (2.30)
20 ∗ 10−6 P a
= 20 log 10000000 (2.31)
7
= 20 log 10 (2.32)
= 20 ∗ 7 (2.33)
= 140dBspl (2.34)
So what we’re saying is that a jet taking off is 140 dBspl which means
“the sound pressure of a jet taking off is 140 dB louder than the softest
sound I can hear.”
2. Analog Electronics 73
2.2.4 dBm
When you’re measuring sound pressure levels, you use a reference based on
the threshold of hearing (20 ∗ 10−6 Pa) which is fine, but what if you want to
measure the electrical power output of a piece of audio equipment? What is
the reference that you use to compare your measurement? Well, in 1939, a
bunch of people sat down at a table and decided that when the needles on
their equipment read 0 VU, then the power output of the device in question
should be 0.001 W or 1 milliwatt (mW). Now, remember that the power in
watts is dependent on two things – the voltage and the resistance (Watt’s
law again). Back in 1939, the impedance of the input of every piece of audio
gear was 600Ω. If you were Sony in 1939 and you wanted to build a tape
deck or an amplifier or anything else with an input, the impedance across
the input wire and the ground in the connector would have to be 600Ω.
As a result, people today (including me until my error was spotted by
Ray Rayburn) believe that the dBm measurement uses two standard refer-
ences – 1 mW across a 600Ω impedance. This is only partially the case. We
use the 1 mW, but not the 600Ω. To quote John Woram, “...the dBm may be
correctly used with any convenient resistance or impedance.” [Woram, 1989]
By the way, the m stands for milliwatt.
Now this is important: since your reference is in mW we’re dealing with
power. Decibels are a measurement of a power difference, therefore you use
the following equation:
P ower1
P ower(dBm) = 10 log (2.35)
1mWRM S
Where Power1 is measured in mWRM S .
What’s so important? There’s a 10 in there and not a 20. It would be
20 if we were measuring pressure, either sound or electrical, but we’re not.
We’re measuring power.
2.2.5 dBV
Nowadays, the 600Ω specification doesn’t apply anymore. The input impedance
of a tape deck you pick up off the shelf tomorrow could be anything – but
it’s likely to be pretty high, somewhere around 10 kΩ. When the impedance
is high, the dissipated power is low, because power is inversely proportional
to the resistance. Therefore, there may be times when your power measure-
ment is quite low, even though your voltage is pretty high. In this case, it
makes more sense to measure the voltage rather than the power. Now we
2. Analog Electronics 74
need a new reference, one in volts rather than watts. Well, there’s actually
two references... The first one is 1VRM S . When you use this reference, your
measurement is in dBV.
So, you measure the voltage output of your piece of gear – let’s say a
mixer, for example, and compare that measurement with the 1VRM S refer-
ence, using the following equation.
V oltage1
V oltage(dBV ) = 20 log (2.36)
1VRM S
Where Voltage1 is measured in VRM S .
Now this time it’s a 20 instead of a 10 because we’re measuring pressure
and not power. Also note that the dBV does not imply a measurement
across a specific impedance.
2.2.6 dBu
Let’s think back to the 1mW into 600Ω situation. What will be the voltage
required to generate 1mW in a 600Ω resistor?
V2
P = (2.37)
R
theref oreV 2 = P ∗R (2.38)
√
V = P ∗R (2.39)
p
= 1mWRM S ∗ 600Ω (2.40)
p
= 0.001WRM S ∗ 600Ω (2.41)
√
= 0.6 (2.42)
= 0.774596669VRM S (2.43)
Therefore, the voltage required to generate the reference power was
about 0.775VRM S . Nowadays, we don’t use the 600Ω impedance anymore,
but the rounded-off value of 0.775VRM S was kept as a standard reference.
So, if you use 0.775VRM S as your reference voltage in the equation like this:
V oltage1
V oltage(dBu) = 20 log (2.44)
0.775VRM S
your unit of measure is called dBu. Where Voltage1 is measured in
VRM S .
(It used to be called dBv, but people kept mixing up dBv with dBV and
that couldn’t continue, so they changed the dBv to dBu instead. You’ll still
2. Analog Electronics 75
see dBv occasionally – it is exactly the same as dBu... just different names
for the same thing.)
Remember – we’re still measuring pressure so it’s a 20 instead of a 10,
and, like the dBV measurement, there is no specified impedance.
2.2.7 dB FS
The dB FS designation is used for digital signals, so we won’t talk about
them here. They’re discussed later in Chapter 9.1.
dBm
P ower1
P ower(dBm) = 10 log (2.46)
1mWRM S
where Power1 is measured in mWRM S .
2. Analog Electronics 76
dBV
V oltage1
V oltage(dBV ) = 20 log (2.47)
1VRM S
where Voltage1 is measured in VRM S .
dBu
V oltage1
V oltage(dBu) = 20 log (2.48)
0.775VRM S
where Voltage1 is measured in VRM S .
2. Analog Electronics 77
1.5 kΩ 500 Ω
+
9V
that the current must flow through in order to make its way through the
entire circuit.
So, how much current will flow through this system? That depends on
the two resistances. If we have a 9 V battery, a 1.5 kohm resistor and a 500
Ω resistor, then the total resistance is 2 kohms. From there we can just use
Ohm’s law to figure out the total current running through the system.
V = IR (2.49)
theref ore (2.50)
V
I = (2.51)
R
9V
= (2.52)
1500Ω + 500Ω
9
= (2.53)
2000Ω
= 0.0045A (2.54)
= 4.5mA (2.55)
Remember that this is not only the current flowing through the entire
system, it’s also therefore the current running through each of the two re-
sistors. This piece of information allows us to go on to calculate the amount
of voltage drop across each resistor.
V2 = I2 R2 (2.59)
= 4.5mA ∗ 500Ω (2.60)
= 0.0045A ∗ 500Ω (2.61)
= 2.25V (2.62)
500 Ω
1.5 kΩ
+
9V
If this is causing some difficulty, think back to the example at the top of
this page where we had a shower running while a toilet was flushing in the
same house. The water pressure supplied to the house didn’t change... It’s
the same thing with a battery and two parallel resistors.
V1
I1 = (2.63)
R1
9V
= (2.64)
1.5kΩ
9V
= (2.65)
1500Ω
= 0.006A (2.66)
= 6mA (2.67)
V2
I2 = (2.68)
R2
9V
= (2.69)
500Ω
= 0.018A (2.70)
= 18mA (2.71)
One way to calculate the total current coming out of the battery here
is to calculate the two individual currents going through the resistors, and
2. Analog Electronics 81
adding them together. This will work, and then from there, we can calculate
backwards to figure out what the equivalent resistance of the pair of resistors
would be. If we did that whole procedure, we would find that the reciprocal
of the total resistance is equal to the sum of the reciprocals of the individual
resistors. (huh?) It’s like this...
1 1 1
= + (2.72)
RT otal R1 R2
1 1 R2 1 R1
= ∗ + ∗ (2.73)
RT otal R1 R2 R2 R1
1 R2 R1
= + (2.74)
RT otal R1 R2 R2 R1
1 R1 + R2
= (2.75)
RT otal R1 R2
R1 R2
RT otal = (2.76)
R1 + R2
2.4 Capacitors
Let’s go back a couple of chapters to the concept of a water pump sending
water out its output though a pipe which has a constriction in it back to
the input of the pump. We equated this system with a battery pushing
current through a wire and resistor. Now, we’re replacing the restriction in
the water pipe with a couple of waterbeds... Stay with me here... This will
make sense, I promise...
If the input of the water pump is connected to one of the waterbeds
and the output of the pump is connected to the other waterbed, and the
output waterbed is placed on top of the input waterbed, what will happen?
Well, if we assume that the two waterbeds have the same amount of water
in them before we turn on the pump (therefore the water pressure in the
two are the same... sort of...) , then, after the pump is turned on, the water
is drained from the bottom waterbed and placed in the top waterbed. This
means that we have a change in the pressure difference between the two
beds (The upper waterbed having the higher pressure). This difference will
increase until we run out of waster for the pump to move. The work the
pump is doing is assisted by the fact that, as the top waterbed gets heavier,
the water is pushed out of the bottom waterbed... Now, what does this have
to do with electricity?
We’re going to take the original circuit with the resistor and the battery
and we’re going to add a device called a capacitor in series with the resistor.
A capacitor is a device with two metal plates that are placed very close
together, but without touching (the waterbeds...). There’s a wire coming
off of each of the two plates (the pipes). Each of these plates, then can act
as a resevoir for electrons – we can push extra ones into a plate, making it
negative (by connecting the negative terminal of a battery to it...), or we
can take electrons out, making the plate positive (by connecting the positive
terminal of the battery to it...). Remember though that electrons, and the
lack-of-electrons (holes) are mutually attracted to each other. As a result,
the extra electrons in the negative plate are attracted to the holes in the
positive plate. This means that the electrons and holes line up on the sides
of the plates closest to the opposite plate – trying desperately to get across
the gap... The narrower the gap, the more attraction, therefore the more
electrons and holes we can pack in the plates... Also, the bigger the plates,
the more electrons and holes we can get in there...
This device has the capacity to store quantities of electrons and holes –
that’s why we call them capacitors. The value of the capacitor, measured
in Farads (abbreviated F) is a measure of its capacity... (we’ll leave it at
2. Analog Electronics 83
Vc
+
Just before we close the switch, let’s assume that the two plates of the
capacitor have the same number of electrons and holes in them – therefore
they are at the same potential – so the voltage across the capacitor is 0 V.
When we close the switch, the electrons in the negative terminal want to flow
to the top plate of the cap to meet the holes flowing into the bottom plate.
Therefore, when we first close the switch, we get a surge of current through
the circuit which gradually decreases as the voltage across the capacitor is
increased. The more the capacitor fills with holes and electrons. the higher
the voltage across it, and therefore the smaller the voltage across the resistor
– this in turn means a smaller current.
If we were to graph this change in the flow of current over time, it would
look like Figure 2.12:
As you can see, the longer in time after the switch has been closed, the
2. Analog Electronics 84
Figure 2.12: The change in current flowing through the resistor and into the top plate of the
capacitor after the switch is closed.
smaller the current. The graph of the change in voltage over time would be
exactly opposite to this as is shown in Figure 2.13.
Figure 2.13: The change in voltage across the capacitor after the switch is closed.
You may notice that in most books, the time axis of the graph is not
marked in seconds but in something that looks like a T – it’s called Tau
(that’s a Greek letter and not a Chinese word, in case you’re thinking that
I’m going to make a joke about Winnie the Pooh... It’s also pronounced
differently – say “tao” not “dao”). This Tao is the symbol for something
called a time constant, which is determined by the value of the capacitor
and the resistor, as in Equation 2.77 :
2. Analog Electronics 85
τ = RC (2.77)
Figure 2.14: The change in voltage across a capacitor over time if the DC source in Figure 2.11
were changed to a square wave generator.
What’s going on? Well, the voltage is applied to the capacitor, and it
starts charging, initially demanding lots of current through the resistor, but
asking for less and less all the time... When the voltage drops to the lower
half of the square wave, the capacitor starts charging (or discharging) to the
new value, initally demanding lots of current in the opposite direction and
slowly reaching the voltage. Since I said that the period of the square wave
is 10 time constants, the voltage of the capacitor just reaches the voltage of
2. Analog Electronics 86
the function generator (5 time constants...) when the square wave goes to
the other value.
Consider that, since the circuit is rounding off the square edges of the
initially applied square wave, it must be doing something to the frequency
response – but we’ll worry about that later.
Let’s now apply an AC sine wave to the input of the same circuit and
look at what’s going on at the output. The voltage of the function generator
is always changing, and therefore the capacitor is always being asked to
change the voltage across it. However, it is not changing nearly as quickly
as it was with the square wave. If the change in voltage over time is quite
slow (therefore, a low frequency sine wave) the current required to bring
the capacitor to its new (but always changing) voltage will be small. The
higher the frequency of the sine wave at the input, the more quickly the
capacitor must change to the new voltage, therefore the more current it
demands. Therefore, the current flowing through the circuit is dependent
on the frequency – the higher the frequency, the higher the current. If we
think of this another way, we could pretend that the capacitor is a resistor
which changes in value as the frequency changes – the lower the frequency,
the bigger the resistor, because the smaller the current. This isn’t really
what’s going on, but we’ll work that out in a minute.
The lower the frequency, the lower the current – the smaller the capacitor
the lower the current (because it needs less current to change to the new
voltage than a bigger capacitor). Therefore, we have a new equation which
describes this relationship:
1 1
XC = = (2.78)
2πf C ωC
Where f is the frequency in Hz, C is the capacitance in Farads, and π is
3.14159264...
What’s XC ? It’s something called the capacitive reactance of the capac-
itor, and it’s expressed inΩ. It’s not the same as resistance for two reasons
– firstly, resistance burns power if it’s resisting the flow of current... when
current is impeded by capacitic reactance, there is no power lost. It’s also
different from a resistor becasue there is a different relationship between the
voltage and the current flowing through (or into) the device. For resistors,
Ohm’s Law tells us that V=IR, therefore if the resistor stays the same and
the voltage goes up, the current goes up at the same time. Therefore, we
can say that, when an AC voltage is applied to a resistor, the flow of current
through the resistor is in phase with the voltage. (when V is 0, I is 0, when
V is maximum, I is maximum and so on...) In a capacitive circuit (one
2. Analog Electronics 87
where the reactance of the capacitor is much greater than the resistance of
the resistor and the two are in series...) the current preceeds the voltage (re-
member the time constant curves... voltage changes slowly, current changes
quickly...) by 90◦ . This also means that the voltage across the resistor is
90◦ ahead of the voltage across the capacitor (because the voltage across the
resistor is in phase with the current through it and into the capacitor)
As far as the function generator is concerned, it doesn’t know whether
the current it’s being asked to supply is determined by resistance or reac-
tance... all it sees is some THING out there, impeding the current flow
differently at different frequencies (the lower the frequency, the higher the
impedance...) This impedance is not simply the addition of the resistance
and the reactance, because the two are not in phase with each other... in
fact they’re 90◦ out of phase. The way we calculate the total impedance
of the circuit is by finding the square root of the sum of the squares of the
resistance and the reactance or :
q
Z= R2 + XC2 (2.79)
1 1 1 1
= + + ... + (2.81)
Ctotal C1 C2 Cn
Note that both of these equations are very similar to the ones for re-
sistors, except that we use them “backwards.” That is to say that the
equations for series resistors is the same as for parallel capacitors, and the
one for parallel resistors is the same as for series capacitors.
Figure 2.15: A first-order RC circuit. This is called “first-order” because there is only 1 reactive
component in it (the capacitor).
In this circuit, the lower the frequency, the higher the impedance, and
therefore the lower the current flowing through the circuit. If we’re at 0 Hz,
there is no current flowing through the circuit. If we’re at infinity Hz (this
is very high...) then the capacitor has a capacitive reactance of 0Ω, and the
impedance of the circuit is the resistance of the resistor. This can be see in
Figure 2.16.
We also talked about how, at low frequencies, the circuit is considered
to be capacitive (because the capacitive reactance is MUCH greater than
the resistor value and therefore the resistor is negligible in comparason).
When the circuit is capacitive, the current flowing through the resistor
into the capacitor is changing faster than the voltage across the capacitor.
We said last week, that, in this case, the current is 90 degree ahead of the
voltage. This also means that the voltage across the resistor (which is in
phase with the current) is 90◦ ahead of the voltage across the capacitor.
This is shown in Figure 2.17.
Let’s look at the voltage across the capacitor as we change the voltage.
At very low frequencies, the capacitor has a very high capacitive reactance,
therefore the resistance of the resistor is negligible in comparison. If we
2. Analog Electronics 90
Figure 2.16: The resistance, capacitive reactance and total impedance of the above circuit. Notice
that there is one frequency where XC is equal to R.
Figure 2.17: The relationship in the time domain between VIN (the voltage difference across the
function generator), VR (the voltage across the resistor), and VC (the voltage across the capacitor).
Time is passing from left to right, so VC is later than VIN which is preceeded by VR . Note as well
that VR is 90◦ ahead of VC .
2. Analog Electronics 91
Figure 2.18: The output level of the voltages across the resistor and capacitor relative to the voltage
of the sine wave generator in Figure 2.15 for various frequencies.
Note that we’re specifying the voltage as a level relative to the input of
the circuit, expressed in dB. The frequency at which the output (the voltage
drop across the capacitor) is 3 dB below the input (that is to say -3 dB) is
called the cutoff frequency (fc ) of the circuit. (We may as well start calling
it a filter, since it’s filtering different frequencies differently... since it allows
low frequencies to pass through unchanged, we’ll call it a low-pass filter.)
The fc of the low-pass filter can be calculated if you know the values of
the resistor and the capacitor. The equation is shown in Equation 2.82:
1
fc = (2.82)
2πRC
(Note that if we put the values of the resistor and the capacitor from
2. Analog Electronics 92
Figure 2.19: The phase relationship between the voltages across the resistor and the capacitor
relative to the voltage across the function generator with different frequencies.
frequency is low, then the current through the circuit is low (because XC is
high) and therefore Vr is low. If the frequency is high, the current is high
and Vr is high.
The result is Figure 4, showing the voltage across the resistor relative to
frequency. Again, we’re plotting the amplitude of the voltage as it relates
to the input voltage, in dB.
Now, of course, we’re looking at a high-pass filter. The fc is again the
frequency where we’re at -3 dB relative to the input, and the equation to
calculate it is the same as for the low-pass filter.
1
fc = (2.83)
2πRC
The slope of the filter is now 6 dB per octave (20 dB per decade) because
we increase by 6 dB as we go up one octave... That slope holds true for
frequencies up to 1 decade below the fc . At frequencies above fc , we are at
0 dB relative to the input.
The phase response is also similar but different. Now the sine wave that
we see across the resistor is ahead of the input. This is because, as we
said before, the current feeding the capacitor preceeds its voltage by 90◦ .
At extremely low frequencies, we’ve established that the voltage across the
capacitor is in phase with the input – but the current preceeds that by 90◦ ...
therefore the voltage across the resistor must preceed the voltage across the
capacitor (and therefore the voltage across the input) by 90◦ (up to fc /
10)...
Again, at fc , the voltage across the resistor is 45◦ away from the input,
but this time it is ahead, not behind.
Finally, at fc ∗ 10 and above, the voltage across the resistor is in phase
with the input. This all results in the phase response graph shown in Figure
5.
As you can see, the voltage across the resistor and the voltage across the
capacitor are always 90◦ out of phase with each other, but their relationships
with the input voltage change.
There’s only one thing left that we have to discuss... this is an apparent
conflict in what we have learned (though it isn’t really a conflict...) We
know that the fc is the point where the voltage across the capacitor and the
voltage across the resistor are both -3 dB relative to the input. Therefore
the two voltages are equal – yet, when we add them together, we go up by
3 dB and not 6 dB as we woudl expect. This is because the two waves are
90◦ apart – if they were in phase, they would add to produce a gain of 6 dB.
Since they are out of phase by 90◦ , their sum is 3 dB.
2. Analog Electronics 94
Figure 2.20: The triangle representing the relationship between the resistance, capacitive reactance
and the impedance of the circuit. Note that, as frequency changes, only R remains constant.
At this point, it should be easy to see why the impedance is the square
root of the sum of the squares of R and XC . In addition, it becomes in-
tuitive that, as the frequency goes to infinity Hz, XC goes to zero and the
hypotenuse of the triangle, Z, becomes the same as R. If the frequency goes
to 0 Hz (DC), XC goes to infinityΩ as does Z.
Go back to the concept of a voltage divider using two resistors. Re-
member that the ratio of the two resistances is the same as the ratio of the
voltages across the two resistors.
R1 V1
= (2.84)
R2 V2
If we consider the RC circuit in Figure 2.15, we can treat the two com-
ponents in a similar manner, however the phase change must be taken into
consideration. Figure 2.21 shows a triangle exactly the same as that in Fig-
ure 2.20 – now showing the relationship bewteen the input voltage, and the
voltages across the resistor and the capacitor.
So, once again, we can see that, as the frequency goes up, the voltage
across the capacitor goes down until, at infinity Hz, the voltage across the
2. Analog Electronics 95
Figure 2.21: The triangle representing the relationship bewteen the input voltage and the outputs
of the high-pass and low-pass filters. Note that, as frequency changes, only VIN remains constant.
2.6 Electromagnetism
Once upon a time, you did an experiment, probably around grade 3 or so,
where you put a piece of paper on top of a bar magnet and sprinkled iron
filings on the paper. The result was a pretty pattern that spread from pole
to pole of the magnet. The iron filings were aligning themselvesalong what
are called magnetic lines of force. These lines of force spread out around
a magnet and have some effect on the things around them (like iron filings
and compasses for example...) These lines of force have a direction – they
go from the north pole of the magnet to the south pole as shown in Figures
2.22 and 2.23.
Figure 2.24: Right hand being used to show the direction of rotation of the magnetic lines of force
when you know the direction of the current.
2. Analog Electronics 98
and the whole thing acts as one big magnetic field generator. When this
happens, as you can see below, the coil has a total magnetic field similar to
the bar magnet in the diagram above.
We can use our right hand again to figure out which end of the coil is
north and which is south. If you wrap your fingers around the coil in the
direction of the current, you will find that your thumb is pointing north, as
is shown in Figure 2.25. Remember again, that, if we increase the current
through the wire, then the magnetic lines of force move farther away from
the coil.
Figure 2.25: Right hand being used to find the polarity of the magnetic field around a coil of wire
(the thumb is pointing towards the North pole) when you know the direction of the current around
the coil (the fingers are wrapping around the coil in the same direction as the current).
Figure 2.26: Right hand being used to find the current though a wire when it is moving in a
magnetic field. The index finger points in the direction of the lines of magnetic force, the thumb
is pointing in the direction of movement of the wire and the middle finger indicates the direction
of current in the wire.
2.7 Inductors
We saw in Section 2.6 that if you have a piece of wire moving through a
magnetic field, you will induce current in the wire. The direction of the
current is dependent on the direction of the magnetic lines of force and the
direction of movement of the wire. Figure 2.30 shows an example of this
effect.
Figure 2.27: A wire moving through a constant magnetic field causing a current to be induced in the
wire. The black arrows show the direction of the magnetic field, the red arrows show the direction
of the movement of the wire and the blue arrow shows the direction of the electical current.
We also saw that the reverse is true. If you have a piece of wire with
current running through it, then you create a magnetic field around the wire
with the magnetic lines of force going in circles around it. The direction of
the magnetic lines of force is dependent on the direction of the current. The
strength of the magnetic field and, therefore, the distance it extends from
the wire is dependent on the amount of current. An example of this is shown
in Figure 2.28 where we see two different wires with two different magnetic
fields due to two different currents.
2. Analog Electronics 101
Figure 2.28: Two independent wires with current running through them. The wire on the right has
a higher current going through it than the one on the left. Consequently, the magnetic field around
it is stronger and therefore extends further from the wire.
Figure 2.29: The wire on the right has a current induced in it because the magnetic field around
the wire on the right is expanding. This is happening because the current through the wire on the
left is increasing. Notice that the induced current is in the opposite direction to the current in the
left wire. This diagram is essentially the same as the one shown in Figure 2.30.
Now let’s go a step further and put a current through the wire on the
2. Analog Electronics 102
Figure 2.30: The wire on the right has a current induced in it because the magnetic field around
the wire on the right is expanding. This is happening because the current through the wire on the
left is increasing. Notice that the induced current is in the opposite direction to the current in the
left wire. This diagram is essentially the same as the one shown in Figure 2.29
2. Analog Electronics 103
right that is always changing – the most common form of this signal in
the electrical world is a sinusoidal waveform that alternates back and forth
between positive and negative current (meaning that it changes direction).
Figure 2.31 shows the result when we put an everyday AC signal into the
wire on the left in Figure 2.30.
Figure 2.31: The relationship between the current in the two wires. The top plot shows the current
in the wire on the left in Figure 2.30. The middle plot also shows the rate of change (the slope) of
the top plot, therefore it is a plot of the velocity (the speed and direction of travel) of the magnetic
field. The bottom plot shows the induced current in the wire on the right in the same figure. Note
that the plots are not drawn on any particular scale in the vertical axes – so you shouldn’t assume
that you’ll get the same current in both wires, but the phase relationship is correct.
Let’s take a piece of wire and wind it into a coil consisting of two turns
as is shown in Figure 2.32. One thing to beware of is that we aren’t just
wrapping naked wire in a coil - we have to make sure that adjacent sections
of the wire don’t touch each other, so we insulate the wire using a thin
insulation.
Now think about Figure 2.30 as being just the top two adjacent sections
of wire in the coil in Figure 2.32. This should raise a question or two. As
we saw in Figure 2.30, increasing the current in one of the wires results in
a current in the other wire in the opposite direction. If these two wires are
actually just two sections of the same coil of wire, then the current we’re
putting through the coil goes through the whole length of wire. However, if
we increase that current, then we induce a current in the opposite direction
on the adjacent wires in the coil, which, as we know, is the same wire.
Therefore, by increasing the current in the wire, we increase the induced
current pushing in the opposite direction, opposing the current that we’re
2. Analog Electronics 104
Figure 2.32: A piece of wire wound into a coil with only two turns.
the magnetic field. This may sound a little strange at first – so far we have
only talked about materials as being conductors or insulators of electrical
current. However, materials can also be classified as how well they conduct
magnetic fields – iron is a very good magnetic conductor.
Similar to a capacitor, the inductive reactance of an inductor is depen-
dent on the inductance of the device, in Farads and the frequency of the
sinusoidal signal being sent through it. This is shown in Equation 2.85.
XL = 2πf L = ωL (2.85)
Where L is the inductance of the inductor, in Farads. As can be seen, the
inductive reactance, XL , is proportional to both frequency and inductance
(unlike a capacitor, in which XC is inversely proportional to both frequency
and capacitance).
2.7.1 Impedance
Let’s put an inductor in series with a resistor as is shown in Figure 2.33.
L Vout
Just like the case of a capacitor and a resistor in series (see section 2.4),
the resulting load on the signal generator is an impedance, the result of
a combination of a resistance and an inductance. Similar to what we saw
with capacitors, there will be a phase difference of 90◦ between the voltages
across the inductor and the resistor. However, unlike the capacitor, the
voltage across the inductor is 90◦ ahead of the voltage across the resistor.
Since the resistance and the inductive reactance are 90◦ apart, we can
calculate the total impedance – the load on the signal generator using the
2. Analog Electronics 106
2.7.2 RL Filters
We saw in Section 2.5 that we can build a filter using the relationship be-
tween the resistance of a resistor and the capacitive inductance of a capaci-
tor. The same can be done using a resistor and an inductor, making an RL
filter instead of an RC filter.
Connect an inductor and a resistor in series as is shown in Figure 2.33
and look at the voltage difference across the inductor as you change the
frequency of the signal generator. If the frequency is very low, then the
reactance of the inductor is practically 0 Ω, so you get almost no voltage
difference across it – therefore no output from the circuit. The higher the
frequency, the higher the reactance. At some frequency, the reactance of the
inductor will be the same as the resistance of the resistor, and the voltages
across the two components are the same. However, since they are 90◦ apart,
the voltage across either one will be 0.707 of the input voltage (or -3 dB).
As we go higher in frequency, the reactance goes higher and higher and we
get a higher and higher voltage difference across the inductor.
This should all sound very familiar. What we have done is to create
a first-order high-pass filter using a resistor and an inductor, therefore it’s
called an RL filter. If we wanted a low-pass filter, then we use the voltage
across the resistor as the output.
The cutoff frequency of an RL filter is calculated using Equation 2.87.
1
fc = (2.87)
2πRL
Ltotal = L1 + L2 + L3 + · · · + Ln (2.88)
If the inductors are connected in parallel, then you use Equation 2.89.
2. Analog Electronics 107
1
Ltotal = 1 1 1 1 (2.89)
L1 + L2 + L3 + ··· + Ln
2.8 Transformers
If we take our coil and wrap it around a bar of iron, the iron acts as a con-
ductor for the magnetic lines of force (not a conductor for the electricity...
our wire is insulated...) therefore the lines are concentrated within the bar
(the second right hand rule still applies for figuring out which way is north –
but remember that if we’re using an AC waveform, the magnetic is changing
in strength and polarity according to the change in the current.) Better yet,
we can bend our bar around so it looks like a donut (mmmmmm donuts...)
– that way the lines of force are most concentrated all the time. If we then
wrap another coil around the bar (now donut-shaped, also known by topolo-
gists as toroidal ) then the magnetic lines of force expanding and contracting
around the bar will cut through the second coil. This will generate an alter-
nating current in the second coil, just because it’s sitting there in a moving
magnetic field. The relationship between these two coils is interesting...
It turns out (we’ll find out in a minute that that was just a pun...) that
the power that we send into the input coil (called the primary coil) of this
thing (called a transformer ) is equal to the power that we get out of the
second coil (called the secondary coil). (this is not entirely true – if it were,
that would mean that the transformer is 100 percent efficient, which is not
the case... but we’ll pretent that it is...)
Also, the ratio of the primary voltage to the secondary voltage is equal
to the ratio of the number of turns of wire in the primary coil to the number
of turns of wire in the secondary coil. This can also be expressed as an
equation :
Vprimary T urnsprimary
= (2.90)
Vsecondary T urnssecondary
Given these two things... we can therefore figure out how much current
is flowing into the transformer based on how much current is demanded of
the secondary coil. Looking at the diagram below :
We know that we have 120Vrms applied to the primary coil. We therefore
know that the voltage across the secondary coil and therefore across the
resistor, is 12Vrms because 120V 10T urns
12Vrms = 1T urn .
rms
120 Vrms 15 kΩ
10:1
Figure 2.34: A 120 Vrms AC voltage source connected to the primary coil of a transformer with a
10:1 turns ratio. The secondary coil is connected to (or “loaded with”) a 15 kΩ resistor.
Note that, since the voltage went down by a factor of 10 (the turns ratio
of the transformer) as we went from input to output, the current went up
by the same factor of 10. This is the result of the powerin being equal to
the powerout .
You can have more than 1 secondary coil on a transformer. In fact, you
can have as many as you want – the power into the primary coil will still be
equal to the power of all of the secondary coils. We can also take a tap off
the secondary coil at its half-way point. This is exactly the same as if we had
two secondary coils with exactly the same number of turns, connected to
each other. In this case, the centre tap (wire connected to the the half-way
point on the coil) is always half-way in voltage between the two outside legs
of the coil. If, therefore, we use the centre tap as our reference, arbitrarily
called 0 V (or ground...) then the two ends of the coil are always an equal
voltage “away” from the ground, but in opposite directions – therefore the
two AC waves will be opposite in polarity.
e e
e e
e e e
e e
e e
e e e N e e e e
e e
e e
e e e
e e
e e
Figure 2.35: Diagram of a copper atom. Notice that there is one lone electron out there in the
outer shell.
If the outer valence shell has too many electrons, they don’t like mov-
ing (too many things to pack... moving vans are too expensive, and all
their friends go to this school...) so they don’t – we call those substances
insulators.
There is a class of substances that have an in-between number of elec-
trons in their outer valence shell. These substances (like silicon, germanium
and carbon...) are neither conductors nor insulators – they lie somewhere in
between so we call them semiconductors(not insuductors nor consulators...)
Now take a look at Figure 2.37. As you can see, when you put a bunch
of silicon atoms together, they start sharing outer electrons – that way each
atom “thinks” that it has 8 electrons in its outer shell, but each one needs
the 4 adjacent atoms to accomplish this
Compare Figure 2.36, which shows the structure of Silicon, to Figures
2.38 and 2.39 which show Arsenic and Gallium. The interesting thing about
Arsenic is that it has 5 electrons in its outer shell (1 more than 4). Gallium
has 3 electrons in its outer shell. We’ll see why this is interesting in a
second...
Recipe :
1 cup arsenic
2. Analog Electronics 111
e
e
e e
e e e N e e e
e e
e
e
Silicon (Si 14)
Figure 2.36: Diagram of a silicon atom. Notice that there are 4 electrons in the outer shell.
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
Figure 2.37: Diagram of a collection of silicon atoms. Note the sharing of outer electrons.
2. Analog Electronics 113
e
e e
e e
e e
e e
e e
e e
e e e N e e e e
e e
e e
e e e e
e e
e e
e
Arsenic (As 33)
Figure 2.38: Diagram of a Arsenic atom. Notice that there are 5 electrons in the outer shell.
e e e
e e
e e e
e e
e e
e e e N e e e e
e e
e e
e e e
e e
e e
e
Figure 2.39: Diagram of a Gallium atom. Notice that there are 3 electrons in the outer shell.
2. Analog Electronics 114
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e e e e
e e
e e e e
e e
e e
e e e
e e e e e
e e N e e e e e N e e e e e N e e e e N e e
e e
e e e e e
e e e e e e e e e
e e e e e
e e
e e e e
e e e e e
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
Figure 2.40: Diagram of N-type material. Notice that the extra “unattached” electron orbiting the
Arsenic atom.
2. Analog Electronics 115
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e e e
e e
e e e e
e e
e e
e e e
e e e e
e e N e e e e e N e e e e e N e e e e N e e
e e
e e e e e
e e e e e e e e e
e e e e e
e e
e e e e
e e e e e
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e e
e e e e e e e e
e e e
e e N e e e e N e e e e N e e e e N e e
e e e
e e e e e e e e
e e e e
Figure 2.41: Diagram of P-type material. Notice that the “missing” electron orbiting the Gallium
atom.
P-type material through a resistor, the electrons in the N-type material will
get attracted to the holes in the positive terminal and the electrons in the
negative terminal will move into the holes in the P-type material. Once this
happens, there are no spare electrons floating around, and no current can
pass through the system. This situation is called reverse-biasing the device
(called a diodediode). When the circuit is connect in this way, no current
flows through the diode.
Figure 2.43: A reverse-biased diode. Note that no current will flow through the circuit.
If we connect the battery the other way, with the negative terminal to
the N-type material and the positive terminal to the P-type material, then
a completely different situation occurs. The extra electrons in the battery
terminal push the electrons in the N-type material across the barrier, into
the P-type material where they are drawn into the positive terminal. At the
same time, of course, the holes (and therefore the current) is flowing in the
opposite direction. This situation is called forward biasing the diode, which
allows current to pass through it. There’s only one catch with this electrical
one-way gate. The diode needs some amount of voltage across it to open
up and stay open. If it’s a silicon diode (as most are...) then, you’ll see a
drop of about 0.6 V or 0.7 V across it – irrespective of current or voltage
applied to the circuit (the remainder of the voltage drop will be across the
resistor, which also determines the current). If the diode is made of doped
germanium, then you’ll only need about 0.3 V to get things running.
Of course, we don’t draw all of those -’s and +’s on a schematic, we just
indicate a diode using a little triangle with a cap on it. The arrow points
in the direction of the current flow, so Figure 2.45 below is a schematic
representation of the forward-biased diode in the circuit shown in Figure
2.44.
Now, remember that the diode “costs” 0.6 V to stay open, therefore if
2. Analog Electronics 117
Figure 2.45: A forward-biased diode with current flowing through it just like in Figure 2.44. Note
that current flows in the direction of the arrow, and will not flow in the opposite direction.
2. Analog Electronics 118
Figure 2.46: A graph showing the characteristics of an ideal diode. Notice how, if the voltage is
positive (and therefore the diode is forward biased) you can have an infinite current going through
the diode. Also, it does not require any voltage to open the diode – if the positive voltage is
anything over 0V, the current will flow. If the voltage is negative (and therefore the diode is
reverse biased), no current flows through the system. Also note that the current goes infinite
because there are no resistors in the circuit. If a resistor were placed in series with the diode, it
would act as a “current limiter” resulting in a current that went to a finite amount determined by
the supply voltage and the resistance. (V=IR...)
We then learned that this didn’t really represent the real world – in fact,
the diode needs a little voltage in order to turn on – about 0.6 V or so.
Therefore, a better representation of this characteristic is shown in Figure
2.47.
In fact, this isn’t really a good representation of the real-world case ei-
ther. The problem with this graph is that it leaves out a couple of important
little details about the diode’s behaviour. Take a look at Figure 2.48 which
shows a good representation of the real-world diode.
There are a couple of things to notice here.
2. Analog Electronics 119
Figure 2.47: A graph showing the characteristics of a diode with slightly more realistic character-
istics. Notice how you have to have about a small voltage applied to the diode before you can get
current through it. The specific amount of voltage that you’ll need depends on the materials used
to make the diode as well as the particular characteristics of the one that you buy (in a package of
5 diodes, every one will be slightly different). If the voltage is negative (and therefore the diode is
reverse biased), no current flows through the system.
Notice that the small turn-on voltage is still there, but notice that the
current doesn’t suddenly go from 0 amps to infinity amps when we pass
that voltage. In fact, as soon as we go past 0 V, a very small amount of
current will start to trickle through the diode. This amount is negligible
in day-to-day operation, but it does exist. As we get closer and closer to
the turn-on voltage, the amount of current trickling through gets bigger and
bigger until it goes very big very quickly. The result on the graph is that
the plot near the turn-on voltage is a curve, not a right angle as is shown in
the simplification in Figure 2.47.
Also notice that when the diode is reverse-biased (and therefore the
voltage is negative) there is also a very small amount of trickle current back
through the diode. We tend to think of a reverse-biased diode as being a
perfect switch that doesn’t permit any current back through it, but in the
real world, a little will get through.
The last thing to notice is the big swing to negative infinity amps when
the voltage gets very negative. This is called the reverse breakdown voltage
and you do not want to reach it. This is the point where the diode goes up
in smoke and current runs through it whether you want it to or not.
Figure 2.50: A circuit showing how a zener diode can be used to “regulate” one voltage to another.
In order for this circuit to work, the voltage supply on the left must be a higher voltage than the
rated breakdown voltage of the zener. The voltage across the resistor on the right will be the rated
breakdown voltage of the zener. The voltage across the resistor on the left will be equal to the level
of the voltage supply minus the rated breakdown voltage of the zener diode. Also, we’re assuming
that R1 is small compared to R2 so that the voltage across R2 without the zener in place would
be normally bigger than the breakdown voltage of the zener.
So, for example, if the rated breakdown voltage of the zener diode in
Figure 2.50 is 5.6 V, and the voltage supply is a 9 V battery, then the
voltage across R2 will be 5.6V (because you can’t have a higher voltage
than that across the zener) and the voltage across R1 will be 9V – 5.6V
= 3.4V. (Remember, we’re assuming that the voltage across R2 would be
bigger than 5.6V if the zener wasn’t there...)
Note as well that the graph in Figure 2.49 shows the characteristics of
an ideal zener diode. The real-world characteristics suffer from the same
trickle problems as normal diodes. For more info on this, a good book to
look at is the book by Madhu in the Suggested Reading List below.
2. Analog Electronics 122
Now, when the voltage output of the function generator is positive rela-
tive to ground, it is pushing current through the forward-biased diode and
we see current flowing through the resistor to ground. There’s just the small
issue of the 0.6 V drop across the diode, so until the voltage of the function
generator reaches 0.6 V, there is no current, after that, the voltage drop
across the resistor is 0.6 V less than the function generator’s voltage level
until we get back to 0.6 V on the way down...
When the voltage of the function generator is on the negative half of the
wave, the diode is reverse-biased and no current flows, therefore there is no
voltage drop across the resistor.
This circuit is called a half-wave rectifier because it takes a wave that is
alternating between positive and negative voltages and turns it into a wave
that has only positive voltages – but it throws away half of the wave...
If we connect 4 diodes as shown in the diagram below, we can use our
AC signal more efficiently.
Now, when the output at the top of the function generator is positive,
the current is pushed through to the diodes and sees two ways to go – one
diode (the green one) will allow current through, while the other (red) one,
which is reverse biased, will not. The current flows through the green diode
to a junction where it chooses between a resistor and another reverse-biased
diode (the blue one) ... so it goes through the resistor (note the direction of
the current) and on to another junction between two diodes. Again, one of
these diodes is reverse-biased (red) so it goes through the other one (yellow)
back to the ground of the function generator.
2. Analog Electronics 124
Figure 2.52: Comparison of voltage of function generator in blue and voltage across the resistor in
red.
Figure 2.54: The portion of the circuit that has current flow for the positive portion of the waveform.
Note that the current is flowing downwards through the resistor, therefore VR will be positive.
Figure 2.55: The portion of the circuit that has current flow for the positive portion of the waveform.
Note that the current is flowing downwards through the resistor, therefore VR will still be positive.
The important thing to notice after all that tracing of signal is that
the voltage drop across the resistor was positive whether the output of the
function generator was positive or negative. Therefore, we are using this
circuit to fold the negative half of the original AC waveform up into the
positive side of the fence. This circuit is therefore called a full-wave rectifier
(actually, this particular arrangement of diodes has a specific name – a bridge
rectifier ) Remember that at any given time, the current is flowing through
2. Analog Electronics 126
two diodes and the resistor, therefore the voltage drop across the resistor
will be 1.2 V less than the input voltage (0.6 V per diode – we’re assuming
silicon...)
Figure 2.56: The input voltage and the output voltage of the bridge rectifier. Note that the output
voltage is 0 V when the absolute value of VIN is less than 1.2 V, or 1.2 V below the input voltage
when the absolute value of VIN is greater than 1.2 V.
Now, we have this weird bumpy wave – what do we do with it? Easy... if
we run it through a type of low-pass filter to get rid of the spiky bits at the
bottom of the waveform, we can turn this thing into something smoother.
We won’t use a “normal” low-pass filter from two weeks ago, however... we’ll
just put a capacitor in parallel with the resistor. What will this do? Well,
when the voltage potential of the capacitor is less than the output of the
bridge rectifier, the current will flow into the capacitor to charge it up to
the same voltage as the output of the rectifier. This charging current will be
quite high, but that’s okay for now... trust me... When the voltage of the
bridge rectifer drops down, the capacitor can’t discharge back into it, be-
cause the diodes are now reverse-biased, so the capacitor discharges through
the resistor according to their time constant (remember?). Hopefully, before
it gets time to discharge, the voltage of the bridge rectifier comes back up
and charges up the capacitor again and the whole cycle repeats itself.
The end result is that the voltage across the resistor is now a slightly
weird AC with a DC offset, as is shown in Figure 2.57.
The width of the AC of this wave is given as a peak-peak measurement
which is a percentage of the DC content of the wave. The smaller the
percentage, the smoother and therefore better, the waveform.
If we know the value of the capacitor and the resistor, we can calculate
2. Analog Electronics 127
Figure 2.57: “DC” output of filtered bridge rectifier output showing ripple caused by the capacitor
slowly discharging between cycles.
100%
Ripplepeak−peak = √ (2.91)
4f RC 3
where f is the frequency of the original waveforem in Hz, R is the value
of the resistor inΩ and C is the value of the capacitor.
All we need to do, therefore, to make the ripple smaller, is to make the
capacitor bigger (the resistor is really not a resistor in a real power supply,
its actually something like a lightbulb or a portable CD player...)
Generally, in a real power supply, we’d add one more thing called a
voltage regulator as is shown in Figures 8 and 9. This is a magic little
device which, when fed a voltage above what you want, will give you what
you want, burning off the excess as heat. They come in two flavours, negative
an positive, the positive ones are designated 78XX where XX is the voltage
(for example, a 7812 is a + 12 V regulator) the negative ones are designated
79XX (ditto... 7918 is a -18 V regulator.) These chips have 3 pins, one input,
one ground and one output. You feed too much voltage into the input (i.e.
8.5 V into a 7805) and the chip looks at its ground, gives you exactly the
right voltage at the output and gets toasty... If you use these things (you
will) you’ll have to bolt it to a little radiator or to the chassis of whatever
you’re building so that the heat will dissapate.
A couple of things about regulators : if you reverse-bias them (i.e. try
and send voltage in its output) you’ll break it... probably gonna see a bit
of smoke too... Also, they get cranky if you demand too much current from
their output. Be nice. (This is why you won’t see regulators in a power
supply for a power amp which needs lots-o-current...)
2. Analog Electronics 128
So, now you know how to build a real-live AC to DC power supply just
like the pros. Just use an appropriate transformer instead of a function
generator, plug the thing into the wall (fuses are your friend...) and throw
away your batteries. The schematic below is a typical power supply of any
device built before switching power supplies were invented... (we’re not
going to even try to figure out how THEY work...)
fuse transformer smoothing
switch bridge capacitor voltage
rectifier regulator
78XX +V
0V
Below is another variation of the same power supply, but this one uses
the centre-tap as the ground, so we get symmetrical negative and positive
DC voltages output from the regulators.
fuse centre-tapped smoothing
switch transformer bridge capacitors voltage
rectifier regulators
78XX +V
0V
79XX +V
2.11 Transistors
NOT YET WRITTEN
1. infinite gain
At the outset, these don t appear to be very useful, however, the device,
called an operational amplifier or op amp, is used in almost every audio
component built today. It has two inputs (one labeled “positive” or “non-
inverting” and the other “negative” or “inverting”) and one output. The op
amp measures the difference between the voltages applied to the two input
“legs” (the positive minus the negative), multiples this difference by a gain
of infinity, and generates this voltage level at its output. Of course, this
would mean that, if there was any difference at all between the two input
legs, then the output would swing to either infinity volts (if the level of the
non-inverting input was greater than that of the inverting input) or negative
infinity volts (if the reverse were true). Since we obviously can’t produce a
level of either infinity or negative infinity volts, the op amp tries to do it,
but hits a maximum value determined by the power supply rails that feed
it. This could be either a battery or an AC to DC power supply such as the
ones we looked at in Chapter 9.
We’re not going to delve into how an op amp works or why – for the
purposes of this course, our time is far better spent simply diving in and
looking at how it’s used. The simplest way to start looking at audio circuits
which employ op amps is to consider a couple of possible configurations,
each of which are, in fact, small circuits in and of themselves which can be
combined like Legos to create a larger circuit.
2.13.2 Comparators
The first configuration we’ll look at is a circuit called a comparator . You
won’t find this configuration in many audio circuits (blinking light circuits
excepted) but it’s a good way to start thinking about these devices.
2. Analog Electronics 133
Looking at the above schematic, you ll see that the inverting input of the
op amp is conntected directely to ground, therefore, it reamins at a constant
0 V reference level. The audio signal is fed to the non-inverting input.
The result of this circuit can have three possible states.
2. If the audio signal is greater than 0 V, then the op amp will subtract
0 V from a positive number, arriving at a positive value, multiply that
result by infinity and have an output of positive infinity (actually, as
high as the op amp can go, which will really be the voltage of the
positive power supply rail)
3. If the audio signal is less than 0 V, then the op amp will subtract 0
V from a negative number, arriving at a negative value, multiply that
result by infinity and have an output of negative infinity (actually,
as low as the op amp can go, which will really be the voltage of the
negative power supply rail)
Figure 2.61: The output voltage vs. input voltage of the comparator circuit in Figure 1.
advantage? Well, the problem is that the infinite gain has to be tamed –
and luckily this can be done with the helps of just a few resistors.
that this means one thing... the dreaded monster known as feedback (which
explains the “f” in “Rf” – it’s sometimes known as the feedback resistor).
Well, it turns out that, in this particular case, feedback is your friend – this
is because it is a special brand of feedback known as negative feedback.
There are a number of ways to conceptualize what s happening in this
circuit. Let’s apply a +1 V DC signal to the input R1 which we’ll assume
to be 1 k – what will happen?
Let’s assume for a moment that the voltage at the inverting input of
the op amp is 0 V. Using Ohm’s Law, we know that there is 1 mA of
current flowing through R1. Since the input impedance of the op amp is
infinity, there will be no current flowing into the amplifier – therefore all
of the current must flow through Rf (which we’ll also make 1 kΩ) as well.
Again using Ohm’s Law, we know that there s 1 mA flowing through a 1 kΩ
resistor, therefore there is a 1 V drop across it. This then means that the
voltage at the output of Rf (and therefore the op amp) is -1 V. Of course,
this magic wouldn’t happen without the op amp doing something...
Another way of looking at this is to say that the op amp “sees” 1 V
coming in its inverting input – therefore it swings its output to negative
infinity. That negative infinity volt output, however, comes back into the
inverting input through Rf which causes the output to swing to positive
infinity which comes back into the inverting input through Rf which causes
the output to swing to negative infinity and so on and so on... All of this
swinging back and forth between positive and negative infinity looks after
itself, causing the 1 V input to make it through, inverted to -1 V.
One little thing that s useful to know – remember that assumption that
the level of the inverting input stays at 0 V? It s actually a good assumption.
In fact, if the voltage level of the inverting input was anything other than 0
V, the output would swing to one of the voltage rails. We can consider the
inverting input to be at a virtual ground – “ground” because it stays at 0
V but “virtual” because it isn’t actually connected to ground.
What happens when we change the value of the two resistors? We change
the gain of the circuit. In the above example, with both R1 and Rf were 1
kΩ and this resulted in a gain of -1. In order to achieve different gains we
follow the below equation:
Rf
Gain = − (2.92)
R1
2. Analog Electronics 136
must be the same. If the value of the feedback resistor is 0Ω, in other words,
a piece of wire, then the output will equal the input voltage, therefore the
gain of the circuit will be 1. If the value of the feedback resistor is greater
than 0Ω, then the gain of the circuit will be greater than 1. Therefore the
minimum gain of this circuit is 1 – so we cannot attenuate the signal as we
can with the inverting amplifier configuration.
Following the above schematic, the equation for determining the gain of
the circuit is
Rf
Gain = 1 + (2.93)
R1
2.13.6 Leftovers
There is one of the three characteristics of op amps that we mentioned up
front that we haven’t talked about since. This is the output impedance,
which was stated to be 0 . Why is this important? The answer to this lies
in two places. The first is the simple voltage divider, the second is a block
diagram of an op amp. If you look at the diagram below, you’ll see that
the op amp contains what can be considered as a function generator which
outputs through an internal resistor (the output impedance) to the world.
If we add an external load to the output of the op amp then we create a
voltage divider. If the internal impedance of the op amp is anything other
than 0Ω, then the output voltage of the amplifier will drop whenever a load
is applied to it. For example, if the output impedance of the op amp was
100Ω, and we attached a 100Ω resistor to its output, then the voltage level
of the output would be cut in half.
Rf Rf Rf
V out = −(V 1 +V2 + ... + V n ) (2.94)
R1 R2 Rn
2. Analog Electronics 139
As you can see, each input voltage is inverted in polarity (the negative
sign at the beginning of the right side of the equation looks after that) and
individually multiplied by its gain determined by the relationship between
its input resistor and the feedback resistor. This, of course, is a very sim-
ple mixing circuit (a Euphonix console it isn’t...) but it will work quite
effectively with a minimum of parts.
the output impedance of an op amp will vary from less than 100Ω to about
10kΩ. Usually, an op amp intended for audio purposes will have an output
impedance in the lower end of this scale – usually about 50Ω to 100Ω or so.
This measurement is taken without using a feedback loop on the op amp,
and with small signal levels above a few hundred Hz.
we’d like, anyway... The maximum rate at which the op amp is able to
change to a different voltage is called the “slew rate” because it’s the rate
at which the amplifier can slew to a different value. It’s usually expressed in
V/microsecond – the bigger the number, the faster the op amp. The faster
the op amp, the better it is able to accurately reflect transient changes in
the audio signal.
The slew rate of different op amps varies widely. Typically, you’ll want
to see about 5 V/microsec or more.
Equations:
RF
GF = 1 + (2.95)
R1
1
fc = (2.96)
2πRC
Where GF is the passband gain of the filter
fc is the cutoff frequency of the filter
Vout GF
Vin = p1 + (f /f )2 (2.97)
c
2. Analog Electronics 146
f
φ = −tan−1 (2.98)
fc
Where φ is the phase angle
Equations:
RF
GF = 1 + (2.99)
R1
1
fc = (2.100)
2πRC
Where GF is the passband gain of the filter
fc is the cutoff frequency of the filter
Vout GF (f /fc )
Vin = p1 + (f /f )2 (2.101)
c
Equations:
RF
GF = 1 + (2.102)
R1
1
fc = √ (2.103)
2π R2R3C2C3
Where GF is the passband gain of the filter
fc is the cutoff frequency of the filter
Vout GF
Vin = p1 + (f /f )4 (2.104)
c
Equations:
RF
GF = 1 + (2.105)
R1
Note: GF must equal 1.586 in order to have a true Butterworth response.
1
fc = √ (2.106)
2π R2R3C2C3
Where GF is the passband gain of the filter
fc is the cutoff frequency of the filter
Vout GF
Vin = q (2.107)
1 + ( ffc )4
2. Analog Electronics 148
Acoustics
3.1 Introduction
3.1.1 Pressure
If you listen to the radio in the mornings, they’ll give you the news, the
sports, the traffic and the weather. Part of the weather report is to tell
you that the barometric pressure is something around 100 kPa (kilopas-
cals). What does this mean? Well, the air particles around you are all
under pressure due to things like gravity and the weight of the air particles
above them and other meteorological things that are outside the scope of
this book. That pressure determines the amount of physical space between
molecules in the air. When there’s a higher barometric pressure, there’s less
space between the molecules than there is on a day with a lower barometric
pressure.
We call this the stasis pressure and abbreviate it ℘o .
When all of the particles in a gaseous medium (like air) in a given volume
(like a room) are at normal pressure, then the gas is said to be at its volume
density (also known as the constant equilibrium density), abbreviated ρo ,
and measured in kg/m3 . Remember that this is actually kilograms of air
per cubic metre – if you were able to trap a cubic metre and weigh it, you’d
find out that it is about XXX kg.
These molecules like to stay at the same pressure all over, so if you bunch
them up in one place in a room somehow, they’ll move around to try and
equalize the difference. This is kind of like when you pour a glass of water
into a bucket, the water level of the entire bucket equalizes and therefore
rises, rather than the water from the glass all bunching up in a little mound
of water where you poured it in...
151
3. Acoustics 152
Let’s think of this as a practical example. We’ll hang the piece of paper
in front of a fan. If we turn on the fan, we’re essentially increasing the
pressure of the air particles in front of the blades. The fan does this by
removing air particles from the space behind it, thus reducing the pressure
of the particles behind the blades, and putting them in front. Since the
pressure in front of the fan is greater than any other place in the room, we
have a situation where there is a greater air pressure on one side of the piece
of paper than the other. The obvious result is that the paper moves away
from the fan.
This is a large-scale example of how you hear sound. Let’s say hypo-
thetically for a moment, that you are sitting alone in a sealed room on a
day when the barometric pressure is 100 kPa. Let’s also say that you have a
clarinet with you and that you play a concert A. What physically happens
to convert air coming out of your mouth into a concert A coming in your
ears?
To begin with, let’s pretend that a clarinet is just a tube with a hole in
each end. One of the holes has a springy piece of wood next to it which, if
you press on it, will close up the hole.
1. When you blow into the hole, you bunch up the air particles and create
a little area of high pressure inside the mouthpiece.
2. Blowing into the hole with the reed on it also has the effect of pushing
the reed against the hole and sealing it so that no more air can enter
the clarinet.
3. At that point the little high pressure area moves down the clarinet and
leaves a low pressure behind it.
4. Remember that the reed is springy, and it doesn’t like being pushed
up against the hole in the mouthpiece, so it bounces back and lets
more air in.
5. Now the cycle repeat and goes back to step 1 all over again.
6. In the meantime, all of those high and low pressure areas move down
the clarinet and radiate out the bell into the room like ripples on a
lake when you throw in a rock.
7. From there, they get to your ear and push your eardrum in and out
(high pressure pushes in, low pressure pulls out)
3. Acoustics 153
Those little fluctuations in the air pressure are small variations in the
stasis pressure. They’re usually very small, never more than about ±1 Pa
(though we’ll elaborate on that later...). At any given moment at a specific
location, we can measure the the instantaneous pressure, ℘, which will be
close to the stasis pressure, but slightly different because there’s a sound
source causing it to change.
Once we know the stasis pressure and the instantaneous pressure, we can
use these to figure out the instantaneous amplitude of the sound level, (also
called the acoustic pressure or the excess pressure) abbreviated p, using
Equation 3.1.
p = ℘ − ℘o (3.1)
To see an animation of what this looks like, check out www.gmi.edu/ drus-
sell/Demos/waves/wavemotion.html.
A sinusoidal oscillation of this pressure reaches a maximum peak pressure
P which determines the sound pressure level or SPL. In air, this level is
typically expressed in decibels as a logarithmic ratio of the effective pressure
Pe referenced to the threshold of hearing, the commonly-accepted lowest
sound pressure level audible by humans at 1 kHz, 20 microPascals, using
Equation 3.2 [Woram, 1989]. The intricacies of this equation have already
been discussed in Section 2.2 on decibels.
Pe
SP L = 20 log10 (3.2)
20 ∗ 10−6 P a
Note that, for sinusoidal waveforms, the effective pressure can be cal-
culated from the peak pressure using Equation 3.3. (If this doesn’t sound
familiar, it should – re-read Section 2.1.6 on RMS.)
P
Pe = √ (3.3)
2
Pull down on the weight a little bit and let go. The Slinky will pull the
weight up to the stasis point and pass it.
By the time the whole thing slows down, the weight will be too high and
will want to come back down to the stasis point, which it will do, stopping
at the point where we let it go in the first place (or almost anyway...)
If we attached a pen to the weight and ran piece of paper along by it
as it sat there bobbing up and down, the line it would draw a sinusoidal
waveform. The picture the weight would draw is a graph of the vertical
position of the weight (the y-axis) as it relates to time (the x-axis).
If the graph is a perfect sinusoidal shape, then we call the system (the
Slinky and the weight on the end) a simple harmonic oscillator.
3.1.3 Damping
Let’s look at that system I just described. We’ll put a weight hung on a
spring as is shown in Figure 3.1
spring
mass
If there was no such thing as air friction, and if the spring was perfect,
then, if you started the mass bobbing up and down, then it would continue
doing that forever. And, since, as we saw in the previous section, that this
is a simple harmonic oscillator, then if we graph its vertical displacement
3. Acoustics 155
over time, then we get a perfect sinusoidal waveform as shown in Figure 3.2
0.8
0.6
0.4
Vertical Displacement
0.2
-0.2
-0.4
-0.6
-0.8
-1
0 10 20 30 40 50 60 70 80 90 100
Time
Figure 3.2: The vertical displacement of the mass versus time if there is no loss of energy due to
friction. Notice that the frequency and amplitude of the oscillation never change. The mass will
bob up and down exactly the same, forever.
In real life, however, there is friction. The mass pushes through the air
and loses energy on each bob up and down. Eventually, it loses so much
energy that it stops moving. An example of this behaviour is shown in
Figure 3.3
There is a technical term that describes the difference between these
two situations. The system with friction, shown in Figure 3.3 is called a
damped oscillator . Since the oscillator is damped, then it loses energy over
time. The higher the damping, the faster it loses energy. For example, if the
same mass and spring were put in water, the system would be more highly
damped than if it were in air. If they’re put in oil, the system is more highly
damped than it is in water.
Since a system with friction is said to be damped, then the system with-
out friction is therefore called an undamped oscillator .
3.1.4 Harmonics
If we go back to the clarinet example, it’s pretty obvious that the pressure
wave that comes out the bell won’t be a sine wave. This is because the clar-
inet reed is doing more than simply opening and closing – it’s also wiggling
and flapping a bit – on top of all that, the body of the clarinet is resonating
various frequencies as well (more on this topic later), so what comes out is
a bunch of different frequencies simultaneously.
3. Acoustics 156
0.8
0.6
0.4
Vertical Displacement
0.2
-0.2
-0.4
-0.6
-0.8
-1
0 10 20 30 40 50 60 70 80 90 100
Time
Figure 3.3: The vertical displacement of the mass versus time if there is loss of energy due to
friction. Notice that the fundamental frequency of the oscillation never changes, but that the
amplitude decays over time. Eventually, the mass will bob up and down so little that it can be
considered to be stopped.
3.1.5 Overtones
Some people call the fundamental and its overtones overtones but you have
to be careful here. There is a common misconception that overtones are har-
monics and vice versa. In fact, in some books, you’ll see people saying that
the first overtone is the second harmonic, the second overtone is the third
harmonic and so on. This is not necessarily the case. A sound’s overtones
are the harmonics that it contains, which is not necessarily all harmonics.
As we’ll see later, not all instruments’ sounds contain all harmonics of the
fundamental. There are particular cases, for example, where an instrument’s
sound will only contain the odd harmonics of the fundamental. In this par-
ticular case, the first overtone is the third harmonic, the second overtone is
the fifth harmonic and so on.
In other words, harmonics are a mathematical idea – frequencies that
are related to a fundamental frequency whereas overtones are the frequencies
3. Acoustics 157
1. Transverse
2. Longitudinal
3. Torsional
Figure 3.4: A snapshot of a transverse wave on a string. Think of the wave as moving from left to
right – but remember that the string is really only moving up and down.
3. Acoustics 158
Longitudinal waves are a little tougher to see. They involve the com-
pression (bunching together) and refraction (pulling apart) of the particles
in the medium such that the motion of the particles is parallel with the
direction of propagation of the wave. The easiest way to see a longitudinal
wave is to stretch out a Slinky between two people, squeeze together a small
section of it and let go. The compressed part will appear to move back and
forth bouncing between the two ends of the spring. This is essentially the
way sound travels through air particles.
Torsional waves don’t apply to anything we’re doing in this book, but
they’re wave in which the particles rotate around the axis along which the
wave propagates (like a twisting rod). This type of wave can be seen on a
Shive wave machine at physics demonstrations and science and technology
museums.
1.5
0.5
-0.5
-1
-1.5
-2
0 10 20 30 40 50 60 70 80 90 100
Time
Figure 3.5: The relationship between the displacement (in blue), the velocity (in red) and the
acceleration (in black) of a particle or a pendulum. Note that none of these is on any particular
scale – the important things to notice are the relationships between the zero, maximum and
minimum points on the two graphs as well as their relative instantaneous slopes.
One other important thing to note here is that the velocity is also related
to frequency (which is discussed below). If we maintain the same peak
pressure, the higher the frequency, the faster the particles have to move back
and forth, therefore the higher the peak velocity. So, remember that particle
velocity is proportional both to pressure (and therefore displacement) and
frequency.
3.1.8 Amplitude
The amplitude of a wave is simply an measurement of the height of the
wave if it’s transverse, or the amount of compression and refraction if it’s
3. Acoustics 160
1.5
0.5
−0.5
−1
−1.5
−2
0 50 100 150 200 250 300 350 400
Figure 3.6: A sinusoidal pressure wave with a peak amplitude of 1, a peak-peak amplitude of 2 and
an effective pressure of 0.707.
second. This therefore means that there are 440 cycles between a high and
a low pressure coming out of the bell of the clarinet each second.
We normally use the term Hertz (indicated Hz ) to indicate the number
of cycles per second in sound waves. Therefore 440 cycles per second is more
commonly known as a frequency of 440 Hz.
In order to find the frequency of a note one octave above this pitch,
multiply by 2 (1 octave = twice the frequency). One octave below is one-
half of the frequency.
In order to find the frequency of a note one decade above this pitch,
multiply by 1 (1 octave = ten times the frequency). One decade below is
one-tenth of the frequency.
Always remember that a complete cycle consists of a high and a low
pressure. One cycle is measured from a point on the wave to the next
identical point on the wave (i.e. the positive-going zero crossing to the next
positive- going zero crossing or maximum to maximum...)
If we know the frequency of a sound wave (i.e. 440 Hz), then we can
calculate how long it takes a single cycle to exit the bell of the clarinet.
If there are 440 cycles each second, then it takes 1/440th of a second to
produce 1 cycle.
The usual equation for calculating this amount of time (known as the
period ) is:
1
T = (3.4)
f
where T is the period and f is the frequency
the wave as a rotating wheel, then this means that the wheel makes a full
revolution the same number of times per second.
We also know that one full revolution of the wheel is 360◦ or 2π radians.
Consequently, if we multiply the frequency of the sound wave by 2π, we
get the number of radians the wheel turns each second. This value is called
the angular frequency or the radian frequency and is abbreviated ω.
ω = 2πf (3.5)
The angular frequency can also be used to determine the phase of the
signal at any given moment in time. Let’s say for a moment that we have
a sine wave with a frequency of 1 Hz, therefore ω = 2π. If it’s really a
sine wave (meaning that it started out heading positive with a value of 0 at
time 0 or t = 0), then we know that the time in seconds, multiplied by the
angular frequency will give us the phase of the sine wave because we rotate
2π radians every second.
This is true for any frequency, so if we know the time t in seconds, then
we can find the instantaneous phase using Equation 3.6.
ϕ = ωt (3.6)
t is the temperature in ◦ C
There is a small deviation of c with frequency shown in Table 3.1, though
this is small and therefore generally ignored
Frequency Deviation
100 (Hz) -30 ppm
200 (Hz) -10 ppm
400 (Hz) -3 ppm
1.25 (kHz) 0 ppm
4 (kHz) +5 ppm
10 (kHz) +10 ppm
Table 3.1: Deviation in the speed of sound with frequency. ??
Humidity Deviation
0% 0 ppm
20% +415 ppm
40% +1136 ppm
60% +1860 ppm
80% + 2590 ppm
100% +3320 ppm
Table 3.2: Deviation in the speed of sound with air humidity levels. ??
1000
100 so it’s useful for really small numbers. Therefore 1000 ppm is 1000000 =
0.001 = 0.1%.
3.1.13 Wavelength
Let’s say that you’re standing outside, whistling a perfect 1 kHz sine tone.
The moment you start whistling, the first wave – the wavefront – is moving
away from you at a speed of 344 m/s. This means that exactly one second
after you started whistling, the wavefront is 344 m away from you. At exactly
that same moment, you are starting to whistle your 1001st cycle (because
you’re whistling 1000 cycles per second). If we could stop time and look at
the sound wave in the air at that moment, we would see the 1000 cycles that
you just whistled sitting in the air taking up 344 m. Therefore you have
1000 cycles for every 344 m. Since we know this, we can calculate the length
of one wave by dividing the speed of sound by the frequency – in this case,
344/1000 = 34.4 cm per wave in the air. This is known as the wavelength
The wavelength (abbreviated λ) is the distance from a point on a periodic
(a fancy word meaning ‘repeating’) waveform to the next identical point.
(i.e. crest to crest, or positive zero-crossing to positive zero crossing)
Equation 3.8 is used to calculate the wavelength, measured in metres.
c
λ= (3.8)
f
phase change of the waveform per metre. This value is called the acoustic
wavenumber of the sound wave and is abbreviated k0 or sometimes, just k.
It’s measured in radians per metre and is calculated using Equation 3.9.
ω
k0 = (3.9)
c
Note that you will see this under a couple of different names – wave
number , wavenumber and acoustic wavenumber will show up in different
places to mean the same thing. The problem is that there are a couple
of different definitions of the term “wavenumber” so you’re best to use the
proper term “acoustic wavenumber.”
Constructive Interference
If you’re equidistant from the two speakers, then you’ll be receiving the same
part of the pressure wave at the same time. So, if you’re getting the high
point in the wave from one speaker, you’re getting a high pressure from the
second speaker as well.
Likewise, if you’re getting a low pressure from one speaker, you’re also
receiving a low pressure from the other.
3. Acoustics 166
The end result of this overlap is that you get twice the pressure difference
between the high and low points in your wave. This is because the two waves
are interfering with each other constructively. This happens because the two
have a phase relationship of 0◦ at your position.
Essentially all we’re doing is adding two simultaneous points from the
first two graphs and winding up with the bottom graph.
-1
-2
0 50 100 150 200 250 300 350
-1
-2
0 50 100 150 200 250 300 350
-1
-2
0 50 100 150 200 250 300 350
Figure 3.7: The top two plots are the individual signals from two loudspeakers in time measured at
a position equidistant to both loudspeakers. The bottom plot is the resulting summed signal.
Destructive Interference
-1
-2
0 50 100 150 200 250 300 350
-1
-2
0 50 100 150 200 250 300 350
-1
-2
0 50 100 150 200 250 300 350
Figure 3.8: The top two plots are the individual signals from two loudspeakers in time measured
at a position where one loudspeaker is half a wavelength farther away than the other. The bottom
plot is the resulting summed signal.
0.5
−0.5
−1
0 100 200 300 400 500 600 700 800 900 1000
0.5
−0.5
−1
0 100 200 300 400 500 600 700 800 900 1000
−1
−2
0 100 200 300 400 500 600 700 800 900 1000
Figure 3.9: The top two plots are sinusoidal waves with slightly different frequencies, f1 and f2 .
The bottom plot is the sum of the top two. Notice that the modulation in the amplitude of the
result is periodic with a “beat frequency” of f2 − f1
1
Pressure
−1
−2
0 100 200 300 400 500 600 700 800 900 1000
Time
1.5
Amplitude
0.5
0
2 3 4
10 10 10
Frequency (Hz)
Figure 3.10: Two graphs showing exactly the same information. An infinitely short amplitude spike
in the time domain is equivalent of all frequencies being present at that moment in time.
have all frequencies with random relative amplitude and phase, the result is
noise in its various incarnations.
There is an official document defining four types of noise. The spec-
ifications for white, pink, blue and black noise are all found in The Fed-
eral Standard 1037C Telecommunications: Glossary of Telecommunication
Terms. (I got the definitions from Rane’s online dictionary of audio terms
at http://www.rane.com.)
White Noise
White noise is defined as a noise that has equal amount of energy per fre-
quency. This means that if you could measure the amount of energy between
100 Hz and 200 Hz it would equal the amount of energy between 1000 Hz
and 1100 Hz. Because all frequencies have equal level, we call the noise
white – just like light that contains all frequencies (colours) equally is white
light.
This sounds “bright” to us because we hear pitch in octaves. 1 octave is
a doubling of frequency, therefore 100 Hz – 200 Hz is an octave, but 1000
Hz – 2000 Hz (not 1000 Hz – 1100 Hz) is also an octave. Since white noise
contains equal energy per Hz, there’s ten times a much energy in the 1 kHz
octave than in the 100 Hz octave.
Pink Noise
Pink noise is noise that has an equal amount of energy per octave. This
means that there is less energy per Hz as you go up in frequency (in fact,
there is a power loss of 50% (or a drop of 3.01 dB) each time you go up an
octave)
This is used because it sounds relatively “equal” in distribution across
frequency bands to us.
Another way of defining this noise is that the power of each frequency f
is proportional to f1 .
Blue Noise
Blue noise is noise that is the opposite of pink noise in that it doubles the
amount of power each time you go up 1 octave. You’ll virtally never see it
(or hear it for that matter...).
Another way of defining this noise is that the power of each frequency f
is proportional to the frequency.
3. Acoustics 171
Purple Noise
Purple Noise is to blue noise as red noise is to pink. It increases in power
by 6.02 dB for every increase in frequency of 1 octave. (In other words, the
power is proportional to f 2 .)
Black Noise
This is an odd case. It is essentially silence with the occasional randomly-
spaced spike.
like the colour of your left shoe... We know from high school that the cir-
cumference is equal to the radius multiplied by about 6.28 (also known as
2π). The graph in Figure 3.11 shows the relationship between the radius
and the circumference. You can see that the latter grows much more quickly
than the former. What this means is that as the radius slowly expands out
from the point of impact, the energy is getting shared between a “length”
of the wave that is growing far faster (note that, if we double the radius, we
double the circumference).
700
600
500
Circumference
400
300
200
100
0
0 20 40 60 80 100
Radius
Figure 3.11: The relationship between the circumference of a circle and its radius.
The same holds true with pressure waves expanding from a loudspeaker
into a room. The only real difference is that the energy is expanding into 3
dimensions rather than 2, so the surface area of the spherical wavefront (the
3-D version of the circumference of the circular wave on the lake...) increases
much more rapidly than the 2-dimensional counterpart. The equation used
to find the surface of a sphere is 4πR2 where R is the radius. As you can
see in Figure 3.12, the surface area of the sphere is already at 1200 units
squared when the radius has only expanded to 10 units. The result of this
in real life is that the energy appears to be dissipating at a rate of 6.02
dB per doubling of distance. (when we double the radius, we increase the
surface area of the sphere fourfold.) Of course, all of this assumes that the
wavefront doesn’t hit anything like a wall or the floor or you...
4
x 10
14
12
10
Surface area
0
0 20 40 60 80 100
Radius
Figure 3.12: The relationship between the surface area of a sphere and its radius.
to move away from you forever, without ever encountering any surface. No
reflections or diffraction at all – forever. This space is a theoretical idea
known as a free field because the wavefront is free expand.
If you put a microphone in this free field, the wavefront from a single
sound source would come from a single direction. This seems obvious, but
I only mention it to compare with the next section.
For a visual analogy of what we’re talking about, imagine that you’re
floating in space and the only thing you can see is a single star. There are
at least three things that you’d notice about this odd situation. Firstly, the
star doesn’t appear to be very bright, because most of its energy is going in a
different direction than towards you. Secondly, you’d notice that everything
but the star is very, very dark. Finally, you’d notice that shadows are very
distinct and also very, very dark.
if you wait long enough, you’ll get a wavefront from every possible direction
at some time.
If we consider this in terms of probability, then we can say that, in this
theoretical space, sound waves have an equal probability of coming from any
direction at any given moment. This is essentially the definition of a diffuse
field.
For a visual example of this, look out the window of a plane as you’re fly-
ing through a cloud on a really sunny day. The light from the sun bounces off
of all the little particles in the cloud, so, from your perspective, it essentially
comes from everywhere. This causes a couple of weird sensations. Firstly,
there are no shadows – this is because the light is coming from everywhere
so nothing can shadow anything else. Secondly, you have a very difficult
time determining distance. Unless you can see the wing of the plane, you
have no idea how far away you’re actually able to see. This the same reason
why people have car accidents in blinding snowstorms. They drive because
they think they can see ahead much further than they’re really able to.
to two different pendulums, one light one and one heavy one, then we’ll get
two different maximum displacements and velocities as a result. Essentially,
the heavier pendulum is harder to move, so we don’t move it as far.
The issue that we’re now discussing is how much the pendulum impedes
your attempts to move it. The same is true of molecules moved by a sound
wave. Air molecules are like a light pendulum – they’re relatively easy
to move. On the other hand, if we were to put a loudspeaker in poured
concrete and play a tune, it would be much harder for the speaker to move
the concrete molecules – therefore they wouldn’t move as far with the same
pressure applied by the loudspeaker. There would still be a sound wave
going through the concrete (just as the heavy pendulum would move – just
not very much) but it wouldn’t be very loud.
The measurement of how much velocity results from a given amount of
pressure is an indication of how hard it is it move the molecules – in other
words, how much the molecules impede the transfer of energy. The higher
the impedance, the lower the velocity for a given amount of pressure. This
can be seen in Equation 3.10 which is true only for the free field situation.
p
z= (3.10)
u
where z is the acoustic impedance in acoustic ohms (abbreviated Ω).
As you can see in this equation, z is proportional to p and inversely pro-
portional to u. This means that if the impedance goes up and the pressure
stays the same, then the velocity will go down.
In the specific case of unbounded plane waves (waves with a flat wave-
front – not curved like the ones we’re been discussing so far), this ratio is
also equal to the product of the volume density of the medium, ρo and the
speed of wave propogation c as is shown in Equation 3.11 [Olson, 1957].
This value zo is known as the specific acoustic impedance or characteristic
impedance of the medium.
z o = ρo c (3.11)
to the situation where you have a sound wave going through one medium
(say, air for example...) into another medium (like concrete). As we’ve al-
ready seen, the difference in the acoustic impedances of the two media will
determine how much of the sound wave gets reflected back into the air and
how much will be transmitted into the concrete. What we haven’t looked at
yet is whether or not there will be a phase change in the reflected and the
transmitted pressure waves. It’s important here to point out that I’m not
talking about a polarity change – I’m talking about a phase change.
Let’s just consider the transmitted pressure wave for a while. If there
is no change in the phase of the wave as a result of it hitting the second
medium, then we can say that the second medium has only an acoustic re-
sistance. If the acoustic impedance has no reactance component (remember
it only has two components) then there will be no phase shift between the
incident pressure wave and the transmitted pressure wave.
On the other hand, let’s say that there is suddenly a 90◦ delay in the
transmitted wave when compared to the incident wave (yes, this can happen
– particularly in cases where the second medium is flexible and bends when
the sound wave hits it). In this particular case, then the second medium has
only an acoustic reactance.
So, an acoustic resistance causes a 0◦ phase shift and an acoustic reac-
tance causes a 90◦ phase shift. By now, this should be reminding you of
Section 1.5.13. Remember from this chapter that a sinusoidal wave with
any magnitude and phase shift can be expressed using a real (0◦ ) and and
imaginary (90◦ ) component? The magnitude of the waveform depends on
the values of the real and the imaginary components, and the phase shift is
determined by the relative values of the two components.
This same system exists for impedance. In fact, you will often see books
saying that impedance is a complex value containing both a real and an
imaginary component. In other words, there will be a phase shift between
the incident wave and the transmitted and reflected waves. This phase shift
can be calculated if you know the relative balance of the real component (the
acoustic resistance) and the imaginary component (the acoustic reactance)
of the acoustic impedance.
If the concepts of acoustic impedance, resistance and reactance are a
little confusing, don’t worry for now. Go back and read Sections 2.1 and
2.4. If it still doesn’t make sense after that, then you can worry. To quote
Telly Monster from Sesame Street, “Don’t worry, be happy. Or worry, and
then be happy.”
3. Acoustics 177
3.1.22 Power
So far, we’ve looked at a number of different ways to measure the level of a
sound. We’ve seen the pressure, the particle displacement and velocity and
some associated measurements like the SPL. These are all good ways to get
an idea of how loud a sound is at a specific point in space, but they are all
limited to that one point. All of these measurements tell you how loud a
sound is at the point of the receiver, but they don’t tell you much about how
loud the sound source itself is. If this doesn’t sound logical, think about the
light radiated from a light bulb – if you measure that light from a distance,
you can only tell how bright the light is where you measure it, you can’t tell
the wattage of the bulb (a measure of how powerful it is).
We’ve already seen that the particle velocity is proportional to the pres-
sure applied to the particles by the sound source. The higher the pressure,
the greater the velocity. However, we’ve also seen that, the greater the
acoustic impedance, the lower the particle velocity for the same amount of
pressure. This means that, if we have a medium with a higher acoustic
impedance, we’ll have to apply more pressure to get the same particle ve-
locity as we would with a lower impedance. Think about the pendulums
again – if we have a heavy and a light one and we want them to have the
same velocity, we’ll have to push harder on the heavy one to get it to move
as fast as the light one. In other words, we’ll have to do more work to get
the heavy one to move as fast.
Scientists typically don’t like work – otherwise they would have gotten
a job in construction instead... As a result, they even use a different word
to express work – specifically, they talk about how much work can be done
using power . The more power you have in a device, the more work it can
do. This can be seen from day to day in the way light bulbs are rated.
The amount of work that they do (how much light and heat they give off)
is expressed in how much power they use when they’re turned on. This
electrical power rating (expressed in Watts) will be discussed in Section
2.1.4.
In the case of acoustics, the amount of work that is done by the sound
source is proportional to the pressure and the particle velocity – the more
pressure and/or the more velocity, the more work you had to do to achieve
it. Therefore the acoustic power measured at a specific point in space can
be calculated using Equation ??. Remember that the change in pressure is
a result of the work that is done – you put power into the system and you
get a change in power as an output.
PUT ACOUSTIC POWER EQUATION HERE
3. Acoustics 178
3.1.23 Intensity
In theory, we can think of the sound power at a single point in space as we
did in the previous section. In reality, we cannot measure this, because we
don’t have any microphones to measure a point that small. Microphones for
measuring acoustic fields are pretty small, with diameters on the order of
millimeters, but they’re not infinitely small. As a result, if we oversimplify a
little bit for now, the microphone is giving us an output which is essentially
the sum of all of the pressures applied to its diaphragm. If the face of the
diaphragm is perpendicular to the direction of travel of the wavefront, then
we can say that the microphone is giving us an indication of the intensity
of the sound wave. Huh?
Well, the intensity of a sound wave is the measure of all of the sound
power distributed over a given area normal (perpendicular) to the direction
of propagation. For example, let’s think about a sound wave as a sphere
expanding outwards from the sound source. When the sphere is at the sound
source, it has the same amount of power as was radiated by the source, all
packed into a small surface area. If we ignore any absorption in the air,
as the sphere expands (because the wavefront moves away from the sound
source in all directions) the same power is contained in the bigger surface
area. Although the sphere gets bigger over time, the total power contained
in it never changes.
If we did the same thought experiment, but only considered an angular
slice of the sphere - say 45◦ by 45◦ , then the same rule would hold true. As
the sphere expands, the amount of power contained in the 45◦ by 45◦ slice
would remain the same, even though its total surface area would increase.
Now, let’s think of it a different way. Instead of thinking of the whole
3. Acoustics 179
sphere, or an angular slice of it, let’s think about a fixed surface area such
as 1 cm2 on the sphere. As the wavefront moves away from the sound
source and the sphere expands, the fixed surface area becomes a smaller
and smaller component of the total surface area of the sphere. Since the
total power distributed over the sphere doesn’t change, then the amount of
power contained in our little 1 cm2 gets less and less, proportional to the
ratio of the area to the total surface area of the sphere.
If the surface area that we’re talking about is part of the sphere ex-
panding from the sound source (in other words, if it’s perpendicular to the
direction of propagation of the wavefront) then the total sum of power in
the area is what is called the sound intensity.
This is why sound appears to get quieter as we move further away from
the sound source. Since your eardrum doesn’t change in surface area, as
you get further from a sound source, it has less intensity – there is less total
sound power on the surface of your eardrum because your eardrum is smaller
compared to the size of the sphere radiating from the source.
3. Acoustics 180
p = Pei(ωt+kx) (3.12)
MORE EXPLANATION
p
i
p p
r t
Figure 3.13: The relationship between the incident, transmitted and reflected pressure waves as-
suming that all rays are perpendicular to the boundary.
pt = pi + pr (3.14)
where pi is the incident pressure in the first medium, pr is the reflected
pressure in the first medium and pt is the pressure transmitted into the
second medium, all measured at the boundary of the two media.
This equation should look a little weird at first – intuition says that the
energy in the incident pressure should be equal to the sum of the reflected
and transmitted pressures, and you’d be right if you thought this. Notice,
however, that Equation 3.14 uses small p’s instead of capitals.
FINISH THIS OFF!!
Similar to Equation 3.14, the difference between the incident and re-
flected particle velocities equals the transmitted particle velocity as is shown
in Equation 3.15 [Kinsler and Frey;, 1982].
ut = ui − ur (3.15)
As a result, we can combine Equations 3.10, 3.14 and 3.15 to produce
Equation 3.16 [Kinsler and Frey;, 1982].
3. Acoustics 183
pi + pr p
zt = = t (3.16)
ui − ur ut
where zt is the acoustic impedance of the second medium.
Pr
R= (3.17)
Pi
Pt
T= (3.18)
Pi
What use are these? Well, let’s say that you have a sound wave hitting
a wall with a reflection coefficient of R = 1. This then means that Pr = Pi ,
which is a mathematical way of saying that all of the sound will bounce back
off the wall. Also because of Equation 3.14, this also means that none of
the sound will be transmitted into the wall (because Pr = 0 and therefore
T = 0), so you don’t get angry neighbours. On the other hand, if R = 0.5,
then Pr = P2 i which in turn means that Pr = P2 i (and therefore that
T = 0.5) and you might be sending some sound next door... although we
would have to do a little more math to really decide whether that was indeed
the case.
Note that the pressure reflection coefficient can either be a positive num-
ber of a negative number. If R is a positive number then the pressure of the
reflection will have the same polarity as the incident wave, however, if R
is negative, then the pressures of the incident and reflected waves will have
opposite polarities. THINK BACK TO THE EXAMPLE WITH YOUR
FRIENDS...
has absorbed some of the sound. The amount of absorption depends on the
characteristics of the material, but a good rule of thumb is that when the
material is made of a lot of changes in density, then you’re going to get more
absorption than in a material with a constant density. For example, air has
pretty much the same density all over a room, therefore you don’t get much
sound energy absorbed by air. Fibreglas insulation, on the other hand is
made up of millions of bits of glass and pockets of air, resulting in a large
number of big changes in acoustic impedance through the material. The
result is that the insulation converts most of the sound energy into heat. A
good illustration of this is a rumour that I once heard about an experiment
that was done in Sweden some years ago. Apparently, someone tried to
measure the sound pressure level of a jet engine in a large anechoic chamber
which is a room that is covered in absorbent material so that there are no
reflections off any of the walls. Of course, since the walls are absorbent, then
this means that they convert sound into heat. The sound of the jet engine
was so loud that it caused the absorptive panels to melt! Remember that
this was not because the engine made heat, but that it made noise.
Usually, a material’s absorption coefficient, α is found by measuring the
amount of energy reflected off it.
CHECK THIS AND FINISH IT OFF
absorbedenergy
α= (3.19)
incidentenergy
Air Absorption
In practice, for lower frequencies, no energy will be lost in the propagation
through air. However, for shorter wavelengths, there is an increasing at-
tenuation due to viscothermal losses (meaning losses due to energy being
converted into heat) in the medium. These losses in air are on the order of
0.1 dB per metre at 10 kHz as is shown in Figure 3.14.
Usually, we can ignore this effect, since we’re usually pretty close to
sound sources. The only times that you might want to consider this is
when the sound has travelled a very long distance which, in practice, means
either a sound source that’s really far away outdoors, or a reflection that
has bounced around a big room a lot before it finally gets to the listening
position.
Figure 3.14: Attenuation resulting from air absorption due to propagation distance vs. frequency
for various relative humidity levels (a) 20 %; (b) 40 %; (c) 80 % [Kutruff, 1991]
3. Acoustics 186
Figure 3.15: The top diagram is a simplified representation of a plane wave travelling from top to
bottom of the page. The bottom diagram shows a large number of closely-spaced point sources
emitting the same frequency. Notice that the interference pattern of the multiple sources looks
very similar to the plane wave. In fact, the more sources you have, and the closer together they
are, the closer the more the two results will be.
3. Acoustics 187
ϑi = ϑr (3.21)
This is exactly the same as the light that bounces off a mirror. The
light hits the mirror and then is reflected off at an angle that is equal to the
angle of incidence. As a result, the reflections looks like a light bulb that
appears to be behind the mirror. There is one interesting thing to note here
– the point on the mirror where the light is reflected is dependent on the
locations of the light, the mirror and the viewer. If the viewer moves, then
the location of the reflection does as well. If you don’t believe me, go get a
light and a mirror and see for yourself.
Since this type of reflection is most commonly investigated as it applies
to visual media and thus reflected light, it is usually considered only in the
spatial domain as is shown in the above diagram. The study of specular
3. Acoustics 188
ϑr
ϑi
Figure 3.16: Relationship between the angles of incidence and reflection in the case of a specular
reflector.
Figure 3.17: Impulse response of direct sound and specular reflection. Note that the Time is
referenced to the moment when the impulse is emitted by the sound source, hence the delay in the
time of arrival of the initial direct sound.
Figure 3.18: Relationship between the angles of incidence and reflection in the case of a diffusive
reflector.
Ir ∝ Ii cos(ϑi ) (3.22)
where Ir and Ii are the intensities of the reflected and incident sound
waves respectively.
Note that, in this case of a perfectly diffusing reflector, the “bright point”
3. Acoustics 190
is the point on the surface of the reflector where the wavefront hits perpen-
dicular to the surface. This means that it is irrelevant where the viewer is
located. This can be seen in Figure 3.19 which shows a reflecting surface
that is partly diffuse and specular. In this case, if the viewer were to move,
the smaller specular reflection would move, but the diffuse reflection would
not.
Figure 3.19: Photograph of a wall surface which exhibits both specular and diffusive reflective
properties. Note that there are two bright spots. The small area on the right is the specular
reflection, the larger area in the centre is the diffuse component.
4. directivity smear
Figure 3.20: Diffused reflection showing spatial distribution of the reflected power for a single
receiver.
3. Acoustics 192
The first issue will be discussed below. The second, third and fourth
points are the product of the fact that the received reflection is distributed
over the surface of the reflector. This results in multiple propagation dis-
tances for a single reflection as well as multiple angles and reflection loca-
tions. Since the reflection is distributed over both space and time at the
listening position, there is an effect on the frequency content. Whereas,
in the case of a perfect specular reflector, the frequency components of the
resulting reflection form an identical copy of the original sound source, a dif-
fusive reflector will modify those frequency characteristics according to the
particular geometry of the surface. Finally, since the reflections are more
widely distributed over the surfaces of the enclosure, the reverberant field
approaches a perfectly diffuse field more rapidly.
Figure 3.21: Impulse response of direct sound as well as the specular and simplified diffused reflec-
tion components. Note that the Time is referenced to the moment when the impulse is emitted
by the sound source, hence the delay in the time of arrival of the initial direct sound.
Surface Types
The natural world is comprised of very few specular reflectors for light
waves even fewer for acoustic signals. Until the development of artificial
structures, reflecting surfaces were, in almost all cases, irregularly-shaped
(with the possible exception of the surface of a very calm body of water).
As a result, natural acoustic reflections are almost always diffused to some
extent. Early structures were built using simple construction techniques and
resulted in flat surfaces and therefore specular reflections.
For approximately 3000 years, and up until the turn of the 20th century,
architectural trends tended to favour florid styles, including widespread use
of various structural and decorative elements such as fluted pillars, entab-
latures, mouldings, and carvings. These random and periodic surface irreg-
ularities resulted in more diffused reflections according to the size, shape
and absorptive characteristics of the various surfaces. The rise of the In-
ternational Style in the early 1900s [Nuttgens, 1997] saw the disappearance
of these largely irregular surfaces and the increasing use of expansive, flat
surfaces of concrete, glass and other acoustically reflective materials. This
stylistic move was later reinforced by the economic advantages of these de-
sign and construction techniques.
Maximum length sequence diffusers
The link between diffused reflections and better-sounding acoustics has
resulted in much research in the past 30 years on how to construct diffusive
surfaces with predictable results. This continues to be an extremely popular
topic at current conferences in audio and acoustics with a great deal of the
work continuing on the breakthroughs of Schroeder.
In his 1975 paper, Schroeder outlined a method of designing surface
irregularities based on maximum length sequences (MLS) [Golomb, 1967]
which result in the diffusion of a specific frequency band. This method relies
on the creation of a surface comprised of a series of reflection coefficients
alternating between +1 and -1 in a predetermined periodic pattern.
Consider a sound wave entering the mouth of a well cut into the wall
from the concert hall as shown in Figure 3.22.
Assuming that the bottom of the well has a reflection coefficient of 1,
the reflection returns to the entrance of the well having propagated a dis-
tance equalling twice its depth dn , and therefore undergoing a shift in phase
relative to the sound entering the well. The magnitude of this shift is depen-
dent on the relationship between the wavelength and the depth according
to Equation 3.23.
dn
ϕ = 4π (3.23)
λ
3. Acoustics 194
where ϕ is the phase shift in radians, dn is the depth of the well and λ
is the wavelength of the incident sound wave.
Therefore, if λ = 4dn , then the reflection will exit the well having un-
dergone a phase shift of π radians. According to Schroeder, this implies
that the well can be considered to have a reflective coefficient of -1 for that
particular frequency, however this assumption will be expanded to include
other frequencies in the following section.
Using an MLS, the particular required sequence of positive and negative
reflection coefficients can be calculated, resulting in a sequence such as the
following, for N=15:
+++---+--++-+-+
This is then implemented as a series of individually separated wells cut
into the reflecting surface as is shown in Figure 3.23. Although the depth
of the wells is dependent on a single so-called design wavelength denoted
λo , in practice it has been found that the bandwidth of diffused frequencies
ranges from one-half octave below to one half octave above this frequency
[Schroeder, 1975]. For frequencies far below this bandwidth, the signal is
typically assumed to be unaffected. For example, consider a case where
the depth of the wells is equal to one half the wavelength of the incident
sound wave. In this case, the wells now exhibit a reflective coefficient of +1;
exactly the opposite of their intended effect, rendering the surface a flat and
therefore specular reflector.
The advantage of using a diffusive surface geometry based on maximum
length sequences lies in the fact that the power spectrum of the sequence is
flat except for a minor dip at DC [Schroeder, 1975]. This permits the acous-
tical designer to specify a surface that maintains the sound energy in the
room through reflection while maintaining a low interaural cross correlation
3. Acoustics 195
room
Figure 3.23: MLS Diffuser showing the relationship between the individual wells cut into the wall
and the MLS sequence.
sn = n2 , mod(N ) (3.24)
where sn is the sequence of relative depths of the wells, n is a number
in the sequence of non-negative consecutive integers (0, 1, 2, 3 ...) denoting
the well number, and N is a non-negative odd prime number.
If you’re uncomfortable with the concept of the modulo function, just
think of it as the remainder. For example, 5, mod(3) = 2 because 35 = 1
with a remainder of 2. It’s the remainder that we’re looking for.
For example, for modulo 17, the series is
0, 1, 4, 9, 16, 8, 2, 15, 13, 13, 15, 2, 8, 16, 9, 4, 1, 0, 1, 4, 9, 16, 8, 2, 15
3. Acoustics 196
...
As may be evident from the representation of this series in Figure 3.24,
the pattern is repeating and symmetrical around n = 0 and N2 .
Figure 3.24: Schroeder diffuser for N = 17 following the relative depths listed above. Note that
this diagram shows 3 repetitions of the period of the sequence.
The actual depths of the wells are dependent on the design wavelength of
the diffuser. In order to calculate these depths, Schroeder suggests Equation
3.25.
λo
dn = sn (3.25)
2N
where dn is the depth of well n and λo is the design wavelength [Schroeder, 1979].
The widths of these wells w should be constant (meaning that they
should all be the same) and small compared to the design wavelength (no
greater than λ2o ; Schroeder suggests 0.137λo ). Note that the result of Equa-
tion 3.25 is to make the median well depth equal to one-quarter of the design
wavelength. Since this arrangement has wells of varying depths, the result-
ing bandwidth of diffused sound is increased substantially over the MLS
diffuser, ranging approximately from one-half octave below the design fre-
quency up to a limit imposed by λ > λNo and, more significantly, λ > 2w
[Schroeder, 1979].
The result of this sequence of wells is an apparently flat reflecting surface
with a varying and periodic impedance corresponding to the impedance at
the mouth of each well. This surface has the interesting property that, for
the frequency band mentioned above, the reflections will be scattered to
propagate along predictable angles with very small differences in relative
amplitude.
Figure 3.25: An overly-simplified diagram showing how an obstruction (the red block) will shadow
a plane wave on one side. Also shown is an overly-simplified example of diffraction, shown as the
waves centered at the corners of the obstruction. Omitted are the reflections off the obstruction,
and the diffraction off the top two corners.
Figure 3.26: You, a rope and a fence post before anything happens.
Then, with a flick of your wrist, you quickly move your end of the rope
up and back down to where you started. If you did this properly, then the
rope will have a bump in it as is shown in Figure 3.27. This bump will move
quickly down the rope towards the other end.
Figure 3.27: You, a rope and a fence post just after you have flicked your wrist to put a bump in
the rope.
When the bump hits the fence post, it can’t move it because fence posts
are harder to move than rope molecules. Since the fence post is at the end
of the rope, we say that it terminates the rope. The word termination is
one that we use a lot in this book, both in terms of acoustic as well as
electronics. All it means is the end of the system – the rope, the air, the
wire... whatever the “system” is. So, an acoustician would say that the rope
is terminated with a high impedance at the fence post end.
Remember back to the analogy in Section 3.2.2 where the person in the
front was pushing on a concrete wall. You pushed the person ahead of you
and you wound up getting pushed in the opposite direction that you pushed
3. Acoustics 199
in the first place. The same is true here. When the wave on the rope refelects
off a higher impedance, then the reflected wave does the same thing. You
pull the rope up, and the reflection pulls you down. This end result is shown
in Figure 3.28.
Figure 3.28: You, a rope and a fence post just after the wave has reflected off the fence post (the
high impedance termination).
Figure 3.29: A frame-by-frame diagram showing what happens to a guitar string when you pluck
it.
If we were to pluck the string and wait a bit, then it would settle down
into a pretty predictable and regular motion, swinging back and forth looking
a bit like a skipping rope being turned by a couple of kids. If we were to
make a move of this movement, and look at a bunch of frames of the film
all at the same time, they might look something like Figure 3.30.
In Figure 3.30, we can see that the string swings back and forth, with
the point of largest displacement being in the centre of the string, halfway
between the two anchored points at either end. Depending on the length,
the tension and the mass of the string, it will swing back and forth at
some speed (we’ll look at how to calculate this a little later...) which will
determine the number of times per second it oscillates. That frequency is
called the fundamental resonant frequency of the string. If it’s in the right
range (between 20 Hz and 20,000 Hz) then you’ll hear this frequency as a
musical pitch.
In reality, this is a bit of an oversimplification. The string actually
resonates at other frequencies. For example, if you look at Figure 3.31,
you’ll see a different mode of oscillation. Notice that the string still can’t
move at the two anchored end points, but it now also does not move in the
centre. In fact, if you get a skipping rope or a telephone cord and wiggle it
back and forth regularly at the right speed, you can get it to do exactly this
3. Acoustics 201
Figure 3.30: The first mode of vibration of a string anchored on both ends. This will vibrate at a
frequency f .
Figure 3.31: The second mode of vibration of a string anchored on both ends. This will vibrate at
a frequency 2f .
One of the interesting things about the mode shown in Figure 3.31 is
that its wavelength on the string is exactly half the wavelength of the mode
shown in Figure 3.30. As a result, it vibrates back and forth twice as fast and
therefore has twice the frequency. Consequently, the pitch of this vibration
is exactly one octave higher than the first.
3. Acoustics 202
Figure 3.32: The third mode of vibration of a string anchored on both ends. This will vibrate at a
frequency 3f .
Since the string is actually vibrating with all of these modes at the
same time, with some relative balance between them, we wind up hearing a
fundamental frequency with a number of harmonics. The combined timbre
(or sound colour) of the sound of the string is determined by the relative
levels of each of these harmonics as they evolve over time. For example,
if you listen to the sound of a guitar string, you might notice that it has a
very bright sound immediately after it has been plucked, and that the sound
gets darker over time. This is because at the start of the sound, there is a
relatively high amount of energy in the upper harmonics, but these decay
more quickly than the lower ones and the fundamental. Therefore, at the
end of the sound, you get only the lowest harmonics and fundamental of the
string, and therefore a darker sound quality.
It might be a little difficult to think that the string is moving at a
maximum in the middle for some modes of vibration in exactly the same
place as it’s not moving at all for other modes. If this is confusing, don’t
worry, you’re completely normal. Take a look at Figure 3.33 which might
help to alleviate the confusion. Each mode is an independent component
that can be considered on its own, but the total movement of the string is
the result of the sum of all of them.
One of the neat tricks that you can do on a stringed instrument such as
3. Acoustics 203
Figure 3.33: The sum of the three first modes of vibration of a string. The top three plots show
the first, second and third modes of vibration with relative amplitudes 1, 0.5 and 0.25 respectively.
The bottom plot is the sum of the top three.
a violin or guitar is to play these modes selectively. For example, the normal
way to play a violin is to clamp the string down to the fingerboard of the
instrument with your finger to effectively shorten it. This will produce a
higher note if the string is plucked or bowed. However, you could gently
touch the string at exactly the halfway point and play. In this case, the
string still has the same length as when you’re not touching it. However,
your finger is preventing the string from moving at one particular point.
That point (if your finger is halfway up the string) is supposed to be the
point of maximum movement for all of the odd harmonics (which include the
fundamental – the 1st harmonic). Since your finger is there, these harmonics
can’t vibrate at all, so the only modes of vibration that work are the even-
numbered harmonics. This means that the second harmonic of the string
is the lowest one that’s vibrating, and therefore you hear a note an octave
higher than normal. If you know any string players, get them to show you
this effect. It’s particularly good on a cell or double bass because you can
actually see the harmonics as a shape of the string.
watch what happens. What you’ll (hopefully) be able to see is that the
bump you created by tapping travels in two opposite directions to the two
ends of the rope. These two bumps reflect and return, meeting each other at
some point, crossing each other and so on. This process is shown in Figures
3.34 through 3.37.
striking probe
point
Figure 3.34: A rope tied to two fence posts just after you have tapped it.
striking probe
point
Figure 3.35: A rope tied to two fence posts a little later. Note that the bump you created is
travelling in two directions, and each of the two resulting bumps is smaller than the original. The
bump on the left has already reflected off the left end of the rope.
striking probe
point
Figure 3.36: The two bumps after they have reflected off the high impedance terminations (the
fence posts). When the two bumps meet each other, they appear for an instant to be one bump
Let’s assume that you are able to make the bump in the rope infinitely
narrow, so that it appears as a spike or an impulse. Let’s also put a theoret-
ical probe on the rope that measures its vertical movement at a single point
over time. We’ll also, for the sake of simplicity, put the probe the same
distance from one of the fence posts as you are from the other post. This
3. Acoustics 205
striking probe
point
Figure 3.37: After the two reflections have met each other, they continue on in opposite directions.
(The right bump has reflected off the right end of the rope.) The process then repeats itself
indefinitely, assuming that there are no losses of energy in the system due to things like friction.
is to ensure that the probe is at the point where the two little spikes meet
each other to make one big spike. If we graphed the output of the probe
over time, it would look like Figure 3.38.
0.5
Vertical displacement
-0.5
-1
0 100 200 300 400 500 600 700 800 900 1000
Time
Figure 3.38: The output of the probe showing the vertical movement of the rope over time. This
response corresponds directly to Figures 3.34 to 3.37
This graph shows how the rope responds in time when the impulse (an
instantaneous change in displacement or pressure which instantaneous) is
applied to it. Consequently we call it the impulse response of the system.
Note that the graph in Figure 3.38 corresponds directly to Figures 3.35 to
3.37 so that you can see the relationship between the displacement at the
point where the probe is located on the string and passing time. Note that
only the first three spikes correspond to the pictures – after those three have
gone by, the whole thing repeats over and over.
As we’ll see later in Section 8.2, we are able to do some math on this
impulse response to find out what the frequency content of the signal is –
3. Acoustics 206
in other words, the harmonic content of the signal. The results of this is
shown in Figure 3.39.
10
-10
-20
-30
Amplitude (dB)
-40
-50
-60
-70
-80
-90
-100
2 3 4
10 10 10
Frequency (Hz)
Figure 3.39: The frequency content of the string’s vibrations at the location of the probe.
This graph shows us that we have the fundamental frequency and all
its harmonics at various levels up to ∞ Hz. The differences in the levels of
the harmonics is due to the relative locations of the striking point and the
probe on the string. If we were to move either or both of these locations, then
the relative times of arrival of the impulses would change and the balance
of the harmonics would change as well. Note that the actual frequencies
shown in the graph are completely arbitrary. These will change with the
characteristics of the string as we’ll see below in Section 3.5.4.
“So what?” I hear you cry. Well, this tells us the resonant frequencies of
the string. Basically, Figure 3.39 (which is a frequency content plot based on
the impulse response in time) is the same as the description of the standing
wave in Section 3.5.2. Each spike in the graph corresponds to a frequency
in the standing wave series.
If we weren’t able to change the pitch of a string, then many of our musi-
cal instruments would sound pretty horrid and we would have very boring
music... Luckily, we can change the frequency of the modes of vibration of
a string using any combination of three variables.
3. Acoustics 207
String Mass
The reason a string vibrates in the first place is because, when you pull
it back and let go, it doesn’t just swing back to the equilibrium position
and stop dead in its tracks. It has momentum and therefore passes where
it wants to be, then turns around and swings back again. The heavier the
string is, the more momentum it has, so the harder it is to stop. Therefore,
it will move slower. (I could make an analogy here about people, but I’ll
behave...) If it moves slower, then it doesn’t vibrate as many times per
second, therefore heavier strings vibrate at lower frequencies.
This is why the lower strings on a guitar or a piano have a bigger diam-
eter. This makes them heavier per metre, therefore they vibrate at a lower
frequency than a lighter string of the same length. As a result, your piano
doesn’t have to be hundred of metres long...
String Length
The fundamental mode of vibration of a string results in the string looking
like a half-wavelength of a sound wave. In fact, this half-wavelength is
exactly that, but the medium is the string itself, not the air. If we make the
wavelength longer by lengthening the string, then the frequency is lowered,
just as a longer wavelength in air corresponds to a lower frequency.
String Tension
As we saw earlier, the reason the string vibrates is because it’s trying to get
back to the equilibrium position. The tension on the string – how much its
being stretched – is the force that’s trying to pull it back into position. The
more you pull (therefore the higher the tension) the faster the string will
move, therefore this higher the pitch.
QUESTION TO SELF: IS THERE A SIMPLE EQUATION FOR PRE-
DICTING THE FUNDAMENTAL RESONANT FREQUENCY OF THE
STRING?
the point where the string is terminated (or anchored) on one end, we can
see a different story. Take a look at Figure 3.40. The top plot shows the
way we’ve been thinking up to now. The perfect string is securely anchored
by a bridge or clamp, and inside the clamp it is free to bend as if it were
hinged by a single molecule. of course, this is not the case, particularly with
metal strings as we find on most instruments.
The bottom diagram tells a more realistic story. Notice that the string
continues on a straight line out of the clamp and then gradually bends into
the desired position – there is no fixed point where the bend occurs at an
extreme angle.
Figure 3.40: Theory vs. reality on a string termination. The top diagram shows the theoretical bend
of a string at the anchor point. The bottom diagram shows a more realistic behaviour, particularly
for stiff strings as in a piano.
Now think back to the slopes of the string at the anchor point for the
various modes of vibration. The higher the frequency of the mode, the
shorter the wavelength on the string, and the steeper the slope at the string
termination. This means that higher harmonics are trying to bend the string
more at the ends, yet there is a small conflict here... there is typically less
energy in the higher harmonics, so it’s more difficult for them to move the
string, let alone bending it more than the fundamental. As a result, we get
a strange behaviour as is shown in Figure 3.41
As you can see in this diagram, the lower harmonic is able to bend the
string more, closer to the termination than the higher harmonic. This means
that the effective length of the string is shorter for higher harmonics than
for lower ones. As a result, the higher the harmonic, the more incorrect it
is mathematically, and the sharper it is musically speaking. Essentially, the
higher the harmonic, the sharper it gets.
3. Acoustics 209
Figure 3.41: High and low harmonics on a stiff string. Note that the lower mode (represented as
a blue line) has a longer effective length than the higher mode (the red line).
This is why good piano tuners tune by ear rather than using a machine.
Let’s say you want a Middle C on a piano to sound in tune with the C one
octave above it. The fundamental of the higher C is theoretically the same
frequency as the second harmonic of the lower one. But, we now know that
the harmonic of the lower one is sharper than it should be, mathematically
speaking. Therefore, in order to sound in tune, the higher C has to be a
little higher than its theoretically correct tuning. If you tune the strings
using a tuner, then they will have the correct frequencies for the theoretical
world, but not for your piano. You will have to tune the various frequencies
a little to high for higher pitches in order to sound correct.
3.6.1 Waveguides
In theory, if there was no friction against the pipe walls, and there was no
absorption in the air, even if pipe were hundreds of kilometers long, the
sound of your handclap would be as loud at the other end as it was 1 m
down the pipe... (In fact, back in the early days of audio effects units, people
used long pieces of hose as delay lines.)
One other thing – as you get further away from the sound source in the
pipe, the wavefront becomes flatter and flatter. In fact, in not very much
distance at all, it becomes a planewave. This means that if you look at the
pressure wave across the pipe, you would see that it is perpendicular with
the walls of the pipe. When this happens, the pipe is guiding the wave down
its length, so we call the pipe a waveguide.
loudspeaker microphone
Pressure
location
in pipe
Figure 3.42: A pipe that is closed with a high-impedance cap at both ends. There is a loudspeaker
and a microphone inside the pipe at the shown locations. The black level of the diagram shows the
pressure inside the pipe – the darker the higher the pressure.
loudspeaker microphone
Pressure
location
in pipe
Figure 3.43: The same pipe a little later. Note that the high pressure wave is travelling in two
directions, and each of the two resulting bumps is smaller than the original. The wave on the left
has already reflected off the left end of the pipe.
loudspeaker microphone
Pressure
location
in pipe
Figure 3.44: The two waves after they have reflected off the high impedance terminations (the
pipe caps). When the two waves meet each other, they appear for an instant to be one wave.
3. Acoustics 212
loudspeaker microphone
Pressure
location
in pipe
Figure 3.45: After the two reflections have met each other, they continue on in opposite directions.
(The right wave has reflected off the right end of the pipe.) The process then repeats itself
indefinitely, assuming that there are no losses of energy in the system due to things like friction on
the pipe walls or losses in the caps.
response of the closed pipe shown like Figure 3.46. Note that this impulse
response corresponds directly to Figures 3.34 through 3.37. Also note that,
unlike the struck rope, all pressure wave is always positive if we create a
positive pressure to begin with. This is, in part, because we are now looking
at air pressure whereas, in the case of the string, we were monitoring vertical
displacement. In fact, if we were graphing the molecules displacements in
the pipe, we would see the values alternating between positive and negative
just as in the impulse response for the string.
0.5
Pressure
-0.5
-1
0 100 200 300 400 500 600 700 800 900 1000
Time
Figure 3.46: The impulse response of a pipe that has a cap with an infinite acoustic impedance at
both ends. The sound source that produced the impulse is assumed to be at one of the ends of
the pipe, so we only see a single reflection. Note that the impulse does not decay over time since
all of the acoutic power is trapped inside the pipe. The first three spikes in this impulse response
corresponds directly to Figures 3.35 through 3.37.
3. Acoustics 213
Just as with the example of the vibrating string in the previous Chap-
ter, we can convert the impulse response in Figure 3.46 into a plot of the
frequency content of the signal as is shown in Figure 3.47. Note that this
response is only true at the location of the microphone. If it or the loud-
speaker’s position were changed, so would the relative balances of the har-
monics. However, the frequencies of the harmonics would not change – these
are determined by the length of the pipe and the speed of sound as we’ll see
below.
10
-10
-20
-30
-40
Level
-50
-60
-70
-80
-90
-100
2 3 4
10 10 10
Frequency (Hz)
Figure 3.47: The frequency content of the pressure waves in the air inside the pipe at the location
of the microphone.
Figure 3.47 tells us that the signal inside the pipe can be decomposed
into a series of harmonics, just like we saw on the string in the previous
section. The only problem here is that they’re a little more difficult to
imagine because we’re talking about longitudinal waves instead of transverse
waves.
FINISH OFF THIS SECTION ON LONGITUDINAL STANDING WAVES
INCLUDE ANIMATION
It is important to note that, in almost every way, this system is identical
to a guitar string. The only big difference is that the wave on the guitar
sting is a transverse wave whereas the pipe has a longitudinal wave, but
their basic behaviours are the same.
Of course, it’s not very useful to have a musical instrument that is a
completely closed pipe – since none of the sound gets out, you probably
won’t get very large audiences. Then again, you might wind up with the
perfect instrument for playing John Cage’s 4’33”...
3. Acoustics 214
So, let’s let a little sound out of the pipe. We won’t change the caps
on the two ends, we’ll just cut a small rectangular hole in the side of the
pipe at one end. To get an idea of what this would look like, take a look at
Figure 3.48.
Cross Perspective
Section
Figure 3.48: A cross section and perspective drawing of a closed organ pipe.
Let’s think about what’s happening here in slow motion. Air comes into
the pipe from the bottom through the small section on the left. It’s at a
higher pressure than the air inside the pipe, so when it reaches the bottom
of the pipe, it sends a high pressure wavefront up to the other end of the
pipe. While that’s happening, there’s still air coming into the pipe. The
high pressure wavefront bounces back down the pipe and gets back to where
it started, meeting the new air that’s coming into the bottom. These two
collide and the result is that air gets pushed out the hole on the side of the
pipe. This, however, causes a negative pressure wavefront which travels up
the pipe, bounces back down and arrives where it started where it sucks air
into the hole in the side of the pipe. This causes a high pressure wavefront,
etc etc... This whole process repeats itself so many times a second, that we
can measure it as an oscillation using a microphone (if the pipe is the right
3. Acoustics 215
TO BE WRITTEN
Amplitude
Position in pipe
Figure 3.49: The relationship between the position of the probe in a closed pipe (the horizontal
axis) and the particle velocity maximum and minimum (in black) as well as the pressure maximum
and minimum (in red). This diagram shows the fundamental resonance of the pipe.
Amplitude
Position in pipe
Figure 3.50: The relationship between the position of the probe in a closed pipe (the horizontal
axis) and the particle velocity maximum and minimum (in black) as well as the pressure maximum
and minimum (in red). This diagram shows the first overtone in the pipe – half the wavelength
and therefore twice the frequency of the fundamental.
Amplitude
Position in pipe
Figure 3.51: The relationship between the position of the probe in a closed pipe (the horizontal axis)
and the particle velocity maximum and minimum (in black) as well as the pressure maximum and
minimum (in red). This diagram shows the second overtone in the pipe – one third the wavelength
and therefore three times the frequency of the fundamental.
3. Acoustics 216
2L
λn = (3.26)
n
where λn is the wavelength of the nth resonant frequency of the closed
pipe of length L.
Since the wavelength of the fundamental resonant frequency of a closed
pipe is twice the length of the pipe, you’ll often hear closed pipes called half-
wavelength resonators. This is a just geeky way of saying that it’s closed at
both ends.
If you’d like to calculate the actual frequency that the pipe resonates,
then you can just calculate it from the wavelength and the speed of sound
using Equation 3.8. What you would wind up with would look like Equation
3.27.
c
fn = n (3.27)
2L
Amplitude
Position in pipe
Figure 3.52: The relationship between the position of the probe in a open pipe (the horizontal axis)
and the particle velocity maximum and minimum (in black) as well as the pressure maximum and
minimum (in red). This diagram shows the fundamental resonance of the pipe.
Amplitude
Position in pipe
Figure 3.53: The relationship between the position of the probe in a open pipe (the horizontal axis)
and the particle velocity maximum and minimum (in black) as well as the pressure maximum and
minimum (in red). This diagram shows the first overtone in the pipe – one third the wavelength
and therefore three times the frequency of the fundamental.
Amplitude
Position in pipe
Figure 3.54: The relationship between the position of the probe in a closed pipe (the horizontal axis)
and the particle velocity maximum and minimum (in black) as well as the pressure maximum and
minimum (in red). This diagram shows the second overtone in the pipe – one fifth the wavelength
and therefore five times the frequency of the fundamental.
3. Acoustics 218
spring
spring
mass
mass air friction
Figure 3.55: A diagram showing two equivalent systems. On the left is a mass supported by a
spring, whose osciallation is damped by resistance to its movement caused by air friction. On the
right is a mass of air in a tube (the neck of the bottle) pushing and pulling against a spring (the of
air in the bottle) with the system oscillation damped by the air friction in the bottle neck.
L0 = L + 1.5a (3.29)
where L is the actual length of the pipe, and a is the inside radius of the
neck [Kinsler and Frey;, 1982].
If the pipe is flanged (meaning that it does flare out like a horn), then the
effective length is calculated using Equation 3.30[Kinsler and Frey;, 1982]
L0 = L + 1.7a (3.30)
Please note that these equations won’t give you exactly the right answer,
but they’ll put you in the ballpark. Things like the actual shape of the
bottle and the neck, how the flange is shaped, how the neck meets the
bottle... many things will have a contribution to the actual frequency of the
resonator.
Is this useful information? Well, consider that you now know that the
oscillation frequency is dependent on the mass of the air inside the neck
of the bottle. If you make the mass smaller, then the frequency of the
oscillation will go up. Therefore, if you stick your finger into the top of a
beer bottle and blow across it, you’ll get a higher pitch than if your finger
wasn’t there. The further you stick your finger in, the higher the pitch.
I once saw a woman in a bar in Montreal win money in a talent contest
by playing “Girl From Ipanema” on a single beer bottle in this manner.
Therefore, yes. It is useful information.
3. Acoustics 222
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
Figure 3.56: The first mode of vibration along the length of a rectangular plate. Note that there
is no vibration across its width.
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
Figure 3.57: The first mode of vibration across the width of a rectangular plate. Note that there
is no vibration along its length.
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
Figure 3.58: The first modes of vibration along the length and width of a rectangular plate.
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
Figure 3.59: The second mode of vibration along the length and the first mode of vibration across
the width of a rectangular plate.
3. Acoustics 225
is moving) is lower than the static friction coefficient (the amount of friction
there is when the object is stopped).
So, going back to the statement that rosin has a very high static friction
coefficient, but a very low dynamic friction coefficient, think about what
happens when you start bowing a string.
1. You put the bow on the string and nothing is moving yet.
2. You push the bow, but it has rosin on it which has a very high static
friction coefficient – it sticks to the string, so it pulls the string along
with it.
5. So, the string slides back to where it came from in the opposite di-
rection to that in which the bow is moving. It passes the equilibrium
position and moves back too far and therefore starts to slow down.
6. Once it gets back as far as it’s going to go, it turns around and heads
back towards the equilibrium position, in the same direction of travel
as the bow...
7. At some point a moment later, the string and the bow are moving
at the same speed in the same direction, therefore they’re stopped
relative to each other. Remember that the rosin has a high static
friction coefficient, so the string sticks to the bow and the whole process
repeats itself.
3.15.1 Introduction
Go back – way back to a time just before harmony. when we had only a
melody line, people would sing a tune made up of notes that were probably
taught to them by someone else singing. Then people thought that it would
be a really neat idea to add more notes at the same time – harmony! The
problem was deciding which notes sounded good together (consonant) and
which sounded bad (dissonant). Some people sat down and decided mathe-
matically that some sounds ought to be consonant or dissonant while other
people decided that they ought to use their ears.
For the most part, they both won. It turns out that, if you play two
freqencies at the same time, you’ll like the combinations that have mathe-
matically simple relationships – and there’s a reason which we talked about
in a previous chapter – beating.
If I play an “A” below Middle C on a piano, the sound includes the
fundamental 220 Hz, as well as all of the upper harmonics at various ampli-
tudes. So we’re hearing 220 Hz, 440 Hz, 660 Hz, 880 Hz, 1100 Hz, 1320 Hz,
and so on up to infinity. (Actually, the exact frequencies are a little different
as we saw in Section 3.5.5.)
If I play another note which has a frequency that is relatively close to
the fundamental of the 220 Hz I hear beating between the two fundamentals
at a rate of f1 − f2 . The same thing will happen if I play a note with a
fundamental which is close to one of the harmonics of the 220 Hz (hence
2f1 − f2 and 3f1 − f2 and so on...).
For example, if I play a 220 Hz tone and a 445 Hz tone, I’ll hear beating,
because the 445 Hz is only 5 Hz away from one of the harmonics of the
220 Hz tone. This will sound “dissonant” because, basically, we don’t like
intervals that “beat.”
If I play a 440 Hz tone with the 220 Hz tone, there is no beating, because
440 Hz happens to be one of the harmonics of the 220 Hz tone. If there’s
no beating then we think that it’s consonant.
Therefore, if I wanted to create a system of tuning the notes in a scale,
I would do it using formulae for calculating the frequencies of the various
notes that ensured that, at least for my most-used chords, the intervals in the
chords would not beat. That is to say that when I play notes simultaneously,
the various fundamentals have simple mathematical relationships.
3. Acoustics 234
(being, in chronological order, the Tonic, Fifth, Second, Sixth, and Third in
a “Major” scale. The Fourth of the scale is achieved simply by calculating
the 3:4 ratio with the Tonic.
It may be interesting to note that the first 5 notes we tune are the
“common” pentatonic scale.
We could use the same system to get all of the chromatic notes by merely
repeating the procedure for a total of 6 ascending fifths and 5 decending
fourths (or 11 ascending fifths). This will present us with a problem, how-
ever.
If we do complete the system, starting at the tonic and calculating the
other 11 notes in a chromatic scale, we’ll end up with what is known as one
wolf fifth.
What’s a wolf fifth? Well, if we kept going around through the system
until the 12th note, we ought to end up an octave from where we started –
the problem is that we wind up a little too sharp. So, we tune the octave
above the tonic to the tonic and put up with it being “out of tune” with the
note a fifth below.
In fact, it’s wiser to put this this wolf fifth somewhere else in the scale –
such as in an interval less likely to be played than the tonic and the fourth
of the scale.
Let’s do a little math for a moment. If we go up a fifth and down a
fourth and so on and so on through to the 12th note, we are actually doing
the following equation:
3 3 3 3 3 3 3 3 3 3 3 3
f∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ (3.31)
2 4 2 4 2 4 2 4 2 4 2 4
If we calculated this we’d see that we get the equation
531441
f∗ (3.32)
262144
2
instead of the f ∗ 1 that we’d expect for an octave.
Question:
how far away from a “real” octave is the note you wind up with?
Answer:
531441 1
well, if we transpose the f ∗ 262144 note down an octave, we get 2 ∗ f ∗ 531441
262144
531441
or f ∗ 524288 .
3. Acoustics 236
In other words, the ratio between the tonic and the note which is an
octave below the 12th note in the pythagorean tuning system is
531441 : 524288.
This amount of error is called the Pythagorean Comma.
Just for your information, according to the Grove dictionary, “Medieval
theorists who discussed intervallic ratios nearly always did so in terms of
Pythagorean intonation.” (look up “Pythagorean intonation”)
For an investigation of the relative sizes of intervals within the scale,
look at Table 4.1 on page 86 of “The Science of Musical Sound” By Johann
Sundberg.
three notes in each of the chords have the simple relationships 4:5:6. (i.e.
100 Hz, 125 Hz and 150 Hz)
The system isn’t perfect, however, because intervals that should be the
same actually have different sizes. For example,
• The seconds between the tonic and the 2nd, the 4th and 5th, and the
6th and 7th have a ratio of 9:8
• The second between the 2nd and 3rd, and the 5th and 6th have a ratio
of 10:9
The implications of this are intonation problems when you stray too
far from your tonic key. Thus a keyboard instrument tuned using Just
Intonation must be tuned for a specific key and you’re not permitted to
transpose or modulate without losing your audience or your sanity or both.
Of course, the problem only occurs in instruments with fixed tunings for
notes (such as keyboard and fretted string instruments). Everyone else can
compensate on the fly.
The major third and the tonic in Just Intonation have a frequency ratio
of 5:4. The major third and the tonic in Pythagorean Intonation have a
ratio of 64:81. The difference between these two intervals is
64:81 – 5:4 = 80:81
This is the amount of error in the major third in Pythagorean Tuning
and is called the syntonic comma.
3. Acoustics 238
or
√
22
12
f∗ (3.36)
So, in order to go up any number of semitones, we simply do the following
equation :
√
2x
12
f∗ (3.37)
where x is the number of semitones
The advantage of this is that you can play in any key on one instrument.
The disadvantage is that every key is “out of tune.” But, they’re all equally
out of tune, so we have gotten used to it. Sort of like we have gotten used to
eating fast food, despite the fact that it tastes pretty wretched, sometimes
even bordering on rancid...
To get an intuitive idea of the fact that equal temperament intervals are
out of tune, even if they don’t sound like it most of the time, take a look at
Figures 3.60 and 3.61
1
0.5
-0.5
-1
0 500 1000 1500 2000 2500 3000
0.5
-0.5
-1
0 500 1000 1500 2000 2500 3000
Figure 3.60: The time response of a perfect fifth played with sine waves. In both cases, the root is
1 kHz. The X axis shows time in samples at a 44.1 kHz sampling rate. The top plot shows a just
temperament perfect fifth, with the frequencies 1 kHz and 1.5 kHz. The bottom plot shows an
equal temperament fifth, with the frequencies 1 kHz and 1.49830707687668 kHz. Notice that the
bottom plot modulates in time. If I had plotted more time, it would be evident that the modulation
is periodic.
3.15.6 Cents
There are some people who want a better way of dividing up the octave.
Basically, some people just aren’t satisfied with 12 equal divisions, so they
3. Acoustics 240
0.5
-0.5
-1
0 500 1000 1500 2000 2500 3000
0.5
-0.5
-1
0 500 1000 1500 2000 2500 3000
Figure 3.61: The time response of a major third played with sine waves. In both cases, the root is
1 kHz. The X axis shows time in samples at a 44.1 kHz sampling rate. The top plot shows a just
temperament major third, with the frequencies 1 kHz and 1.25 kHz. The bottom plot shows an
equal temperament third, with the frequencies 1 kHz and 1.25992104989487 kHz. Notice that the
bottom plot modulates in time. If I had plotted more time, it would be evident that the modulation
is periodic.
divided up the semitone into 100 equal parts and called them cents . Since a
cent is 1/100 of a semitone, it’s an interval which, when multiplied by itself
1200 times (12 semitones 100 cents) makes an octave, therefore the interval
is the 1200th root of 2 (or 1:1.00058).
Therefore, 1 cent above 440 Hz is
1200
√
440 ∗ 2 = 440.254Hz. (3.38)
We can use cents to compare tuning systems. Remember that 100 cents
is 1 semitone, 200 cents is 2 semitones, and so on.
There is a good comparasion in cents of various tuning systems in “Mu-
sical Acoustics” by Donald Hall. (p. 453)
Figure 3.62: A sound source (black dot) and a listener (white dot) in a room (Black rectangle)
level because the sound had to travel some distance. This impulse response
is shown in Figure 3.64.
Figure 3.63: The direct sound (red line) travelling from the source to the receiver.
Level (dB)
Time
Of course, the sound is really travelling out in all directions, which means
that a lof of it is heading towards the walls of the room instead of heading
towards you. As a result, there is a ray of sound that travels from the sound
source, bounces off of one wall (remember Snell’s Law) and comes straight
to you. Of course, this will happen with all four walls – a single reflection
from each wall reaches you a little while after the direct sound and probably
at different times according to the distance travelled. These are called first-
order reflections because they contain only a single bounce off a surface.
They’re shown as the blue lines in Figure 3.65 with the impulse response
shown in Figure 3.66.
We also get situations where the sound wave bounces off two different
walls before the sound reaches you, resulting in second-order reflections. In
our perfectly rectangular room, there will be two types of these second-
order reflections. In the first, the two walls that are reflecting are parallel
3. Acoustics 243
Figure 3.65: The first-order reflections (blue lines) travelling from the source to the receiver.
Level (dB)
Time
Figure 3.66: The impulse response of the first-order reflections (blue lines).
3. Acoustics 244
and opposite to each other. In the second, the two walls are adjacent and
perpendicular. These are shown as the green lines in Figure 3.67 and the
impulse response in Figure 3.68. Note in the impulse response that it’s
possible for a second-order reflection to arrive earlier than a first-order re-
flection, particularly if you are in a long rectangular room. For example, if
you’re sitting in the front row of a big concert hall, it’s possible that you
get a second-order reflection off the stage and side walls before you get a
first-order reflection off the wall in the back behind the audience. The moral
of the story here is that the order of reflection is only a general indicator of
its order of arrival.
Figure 3.67: The second-order reflections (green lines) travelling from the source to the receiver.
Level (dB)
Time
Figure 3.68: The impulse response of the second-order reflections (green lines).
3.16.2 Reverberation
If the walls were perfect reflectors and there was no such thing as sound
absorption in air, this series of more and more reflections would continue
forever. However, there is a little energy in the sound wave lost in the air,
3. Acoustics 245
and in the wall, so eventually, the reflections get quieter and quieter as they
reach a higher and higher order until eventually, there is nothing.
Let’s say that your sound source is a person clapping their hands once – a
sound with a very fast attack and decay. The first thing you hear is the direct
sound, then the early reflections. These are probably separated enough in
time and space that your brain can interpret them as separate events. Be
careful about what I mean by this previous sentence. I do not necessarily
mean that you will hear the direct and earlier reflections as separate hand
claps (although if the room is big enough you might...) Instead, I mean
that your brain uses these discrete components in the sound that arrives at
the listening position to determine a bunch of information about the sound
source and the room. We’ll talk about that more later.
If we consider higher and higher orders of reflections, then we get more
and more reflections per second as time goes by. For example, in our rect-
angular, two-dimensional room, there are 4 first-order reflections, 8 second-
order reflections, 12 third-order reflections and so on and so on. These will
pile up on each other very quickly and just become a complete mess of sound
that apparently comes from almost everywhere all at the same time (actu-
ally, you will start to approach a diffuse field situation). When the reflections
get very dense, we typically call the collection of all of them reverberation or
reverb. Essentially, reveberation is what you have when there are too many
reflections to think about. So, instead of trying to calculate every single
reflection coming in from every direction at every time, we just give up and
start talking about the statistical properties of the room’s acoustics. So, you
won’t hear about a 57th order reflection coming in a a predicted time. In-
stead, you’ll hear about the probability of a reflection coming from a certain
direction at a given time. (This is sort of the same as trying to predict the
weather. Nobody will tell you that it will definitely rain tomorrow starting
at 2:34 in the afternoon. Instead, they’ll say that there is a 70% chance of
rain. Hiding behind statistics helps you to avoid being incorrect...)
One immediately obvious thing about reverberation in a real room is
that it takes a little while for it to die away or decay. So then the question
is, how do we measure the reveberation time? Well, typically we have to
oversimplify everything we do in audio, so one way to oversimplify this
measurement is to just worry about one frequency. What we’ll do is to get
a loudspeaker that’s emitting a sine tone with a constant level – therefore,
just a single frequency. Also, we’ll put a microphone somewhere in the room
and look at its output level on a decibel scale. If we leave the speaker on for
a while, the sound pressure level at the microphone will stabilize and stay
constant. Then we turn off the sine wave, and the revebreration will decay
3. Acoustics 246
sine wave
turned off at
this time
Level (dB)
Time
Figure 3.69: Sound pressure level vs. time for a single sine wave in a room. The sine wave was
turned on once upon a time – long ago enough that the SPL in the room has stabilized at the
microphone’s position. Note that, when the tone is turned off, the decay in the room is linear (a
straight line) on a decibel scale.
Sabine Equation
Once upon a time (acutually, around the year 1900), a guy named Wallace
Clement Sabine did some experiments and some math and figured out that
we can arrive at an equation to predict the reveberation time of a room if
we know a couple of things about it.
Let’s consider that the more absorptive the surfaces in the room, the
more energy we lose in the walls, so the faster the reverberation will decay.
Also, the more surface area there is (i.e. the bigger the walls, floor and
ceiling) the more area there is to absorb sound, therefore the reverb will
decay faster. So, the average absorption coefficient (see Section ??) and the
surface area will be inversely proportional to the reverb time.
Also consider, however, that the bigger the room, the longer the sound
will travel before it hits anything to reflect (or absorb) it. Therefore the
bigger the room volume, the longer the reverb time.
Thinking about these three issues, and after doing some experiments
with a stopwatch, Sabine settled on Equation 3.39:
3. Acoustics 247
55.26V
RT60 = (3.39)
Ac
Where c is the speed of sound in the room and A is the total sound
absorption by the room which can be calculated using Equation 3.40.
A = S ᾱ (3.40)
Where S is the total surface area of the room and ᾱ is the “average value
of the statistical absorption coefficient.” [Morfey, 2001]
Eyring Equation
Coupling
sound source couples to the mode
FINISH THIS OFF
We also have to consider how well the mode couples to the receiver.
FINISH THIS OFF
4 S
fmin ≈ c √ − (3.46)
πA 16V
where fmin is the Schroeder frequency, A is the room absorption calcu-
lated using Equation 3.40, S is the surface area of the boundaries amd V is
the room’s volume.
High-frequency Transmission
NOT WRITTEN YET
Low-frequency Transmission
NOT WRITTEN YET
Psychoacoustics and
perception
255
4. Psychoacoustics and perception 256
inside your inner ear are moving back and forth with a total peak-to-peak
displacement that is less than the diameter of a hydrogen atom [].
Note that the reference for calculating sound pressure level in dBspl is
20 µPa, therefore, a 1 kHz sine tone at the threshold of hearing has a level
of 0 dBspl.
One important thing to remember is that the threshold of hearing is not
the same sound pressure level at all frequencies, but we’ll talk about this
later.
The threshold of pain is a sound pressure level that is so loud that it
causes you to be in pain. This level is somewhere around 200 Pa, depending
on which book you read and how masochistic you are. This means that the
threshold of pain is around 140 dBspl. This is very loud.
So, based on these two numbers, we can calculate that the human hearing
system has a total dynamic range of about 140 dB.
4. Psychoacoustics and perception 259
4.4 Loudness
4.4.1 Equal loudness contours
Back in 1933, a couple of researchers by the name of Fletcher and Munson
decided to gather some information about how we perceive different fre-
quencies at different amplitudes. What they came up with was a bunch of
lines we now call “Equal Loudness Contours” or the “Fletcher and Munson
Curves”.
These curves indicate two important pieces of information.
than the higher pitch. The same effect would happen if we tried it with a
1 kHz tone and a 10 kHz tone, except that the 10 kHz would now be the
louder of the two (even though they sound the same to you). Again, there
are two interesting things to note about this effect.
The curve of equal loudness has virtually the same shape as the absolute
threshold curve, even at other amplitudes
The curve tends to flatten out when the volume goes up. What does
this mean? Firstly, when you turn down the stereo, you are less sensitive
to low and high frequencies (compared to the mid-range frequencies) than
when the stereo was turned up. Therefore the balance changes (particularly
in the low end). If the level is low, then you’ll think that you hear less bass.
This is why there’s a LOUDNESS switch on your stereo. It boosts the bass
to compensate for your low-level Fletcher and Munson curves... Secondly,
things sound better when they’re louder. This is because there’s a “better
balance” in your hearing perception than when they’re at a lower level. This
is why the salesperson at the stereo store will crank up the volume when
you’re buying speakers... they sound good that way... everything does.
3. Plot the intersection of the two values on the chart of the Fletcher and
Munson curves
4. Find the nearest curve contour and check what the value of that curve
is at 1 kHz
The idea is that all sounds along a single Fletcher and Munson contour
have the same apparent loudness level, and therefore are given the same
value in phons.
Note: I got an email from Bert Noeth, a professor teaching sound and
acoustics in Belgium who tells me that, in Europe, the lines of equal intensity
are called “isophons.”
4.4.3 Sones
There is another frequency-dependent amplitude measurement called the
Sone – but you’ll never see it except in definitions and psychoacoustics tests,
so I’ll jut say “they exist” and if you want more information, check out the
textbook. We won’t speak of them again...
4. Psychoacoustics and perception 262
4.6 Masking
Let’s go back to the bottom curve on the Fletcher and Munson graphs. This
contour tells us the absolute minimum of our abilities to perceive sound. If
we measure a sound to have a pressure that, when plotted on the same
graph is below the line, then we can’t hear it. Sometime after Fletcher and
Munson, however, there was a discovery that this curve was not an absolute.
In fact, it moves around quite a bit depending on what you’re hearing at
the time. It tends to move up and surround sounds that you are hearing.
I’ll explain.
If you play a relatively loud 1 kHz sine wave, the spectrum of the tone
will be plotted on a Frequency vs. Amplitude graph as a vertical line.
If you then play another sine tone (in addition to the first) with a close
frequency to 1 kHz but a comparatively low amplitude, you won’t hear it.
If you then raise the amplitude of the second tone until you can hear
is, plot that point on a graph, and repeat the process with other nearby
frequencies, you’ll wind up with a graph that looks like this:
This is telling us that, if I play a sine tone at 1 kHz with an amplitude
of about 60 dBspl, I will not be able to hear any simultaneously sounding
tone which, when plotted on the same graph, is below the dotted line.
This is the simultaneous masking curve. Others can be plotted for for-
wards and backward masking which occur when the two tones happen at
different times.
4. Psychoacoustics and perception 265
4.7 Localization
Lord Rayleigh to Wightman and Kistler
How do you localize sound? For example, if you close your eyes and you
hear a sound and you point and say “the sound came from over there,” how
did you know? And, how good are you at it?
Well, you have two general things to sort out
The first thing you rely on is the interaural (a fancy word meaning
“between the two ears” give or take) time of arrival of the sound. If the
right ear hears the sound first, then the sound is on your right, if the left ear
hears the sound first, then the sound is on your left. If the two ears get the
sound simultaneously, then the sound is directly ahead, or above, or below
or behind you.
The next thing you rely on is the interaural amplitude difference. If the
sound is louder in the right ear, then the sound is probably on your right.
Interestingly, if the sound is louder in your right ear, but arrives in your left
ear first, then your brain decides that the interaural time of arrival is the
more important cue and basically ignores the amplitude information.
You also have these things sticking out of your head which most people
call their ears but are really called your pinnae (1 pinna, 2 pinnae). These
things bounce sound around inside them differently depending on which di-
rection the sound is coming from. They tend to block really high frequencies
coming from the rear (because they stick out a bit...) so rear sources sound
“darker” than front sources. These are of a little more help when you turn
your head back and forth a bit (which you do involuntarily anyway...)
4. Psychoacoustics and perception 266
Once upon a time, a guy named Lord Rayleigh wrote a book called “The
Theory of Sound.” He said that the brain uses the phase difference of low
frequencies to sort out where things are, whereas for high frequencies, the
amplitude differences between the two ears are used. This is a pretty good
estimation, although there’s a couple of people by the names of Wightman
and Kistler in the States working for NASA doing a lot of research in the
matter.
Research in this field is a big thing these days because of all the compa-
nies trying to make virtual reality machines.
• reflection patterns
• direct-to-reverberant ratio
• high-frequency content
reverb relative to the dry sound. This is essentially creating the same effect
we have in real life. We’ve seen in Section ?? that, as you get farther and
farther away from a sound source in a room, the direct sound gets quieter
and quieter, but the energy from the room’s reflections – the reverberation
– stays the same. Therefore the direct-to-reverberant level ratio gets smaller
and smaller.
Level ?
Noise
SNR of recording system (typically hum caused by grounding issues)
Narrow High vs. Low Pitch SNR of playback system (typically hum caused by grounding issues)
Bandwidth
SNR of recording system
Wide SNR of playback system
Electrical
Absolute SNR
Level
Relative to signal SNR
Program
Independent
(Noise) Narrow High vs. Low Pitch Room noise
Bandwidth
Figure 4.1: My own personal, unproven and un-researched list of descriptions of sound qualities,
possibly even bordering on perceptual attributes and how I think they correlate with physical mea-
surements. Please be wary of quoting this list – it’s just a map of what is in my head.
Chapter 5
Electroacoustics
5.1.1 Introduction
Once upon a time, in the days before audio was digital, when you made
a long-distance phone call, there was an actual physical connection made
between the wire running out of your phone and the phone at the other end.
This caused a big problem in signal quality because a lot of high-frequency
components of the signal would get attenuated along the way. Consequently,
booster circuits were made to help make the relative levels of the various
frequencies equal. As a result, these circuits became known as equalizers.
Nowadays, of course, we don’t need to use equalizers to fix the quality of
long-distance phone calls, but we do use them to customize the relative
balance of various frequencies in an audio signal.
In order to look at equalizers and their smaller cousins, filters, we’re
going to have to look at their frequency response curves. This is a description
of how the output level of the circuit compares to the input for various
frequencies. We assume that the input level is our reference, sitting at 0 dB
and the output is compared to this, so if the signal is louder at the output,
we get values greater than 0 dB. If it’s quieter at the output, then we get
negative values at the output.
273
5. Electroacoustics 274
5.1.2 Filters
Before diving straight in and talking about how equalizers behave, we’ll start
with the basics and look at four different types of filters. Just like a coffee
filter keeps coffee grinds trapped while allowing coffee to flow through, an
audio filter lets some frequencies pass through unaffected while reducing the
level of others.
Low-pass Filter
-5
Gain (dB)
-10
-15
-20
2 3 4
10 10 10
Frequency (Hz)
Figure 5.1: The frequency response of a first-order low pass filter with a cutoff frequency of 1 kHz.
Note that the cutoff frequency is where the response has dropped in level by 3 dB. The slope can
be calculated by dividing the drop in level by the change in frequency that corresponds to that
particular drop.
slope of the frequency response is really -6.02 dB/oct for frequencies more
than one decade above the cutoff frequency.
If we have a higher-order filter, the cutoff frequency is still the one where
the output drops by 3 dB, however the slope changes to a value of −6.02n
dB/oct, where n is the order of the filter. For example, if you have a 3rd-
order filter, then the slope is
High-pass Filter
A high-pass filter is essentially exactly the same as a low-pass filter, however,
it permits high frequencies to pass through while attenuating low frequencies
as can be seen in Figure 5.2. Just like in the previous section, the cutoff
frequency is where the output has a level of -3.01 dB but now the slope
below the cutoff frequency is positive because we get louder as we increase
in frequency. Just like the low-pass filter, the slope of the high-pass filter is
dependent on the order of the filter and can be calculated using the equation
6.02n dB/oct, where n is the order of the filter.
5. Electroacoustics 276
-5
Gain (dB)
-10
-15
-20
2 3 4
10 10 10
Frequency (Hz)
Figure 5.2: The frequency response of a first-order high pass filter with a cutoff frequency of 1 kHz.
Remember as well that the slope only applies to frequencies that are at
least one decade away from the cutoff frequency.
Band-pass Filter
Let’s take a signal and send it through a high-pass filter and a low-pass filter
in series, so the output of one feeds into the input of the other. Let’s also
assume for a moment that the two cutoff frequencies are more than a decade
apart.
The result of this probably won’t hold any surprises. The high-pass filter
will attenuate the low frequencies, allowing the higher frequencies to pass
through. The low-pass filter will attenuate the high frequencies, allowing
the lower frequencies to pass through. The result is that the high and low
frequencies are attenuated, with a middle band (called the passband ) that’s
allowed to pass relatively unaffected.
Bandwidth
This resulting system is called a bandpass filter and it has a couple of specifi-
cations that we should have a look at. The first is the width of the passband.
This bandwidth is calculated using the difference two cutoff frequencies which
we’ll label fc1 for the lower one and fc2 for the higher one. Consequently,
the bandwidth is calculated using the equation:
5. Electroacoustics 277
So, using the example of the filter frequency response shown in Figure
4, the bandwidth is 10,000 Hz – 20 Hz = 9980 Hz.
Centre Frequency
We can also calculate the middle of the passband using these two frequencies.
It’s not quite so simple as we’d like, however. Unfortunately, it’s not just
the frequency that’s half-way between the low and high frequency cutoff’s.
This is because frequency specifications don’t really correspond to the way
we hear things. Humans don’t usually talk about frequency – they talk
about pitches and notes. They say things like “Middle C” instead of “262
Hz.” They also say things like “one octave” or “one semitone” instead of
things like “a bandwidth of 262 Hz.”
Consider that, if we play the A below Middle C on a well-tuned piano,
we’ll hear a note with a fundamental of 220 Hz. The octave above that is
440 Hz and the octave above that is 880 Hz. This means that the bandwidth
of the first of these two octaves is 220 Hz (it’s 440 Hz – 220 Hz), but the
bandwidth of the second octave is 440 Hz (880 Hz – 440 Hz). Despite the
fact that they have different bandwidths, we hear them each as one octave,
and we hear the 440 Hz note as being half-way between the other two notes.
So, how do we calculate this? We have to find what’s known as the geometric
mean of the two frequencies. This can be found using the equation
p
fcentre = fc1 fc2 (5.5)
Q
Let’s say that you want to build a bandpass filter with a bandwidth of one
octave. This isn’t difficult if you know the centre frequency and if it’s never
going to change. For example, if the centre frequency was 440 Hz, and
the bandwidth was one octave wide, then the cutoff frequencies would be
311 Hz and 622 Hz (we won’t worry too much about how I arrived at these
numbers). What happens if we leave the bandwidth the same at 311 Hz, but
change the centre frequency to 880 Hz? The result is that the bandwidth is
now no longer an octave wide – it’s one half of an octave. So, we have to
link the bandwidth with the centre frequency so that we can describe it in
5. Electroacoustics 278
terms of a fixed musical interval. This is done using what is known as the
quality or Q of the filter, calculated using the equation:
fcentre
Q= (5.6)
BW
Now, instead of talking about the bandwidth of the filter, we can use
the Q which gives us an idea of the width of the filter in musical terms.
This is because, as we increase the centre frequency, we have to increase the
bandwidth proportionately to maintain the same Q. Notice however, that if
we maintain a centre frequency, the smaller the bandwidth gets, the bigger
the Q becomes, so if you’re used to talking in terms of musical intervals, you
have to think backwards. A big Q is a smaller interval as can be seen in the
plot of a number of different Q’s in Figure 5.8.
15
10
5
Gain (dB)
-5
-10
-15
2 3 4
10 10 10
Frequency (Hz)
Figure 5.3: The frequency responses of various bandpass filters with different Q’s and a matched
centre frequency of 1 kHz.
Notice in Figure 5.8 that you can have a very high Q, and therefore a
very narrow bandwidth for a bandpass filter. All of the definitions still hold,
however. The cutoff frequencies are still the points where we’re 3 dB lower
than the maximum value and the bandwidth is still the distance in Hertz
between these two points and so on...
Band-reject Filter
Although bandpass filters are very useful at accentuating a small band of
frequencies while attenuating others, sometimes we want to do the opposite.
5. Electroacoustics 279
We want to attenuate a small band of frequencies while leaving the rest alone.
This can be accomplished using a band-reject filter (also known as a bandstop
filter ) which, as its name implies, rejects (or usually just attenuates) a band
of frequencies without affecting the surrounding material. As can be seen
in Figure 5.4, this winds up looking very similar to a bandpass filter drawn
upside down.
-2
Gain (dB)
-4
-6
-8
2 3 4
10 10 10
Frequency (Hz)
Figure 5.4: The frequency response of a band-reject filter with a centre frequency of 1 kHz.
Notch Filter
-5
Gain (dB)
-10
-15
-20
2 3 4
10 10 10
Frequency (Hz)
Figure 5.5: The frequency response of a notch filter with a centre frequency of 1 kHz.
5.1.3 Equalizers
Unlike its counterpart from the days of long-distance phone calls, a modern
equalizer is a device that is capable of attenuating and boosting frequencies
according to the desire and expertise of the user. There are four basic types
of equalizers, but we’ll have to talk about a couple of issues before getting
into the nitty-gritty.
An equalizer typically consists of a collection of filters, each of which
permits you to control one or more of three things: the gain, centre frequency
and Q of the filter. There are some minor differences in these filters from the
ones we discussed above, but we’ll sort that out before moving on. Also, the
filters in the equalizer may be connected in parallel or in series, depending
on the type of equalizer and the manufacturer.
To begin with, as we’ll see, a filter in an equalizer comes in three basic
models, the bandpass, and the band reject, which are typically chosen by
the user by manipulating the gain of the filter. On a decibel scale, positive
gain results in a bandpass, whereas negative gain produces a band reject.
In addition, there is the shelving filter which is a variation on the highpass
and low pass filters.
The principal difference between filters in an equalizer and the filters
defined in Section 1 is that, in a typical equalizer, instead of attenuating
all frequencies outside the passband, the filter typically leaves them at a
gain of 0 dB. An example of this can be seen in the plot of an equalizer’s
bandpass filter in Figure 5.6. Notice now that, rather than attenuating all
5. Electroacoustics 281
10
5
Gain (dB)
-5
-10
-15
2 3 4
10 10 10
Frequency (Hz)
Figure 5.6: The frequency response of a bandpass filter with a centre frequency of 1 kHz, a Q of
4, and a gain of 12 dB in a typical equalizer.
Filter symmetry
Constant Q Filter
15
10
Gain (dB) 0
-5
-10
-15
2 3 4
10 10 10
Frequency (Hz)
Figure 5.7: The frequency responses of bandpass filters with a various centre frequencies, a Q of
4, and a gain of 12 dB in a typical equalizer. Blue fc = 250 Hz. Red fc = 500 Hz. Green fc =
1000 Hz. Black fc = 2000 Hz.
15
10
5
Gain (dB)
-5
-10
-15
2 3 4
10 10 10
Frequency (Hz)
Figure 5.8: The frequency responses of bandpass filters with a centre frequency of 1 kHz, various
Q’s, and a gain of 12 dB in a typical equalizer. Black Q = 1. Green Q = 2. Blue Q = 4. Red Q
= 8.
5. Electroacoustics 283
15
10
5
Gain (dB)
-5
-10
-15
2 3 4
10 10 10
Frequency (Hz)
Figure 5.9: The frequency responses of bandpass filters with a centre frequency of 1 kHz, a Q of
4, and various gains from 0 dB to 12 dB in a typical equalizer. Yellow gain = 0 dB. Red gain = 3
dB. Green gain = 6 dB. Blue gain = 9 dB. Black gain = 12 dB.
15
10
5
Gain (dB)
-5
-10
-15
2 3 4
10 10 10
Frequency (Hz)
Figure 5.10: The frequency responses of bandpass filters with a centre frequency of 1 kHz, a Q of
4, and various gains from -12 dB to 0 dB in a typical equalizer. Yellow gain = 0 dB. Red gain =
-3 dB. Green gain = -6 dB. Blue gain = -9 dB. Black gain = -12 dB.
5. Electroacoustics 284
15
10
Gain (dB)
0
-5
-10
-15
2 3 4
10 10 10
Frequency (Hz)
Figure 5.11: The frequency responses of two filters, each with a centre frequency of 1 kHz, and a
Q of 4. The Blue curve shows a gain of 12 dB, the black curve, a gain of -12 dB.
There are advantages and disadvantages to this type of filter. The pri-
mary advantage is that you can have a very selective cut if you’re trying
to eliminate a single frequency, simply by increasing the Q. The primary
disadvantage is that you cannot undo what you have done. This statement
is explained in the following section.
Instead of building a filter where the cut and boost always maintain a con-
stant Q, let’s set about to build a filter that is symmetrical – that is to say
that a matching boost and cut at the same centre frequency would result
in the same shape. The nice thing about this design is that, if you take
two such filters and connect them in series and set their parameters to be
the same but opposite gains (for example, both with a centre frequency of
1 kHz and a Q of 2, but one has a boost of 6 dB and the other has a cut
of 6 dB) then they’ll cancel each other out and your output will be iden-
tical to your input. This also applies if you’ve equalized something while
recording – assuming that you live in a perfect world, if you remember your
original settings on the recorded EQ curve, you can undo what you’ve done
by duplicating the settings and inverting the gain.
5. Electroacoustics 285
15
10
5
Gain (dB)
-5
-10
-15
2 3 4
10 10 10
Frequency (Hz)
Figure 5.12: The frequency responses of various constant Q filters, all with a centre frequency of
1 kHz, gains of either 12 dB or -12 dB (depending on whether it’s a boost or a cut) and various
Q’s. Black Q = 1. Green Q = 2. Blue Q = 4. Red Q = 8.
15
10
5
Gain (dB)
-5
-10
-15
2 3 4
10 10 10
Frequency (Hz)
Figure 5.13: The frequency responses of various reciprocal peak/dip filters, all with a centre fre-
quency of 1 kHz, gains of either 12 dB or -12 dB (depending on whether it’s a boost or a cut) and
various boost Q’s. Black Q = 1. Green Q = 2. Blue Q = 4. Red Q = 8.
5. Electroacoustics 286
Let’s take two reciprocal peak/dip filters, each set with a Q of 2 and a gain
of 6 dB. The only difference between them is that one has a centre frequency
of 1 kHz and the other has a centre frequency of 1.2 kHz. If we use both
of these filters on the same signal simultaneously, we can achieve two very
different resulting frequency responses, depending on how they’re connected.
If the two filters are connected in series (it doesn’t matter what order
we connect them in), then the frequency band that overlaps in the boosted
portion of the two filters’ responses will be boosted twice. In other words,
the signal goes through the first filter and is amplified, after which it goes
through the second filter and the amplified signal is boosted further. This
arrangement is also known as a circuit made of combining filters.
Shelving Filter
The nice thing about high pass and low pass filters is that you can reduce
(or eliminate) things you don’t want like low-frequency noise from air condi-
tioners, for example. But, what if you want to boost all your low frequencies
instead of cutting all your high’s? This is when a shelving filter comes in
handy. The response curve of shelving filters most closely resemble their
high- and low-pass filter counterparts with a minor difference. As their
name suggests, the curve of these filters level out at a specified frequency
called the stop frequency. In addition, there is a second defining frequency
called the turnover frequency which is the frequency at which the response
is 3 dB above or below 0 dB. This is illustrated in Figure 20.
The transition ratio is sort of analogous to the order of the filter and is
calculated using the turnover and stop frequencies as shown below.
fstop
RT = (5.7)
fturnover
where RT is the transition ratio.
The closer the transition ratio is to 1, the greater the slope of the tran-
sition in gain from the unaffected to the affected frequency ranges.
These filters are available as high- and low-frequency shelving units,
boosting high and low frequencies respectively. In addition, they typically
have a symmetrical response. If the transition ratio is less than 1, then the
filter is a low shelving filter. If the transition ratio is greater than 1, then
the filter is a high shelving filter.
The disadvantage of these components lies in their potential to boost
frequencies above and below the audible audio range causing at the least
wasted amplifier power and interference from local AM radio signals, and at
the worst, loudspeaker damage. For example, if you use a high shelf filter
with a stop frequency of 10 kHz to increase the level of the high end by
12 dB to brighten things up a bit, you will probably also wind up boosting
signals above your hearing range. In a typical case, this may cause some
unpredictable signals from your tweeter due to increased intermodulation
distortion of signals you can’t even hear. To reduce these unwanted effects,
super sonic and subsonic signals can be attenuated using a low pass or high
pass filter respectively outside the audio band. Using a peaking filter at the
appropriate frequency instead of a filter with a shelving response can avoids
the problem altogether.
The most common application of this equalizer is the tone controls on
home sound systems. These bass and treble controls generally have a maxi-
5. Electroacoustics 288
mum slope of 6 dB per octave and reciprocal characteristics. They are also
frequently seen on equalizer modules on small mixing consoles.
Graphic Equalizer
Graphic equalizers are seen just about everywhere these days, primarily be-
cause they’re intuitive to use. In fact, they are probably the most-used piece
of signal processing equipment in recording. The name “graphic equalizer”
comes from the fact that the device is made up of a number of filters with
centre frequencies that are regularly spaced, each with a slider used for gain
control. The result is that the arrangement of the sliders gives a graphic
representation of the frequency response of the equalizer. The most com-
mon frequency resolutions available are one-octave, two-third-octave and
one-third-octave, although resolutions as fine as one-twelveth-octave exist.
The sliders on most graphic equalizers use ISO standardized band center fre-
quencies. They virtually always employ reciprocal peak/dip filters wired in
parallel. As a result, when two adjacent bands are boosted, there remains
a comparatively large dip between the two peaks. This proves to be a great
disadvantage when attempting to boost a frequency between two center fre-
quencies. Drastically excessive amounts of boost may be required at the
band centers in order to properly adjust the desired frequency. This prob-
lem is eliminated in graphic EQ’s using the much-less-common combining
filters. In this system, the filter banks are wired in series, thus adjacent
bands have a cumulative effect. Consequently, in order to boost a frequency
between two center frequencies, the given filters need only be boosted a
minimal amount to result in a higher-boosted mid-frequency.
Virtually all graphic equalizers have fixed frequencies and a fixed Q. This
makes them simple to use and quick to adjust, however they are generally
a compromise. Although quite suitable for general purposes, in situations
where a specific frequency or bandwidth adjustment is required, they will
prove to be inaccurate.
Paragraphic Equalizer
One attempt to overcome the limitations of the graphic equalizer is the para-
graphic equalizer . This is a graphic equalizer with fine frequency adjustment
on each slider. This gives the user the ability to sweep the center frequency
of each filter somewhat, thus giving greater control over the frequency re-
sponse of the system.
5. Electroacoustics 289
Sweep Filters
These equalizers are most commonly found on the input stages of mixing
consoles. They are generally used where more control is required over the
signal than is available with graphic equalizers, yet space limitations restrict
the sheer number of potentiometers available. Typically, the equalizer sec-
tion on a console input strip will have one or two sweep filters in addition
to low and a high shelf filters with fixed turnover frequencies. The frequen-
cies of the mid-range filters are usually reciprocal peak/dip filters with an
adjustable (or sweepable) center frequencies and fixed Q’s.
The advantage of this configuration is a relatively versatile equalizer
with a minimum of knobs, precisely what is needed on an overcrowded mixer
panel. The obvious disadvantage is its lack of adjustment on the bandwidth,
a problem that is solved with a parametric equalizer.
Parametric Equalizer
A parametric equalizer is one that allow the user to control the gain, centre
frequency and Q of each filter. In addition, these three parameters are
independent – that is to say that adjusting one of the parameters will have
no effect on the other two. They are typically comprised of combining filters
and will have either reciprocal peak/dip or constant-Q filters. (Check your
manual to see which you have – it makes a huge difference!) In order to give
the user a wider amount of control over the signal, the frequency ranges of
the filters in a parametric equalizer typically overlap, making it possible to
apply gain or attenuation to the same centre frequency using at least two
filters.
The obvious advantage of using a parametric equalizer lies in the detail
and versatility of control afforded by the user. This comes at a price, however
– it unfortunately takes much time and practice to master the use of a
parametric equalizer.
Semi-parametric equalizer
A less expensive variation on the true parametric equalizer is the semi-
parametric or quasi-parametric equalizer . From the front panel, this device
appears to be identical to its bigger cousin, however, there is a significant
difference between the two. Whereas in a true parametric equalizer, the
three parameters are independent, in a semi-parametric equalizer, they are
not. As a result, changing the value of one parameter will cause at least one,
if not both, of the other two parameters to change unexpectedly. As a result,
5. Electroacoustics 290
although these devices are less expensive than a true parametric, they are
less trustworthy and therefore less functional in real working situations.
5.1.4 Summary
5.1.5 Phase response
So far, we’ve only been looking at the frequency response of a filter or
equalizer. In other words, we’ve been looking at what the magnitude of
the output of the filter would be if we send sine tones through it. If the
filter has a gain of 6 dB at a certain frequency, then if we feed it a sine
tone at that frequency, then the amplitude of the output will be 2 times the
amplitude of the input (because a gain of 2 is the same as an increase of 6
dB). What we haven’t looked at so far is any shift in phase (also known as
phase distortion) that might be incurred by the filtering process. Any time
there is a change in the frequency response in the signal, then there is an
associated change in phase response that you may or may not want to worry
about. That phase response is typically expressed as a shift (in degrees) for
a given frequency. Positive phase shifts mean that the signal is delayed in
phase whereas negative phase shifts indicate that the output is ahead of the
input.
“The output is ahead of the input!?” I hear you cry. “How can the
output be ahead of the input? Unless you’ve got one of those new digital
filters that can see into the near future...” Well, it’s actually not as strange
as it sounds. The thing to remember here is that we’re talking about a sine
wave – so don’t think about using an equalizer to help your drummer get
ahead of the beat... It doesn’t mean that the whole signal comes out earlier
than it went in. This is because we’re not talking about negative delay –
it’s negative phase.
Let’s look at a typical example. Figure 24 below shows the phase re-
sponse for a typical filter found in an average equalizer with a frequency
response shown in Figure 10. Note that some frequencies have a negative
phase shift while others have a positive phase shift. If you’re looking really
carefully, you may notice a relationship between the slope of the frequency
response and the polarity of the phase response – but if you didn’t notice
this, don’t worry...
Minimum phase
While it’s true that a change in frequency response of a signal necessarily
implies that there is a change in its phase, you don’t have to have the
5. Electroacoustics
40
30
20
10
Phase (degrees)
0
-10
-20
-30
-40
2 3 4
10 10 10
Frequency (Hz)
Figure 5.14: The phase responses of bandpass filters with a centre frequency of 1 kHz, various Q’s,
and a gain of 12 dB in a typical equalizer. Black Q = 1. Green Q = 2. Blue Q = 4. Red Q = 8.
(Compare these curves to the plot in Figure 10)
40
30
20
10
Phase (degrees)
-10
-20
-30
-40
2 3 4
10 10 10
Frequency (Hz)
Figure 5.15: The phase responses of bandpass filters with a centre frequency of 1 kHz, various Q’s,
and a gain of -12 dB in a typical equalizer. Black Q = 1. Green Q = 2. Blue Q = 4. Red Q = 8.
(Compare these curves to the plot in Figure 5.14)
5. Electroacoustics 293
same phase shift for the same frequency response change. In fact, different
manufacturers can build two filters with centre frequencies of 1 kHz, gains
of 12 dB and Q’s of 4. Although the frequency responses of the two filters
will be identical, their phase responses can be very different.
You may occasionally hear the term minimum phase to describe a filter.
This is a filter that has the frequency response that you want, and incurs
the smallest (hence “minimum”) shift in phase to achieve that frequency
response.
Two things to remember about minimum phase filters: 1) Just because
they have the minimum possible phase shift doesn’t necessarily imply that
they sound the best. 2) A minimum phase filter can be “undone” – that is to
say that if you put your signal through a minimum phase filter, it is possible
to find a second minimum phase filter that will reverse all the effects of the
first, giving you exactly the signal you started with.
Linear Phase
If you plot the phase response of a filter for all frequencies, chances are
you’ll get a smooth, fancy-looking curve like the ones in Figure 24. Some
filters, on the other hand, have a phase response plot that’s a straight line
if you graph the response on a linear frequency scale (instead of a log scale
like we normally do...). This line usually slopes upwards so the higher the
frequency, the bigger the phase change. In fact, this would be exactly the
phase response of a straight delay line – the higher the frequency, the more
of a phase shift that’s incurred by a fixed delay time. If the delay time is 0,
then the straight line is a horizontal one at 0◦ for all frequencies.
Any filter whose phase response is a straight line is called a linear phase
filter . Be careful not to jump to the conclusion that, because it’s a linear
phase filter, it’s better than anything else. While there are situations where
such a filter is useful, they work well in all situations to correct all problems.
Different intentions require different filter characteristics.
Ringing
The phase response of a filter is typically strongly related to its Q. The higher
the Q (and therefore the smaller the bandwidth) the greater the change in
phase around the centre frequency. This can be seen in Figure 24 above.
Notice that, the higher the Q, the higher the slope of the phase response
at the centre frequency of the filter. When the slope of the phase response
of a filter gets very steep (in other words, when the Q of the filter is very
5. Electroacoustics 294
Figure 5.16: Ringing caused by minimum phase bandpass filters with centre frequencies of 1 kHz
and various Q’s. The input signal is white noise, abruptly cut to digital zero as is shown in the top
plot. There are at least three things to note: 1) The higher the Q, the longer the filter will ring at
the centre frequency after the input signal has stopped. 2) The higher the Q, the more the output
signal approaches a sine wave at the centre frequency. 3) Even a filter with a Q as low as 1 rings
– although this will likely not be audible due to psychoacoustic masking effects.
5.1.6 Applications
All this information is great – but why and how would you use an equalizer?
Well, there are lots of different reasons FINISH THIS OFF
It’s virtually impossible to give a list of “tips and tricks” in this category,
because every instrument and every microphone in every recording situation
will be different. There are time when you’ll want to use an equalizer to
compensate for deficiences in the signal because you couldn’t afford a better
mic for that particular gig. On the other hand there may be occasions
where you have the most expensive microphone in the world on a particular
instrument and it still needs a little tweaking to fix it up. There are, however,
a couple of good rules to follow when you’re in this game.
First of all – don’t forget that you can use an equalizer to cut as easily
well as boost. Consider a situation where you have a signal that has too
much bass – there are two possible ways to correct the problem. You could
increase the mids and highs to balance, or you could turn down the bass.
There are as many situations where one of these is the correct answer as
there are situations where the other answer is more appropriate. Try both
unless you’re in a really big hurry.
Second of all – don’t touch the equalizer before you’ve heard what you’re
tweaking. I often notice when I go to a restaurant that there are a huge
number of people who put salt and pepper on their meal before they’ve
even tasted a single morsel. Doesn’t make much sense... Hand them a plate
full of salt and they’ll still shake salt on it before raising a hand to touch
their fork. The same goes for equalization. Equalize to fix a problem that
you can hear – not because you found a great EQ curve that worked great
on kick drum at the last session.
Thirdly – don’t overdo it. Or at least, overdo it to see how it sounds
when it’s overdone, then bring it back. Again, back to a restaurant analogy
– you know that you’re in a restaurant that knows how to cook steak when
there’s a disclaimer on the menu that says something to the effect of “We
are not responsible for steaks ordered well done.” Everything in moderation
– unless, of course, you’re intending to plow straight through the fields of
moderation and into the barn of excess.
Fourthly, there’s a number of general descriptions that indicate problems
that can be fixed, or at least tamed with equalization. For example, when
someone says that the sound is “muddy,” you could probably clean this up
by reducing the area around 125 – 250 Hz with a low-Q filter. The table
below gives a number of basic examples, but there are plenty more – ask
around...
One last trick here applies when you hear a resonant frequency sticking
out, and you want to get rid of it, but you just don’t know what the exact
frequency is. You know that you need to use a filter to reduce a frequency –
but finding it is going to be the problem. The trick is to search and destroy
5. Electroacoustics 296
5.1.8 Loudness
Although we rarely like to admit it, we humans aren’t perfect. This is true
in many respects, but for the purposes of this discussion, we’ll concentrate
specifically on our abilities to hear things. Unfortunately, our ears don’t have
the same frequency response at all listening levels. At very high listening
levels, we have a relatively flat frequency response, but as the level drops,
so does our sensitivity to high and low frequencies. As a result, if you mix a
tune at a very high listening level and then reduce the level, it will appear
to lack low end and high end. Similarly, if you mix at a low level and turn
it up, you’ll tend to get more low end and high end.
One possible use for an equalizer is to compensate for the perceived lack
of information in extreme frequency ranges at low listening levels. Essen-
tially, when you turn down the monitor levels, you can use an equalizer to
increase the levels of the low and high frequency content to compensate for
deficiencies in the human hearing mechanism. This filtering is identical to
that which is engaged when you press the “loudness” button on most home
stereo systems. Of course, the danger with such equalization is that you
don’t know what frequency ranges to alter, and how much to alter them –
so it is not recommendable to do such compensation when you’re mixing,
only when you’re at home listening to something that’s already meen mixed.
5. Electroacoustics 297
Let’s say that you’ve got a recording of an electric bass on a really noisy
analog tape deck. Since most of the perceivable noise is going to be high-
frequency stuff and since most of the signal that you’re interested in is
going to be low-frequency stuff, all you need to do is to roll off the high
end to reduce the noise. Of course, this is be best of all possible worlds.
It’s more likely that you’re going to be coping with a signal that has some
high-frequency content (like your lead vocals, for example...) so if you start
rolling off the high end too much, you start losing a lot of brightness and
sparkle from your signal, possibly making the end result worse that you
started. If you’re using equalization to reduce noise levels, don’t forget to
occasionally hit the “bypass” switch of the equalizer once and a while to
hear the original. You may find when you refresh your memory that you’ve
gone a little too far in your attempts to make things better.
Almost every console in the world has a little button on every input strip
that has a symbol that looks like a little ramp with the slope on the left.
This is a high-pass filter that is typically a second-order filter with a cutoff
frequency around 100 Hz or so, depending on the manufacturer and the year
it was built. The reason that filter is there is to help the recording or sound
reinforcement engineer get rid of low-frequency noise like “stage rumble”
or microphone handling noise. In actual fact, this filter won’t eliminate all
of your problems, but it will certainly reduce them. Remember that most
signals don’t go below 100 Hz (this is about an octave and a half below
middle C on a piano) so you probably don’t need everything that comes
from the microphone in this frequency range – in fact, chances are, unless
you’re recording pipe organ, electric bass or space shuttle launches, you
won’t need nearly as much as you think below 100 Hz.
5. Electroacoustics 298
Hummmmmmm...
There are many reasons, forgivable and unforgivable, why you may wind
up with an unwanted hum in your recording. Perhaps you work with a
poorly-installed system. Perhaps your recording took place under a buzzing
streetlamp. Whatever the reason, you get a single frequency (and perhaps
a number of its harmonics) singing all the way through your recording. The
nice thing about this situation is that, most of the time, the hum is at a
predictable frequency (depending on where you live, it’s likely a multiple
of either 50 Hz or 60 Hz) and that frequency never changes. Therefore, in
order to reduce, or even eliminate this hum, you need a very narrow band-
reject filter with a lot of attenuation. Just the sort of job for a notch filter.
The drawback is that you also attenuate any of the music that happens to
be at or very near the notch centre frequency, so you may have to reach a
compromise between eliminating the hum and having too detrimental of an
effect on your signal.
Dynamic enhancement
Take your signal and, using filters, divide it into two bands with a crossover
frequency at around 5 kHz. Compress the higher band using a fast attack
and release time, and adjust the output level of the compressor so that when
the signal is at a peak level, the output of the compressor summed with the
lower frequency band results in a flat frequency response. When the signal
level drops, the low frequency band will be reduced more than the high
frequency band and a form of high-frequency enhancement will result.
Dynamic Presence
In order to add a sensation of “presence” to the signal, use the technique
described in Section 3.4.1 but compress the frequency band in the 2 kHz to
5. Electroacoustics 299
De-Essing
There are many instances where a close-mic technique is used to record a
narrator and the result is a signal that emphasizes the sibilant material in
the signal – in particular the “s” sound. Since the problem is due to an excess
of high frequency, one option to fix the issue could be to simply roll off high
frequency content using a low-pass filter or a high-frequency shelf. However,
this will have the effect of dulling all other material in the speech, removing
not only the “s’s” but all brightness in the signal. The goal, therefore, is
to reduce the gain of the signal when the letter “s” is spoken. This can
be accomplished using an equalizer and a compressor with a side chain. In
this case, the input signal is routed to the inputs of the equalizer and the
compressor in parallel. The equalizer is set to boost high frequencies (thus
making the “s’s” even louder...) and its output is fed to the side chain input
of the compressor. The compression parameters are then set so that the
signal is not normally compressed, however, when the “s” is spoken, the
higher output level from the equalizer in the side chain triggers compression
on the signal. The output of the compressor has therefore been “de-essed”
or reduced in sibilance.
Although it seems counterintuitive, don’t forget that, in order to reduce
the level of the high frequencies in the output of the compressor, you have
to increase the level of the high frequencies at the output of the equalizer in
this case.
Pop-reduction
A similar problem to de-essing is the “pop” that occurs when a singer’s plo-
sive sounds (p’s and b’s) cause a thump at the diaphragm of the microphone.
There is a resulting overload in the low frequency component of the signal
that can be eliminated using the same technique described in Section 3.4.3
where the low frequencies (250 Hz and below) are boosted in the equalizer
instead of the high frequency components.
Figure 5.17: The gain response or transfer function of a device with a gain of 1 for all input levels.
Essentially, output = input.
Figure 5.18: The gain response (or transfer function) of a device with a different gain for different
input levels. Note that a 2 dB rise in level at the input results in a 1 dB rise in level at the output.
5. Electroacoustics 302
level in decibels to change in output level in decibels. So, if the output goes
up 2 dB for every 1 dB increase in level at the input, then we have a 2:1
compression ratio. The higher the compression ratio, the greater the effect
on the dynamic range.
Notice in Figure 2 that there is one input level (in this case, 0 dBV)
that results in a gain of 1 – that is to say that the output is equal to the
input. That input level is known as the rotation point of the compressor.
The reason for this name isn’t immediately obvious in Figure 2, but if we
take a look at a number of different compression ratios plotted on the same
graph as in Figure 3, then the reason becomes clear.
Figure 5.19: The gain response of various compression ratios with the same rotation point (at 0
dBV). Blue = 2:1 compression ratio, red = 3:1, green = 5:1, black = 10:1.
Figure 5.20: A device which exhibits unity gain for input signals with a level of less than 0 dBV and
a compression of 2:1 for input signals with a level of greater than 0 dBV.
gain to all signal levels. Above the threhold, the device changes its gain
according to the input level. This sudden bend in the transfer function at
the threshold is called the knee in the response.
In the case of the plot shown in Figure 4, the rotation point of the
compressor is the same as the threshold. This is not necessarily the case,
however. If we look at Figure 5, we can see an example of a curve where
this is illustrated.
This device applies a gain of 5 dB to all signals below the threshold, so
an input level of -20 dBV results in an output of -15 dBV and an input at
-10 dBV results in an output of -5 dBV. Notice that the threshold is still
at 0 dBV (because it is the input level over which the device changes its
behaviour). However, now the rotation point is at 10 dBV.
Let’s look at an example of a compressor with a gain of 1 below threshold,
a threshold at 0 dBV and different compression ratios. The various curves
for such a device are shown in Figure 6 below. Notice that, below the
threshold, there is no difference in any of the curves. Above the threshold,
however, the various compression ratios result in very different behaviours.
There are two basic “styles” in compressor design when it comes to the
threshold. Some manufacturers like to give the user control over the thresh-
old level itself, allowing them to change the level at which the compressor
5. Electroacoustics 304
Figure 5.21: An example of a device where the threshold is not the rotation point. The threshold
is 0 dBV and the rotation point is 10 dBV.
Figure 5.22: A plot showing a number of curves representing various settings of the compression
ratio with a unity gain below threshold and a threshold of 0 dBV. red = 1.25:1, blue = 2:1, green
= 4:1, black = 10:1.
5. Electroacoustics 305
“kicks in.” This type of compressor typically has a unity gain below thresh-
old, although this isn’t always the case. Take a look at Figure 7. This shows
a number of curves for a device with a compression ratio of 2:1, unity gain
below threshold and an adjustable threshold level.
Figure 5.23: A plot showing a number of curves representing various settings of the threshold with a
unity gain below threshold and a compression ratio of 2:1. red threshold = -10 dBV, blue threshold
= -5 dBV, green threshold = 0 dBV, black theshold = 5 dBV.
The advantage of this design is that the bulk of the signal, which is typ-
ically below the threshold, remains unchanged – by changing the threshold
level, we’re simply changing the level at which we start compressing. This
makes the device fairly intuitive to use, but not necessarily a good design
for the final sound quality.
Let’s think about the response of this device (with a 2:1 compression
ratio). If the threshold is turned up to 12 dBV, then any signal coming in
that’s less than 12 dBV will go out unchanged. If the input signal has a
level of 20 dBV, then the output will be 16 dBV, because the input went 8
dB above threshold and the compression ratio is 2:1, so the output goes up
4 dB.
If the threshold is turned down to -12 dBV, then any signal coming in
that’s less than -12 dBV will go out unchanged. If the input signal has a
level of 20 dBV, then the output will be 4 dBV, because the input went 32
dB above threshold and the compression ratio is 2:1, so the output goes up
16 dB.
5. Electroacoustics 306
So what? Well, as you can see from Figure 7, changing the compres-
sion ratio will affect the output level of the loud stuff by an amount that’s
determined by the relationship bewteen the threshold and the compression
ratio.
Consider for a moment how a compressor will be used in a recording
situation: we use the compressor to reduce the dynamic range of the louder
parts of the signal. As a result, we can increase the overall level of the output
of the compressor before going to tape. This is because the spikes in the
signal are less scary and we can therefore get closer to the maximum input
level of the recording device. As a result, when we compress, we typically
have a tendancy to increase the input level of the device that follows the
compressor. Don’t forget, however, that the compressor itself is adding noise
to the signal, so when we boost the input of the next device in the audio
chain, we’re increasing not only the noise of the signal itself, but the noise
of the compressor as well. How can we reduce or eliminate this problem?
Use compressor design philosophy number 2...
Instead of giving the user control over the threshold, some compressor
designers opt to have a fixed threshold and a variable gain before compres-
sion. This has a slightly different effect on the signal.
Figure 5.24: A plot showing a number of curves representing various settings of the gain before
compression with a fixed threshold. The compression ratio in this example is 2:1. The threshold
is fixed at 0 dBV, however, this value does not directly correspond to the input signal level as in
Figure 7. The red curve has a gain of 10 dB, blue = 5 dB, green = 0 dB, black = -5 dB.
5. Electroacoustics 307
dB, then the output has to be turned down by 5 dB to make this happen.
The output attenuation in dB is equal to the gain before compression (in
dB) divided by the compression ratio.
What would this response look like? It’s shown in Figure 9. As you can
see, changes in the gain before compression are compensated so that the
output for a signal above the threshold is always the same, so we don’t have
to fiddle with the input level of the next device in the chain.
If we were to do the same thing using a compressor with a variable
threshold, then we’d have to boost the signal at the output, thus increasing
the apparent noise floor of the compressor and making it sound as bad as it
is...
Figure 5.25: The gain response curves for various settings on a compressor with a magic output
gain stage that compensates for changes in either the threshold or the gain before compression
stage so that you don’t have to.
As you can see from Figure 9, the advantage of this system is that
adjustments in the gain before compression (or the threshold) don’t have
any affect on how the loud stuff behaves – if you’re past the threshold, you
get the same output for the same input.
Figure 5.26: The transfer function of a compressor with a gain before compression of 0 dB, a
threshold at -20 dBV and a compression ratio of 8:1.
Figure 5.27: The gain vs. input response of a compressor with a gain before compression of 0 dB,
a threshold at -20 dBV and a compression ratio of 8:1.
Figure 5.28: The gain vs. input response of a compressor with a gain before compression of 0 dB, a
threshold at -20 dBV and a compression ratio of 8:1. Notice that the gain is not plotted in decibels
in this case. In effect, Figures 10, 11 and 12 show the same information.
5. Electroacoustics 311
sion ratios with the same thresholds and gains before compression to give
you an idea of the change in the gain of the compressor for various ratios.
Figure 5.29: The transfer function of a compressor with a gain before compression of 0 dB, a
threshold at -20 dBV. Four different compression ratios are shown: red = 1.25:1, blue = 2:1, green
= 4:1, black = 10:1.
Figure 5.30: The gain vs. input response of a compressor with a gain before compression of 0 dB,
a threshold at -20 dBV. Four different compression ratios are shown: red = 1.25:1, blue = 2:1,
green = 4:1, black = 10:1.
Figure 5.31: The gain vs. input response of a compressor with a gain before compression of 0 dB,
a threshold at -20 dBV. Four different compression ratios are shown: red = 1.25:1, blue = 2:1,
green = 4:1, black = 10:1.
5. Electroacoustics 313
Figure 5.32: The gain vs. input level plot for a soft knee compressor with a gain before compression
of 0 dB, a threshold at -20 dBV and a compression ratio of 8:1. Compare this plot to the one in
Figure 10.
sors these days give you the option to switch between a peak and an RMS
detection circuit.
On high-end units, you can have your detection circuit respond to some
mix of the simultaneous peak and RMS values of the input level. Remember
from Chapter 2.1.6 that the ratio of the peak to the RMS is called the crest
factor . This ratio of peak/RMS can either be written as a value from 0
to something big, or it may be converted into a dB scale. Remember that,
if the crest factor is near 0 (or -infinity dB), then the RMS value is much
greater than the peak value and therefore the compressor is responding to
the RMS of the signal level. If the crest factor is a big number (or a smaller
number in dB – but much greater than -infinity), then the compressor is
responding to the peak value of the input level.
Figure 5.33: A sine wave that is suddenly increased in level from a peak value of 0.33 to a peak
value of 1.
compressors can do this by delaying the signal and turning the present into
the past and the future into the present, but we’ll pretend that this isn’t
happening for now...).
Let’s say that we have a compressor with a gain before compression of
0 dB and a threshold that’s set to a level that’s higher than the lower-level
signal in Figure 17, but lower than the higher-level signal. So, the first part
of the signal, the quiet part, won’t be compressed and the later, louder part
will. Therefore the compressor will have to have a gain of 1 (or 0 dB) for
the quiet signal and then a reduced gain for the louder signal.
Since the compressor can’t see into the future, it will respond somewhat
slowly to the sudden change in level. In fact, most compressors allow you
to control the speed with which the gain change happens. This is called the
attack time of the compressor. Looking at Figure 18, we can see that the
compressor has a sudden awareness of the new level (at Time = 500) but
it then settles gradually to the new gain for the higher signal level. This
raises a question – the gain starts changing at a known time, but, as you can
see in Figure 18, it approaches the final gain forever without really reaching
it. The question that’s raised is “what is the time of the attack time?” In
other words, if I say that the compressor has an attack time of 200 ms, then
what is the relationship between that amount of time and the gain applied
by the compressor. The answer to this question is found in the chapter on
capacitors. Remember that, in a simple RC circuit, the capacitor charges to
a new voltage level at a rate determined by the time constant which is the
product of the resistance and the capacitance. After 1 time constant, the
capacitor has charged to 63 % of the voltage being applied to the circuit.
After 5 time constants, the capacitor has charged to over 99 % of the voltage,
and we consider it to have reached its destination. The same numbers apply
to compressors. In the case of an attack time of 200 ms, then after 200 ms
has passed, the gain of the compressor will be at 63 % of the final gain level.
After 5 times the attack time (in this case, 1 second) we can consider the
device to have reached its final gain level. (In fact, it never reaches it, it
just gets closer and closer and closer forever...)
What is the result of the attack time on the output of the compressor?
This actually is pretty interesting. Take a look at Figure 19 showing the
output of a compressor that has the signal in Figure 17 sent into it and
responding with the gain in Figure 18. Notice that the lower-level signal
goes out exactly as it went it. We would expect this because the gain of
the compressor for that portion of the signal is 1. Then the signal suddenly
increases to a new level. Since the compressor detection circuit take a little
while to figure out that the signal has gotten louder, the initial new loud
5. Electroacoustics 316
Figure 5.34: The change in gain over time for a sudden increase in signal level going from a signal
that’s lower than the threshold to one that’s higher. This is called the attack time of the compressor.
(Notice that this looks just like the response of a capacitor being charged to a new voltage level.
This is not a coincidence.)
signal gets through, almost unchanged. As we get further and further into
the new level in time, however, the gain settles to the new value and the
signal is compressed as we would expect. The interesting thing to note here
is that a portion of the high-level signal gets through the compressor. The
result is that we’ve created a signal that sounds like more of a transient than
the input. This is somewhat contrary to the way most people tend to think
that a compressor behaves. The common belief is that a compressor will
control all of your high-level signals, thus reducing your dynamic range –
but this is not exactly the case as we can see in this example. In fact, it may
be possible that the perceived dynamic range is greater than the original
because of the accents on the transient material in the signal.
Figure 5.35: The output of a compressor which is fed the signal shown in Figure 17 and responds
with the gain shown in Figure 18.
Similarly, what happens when the signals decreases in level from one that
is being compressed to one that is lower than the threshold? Again, it takes
some time for the compressor’s detection circuit to realize that the level has
changed and therefore responds slowly to fast changes. This response time
is called the release time of the compressor. (Note that the release time is
5. Electroacoustics 317
measured in the same way as the attack time – it’s the amount of time it
takes the compressor to get to 63% of its intended gain.)
For example, we’ll assume that the signal in Figure 20 is being fed into
a compressor. We’ll also assume that the higher-level signal is above the
compression threshold and the lower-level signal is lower than the threshold.
Figure 5.36: A sine wave that is suddenly decreased in level from a peak value of 1 to a peak value
of 0.33.
This signal will result in a gain reduction for the first part of the signal
and no gain reduction for the latter part, however, the release time of the
compressor results in a transition time from these two states as is shown in
Figure 21.
Figure 5.37: The change in gain over time for a sudden decrease in signal level going from a
signal that’s higher than the threshold to one that’s lower. This is called the release time of the
compressor. (Notice that this looks just like the response of a capacitor being charged to a new
voltage level. This is not a coincidence.)
Figure 5.38: The output of a compressor which is fed the signal shown in Figure 20 and responds
with the gain shown in Figure 21.
Figure 5.39: Four different attack times for compressors with the same thresholds and compression
ratios.
Figure 5.40: Four different release times for compressors with the same thresholds and compression
ratios.
gets analysed and converted into a different signal which is used to control
the gain of the audio path.
As a result, we can think of a basic compressor as is shown in the block
diagram below.
Notice that the input gets split in two directions right away, going to the
two different paths.
At the heart of the audio path is a device we have’t seen before – it’s
drawn in block diagrams (and sometimes in schematics) as a triangle (so
we know right away it’s an amplifier of some kind) attached to a box with
an “X” through it on the left. This device is called a voltage controlled
amplifier or VCA. It has one audio input on the left, one audio output on
the right and a control voltage (or CV ) input on the top. The amplifier has
a gain which is determined by the level of the control voltage. This gain is
typically applied to the current through the VCA, not the voltage – this is
a new concept as well... but we’ll get to that later.
If you go to the VCA store and buy a VCA, you’ll find out that it has
an interesting characteristic. Usually, it will have a logarithmic change in
gain for a linear change in voltage at the control voltage input. For example,
one particular VCA from THAT corporation has a gain of 0 dB (so input =
5. Electroacoustics 320
The only problem with the schematic so far is that the VCA is a current
amplifier not a voltage amplifer. Since we prefer to think in terms of voltage
most of the time, we’ll need to convert the voltage signal that we’re feeding
into the compressor into a current signal of the same shape. This is done
by sticking the VCA in the middle of an inverting amplifier circuit as shown
below:
Here’s an explanation of why we have to build the circuit like this. Re-
member back to the first stuff on op amps – one of the results of the feedback
loop is that the voltage level at the negative input leg MUST be the same
as the positive input leg. If this wasn’t the case, then the huge gain of the
op amp would result in a clipped output. So, we call the voltage level at
the negative input “virtual ground” because it has the same voltage level as
ground, but there’s really no direct connection to ground. If we assume that
the VCA has an impedance through it of 0Ω (a pretty safe assumption), then
the voltage level at the signal input of the VCA is also the same as ground.
Therefore the current through the resistor on the left in the above schematic
5. Electroacoustics 321
RMS Detector
This is an easy one. You buy a chip called an RMS detector. There’s more
stuff to do once you buy that chip to make it happy, but you can just follow
the manufacturer’s specifications on that one. This chip will give you a DC
voltage output which is determined by the logarithmic level of an AC input.
For example, using a THAT Corp. chip again... The audio input of the chip
is measured relative to 0.316 V rms (which happens to be -10 dBV). If the
RMS level of the audio input of the chip is 0.316 V rms, then the output of
the chip is 0 V DC. If you increase the level of the input signal by 1 dB, the
output level goes up by 6 mV. Conversely, if the input level goes down by 1
5. Electroacoustics 322
dB, then the output goes down by 6 mV. An important thing to notice here
is that a logarithmic change in the input level results in a linear change in
the output voltage. So, we build a table for future reference again:
Now, what would happen if we took the output from this RMS detector
and connected it directly to the control voltage input of the VCA like in the
diagram below?
Well, if the input level to the whole circuit was -10 dBV, then the RMS
detector would output a 0 V control voltage to the CV input of the VCA.
This would cause it to have a gain of 0 dB and its output would be -10 dBV.
BUT, if the input was -11 dBV, then the RMS detector output would be
-6 mV making the VCA gain go up by 1 dB, raising the output level to -10
dBV. Hmmmmm... If the input level was -9 dBV, then the RMS detector’s
output goes to 6 mV and the VCA gain goes to -1 dB, so the output is
-10 dBV. Essentially, no matter what the input level was, the output level
would always be the same. That’s a compression ratio of infinity : 1.
Although the circuit above would indeed compress with a ratio of infinity
: 1, that’s not terribly useful to us for a number of reasons. Let’s talk about
how to reduce the compression ratio. Take a look at this circuit.
If the potentiometer has a linear scale, and the wiper is half-way up the
pot, then the voltage at the wiper is one half the voltage applied to the top
5. Electroacoustics 323
of the pot. This means, in turn, that the voltage applied to the CV input
of the VCA is one half the voltage output from the RMS detector. How
does this affect us? Well, if the input level of the circuit is -10 dBV, then
the RMS detector outputs 0 V, the wiper on the pot is at 0 V and the gain
of the VCA is 0 dB, therefore the output level is -10 dBV. If, however, the
input level goes up by 1 dB (to -9 dBV), then the RMS detector output goes
up by 6 mV, the wiper on the pot goes up by 3 mV, therefore the gain of
the VCA goes down by 0.5 dB and the output level is -9.5 dB. So, for a 2
dB change in level at the input, we get a 1 dB change in level at the output
– in other words, a 2:1 compression ratio.
If we put the pot at another location, we change the ratio of the voltage
at the top of the pot (which is dependent on the input level to the RMS
detector) to the gain (which is controlled by the wiper voltage). So, we
have a variable compression ratio from 1:1 (no compression) to infinity:1
(complete limiting) and a rotation point at -10 dBV. This is moderately
useful, but real compressors have a threshold. So – how do we make this
happen?
Threshold
The first thing we’ll need to make a threshold detection circuit is a way
of looking at the signal and dividing it into a low voltage area (in which
nothing gets out of the circuit) and a high area (in which the output = the
input to the circuit). We already looked at how to do this in a rather crude
fashion – it’s called a half-wave rectifier. Since the voltage of the wiper on
the pot is going positive and negative as the input signal goes up and down
respectively, all we need to do is rectify the signal after the wiper so that
none of the negative voltage gets through to the CV input of the VCA. That
5. Electroacoustics 324
way, when the signal is low, the gain of the VCA will be 0 dB, leaving the
signal unaffected. When the signal goes positive, the rectifer lets the signal
through, the gain of the VCA goes down and the compressor compresses.
One way to do this would simply be to put a diode in the circuit pointing
away from the wiper. This wouldn’t work very well because the diode would
need the 0.7 V difference across it to turn on in the first place. Also, the
turn-on voltage of the diode is a little sloppy, so we wouldn’t know exactly
what the threshold was (but we’ll come back to this later). what we need
then, is something called a precision rectifier – a circuit that looks like a
perfect diode. This is pretty easy to build with a couple of diodes and an
op amp as is shown in the circuit below.
Notice that the circuit has two effects – the first is that it is a half-wave
rectifier, so only the positive half of the input gets through. The second is
that it is an inverting amplifier, so the output is opposite in polarity to the
input – therefore, in order to get things back in the right polarity, we’ll have
to flip the polarity once again with a second inverting amplifier with unity
gain.
If we add this circuit between the wiper and the VCA CV input like the
diagram shown below, what will happen?
Now, if the input level is -10 dBV or lower, the output of the RMS
detector is 0 V or lower. This will result in the output of the half-wave
rectifier being 0 V. This will be multiplied by -1 in the polarity inversion,
5. Electroacoustics 325
thing to note is that when you turn UP this level to the top of the pot,
you are acutally getting a lower voltage (notice that the top of the pot is
connected to a negative voltage supply). Why is this? Well, if the output
of the threshold level adjustment wiper is 0 V, this gets added to the RMS
detector output and the threshold stays at -10 dBV. If the output of the
threshold level adjustment wiper goes positive, then the output of the RMS
detector is increased and the rectifier opens up at a lower level, so by turning
UP the voltage level of the threshold adjustment pot, you turn DOWN the
threshold. Of course, the size of the change we’re talking about on the
threshold level adjustement is on the order of mV to match the level coming
out of the RMS detector, so you might want to be sure to make the maximum
and minimum values possible from the pot pretty small. See the THAT Corp
.pdf file linked at the bottom of the page for more details on how to do this.
So, now we have a compressor with a controllable compression ratio and
a threshold with a controllable level. All we need to do is to add an output
gain knob. This is pretty easy since all we’re going to do is add a static gain
value for the VCA. This can be done in a number of ways, but we’ll just add
another DC voltage to the control voltage after the threshold. That way, no
matter what comes out of the threshold, we can alter the level.
The diagram above shows the whole circuit. Note that the output gain
control has the + DC voltage at the top of the pot. This is because it will
become negative after going through the polarity inversion stage, making
the VCA go up in gain. Since this DC voltage level is added to the control
voltage signal after the threshold detection circuit, it’s always on – therefore
5. Electroacoustics 327
it’s basically the same as an output level knob. In fact, it IS an output level
knob.
Everything I’ve said here is basically a lead-up to the pdf file below
from THAT Corp. It’s a good introduction to how a simple RMS-based
compressor works. It includes all the response graphs that I left out here,
and goes a little further to explain how to include a soft knee for your circuit.
Definitely recommended reading if you’re planning on learning more about
these things...
isn’t, then the mic pre will output the audio signal plus the noise coupled
from the common impedance.
This is determined by the capacitance between the source and the receiver
or their transmission media.
Once upon a time, we looked at the construction of a capacitor as being
two metal plates side by side but not touching each other. If we push
electrons into one of the plates, we’ll repel electrons out of the other plate
and we appear to have “current” flowing through the capacitor. The higher
the rate of change of the current flowing in and out of the plate, the easier
it is to move current in and out of the other plate.
Consider that if we take any two pieces of metal and place them side by
side without touching, we’re going to create a capacitor. This is true of two
wires side by side inside a mic cable, or two wires resting next to each other
on the floor or so on. If we send a high frequency through one of the wires,
and we have some small capacitance between that wire and the “receiver”
wire, we’ll get some signal appearing on the latter.
The level of this noise is proportional to:
1. The area that the source and receiver share (how big the plates are,
or in this case, how long the wires are side by side)
2. The frequency of the noise
3. The amplitude of the noise voltage (note that this is “voltage”)
4. The permittivity of the medium (dielectric) between the two
The level of the noise is inversely proportional to
1. the square of the distance between the sender and the receiver (or in
some cases their connected wires)
This is determined by the mutual inductance between the source and re-
ceiver.
Remember back to the chapter where we talked about the right hand rule
and how, when we send AC through a wire, we generate a pulsing magnetic
field around it whose direction is dependent on the direction of the current
and whose amplitude (or distance from the wire) is proportional to the level
of the current. If we place another wire in this moving magnetic field, we
will induce a current in the second wire – which is how a transformer works.
5. Electroacoustics 331
Electromagnetic radiation
This occurs when the source and receiver are at least 1/6th of a wavelength
apart (therefore the receiver is in the far field – where the wavefront is a
plane and the ratio of the electrostatic to the electromagnetic field strengths
is constant)
An example of noise caused by electromagnetic radiation is RFI (Radio
Frequency Interference) caused by radio transmitters, CB etc.
5.5.1 Shielding
This is the first line of defense against outside noise caused by high-frequency
electrical field and magnetic field coupling as well as electromagnetic radia-
tion. The theory is that the shielding wire, foil or conduit will prevent the
bulk of the noise coming in from the outside.
It works by relying on two properties
1. Reflection back to the outside world where it can’t do any harm...
(and, to a small extent, re-reflection within the shield, but this is a VERY
small extent)
2. Absorption – where the energy is absorbed by the shield and sent to
ground.
The effectiveness of the shield is dependent on its:
1. Thickness – the thinner the shield the less effective. This is par-
ticularly true of low-frequency noise... Aluminum foil shield works well at
rejecting up to 90 dB at frequencies above 30 MHz, but it’s inadequate at
fending off low-frequency magnetic fields (in fact it’s practically transparent
below 1 kHz), We rely on balancing and differential amplifiers to get rid of
these.
2. Conductivity – the shield must be able to sink all stray currents to
the ground plane more easily than anything else.
3. Continuity – we cannot break the shield. It must be continuous
around the signal paths, otherwise the noise will leak in like water into a
hole in a boat. Don’t forget that the holes in your equipment for cooling,
potentiometers and so on are breaks in the continuity. General guideline:
keep the diameter of your holes at less than 1/20 of the wavelength of the
highest frequency you’re worried about to ensure at least 20 dB of attenu-
ation. Most high-frequency noise problems are caused by openings in the
shield material.
This is not the case. A differential (or symmetrical) signal is one where
one channel of audio is sent as two voltages on two wires, which are usually
twisted together. These two signals are identical in ever respect with the
exception that they are opposite in polarity. These signals are known by
such names as “inverting and non-inverting” or “Live and Return” – the
“L” and “R” in XLR (the X is for eXternal – the ground). They are re-
ceived by a differential amplifier which subtracts the return from the live
and produces a single signal with a gain of 6 dB (since a signal minus its
negative self is the same as 2 times the signal and therefore 6 dB louder).
The theoretical benefit of using this system is that any noise that is received
on the transmission cables between the source and the receiver is (theoreti-
cally) identical on both wires. When these two versions of the noise arrive
at the receiver’s differential amplifier, they are theoretically eliminated since
we are subtracting the signal from a copy of itself. This is what is known
as the Common Mode Rejection done by the differential input. The ability
of the amplifier to reject the common signals (or mode) is measured as a
ratio between the output and one input leg of the differential amplifier and
is therefore called the Common Mode Rejection Ratio (CMRR).
Having said all that, I want to come back to the fact that I used the
word “theoretical” a little too often in the last paragraph. The amount and
quality of the noise on those two transmission lines (the live and the return)
in the so-called “balanced” wire is dependent on a number of things.
1. The proximity to the noise source. This is what is causing the noise to
wind up on the two wires in the first place. If the source of the noise is quite
near to the receiving wire (for example, in the case of a high-voltage/current
AC cable sitting next to a microphone cable) then the closer wire within our
“balanced” pair will receive a higher level of noise than the more distant wire.
Remember that this is inversely proportional to the square of the distance,
so it can cause a major problem if the AC and mic cables are sitting side
by side. The simplest way to avoid this difference in the noise on the two
wires is to wrap them together. This ensures that, over the length of the
cable, the two internal wires average out to being equally close to the AC
cable and therefore we pick up the same amount of noise – therefore the
differential amplifier will cancel it.
2. The termination impedance of the two wires. In fact, technically
speaking, a balanced transmission line is one where the impedance between
each of the two wires and ground is identical for each end of the transmis-
sion. Therefore the impedance between live and ground is identical to the
impedance between return and ground at the output of the sending device
and at the input of the receiving device. This is not to say that the in-
5. Electroacoustics 334
put and output impedances are matched. They are not. If the termination
impedances are mismatched, then the noise on each of the wires will be
different and the differential amplifier will not be subtracting a signal from
a copy of itself – therefore the noise will get through. Some manufacturers
are aware of this and save themselves some money while still providing you
with a balanced output. Mackie consoles, for example, drive the signal on
the tip of their 1/4” balanced outputs, but only put a resistor between the
ring and ground (the sleeve) on the same output connector. This is still a
balanced output despite the fact that there is no signal on the ring because
the impedance between the tip and ground matches the impedance between
the ring and ground (they’re careful about what resistor they put in there...)
5.5.3 Grounding
The grounding of audio equipment is there for one primary purpose: to keep
you alive. If something goes horribly wrong inside one of those devices and
winds up connecting the 120 V AC from the wall to the box (chassis) itself,
and you come along and touch the front panel while standing in a pool of
water, YOU are the path to ground. This is bad. So, the manufacturers
put a third pin on their AC cables which is connected to the chassis on the
equipment end, and the third pin in the wall socket.
Let’s look at how the wiring inside a wall socket is connected to begin
with. Take a look at Figures 5.49 to 5.53.
Figure 5.49: A typical North American electrical outlet showing the locations of the two spade
connections and the third, round ground pin. Note that the orange cable contains three independent
conductors, each with a different coloured insulator.
The third pin in the wall socket is called the ground bus and is connected
to the electrical breaker box somewhere in the facility. All of the ground
busses connect to a primary ground point somewhere in the building. This
is the point at which the building makes contact with the earth through a
5. Electroacoustics 335
Figure 5.50: The same outlet as is shown in Figure 5.49 with the safety faceplate removed.
Figure 5.52: The beginnings of the inside of the outlet showing the connection of the white and
green wires to the socket. The white wire is at 0 V and is connected in parallel through the brass
plate on the side of the socket to the two larger spades. The green wire is also at 0 V and is
connected in parallel to the round safety ground pin as well as the box that houses the socket. The
third lack wire which is at 120 VRM S is connected to the socket on the opposite side and cannot
be seen in this photograph.
5. Electroacoustics 337
spike or piling called the grounding electrode. The wires which connect these
grounds together MUST be heavy-gauge (and therefore very low impedance)
in order to ensure that they have a MUCH lower impedance than you when
you and it are a parallel connection to ground. The lower this impedance,
the less current will flow through you if something goes wrong.
MUCH MORE TO COME!
Conductor Out From Low Dynamic Range Med DR Med DR High DR High DR
(< 60 dB) (60 to 80 dB) (60 to 80 dB) (> 80 dB) (> 80 dB)
Low EMI High EMI Low EMI High EMI
Ground Electrode 6 2 00 00 0000
Master Bus 10 8 6 4 0
Local Bus 14 12 12* 12* 10*
Max. Resist. for any cable 0.5Ω 0.1Ω 0.01Ω 0.001Ω 0.0001Ω
Table 5.5: Suggested technical ground conductor sizes. This table is from “Audio Systems: Design and Installation” by Philip Giddings (Focal Press,
1990, ISBN 0-240-80286-1). If you’re installing or maintaining a studio, you should own this book. (*Do not share ground conductors – run individual
branch grounds. In all cases the ground conductor must not be smaller than the neutral conductor of the panel it services.)
339
5. Electroacoustics 340
N S
Figure 5.54: The construction of a simple ribbon microphone. The diaphragm is the corrugated
(folded) foil placed between the two poles of the magnet.
There are a couple of small problems with this design. Firstly, the current
that’s generated by one little strip of aluminium that’s getting pushed back
and forth by a sound wave will be very small. So small that a typical
microphone preamplifier won’t have enough gain to bring the signal up to a
useable level. Secondly, consider that the impedance of a strip of aluminium
5. Electroacoustics 342
a couple of centimeters long will be very small, which is fine, except that the
input of the microphone preamp is expecting to “see” an impedance which
is at least around 200Ω or so. Luckily, we can fix both of these problems in
one step by adding a transformer to the microphone.
The output wires from the diaphragm are connected to the primary coil
of a transformer that steps up the voltage to the secondary coil. The result
of this is that the output of the microphone is increased proportionally to the
turns ratio of the transformer, and the apparent impedance of the diaphragm
is increased proportionally to the square of the turns ratio. (See Section 2.7
of the electronics section if this doesn’t make sense.) So, by adding a small
transformer inside the body of the microphone, we kill both birds with one
stone. In fact, there is a third dead bird lying around here as well – we can
also use the transformer to balance the output signal by including a centre
tap on the secondary coil and making it the ground connection for the mic’s
output. (See Chapter 5.5 for a discussion on balancing if you’re not sure
about this.)
That’s pretty much it for the basic design of a ribbon condenser micro-
phone – different manufacturers will use different designs for their magnet
and ribbon assembly. There is an advantage and a couple of disadvantages in
this design that we should discuss at this point. Firstly, the advantage: since
the diaphragm in a ribbon microphone is a very small piece of aluminium,
it is very light, and therefore very easy to move quickly. As a result, rib-
bon microphones have a good high-frequency response characteristic (and
therefore a good transient response). On the contrary, there are a number of
disadvantages to using ribbon microphones. Firstly, you have to remember
that the diaphragm is a very thin and relatively fragile strip of aluminium.
you cannot throw a ribbon microphnone around in a road case and expect
it to work the next day – they’re just too easily broken. Since the output
of the diaphragm is proportional to the its velocity, and since that velocity
is proportional to frequency, the ribbon has a very poor low-frequency re-
sponse. There’s also the issue of noise: since the ribbon itself doesn’t have
a large output, it must be boosted in level a great deal, therefore increasing
the noise floor as well. The cost of ribbon microphones is moderately high
(although not insane) because of the rather delicate construction. Finally,
as we’ll see a little later, ribbon microphones are particularly suceptible to
low-frequency noises caused by handling and breath noise.
5. Electroacoustics 343
Figure 5.55: An exploded view of a coil of wire with a diameter carefully chosen to fit in the circular
gap of a permanent magnet.
Now, when the coil is moved in and out of the magnet, a current is
generated that is proportional to the velocity of the movement. How do we
create this movement? We glue the front of the coil on to a diaphragm made
of plastic as is shown in the cross section in Figure 5.57.
Pressure changes caused by sound waves hitting the front of the di-
aphragm push and pull it, moving the coil in and out of the gap. This
causes the wire in coil to cut perpendicularly through the magnetic lines
of force, thus generating a current that is substantially greater than that
produced by the ribbon in a ribbon microphone.
This signal still need to be boosted, and the impedance of the coil isn’t
high enough for us to simply take the wire connected to the coil and connect
it to the microphone’s output. Therefore, we use a step-up transformer
again, just as we did in the case of the ribbon mic, to increase the sigal
strength, increase the output impedance to around 200Ω, and to provide a
balanced output.
5. Electroacoustics 344
Figure 5.56: A cross section of the same device when assembled. Note that the front of the coil
of wire is attached to the inside of the diaphragm.
Figure 5.57: A moving coil dynamic microphone with the protection grid removed. The “front” of
the microphone shows a second protective layer made of mesh and hard plastic. The diaphragm
and assembly are below this.
5. Electroacoustics 345
Figure 5.58: The underside of the diaphragm showing the copper coil glued to the back of the
diaphragm. This coil fits inside the circular gap in the magnet. See Figure 3a for part labels.
Figure 5.59: The same photograph as Figure 2a with the various parts labeled.
5. Electroacoustics 346
2. In the case of the sealed can, it doesn’t matter which direction the
pressure outside the can is coming from. On a low-pressure day, the
balloon pushes out of the can, and this is true whether the can is
rightside up, on its side, or even upside down.
How will this system behave? Remember that, just like the coffee can
barometer, the back of the diaphragm is effectively sealed so if the pressure
outside the can is high, the diaphragm will get pushed into the can. If the
pressure outside the can is low, then the diaphragm will get pulled out of the
can. This will be true no matter where the change in pressure originated.
(The capillary tube is of a small enough diameter that fast pressure changes
in the audio range don’t make it through into the can, so we can consider
5. Electroacoustics 349
Figure 5.61: A microphone showing various angles of incidence (full marking every 30◦ , small
markings every 10◦ ). Note that we can consider the rotational angle in any plane whereas this
photo only indicates the horizontal plane.
so if we consider that the gain for a sound source that’s on-axis or with an
angle of incidence of 0◦ is normalized to a value of 1, then all other angles of
incidence will be 1 as well. This can be plotted on a cartesian X-Y graph as
is shown in Figure 6. The equation below can be used to calculate the sen-
sitivity for a pressure transducer. (Okay, okay, it’s not much of an equation
– for any angle, the sensitivity is 1...)
SP = 1 (5.8)
where SP is the sensitivity of a pressure transducer.
For any angle, you can just multiply the pressure by the sensitivity for
that angle of incidence to find the voltage output.
Figure 5.62: A Cartesian plot of the sensitivity of a perfect pressure transducer normalized to the
on-axis response.
Most people like to see this in a little more intuitive graph called a polar
plot shown in Figure 7. In this kind of graph, the sensitivity is graphed as
a radius from the centre of the graph at a given angle of rotation.
One thing to note here: most books plot their polar plots with 0◦ pointing
directly upwards (towards 12 o’clock). Technically speaking, this is incorrect
– a proper polar plot starts with 0◦ on the right side (towards 3 o’clock).
This is the system that I’ll be using for all polar plots in this book.
Just for the sake of having a version of the plots that look nice and clean,
Figures 8 and 9 are duplicates of Figures 6 and 7.
Most people don’t call these microphones “pressure transducers” – be-
cause the microphone is equally sensitive to all sound sources regardless of
direction they’re normally called omnidirectional microphones. Some people
shorten this even further and call them omni’s.
5. Electroacoustics 351
Figure 5.63: A polar plot of the sensitivity of a perfect pressure transducer normalized to the on-axis
response. Note that this plot shows the same information as the plot in Figure 6.
Figure 5.64: Cartesian plot of the sensitivity of a perfect pressure transducer normalized to the
on-axis response.
5. Electroacoustics 352
Figure 5.65: Cartesian plot of the sensitivity (in dB referenced to the on-axis sensitivity) of a perfect
pressure transducer normalized to the on-axis response.
Figure 5.66: Polar plot of the sensitivity of a perfect pressure transducer normalized to the on-axis
response. Note that this plot shows the same information as the plot in Figure 8.
5. Electroacoustics 353
Let’s assume for the purposes of this discussion that a movement of the
diaphragm to the left of the resting position somehow magically results in
a positive voltage at the output of this microphone (for more info on how
this miracle actually occurs, read Section 5.6.) Therefore if the diaphragm
moves in the opposite direction, the voltage at the output will be negative.
Let’s also assume that the side of the diaphragm facing the right is called
the “front” of the microphone.
If there’s a sound source producing a high pressure at the front of the
diaphragm, then the diaphragm is pushed backwards and the voltage output
is positive. Positive pressure causes positive voltage. If the sound source
stays at the front and produces a low pressure, then the diaphragm is pulled
frontwards and the resulting voltage output is negative. Negative pressure
causes negative voltage. Therefore there is a positive relationship between
the pressure at the front of the diaphragm and the voltage output – meaning
that, the polarity of the voltage at the output is the same as the pressure
at the front of the microphone.
What happens if the sound source is at the rear of the microphone at an
angle of incidence of 180◦ ? Now, a positive pressure pushes on the diaphragm
from the rear and causes it to move towards the front of the microphone.
Remember from two paragraphs back that this causes a negative voltage
output. Positive pressure causes negative voltage. If the source is in the
rear and the pressure is negative, then the diaphragm is pulled towards the
5. Electroacoustics 354
Figure 5.68: A positive pressure at the front of the microphone moves the diaphragm towards the
back and causes a positive voltage at the output.
Figure 5.69: A positive pressure at the back of the microphone moves the diaphragm towards the
front and causes a negative voltage at the output.
Figure 5.70: A positive pressure at the side of the microphone causes no movement in the diaphragm
and causes 0 volts at the output.
SG = cos(α) (5.9)
where SG is the sensitivity of a pressure gradient transducer and α is the
angle of incidence.
Again, most people prefer to see this in a polar plot as is shown in Figure
5. Electroacoustics 356
Figure 5.71: Cartesian plot of the sensitivity of a pressure gradient transducer. Note that the
negative polarity lobe has been higlighted in red.
Figure 5.72: Cartesian plot of the sensitivity (in dB referenced to the on-axis sensitivity) of a
pressure gradient transducer. Note that the negative polarity lobe has been higlighted in red.
5. Electroacoustics 357
15, however, what most people don’t know is that they’re not really looking
at an accurate polar plot of the sensitivity. In this case, we’re looking at a
polar plot of the absolute value of the cosine of the angle of incidence – if
this doesn’t make sense, don’t worry too much about it.
Figure 5.73: Polar plot of the absolute value of the sensitivity of a pressure gradient transducer.
Blue indicates positive polarity, red indicates negative polarity. Note that this plot shows the same
information as the plot in Figure 14.
Notice that in the graph in Figure 15, the front half of the plot is in blue
while the rear half is red. This is to indicate the polarity of the sensitivity,
so at an angle of 180◦ , the radius of the plot is 1, but because the plot at
that angle is red, it’s -1. At 30◦ , the radius is 0.5 and because it’s blue, then
it’s positive.
Just like pressure transducers are normally called omnidirectional micro-
phones, pressure gradient transducers are usually called either bidirectional
microphones (because they they’re sensitive in two directions – the front
and back) or figure eight microphones (because the polar pattern looks like
the number 8).
There’s one thing that we should get out of the way right now. Many
people see the figure 8 pattern of a bidirectional microphone and jump to
the assumption that the mic has two outputs – one for the front and one
for the back. This is not the case. The microphone has one output and one
output only. The sound picked up by the front and rear lobes is essentially
5. Electroacoustics 358
mixed acoustically and output as a single signal. You cannot separate the
two lobes to give you independent outputs.
Figure 5.74: A microphone that is one half pressure transducer and one half pressure gradient
design. Note that if you build this microphone it probably will not work properly – this is an
approximate drawing for conceptual purposes. The significant things to note here are the vents
that allow some of the pressure changes from the outside world into the back of the diaphragm.
Note, however, that the back of the diaphragm is not completely exposed to the outside world.
In this case, the path to the back of the diaphragm from the outside
world is the same length as the path to the front of the diaphragm when the
sound source is at 180◦ – not 90◦ as in a pure pressure gradient transducer.
This then means that there will be no output when the sound source is at
the rear of the microphone. In this case, the sensitivity pattern is created by
creating a mixture of 50 percent Pressure and 50 percent Pressure Gradient.
Therefore, we’re multiplying the two pure sensitivity patterns by 0.5 and
adding them together. This results in the pattern shown in Figure 17 –
notice the similarity between this pattern and the perfect pressure gradient
sensitivity pattern – it’s just a cosine wave that’s been offset by enough to
eliminate the negative components.
If we plot this sensitivity pattern on a polar plot, we get the graph shown
in Figure 18. Notice that this pattern looks somewhat like a heart shape,
so it’s normally called a cardioid pattern (“cardio” meaning “heart” as in
“cardio-vascular” or “cardio-pulmonary”)
5. Electroacoustics 359
Figure 5.75: Cartesian plot of the sensitivity pattern of a microphone that is one half Pressure and
one half Pressure Gradient transducer.
Figure 5.76: Cartesian plot of the sensitivity (in dB referenced to the on-axis sensitivity) of a
microphone that is one half Pressure and one half Pressure Gradient transducer.
5. Electroacoustics 360
Figure 5.77: Polar plot of the sensitivity pattern of a cardioid microphone (one half Pressure and
one half Pressure Gradient transducer). Note that this plot shows the same information as the plot
in Figure 16. A good rule of thumb to remember about this polar pattern is that the sensitivity is
0.5 (or -6 dB) at 90◦ .
S = P + G ∗ cos(α) (5.10)
where S is the sensitivity of the microphone, P is the Pressure component,
G is the Pressure Gradient component, α is the angle of incidence and where
P + G = 1.
For example, for a microphone that is 50 percent Pressure and 50 percent
Pressure Gradient, the sensitivity equation would be:
S = P + G ∗ cos(α) (5.11)
need to do is to decide how much of each we want to add in the equation. For
a perfect omnidirectional microphone, we make P=1 and PG=0. Therefore
the microphone is a 100 percent pressure transducer and 0 percent pressure
gradient transducer. There are five “standard” polar patterns, although one
of these is actually two different standards, depending on the manufacturer.
The five most commonly-seen polar patterns are:
Polar Pattern P G
Omnidirectional 1 0
Subcardioid 0.75 0.25
Cardioid 0.5 0.5
Supercardioid 0.333 0.666
Hypercardioid 0.25 0.75
Bidirectional 0 1
Table 5.6: INSERT CAPTION HERE
What do these polar patterns look like? We’ve aready seen the omnidi-
rectional, cardioid and bidirectional patterns. The others are as follows.
Figure 5.79: Cartesian plot of the sensitivity (in dB referenced to the on-axis sensitivity) of a
subcardioid microphone.
Figure 5.80: Polar plot of a subcardioid microphone. Notice that the maximum attenuation of 0.5
(or -6.02 dB) is at the rear of the microphone at 180◦ .
5. Electroacoustics 363
Figure 5.81: Cartesian plot of a hypercardioid microphone using the values P=0.25 and PG=0.75.
Figure 5.82: Cartesian plot of the sensitivity (in dB referenced to the on-axis sensitivity) of a
hypercardioid microphone using the values P=0.25 and PG=0.75. Note that the negative polarity
lobe has been higlighted in red.
5. Electroacoustics 364
Figure 5.83: Polar plot of a hypercardioid microphone using the values P=0.25 and PG=0.75.
Notice that the maximum attenuation of 0 (or -infinity dB) is at about 109◦ .
Figure 5.84: Cartesian plot of a supercardioid microphone using the values P=0.333 and PG=0.666
5. Electroacoustics 365
Figure 5.85: Cartesian plot of the sensitivity (in dB referenced to the on-axis sensitivity) of a
supercardioid microphone using the values P=0.333 and PG=0.666. Note that the negative polarity
lobe has been higlighted in red.
Figure 5.86: Polar plot of a supercardioid microphone using the values P=0.333 and PG=0.666.
Notice that the maximum attenuation of 0 (or -infinity dB) is at 120◦ .
5. Electroacoustics 366
Figure 5.87: Most of the standard polar patterns on one Cartesian plot. From top to bottom, these
are omnidirectional, subcardioid, cardioid, hypercardioid, and bidirectional. Note that red sections
of the plot point out the fact that the sensitivity is negative polarity.
One of the interesting things that becomes obvious in this plot is the rela-
tionship between the angle of incidence where the sensitivity is 0 – sometimes
called the null because there is no output – and the mixture of the Pressure
and Pressure Gradient components. All mixtures between omnidirectional
and cardioid have no null because there is no angle of incidence that results
in no output. The cardioid microphone has a single null at 180◦ , or, directly
to the rear of the microphone. As we increase the mixture to have more and
more Pressure Gradient component, the null splits into two symmetrical
points on the polar plot that move around from the rear of the microphone
to the sides until, when the transducer is a perfect bidirectional, the nulls
are at 90 and 270◦ .
make any polar pattern you wanted just by modifying the relative levels of
the two signals.
Mathematically speaking, the output of the omnidirectional microphone
is the Pressure component and the output of the Bidirectional Microphone
is the Pressure Gradient component. The two are just added in the mixer
so you’re fulfilling the standard sensitivity equation:
S = P + G ∗ cos(α) (5.13)
where P is the gain applied to the omnidirectional microphone and G is
the gain applied to the bidirectional microphone.
Also, let’s say that you have two cardioid microphones, but that you
put them in a back-to-back configuration where the two are pointing 180◦
away from each other. Let’s look at this pair mathematically. We’ll call
microphone 1 the one pointing “forwards” and microphone 2 the second
microphone pointing 180◦ away. Note that we’re also assuming for a moment
that the gain applied to both microphones is the same.
ST OT AL = 1 (5.21)
Therefore, the result is an omnidirectional microphone. This result is
possibly easier to understand intuitively if we look at graphs of the sensitivity
patterns as is shown in Figures 26 and 27.
Figure 5.88: Cartesian plot of the sensitivity patterns of two cardioid microphones aimed 180◦
apart. The blue plot is the forward-facing cardioid, the green is the rear-facing cardioid. Note that,
if summed, the resulting output would be 1 for any angle.
Figure 5.89: Polar plot of the sensitivity patterns of two cardioid microphones aimed 180◦ apart.
Note that, if summed, the resulting output would be 1 for any angle.
5. Electroacoustics 369
ST OT AL = cos(α) (5.26)
So, as you can see, not only is it possible to create any microphone polar
pattern using the summed outputs of a bidirectional and an omnidirectional
microphone, it can be accomplished using two back-to-back cardioids as well.
Of course, we’re still assuming at this point that we’re living in a perfect
world where all transducers are matched – but we’ll stay in that world for
now...
Figure 5.90: The frequency response of a perfect Pressure transducer. Note that all frequencies
have equal output assuming that the peak value of the pressure wave is the same at all frequencies.
Figure 5.91: A diagram of a Pressure Gradient transducer showing the two paths to the front and
rear of the diaphragm from a source on axis.
Since the sensitivity at the rear of the diaphragm has a negative polarity
and the front has a positive polarity, then the result is that the pressure at
5. Electroacoustics 371
Figure 5.92: A linear plot of a comb filter caused by the interference of the pressures at the front
and rear of a Pressure Gradient transducer. The harmonic relationship between the peaks and dips
in the frequency response is evident in this plot.
Figure 5.93: A semi-logarithmic plot of a comb filter caused by the interference of the pressures at
the front and rear of a Pressure Gradient transducer. The 6 dB/octave rise in the response up to
the lowest-frequency peak is evident in this plot.
5. Electroacoustics 373
Figure 5.94: The output of a Pressure Gradient transducer whose design ensures that the entire
audio range lies below the lowest-frequency peak in the frequency response.
will ring at a very low note. The higher the frequency, the further you get
from the peak of the resonance. This resonance acts like a filter that has a
gain that increases by 6 dB for every halving of frequency. Therefore, the
lower the frequency, the higher the gain. This counteracts the rising natural
slope of the diaphragm’s output and produces a theoretically flat frequency
response. The only problem with this is that, at very low frequencies, there
is almost no output to speak of, so we have to have enormous gain and the
resulting output is basically nothing but noise.
The moral of this story is that Pressure Gradient microphones have no
very low frequency output. Also, keep in mind that any microphone with
a Pressure Gradient component will have a similar response. Therefore, if
you want to record program material with low frequency content, you have
to stick with omnidirectional microphones.
Figure 5.95: The blue plot shows the gain response of a theoretical filter required to “fix” the
frequency response of the transducer shown in Figure 32. Note the extremely high gain required in
the low frequency range. The red plot shows the gain achieved by making the diaphragm naturally
resonant at a low frequency. Note that there is a bottom limit to the benefits of the resonance.
Figure 5.96: The blue plot shows the result of the frequency response of the output of the transducer
shown in Figure 32 filtered using the theoretical (blue) gain response plotted in Figure 33. Note
that this is a theoretical result that does not take real life into account... The red plot shows a
more likely scenario where the extra gain provided by the resonance doesn’t extend all the way
down to 0 Hz.
5. Electroacoustics 375
outside world. Also, consider a little rule of thumb that says that, in a free,
unbounded space, the pressure of a sound wave is reduced by half for every
doubling of distance. The implication of this rule is that, if you’re very close
to a sound source, a small change in distance will result in a large change in
sound level. At a greater distance from the sound source, the same change
in distance will result in a smaller change in level. For example, if you’re 1
cm from the sound source, moving away by 1 cm will cut the level by half,
a drop of 6 dB. If you’re 1 m from the sound source, moving away by 1 cm
will have a negligible effect on the sound level. So what?
Imagine that you have a pressure gradient microphone that is placed
very close to a sound source. Consider that the distance from the sound
source (say, a singer’s mouth...) to the front of the diaphragm will be on
the order of millimeters. At the same time, the distance to the rear of the
diaphragm will be comparatively very far – possibly 4 to 8 times the distance
to the singer’s lips. Therefore there is a very large drop in pressure for the
sound wave arriving at the rear of the diaphragm. The result is that the
rear of the diaphragm is effectively sealed from the outside world by virtue
of the fact that the sound pressure level at that side of the diaphragm is
much lower than that at the front. Consequently, the natural frequency
response becomes more like a pressure transducer than a pressure gradient
transducer.
What’s the problem? Well, remember that the microphone has a filter
that boosts the low end built into it to correct for problems in the natural
frequency response – problems that don’t exist when the microphone is close
to the sound source. As a result, when the microphone is very close to the
source, there is a boost in the low frequencies because the correction filter
is applied to a now natually flat frequency response. This boost in the low
end is called proximity effect because it is caused by the microphone being
in close proximity to the sound source.
There are a number of microphones that rely on the proximity effect
to boost the low frequency components of the signal. These are typically
sold as vocal mic’s such as the Shure SM58. If you measure the frequency
response of such a microphone from 1 m away, then you’ll notice that there
is almost no low-end output. However, in typical usage, there is plenty of
low end. Why? Because, in typical usage, the microphone is stuffed in the
singer’s mouth – therefore there’s lots of low end because of proximity effect.
Remember, when the microphone has a pressure gradient component, the
frequency response is partially dependent on the distance to the diaphragm.
Also remember that, for some microphones, you have to be placed close
to the source to get a reasonably low frequency response, whereas other
5. Electroacoustics 376
microphones in the same location will have a boosted low frequency response.
1. An omni has the greatest sum of sensitivities to sounds from all direc-
tions, therefore it has the highest RER of all polar patterns.
Table 5.8: Random Energy Responses for various microphone polar patterns.
14
12
10
8
RER
6
0
0 0.2 0.4 0.6 0.8 1
P
Figure 5.97: Random Energy Response vs. the Pressure component, P in the microphone.
5. Electroacoustics 379
RER
REE = (5.28)
RERomni
So, as we can see, all we’re doing is calculating the ratio of the micro-
phone’s RER to that of an omni. As a result, the lower the RER, the lower
the REE.
This value can be expressed either as a linear value, or it can be calcu-
lated in decibels using Equation 5.29.
0.8
0.6
REE
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
P
Figure 5.98: Random Energy Efficiency vs. the Pressure component, P in the microphone.
-2
REE (dB)
-4
-6
-8
0 0.2 0.4 0.6 0.8 1
P
Figure 5.99: Random Energy Response on a decibel scale vs. the Pressure component, P in the
microphone.
5. Electroacoustics 381
3
DRF
2
0
0 0.2 0.4 0.6 0.8 1
P
Figure 5.100: Directivity Factor vs. the Pressure component, P in the microphone.
1. if you move away from a sound source in a real room, the direct sound
will drop by 6 dB per doubling of distance
2. if you move away from a sound source in a real room, the reverberant
sound will not change. This is true, even inside the room radius. The
only reason the level drops inside this area is because the direct sound
is much louder than the reverberant sound.
3. the relative balance of the direct sound and the reverberant sound is
dependent on the DRF of the microphone’s polar pattern.
√
DSF = DRF (5.31)
2.5
1.5
DSF
1
0.5
0
0 0.2 0.4 0.6 0.8 1
P
Figure 5.101: Distance Factor vs. the Pressure component, P in the microphone.
So, what does this mean? Have a look at Figure 5.102. All of the mi-
crophones in this diagram will have the same direct-to-reverberant outputs.
The relative distances to the sound source have been directly taken from
Table 5.10 as you can see...
r=2
r=1 r=1.31 r=1.73
r=1.92
subcard super
card
Figure 5.102: Diagram showing the distance factor in practice. In theory, all of the outputs of
these microphones at these specific distances from the sound source will all have the same direct-
to-reverberant ratios.
5. Electroacoustics 385
5.8.1 Introduction
Loudspeakers are basically comprised of two things:
1. one or more drivers to push and pull the air in the room
2. an enclosure (fancy word for “box”) to make the driver sound and
look better. This topic is discussed in Section 5.9
• Ribbon
• Moving Coil
2. Electrostatic
N S
Figure 5.103: When you put current in the wire, a magnetic field is created around it and the
ribbon. Therefore the ribbon moves.
Advantages
• The ribbon has a low mass, therefore it’s good for high frequencies.
Disadvantages
• You can’t make it very large, or have a very large excursion (it’ll break
apart) so it’s not good for low frequencies or high sound pressure levels.
• The impedance of the driver is very low (because it’s just a piece of
aluminum) – so it may be a nasty load for your amplifier, unless you
look after this using a transformer.
Figure 5.104:
In order for the system to work well, you need reasonably strong magnetic
fields. The easiest way to do this is to use a really strong permanent magnet.
You could also improve the packing density of the voice coil. This essentially
means putting more metal in the same space by changing the cross-section
of the wire. The close-ups shown below illustrate how this can be done. The
third (and least elegant) method is to add more wire to the coil. We’ll talk
about why this is a bad idea later.
5. Electroacoustics 388
Figure 5.105:
surround
diaphragm
dome
(dust cap)
basket
spider
voice coil
magnet
former
Figure 5.107: A diaphragm is glued to the front (this side) of the coil (called the voice coil. It has
two basic purposes: 1) to push the air and 2) to suspend the coil in the magnetic field
Figure 5.108: Voice coil using wire with a round cross section. This is cheap and easy to make, but
less efficient.
Figure 5.109: Voice coil using wire with a flat cross section. This has greater packing density,
producing a stronger magnetic field and is therefore more efficient.
5. Electroacoustics 390
Advantages
• They are pretty rugged (that’s to say that they can take a lot of
punishment – not that they look nice if they’re carpeted...)
Disadvantages
• There’s a big hunk of metal (the voice coil) that you’re trying to move
back and forth. In this case, inertia is not your friend. The more en-
ergy you want to emit (you’ll need lots in low frequencies) the bigger
the coil – the bigger the heavier – the heavier, the harder to move
quickly – the harder to move quickly, the more difficult it is to pro-
duce high frequencies. The moral: a driver (single coil and diaphragm
without an enclosure) can’t effectively produce all frequencies for you.
It has to be optimized for a specific frequency range.
• They have funny-looking impedance curves – but we’ll talk about that
later.
5. Electroacoustics 391
Figure 5.110: Put three conductive plates side by side (just like in a capacitor, but with an extra
plate). Have the two outside plates fixed, but the middle one suspended so that it can move (shown
by the little arrow at the bottom of the diagram.
from each other, the inside plate will move towards the plate of more opposite
polarity.
Figure 5.111: If the middle plate is very positive and the outsite plates are slightly positive and
slightly negative relative to each other, the middle plate will be attracted to (and be pulled towards)
the more negative plate while it is repelled (and therefore pushed away from) the more positive
plate. This system is known as push-pull for obvious reasons.
If we then perforate the plates (or better yet, make them out of a metal
mesh) the movement of the inside plate will cause air pressure changes that
radiate through the holes into the room.
There are (as expected) advantages and disadvantages to this system.
Disadvantages
- In order for this system to work, you need enough charge to create an
adequate attraction and repulsion. This requires one of two things:
5. Electroacoustics 393
- a very big polarizing voltage on the middle plate (on the order of 5000
V). Most electrostatics use a hugh polarizing voltage (hence the necessity in
most models to plug them into your wall and your amplifier.
- a very small gap between the plates. If the gap is too small, you can’t
have a very big excursion of the diaphragm, therefore you can’t produce
enough low frequency energy without a big diaphragm (it’s not unusual to
see electrostatics that are 2m2 .
- Starting prices for these things are in the thousands of dollars.
Advantages
- The diaphragm can be very thin, making it practically massless – or at least
approaching the mass of the air it’s moving. This in turn, gives electrostatic
loudspeakers a very good transient response due to an extremely high high-
frequency cutoff.
- You can see through them (this impresses visitors)
- They have symmetrical distortion – we’ll talk a little more about this
in the following section on enclosures.
Electrical Impedance
Hmmmmmm... loudspeaker impedance... We’re going to only look at mov-
ing coil loudspeakers. (If you want to know more about this – or anything
about the other kinds of drivers, get a copy of the Borwick book I mentioned
earlier.)
We’ll begin by looking at the impedance of a resistor:
Figure 5.113: The higher the frequency, the lower the impedance.
Figure 5.114: The higher the frequency, the higher the impedance.
5.9.1 Enclosures
- Enclosure Design (i.e. “what does the box look like?”) of which there are
a number of possibilities including the following:
- Dipole Radiator (no enclosure) - Infinite Baffle - Finite Baffle - Folded
Baffle - Sealed Cabinet (aka Acoustic Suspension) - Ported Cabinet (aka
Bass Reflex)
- Horns (aka Compression Driver)
Dipole Radiator
(no enclosure)
aka a Doublet Radiator
Since both sides of a dipole radiator’s diaphragm are exposed to the out-
side world (better known as your listening room) there are opposite polarity
pressures being generated in the same space (albeit at different locations)
simultaneously. As the diaphragm moves “outwards” (i.e. towards the lis-
tener), the resulting air pressure at the “front” of the diaphragm is positive
while the pressure at the back of the diaphragm is of equal magnitude but
opposite polarity.
Infinite Baffle
The biggest problem with a dipole radiator is caused by the fact that the
energy from the rear of the diaphragm reaches the listener at the front of
the diaphragm. The simplest solution to this issue is to seal the back of the
diaphragm as shown.
5. Electroacoustics 398
Finite Baffle
Instead of trying to build a baffle with infinte dimensions, what would hap-
pen if we mounted the diaphragm on a circular baffle of finite dimensions as
5. Electroacoustics 399
shown below?
Now, the circular baffle causes the energy from the rear of the driver to
reach the listener with a given delay time determined by the dimensions of
the baffle. This causes a comb-filtering effect with the first notch happening
where the path length from the rear of the driver to the front of the baffle
equals one wavelength as shown below:
One solution to this problem is to create multiple path lengths by us-
ing an irregularly-shaped baffle. This causes mutiple delay times for the
pressure from the rear of the diaphragm to reach the front of the baffle.
The technique will eliminate the comb filtering effect (no notches above the
transition frequency) but we still have a 6 dB/octave slope in the low end
(below the transisition frequency). If the baffle is very big (approaching
infinite relative to the wavelength of the lowest frequencies) then the slop in
the low end approaches 12 dB/octave – essentially, we have the same effect
as if it were an infinite baffle.
If the resonant frequency of the driver, below which the roll-off occurs at
a slope of 6 dB/octave, is the same as (or close to) the transition frequency
of the baffle, the slope becomes 12 or 18 dB/octave (dependent on the size
of th baffle – the slopes add). The resonant frequency of the driver is the
rate at which the driver would oscillate if your thumped it with your finger
– though we’ll talk about that a little more in the section on impedance.
What do you do if you don’t have enough room in your home for big
baffles? You fold them up!
5. Electroacoustics 400
Figure 5.120: The natural frequency response of a radiator mounted in the centre of a circular finite
baffle. The transition frequency is where the extra path length from the rear of the diaphragm to the
listener is one half of a wavelength. Note that the plotted transition frequency is unnaturally high
for a real loudspeaker. Also, compare this frequency response to the natural frequency response of
a bidirectional microphone shown in Figure XXX in Section 5.7.
Folded Baffle
(aka Open-back Cabinet)
A folded baffle is essentially a large flat baffle that has been “folded”
into a tube which is open on one end (the back) and sealed by the driver at
the other end as is shown below.
The fact that there’s a half-open tube in the picture causes a nasty
resonance (see the section on resonance in the Acoustics section of this
textbook for more info on why this happens) at a frequency determined by
the length of the tube.
5. Electroacoustics 401
Sealed Cabinet
(aka Acoustic Suspension)
We can eliminate the resonance of an open-back cabinet by sealing it up,
thus turning the tube into a box.
Now the air sealed inside the enclosure acts as a spring which pushes
back against the rear of the diaphragm. This has a number of subsequent
effects:
- increased resonant frequency of the driver
- reduced efficiency of the driver (because it has to push and pull the
“spring” in addition to the air in the room)
- non-symmetrical distortion (because pushing in on the “spring” has a
different response than pulling out on it)
We’re throuwing away a lot of energy as in the case of the infinite baffle
because the back of the driver is sealed off to the world.
Usually the enclosure has an acoustic resonant frequency (sort of like
little room modes) which is lower than the (now raised...) resonant frequency
of the driver. This effectively lowers the low cutoff frequency of the entire
system, below which the slope is typically 12 dB/octave.
Ported Cabinet
(aka Bass Reflex)
5. Electroacoustics 402
You can achieve a lower low cutoff frequency in a sealed cabinet if you
cut a hole in it – the location is up to you (although, obviously, different
locations will have different effects...)
Horns
(aka Compression Drivers)
The efficiency of a system in transferring power is determined by how
the power delivered to the system relates to the power received in the room.
For example, if were to send a signal to a resistor from a function generator
with an output impedance of 50Ω, I woudl get the most power dissapation
in the resistor if it had a value of 50Ω. Any other value would mean that
less power would be dissapated by the device. This is because its impedance
5. Electroacoustics 403
plug with hole drilled in it (the white lines). Each hole is the same length,
ensureing that there is no interference, either constructive or destructive,
caused by multiple path lengths from the diaphragm to the horn.
Multiple drivers
We said much earlier that drivers are usually optimized for specific frequency
bands. Therefore, an easy way to get a wide-band loudspeaker is to use
multiple drivers to cover the entire audio range. Most commonly seen these
days are 2-way loudspeakers (meaning 2 drivers, usually in 1 enclosure),
although there are many variations on this.
If you’re going to use multiple drivers, then you have some issues to
contend with:
1 – Crossovers
- You don’t want to send high frequencies to a low frequency driver (aka
woofer or, more destructively, low frequencies to a high-frequency driver
(aka tweeter ). In order to avoid this, you filter the signals getting sent to
each driver – the woofer’s signal is routed through a low-pass filter, while
the tweeter’s is filtered using a high-pass. If you are using mid-range drivers
as well, then you use a band-pass filter for the signal. This combination of
filters is known as a crossover.
Most filters in crossovers have slopes of 12 or 18 dB/octave (although
it’s not uncommon to see steeper slopes) and have very specific designs to
minimize phase distortion around the crossover frequencies (the frequencies
where the two filters overlap and therefore the two drivers are producing
roughtly equal energy) This is particularly important because crossover fre-
quencies are frequently (sorry... I couldn’t resist at least one pun) around
the 1 – 3 kHz range – the most sensitive band of our hearing.
There are two basic types of crossovers, active and passive.
Passive crossovers
These are probably what you have in your home. Your power amplifier
sends a full-bandwidth signal to the crossover in the loudspeaker enclosure
which, in turn, sends filtered signals to the various drivers. This system
has a drawback in that it wastes power from your amplifier – anything that
is filtered out is lost power. “Audiophiles” (translation: “people with too
much disposable income who spend more time reading reviews of their audio
equipment than they do actually listening to it”) also complain about issues
like back EMF and cabinet vibrations which may or may not affect the
crossover. (I’ll believe this when I see some data on it...). The advantage
of passive crossovers is that they’re idiot-proof. You plug in one amplifier
to one speaker input and the system works and works well. Also – they’re
inexpensive.
5. Electroacoustics 406
Active crossovers
These are filters that preceed the power amplifier in the audio chain (the
amplifier is then directly connected to the individual driver). They are more
efficient, since you only amplify the required frequency band for each driver
– then again, they’re more expensive because you have to buy the crossover
plus extra amplifiers.
2 – Crossover Distortion
There is a band of frequencies around the cuttoffs of the crossover filters
where the drivers overlap. At this point you have roughly equal energy
being emitted by at least two diaphragms. There is an acoustic interaction
between these two which must be considered, particularly because it is in
the middle of the audio range usually.
In order to minimize problems in this band, the drivers must have
matched distances to the listener, otherwise, you’ll get comb filtering due to
propogation delay differences. This so-called time aligning can be done in
one of two ways. You can either build the enclosure such that the tweeter
is set back into the face of the cabinet, so its voice coil is vertically aligned
with the voice coil of the deeper woofer. Or, alternatelym you can use an
electronic delay to retard the arrival of the signal at the closer driver by an
appropriate amount.
3 – Interaction between drivers
Remember that ther air inside the encolsure acts like a spring which
pushes and pulls the drivers contrary to the direction you want them to
move in. Imagine a woofer moving into a cabinet. This increases the air
pressure inside the enclosure which pushes out against both the woofer AND
the tweeter. This is bad thing. Some manufacturers get around this problem
by putting divided secctions inside the cabinet – others simply build seperate
cabinets – one for each driver (as in the B and W 801).
Electrical Impedance
Figure 5.127: The higher the frequency, the lower the impedance.
Figure 5.128: The higher the frequency, the higher the impedance.
5. Electroacoustics 408
5.10.1 Introduction
5. Electroacoustics 410
Chapter 6
Electroacoustic
Measurements
411
6. Electroacoustic Measurements 412
mV DC
mV AC
mA DC
mA AC
kΩ
V
V Cont.
Ω A
Off A
VΩ
A Com.
Figure 6.1: A cheap digital multimeter. This thing is about the size of a credit card, not much
thicker and cost around $20Cdn.
6. Electroacoustic Measurements 413
Usually, the last meter you’ll be guaranteed is a great little tool called a con-
tinuity tester . This is essentially a simplified version of a resistance meter
where the only output of the meter is a “beep” when there is a low-resistance
connection between the two leads.
Some fancier (and therefore more expensive) units will also include a
diode tester and a meter that can measure capacitance.
There are a couple of things to be concerned about when using one of
these devices.
First, we’ll just look a the basic operation. Let’s say that you wanted
to measure the DC voltage between two points on your circuit. You set the
DMM to a DC Voltage measurement and touch the two leads on the two
points. Be careful not to touch more than one point on the circuit with each
lead of the DMM. This could do some damage to your circuit... Note that
one of the leads is labelled “ground” which does not necessarily mean that
it’s actually connected to ground in your circuit. All it really means is that,
for the DMM, that particular lead is the reference. The voltage displayed is
the voltage of the other lead relative to the “ground” lead.
When you’re measuring voltage, the two leads of the DMM theoretically
have an infinite impedance between them. This is necessary becauuse, if they
didn’t, you’d change the voltage that you were measuring by measuring it.
For example, let’s pretend that you have two identical resistors in series
across a voltage source. If your DMM had a small impedance across the two
leads, and you used it to measure the voltage across one of the resistors,
then the small impedance of the DMM would be in parallel with the resistor
being measured. Consequently, the voltage division will change and you
won’t get an accurate measurement. Since the impedance between the leads
is infinityΩ (or at least, pretty darned close...), when it’s in parallel with a
resistor, it’s as if it’s not there at all, and it consequently has no effect on
the circuit being measured.
If you’re measuring current, then you have to use the DMM a little
differently. Whereas a voltage measurement is performed by putting the
DMM in parallel with the resistor being measured, a current measurement
is performed by putting the DMM in series with the circuit. This ensures
that the current goes through the DMM on it’s way through the circuit.
Also, as a result, it’s good to remember that the impedance across the leads
for a current measurement is 0Ω – otherwise, it would change the current in
the circuit and you’d get an inaccurate measurement.
Now, there’s an important thing to remember here – when the DMM is
in voltage measuring mode, the impedance is infinity. When it’s in current
measuring mode, it’s 0Ω. Therefore, if you are measuring current, and
6. Electroacoustic Measurements 414
you accidently forget to set the meter back to voltage measuring mode to
measure the voltage across a power supply or a battery, you’ll probably
melt something. Remember that power supplies and batteries don’t like
being shorted out with a 0Ω impedance – it means that you’re asking for
infinite current...
Another thing: a DMM measures resistance (and continuity) by putting
a small voltage difference across its leads and measuring the resulting current
through it. The smaller the current, the bigger the resistance. This also
means, however, that your DMM in this case, is its own little power supply
– so you don’t want to go poking at a circuit to measure resistors without
lifting the resistor out – one leg of the resistor will do.
One last thing to remember: the AC measurements in a typical DMM
are supposed to be RMS measurements. In fact, they’re pseudo-RMS, so
you can’t usually trust them. Remember from way back that the RMS of
a sine wave is about 0.707 of its peak value? Well, a DMM assumes that
this is always the case – so it measures the peak value, multiplies by 0.707
and displays that number. This is fine if all you ever do is measure sine
waves (like most electricians do...) but we audio types deal with waveforms
other than sinusoids, so we need to use something a little more accurate.
Just remember, if you’re using a DMM and it’s not a sine wave, then you’re
looking at the wrong answer.
A small thing that you can usually ignore: a DMM is only accurate (such
as it is...) within a limited frequency range. Don’t expect a $10 DMM to
be able to give you a reliable reading on a 30 kHz sine tone.
This may seem a little silly at first, but it’s actually really useful for doing
noise measurements. For example, if you want to measure the noise output
of a microphone preamp, you’ll have to do a true RMS measurement, but
you’re not really worried about noise above, say, 100 kHz (then again, maybe
you are... I don’t know...). If you just did the measurement without a low
pass filter, you’d get a much bigger value than you’ll like, and one that
isn’t representative of the audible noise level. So, in this case, the low pass
filter is your friend. If your true RMS meter doesn’t have one, you might
want to build one out of a simple RC circuit. Just remember, though, that if
you’re stating a noise measurement that you did, include the LPF magnitude
characteristic with your values.
6.1.3 Oscilloscope
Digital multimeters and true RMS meters are great for doing a quick mea-
surement of a signal and getting a single number to describe it – the problem
is that you really don’t know all that much about the signal’s shape. For
example, if I handed you two wires and asked you to give me a measurement
of their voltage difference with a true RMS meter, you’d tell me one, maybe
two numbers (the AC and DC components can be measured separately, re-
member...). But you don’t know if it’s a sine wave or a square wave or a
Britney Spears tune.
in order to see the shape of the voltage waveform, we use an oscilloscope.
This is a device which can be seen in the background of every B science fiction
film ever made. All evil scientists and aliens seem to have oscilloscopes in
their labs measuring sine waves all the time.
An oscilloscope displays a graph showing the voltage being measured as
it changes in time. It shows this by sweeping a bright little dot across the
screen from left to right and moving it up when the voltage is positive and
down when it’s negative.
For the remainder of this discussion, keep referring back to the drawing
of an oscilloscope in Figure 6.2. The first thing to notice is that the display
screen is divided into squares – 10 across and 8 down. These squares are
called divisions and each is further divided into five sections using little tick
marks on the centre lines – the X- and Y-axes. As we’ll see, these divisions
are our only reference to the particular characteristics of the waveform.
You basically have control over two parameters on a ‘scope – the speed
of the dot as it goes across the screen and the vertical magnification of the
voltage. There are a couple of extra controls that we’ll look at one by one.
We’ll do this section by section on the panel in Figure 6.2.
6. Electroacoustic Measurements 416
Input AC Input AC
x10 Magnify
DC DC
Ground Ground
Power
This section is simple – it contains the Power switch – the on/off button.
Screen
Typically, there is a section on the oscilloscope that lets you control some
basic aspects of the display like the Intensity of the beam – a measure of
its brightness, the focus and the screen Illumination. In addition, there is
also probably a small hole showing the top of a screwdriver potentiometer.
If this is labeled, it is called the trace rotation. This controls the horizontal
angle of the dot on the screen and shouldn’t need to be adjusted very often
– once a month maybe.
Time / div.
The horizontal speed of the dot is controlled by the knob marked Time/div
which stands for “Time per Division.” For example, when this is set to 1
second, it means that the dot will move from left to right, taking 10 seconds
to move across the screen (because there are 10 divisions and the dot is
traveling at a rate of 1 division per second). As the knob is turned to
smaller and smaller divisions of a second, the dot moves faster and faster
until it is moving so fast that it appears that there is a line on the screen
rather than a single dot.
You’ll notice on the drawing that there area actually two knobs – one
6. Electroacoustic Measurements 417
gray and one red. The gray one is the usual one. The red one is rarely
used so it’s normally in the off position. If it’s turned on, it uncalibrates the
larger knob, meaning that if the time-per-division number can no longer be
trusted. This is why, when the red knob is turned on, a red light comes on
on the front panel next to the word “Uncalibrate,” warning the user that
any time readings will be incorrect.
Channel 1
Most oscilloscopes that you’ll see have two input channels which can be
used to make completely independent measurements of two different signals
simultaneously. We’ll just look at one of the channel’s controls, since the
second one has identical parameters.
To begin with, the input is a BNC connector, shown on the lower left
of the panel. This is normally connected to either of two things. The first
is a probe, which has an alligator clip for the ground connection (almost
always black) and a pointy part for the probe that you’ll use to make the
measurement. The second is somewhat simpler – it’s just two alligator clips,
one black one for the ground and one red one for the probe.
The large knob on this panel is marked “V/div” which stands for “Volts
per Division.” This is essentially a vertical magnification control, telling you
how many divisions the green dot on the screen will move vertically for a
given voltage. For example, when this knob is set to 100 mV per division
and the dot moves upwards by 2 divisions, then the incoming voltage is 200
mV. A downwards movement of the dot indicates a negative voltage.
This knob can also be uncalibrated using the smaller red knob within it
which is normally turned off. Again, there is a warning light to tell you not
to trust the display when this knob is turned on.
Moving to the lower right side of the panel, you’ll see a toggle switch
that can be used to select three different possibilities – AC, DC and Ground.
These don’t mean what your intuition would think, so we’ll go through them.
When the toggle is set to Ground, the dot displays the vertical location of
0 V. Since this vertical position is adjustable using the knob just above the
toggle switch, it can be set to any location on the screen. Normally, we
would set the toggle to Ground which results in a flat horizontal line on
the display, then we turn the Vertical position so that the line is perfectly
aligned with the centre line on the screen. This means that you have the
same positive and negative voltage ranges. If you’re looking at a DC voltage,
or an AC voltage with a DC component, then you may want to move the
Ground position to a more appropriate vertical location.
6. Electroacoustic Measurements 418
When the toggle is set to DC, the dot displays the actual instantaneous
voltage. This is true whether the signal is a DC signal or an AC signal. Note
that this does not mean that you’re looking at just the DC component of an
AC signal. If you have a 1 VRM S sine wave with a -1 V DC offset coming
into the oscilloscope and if it’s set to DC mode, then what you see is what
you get – both the AC and the DC components of the signal.
When the toggle is set to AC, the dot displays the AC component of the
waveform, omitting the DC offset component. So, if you try to measure the
voltage of a battery using an oscilloscope in AC mode, then you’ll see a value
of 0 V. However, if you’re measuring the ripple of a power supply rail without
seeing the DC component, this mode is very useful. For example, let’s say
that you’ve got an 18 V rail with a 10 mV of ripple. If the oscilloscope is in
DC mode and you set the V/div knob to have a high magnification so that
you can accurately measure the 10 mV AC, then you can’t see the signal
because it’s somewhere up in the ceiling. If you decrease the magnification
so that you can see the signal, it looks like a perfectly flat line because 10
mV AC is so much smaller than 18 V DC. So, you set the mode to AC –
this removes the 18 V DC component and lets you look at only the 10 mV
AC ripple component at a high magnification.
Trigger
Let’s say that you’ve got a 1 kHz sine wave that you’re measuring with
the ‘scope. The dot on the screen just travels from left to right at a speed
determined by the user. when it gets to the right side of the screen, it
just starts on the left side again. Chances are that this rotation doesn’t
correspond in any way to the period of the signal. Consequently, the sine
wave appears to move on the screen because every time the trace starts, it
starts at a new point in the wave. In order to prevent ourselves from getting
seasick or at least a headache, we have to tell the oscilloscope to start the
trace at the same point in the waveform every time. This is controlled by
the trigger. If we set the trigger to a level of 0 V, then the trace will wait
until the signal crosses the level of 0 V before moving across the screen.
This works great, but we need one more control – the slope. If we just set
the trigger level to 0 V and hoped for the best, then sometimes the sine
wave would start off on the left side at 0 V heading in the positive direction,
sometimes at 0 V in the negative direction. This would result in a shape
made of two sine waves, each a vertical mirror image of the other. Therefore,
we set the trigger to a particular slope – for example, 0 V heading positive.
We can also select which signal is sent to the triggering circuit. This is
6. Electroacoustic Measurements 419
done using the toggle switch at the bottom of the panel. The user can select
either the signal coming into Channel 1, Channel 2 or a separate External
input connected to the BNC connector right on the trigger panel.
View
Finally, there is a panel that contains the controls to determine what you’re
actually looking at on the screen. There is a knob that permits you to select
between looking at either Channel 1, Channel 2 or both simultaneously.
There are two options for seeing both signals – Chop and Alternate. There
is a slight difference between these two.
In actual fact, there is only one dot tracing across the screen on the
‘scope, so it cannot display both signals simultaneously. It can, however,
do a pretty good job of fooling your eyes so that it looks like you’re seeing
both signals at once. When the ‘scope is in Alternate mode, the dot traces
across the screen showing Channel 1, then it goes back and traces Channel
2, alternating back and forth. If the horizontal speed of hte trace is fast
enough, then two lines appear on the screen. This works really well for fast
traces used for high frequency signals, but if the trace is moving very slowly,
then it takes too long to wait for the other channel to re-appear.
Consequently, when you’re using slower trace speeds for lower frequen-
cies, you put the scope in Chop mode. In this case, the dot moves across
the screen, hopping back and forth between the Channel 1 and 2 inputs, es-
sentially showing half of each. This doesn’t work well for high trace speeds
because the display gets too dim because you’re only showing half of the
trace at any given time.
In addition, the View panel also has a couple of other controls. One is
a knob which allows you to change the Horizontal position of the traces. In
fact, if you turn this knob enough, you should be able to move the trace
right off the screen.
The other control is a magnification switch which is rarely used. This
is usually a x 10 Magnification which multiplies the speed of the trace by a
factor of 10. Typically, this magnified trace will be shown in addition to the
regular trace.
this mode is selected, the vertical movement of the dot is determined by the
Channel 1 input in a normal fashion. The difference is that the horizontal
movement of the dot is determined by the Channel 2 input, positive voltage
moving the dot to the right, and negative to the left. This permits you
to measure the phase relationship of two signals using what is known as a
Lissajous pattern. This is discussed elsewhere in this book.
The last thing to talk about with oscilloscopes is a minor issue related to
grounding. Although the signal input to the ‘scope has an infinite impedance
to ground, its ground input is directly connected to the third pin in the AC
connector for the device. This means that, if you’re using it to measure
typical pro audio gear, the ground on the ‘scope is likely already connected
to the ground on the gear through your AC power connections (unless your
gear has a floating ground). Consequently, you can’t just put the ground
connection anywhere in your circuit as you can with a free-floating DMM.
Function
On the far right of the panel is a section marked “Function” which contains
the controls for various components of the waveform. In the top middle are
three buttons for selecting between a square, triangle or sine wave. To the
left of this is a button which allows the user to invert the polarity of the
signal. On the far right is a button which provides a -20 dB attenuation.
This button is used in conjunction with the Amplitude control knob just
below it. On the bottom left is the knob which controls the DC Offset.
Typically, this is turned off by pushing the knob in and turned on by pulling
away from the panel.
The final control here is for the Duty Cycle of the waveform. This is a
control which determined the relative time spent in the positive and negative
portions of the waveform. For example, a square wave has a duty cycle of
50% because it is high for 50% of the cycle and low for the remaining 50%.
If this relationship is changed, the wave approaches a pulse wave. Similarly,
a triangle wave has a 50% duty cycle. If this is reduced to 0% then you have
a sawtooth wave.
Output
The average function generator will have two outputs. One, which is usually
marked “Main” is the normal output of the device. The second is a pulse
wave with the same frequency as the main output. This can be used for
6. Electroacoustic Measurements 422
Input
Finally, some function generators have a VCF input which permits the user
to control the frequency of the generator with an external voltage source.
This may be useful in doing swept sine tone measurements, where the fre-
quency of the sine wave is swept from a very low to very high frequency to
see the entire range of the DUT.
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
0 2 4 6 8 10 12 14 16
0.8
0.6
Correlation coefficient
0.4
0.2
-0.2
-0.4
-15 -10 -5 0 5 10 15
Time offset (samples)
Figure 6.5: The cross correlation function of the MLS sequence shown in Figure 6.4 with itself.
other than 0 would become very small. For example, Figure 6.6 shows the
cross-correlation function of an 8191-point MLS.
1.2
0.8
Correlation coefficient
0.6
0.4
0.2
-0.2
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Time offset (samples) 4
x 10
Figure 6.6: The cross correlation function of an 8191-point MLS sequence with itself.
a room. You’d have to make an impulse, record it and look at the recording.
So, you snap your fingers and magically, you can see direct sound, reflections
and reverberation in your recording. Unfortunately, this won’t work. You
see, in the real world, there’s noise, and if you snap your fingers, you won’t be
able to distinguish the finger snap from the noise because your impulse isn’t
strong enough – your signal-to-noise ratio is far too small. “No problem!” I
hear you cry. “We make a bigger impulse. I’ll get my gun.”
In fact, this would work pretty well. For many years, acousticians carried
around pistols and buckets of sand to fire bullets into in order to make im-
pulse response measurements. There are some drawbacks to this procedure,
however. You have to own a gun, you need a microphone that can handle a
VERY big signal, you need to buy ear protection... All sorts of reasons that
work against this option.
There is a better way. Play an MLS out of a loudspeaker. (It will sound
a bit like white noise – in fact the signal is sometimes called a pseudo-random
signal because it sounds random but it isn’t.) Record the signal in the room
with a microphone. Then go home.
When you get home, do a cross-correlation function on the MLS signal
that you sent to the loudspeaker with the signal that came into the micro-
phone. When you do, the resulting output will be an impulse response of
the room with a very good signal-to-noise ratio.
Let’s look at a brief example. I made a 1023-point MLS. Then I mul-
tiplied by 0.5 delayed it by 500 samples and added the result back to the
original. So, we have two overlapping MLS’s with a time offset of 500 sam-
ples, and the second one is lowered in level by 6 dB. I then did a cross
correlation function of the two signals using the xcorr function in MAT-
LAB. The result is shown in Figure 6.7. As you can see here, the result is
a noisy representation of the impulse response of the resulting signal. If I
wanted to improve my signal-to-noise ratio, all I have to do is to increase
the length of the MLS.
All of this was a lot of work that you don’t have to do because you can
buy measurement systems that do all the math for you. The most popular
one is the MLSSA system (pronounced “Melissa”) from DRA Laboratories
(www.mlssa.com), but other companies make them. For example, the Au-
dioPrecision system has it as an option on their devices (www.ap.com). Of
course, if you have MATLAB and the option to playback and record digital
audio from your computer, you could just do it yourself.
1.2
0.8
Correlation coefficient
0.6
0.4
0.2
-0.2
0 500 1000 1500 2000 2500 3000 3500
Time (samples)
Figure 6.7: The cross correlation function of an 1023-point MLS sequence with a modified version
of itself (see text for details).
6. Electroacoustic Measurements 427
V1
ZOU T = 600 ∗ −1 (6.1)
V2
Where ZOU T is the output impedance of the DUT (in Ω)
V1 is the level of the unloaded output signal of the DUT (in volts)
and
V2 is the level of the output signal of the DUT loaded with a 600 Ω
resistor (in volts)
600
ZIN = V1
(6.2)
V2−1
Where ZIN is the input impedance of the DUT (in Ω)
V1 is the level of the unloaded output signal of the function generator
(with a 600 Ω output impedance) (in volts)
6. Electroacoustic Measurements 428
and
V2 is the level of the output signal of the same funtion generator loaded
with the input of the DUT (in volts)
6.2.3 Gain
The gain of a DUT is measured by comparing its input to its output. This
gain can either be expressed as a multiplier or in decibels.
For example : if the input of a DUT is a 1 kHz sine tone with an ampli-
tude of 100mVRM S and its output is a 1 kHz sine tone with an amplitude
of 1VRM S , then the gain is
1VRM S
(6.3)
100mVRM S
1
(6.4)
0.1
Gain = 10 (6.5)
This could also be expressed in decibels as follows
1VRM S
20 log10 (6.6)
100mVRM S
input amplitude on the x-axis and the deviation from the nominal gain on
the y-axis.
This measurement is useful for showing the distortion of low-level signals
caused by digital converters in a DUT, for example.
6.2.6 Bandwidth
All DUT’s will exhibit a roll off in their frequency response at low and high
frequencies. The bandwidth of the DUT is the width in Hz between the low
and high points where we first reach a gain of -3 dB relative to the nominal
gain measured at a mid frequency (usually 1 kHz).
accurately (if the input noise levels aren’t too low to measure accurately,
then you need a new mic pre...). Luckily, however, we have lots of gain
available to us in the preamp itself which we can use to boost the noise in
order to measure it. The general idea is that we set the preamp to some
known amount of gain, measure the output noise, and then subtract the
gain from our noise measurement to calculate what the level is – this is the
reason for the “equivalent” in the title, because it’s not actually measured
– it’s caluclated.
There a couple of things to worry about when making an EIN measure-
ment. The first thing is the gain of the microphone preamplifier. Using
a function generator with a very small output level, set the gain of the
preamp to +60 dB. This is done by simultaneously measuring the input
and the output on an oscilloscope and changing the gain until the output is
1000 times greater in amplitude than the input. Then you disconnect the
function generator and the oscilloscope (but don’t change the gain of the
preamp!).
Since it’s a noise measurement, we’re concerned with the input impedance.
In this case, it’s common practice to terminate the input of the microphone
preamp with a 150 Ω resistor to match the output impedance of a typical
microphone. The easiest way to do this is to solder a 150 Ω resistor between
pins 2 and 3 of a spare male XLR connector you have lying around the
house. Then you can just plug that directly into the mic preamp input with
a minimum of hassle.
With the 150 Ω termination connected, and the gain set to +60 dB,
measure the level of the noise at the output of the mic preamp using a
true RMS meter, with a band-limited imput. Take the measurement of the
output noise (expressed in dBu) and subract 60 dB (for the gain of the
preamp). That’s the EIN – usually somewhere around -120 dBu.
works well for audio signals because we usually send a balanced input a bal-
anced signal which means that we’re sending the audio signal on two wires,
with the voltages on each of theose wires being the negative of the other
– so when one wire is at +1 V, the other is at -1 V. The +1 V is sent to
the non-inverting input, and the -1 V is sent to the inverting input and the
differential input does the math +1 – -1 = 2 and gets the signal out with a
gain of 6.02 dB.
The neat thing is that, since any noise picked up on the balanced trans-
mission cable before the balanced input will theoretically be the same on
both wires, the differential input subtracts the noise from itself and therefore
eliminates it. For more info on this (and the correct definition of “balanced”)
see the chapter on grounding and sheilding in the electronics section
Since the noise on the cable is common to both inputs (the inverting
and the non-inverting) we call it a common mode and since this common
mode cancels itself, we say that it has been rejected. Therefore, the ability
for a differential input to eliminate a signal sent to both input legs is called
the common mode rejection ratio. It’s expressed in dB, and here’s how you
measure it.
Take a function generator emitting a sine wave. Send the output of the
generator to both input legs of the balanced input of the DUT (pins 2 and
3 on an XLR). Measure the input level and the output level of the DUT.
Hopefully, the level of the output will be considerably lower (at least 60 dB
lower) than the level of the input. The difference between them, in dB is
the CMRR.
This method is the common way to find the CMRR but there is a slight
problem with it. Since a balanced input is designed to work best when
connected to a balanced output (meaning that the impedances of the two legs
to ground are matched – not necessarily that the voltages are symmetrical)
the CMRR of our DUT’s input may change if we connected it to unmatched
impedances. Two people I have talked to regarding this subject have both
commented on the necessity to implement a method of testing the CMRR of
a device under unmatched impedances. This may become a recommended
standard...
are quarks irreducible?) of the audio world. What this means is that a sine
wave has no energy in any of its harmonics above the fundamental. It is the
fundamental and that’s that.
If you then take your perfect sine wave and send it to the input of a
DUT and look at the output, you’ll find that the shape of your wave has
been modified slightly. Just enough to put some energy in the harmonics
above the fundamental. The distortion of the shape of the wave results in
energy in the upper harmonics. Therefore, if we measure the total amount
of energy in the harmonics relative to the level of the fundamental, we have a
quantifiable method of determining how much the signal has been distorted.
The greater the amplitude of the harmonics, the greater the distortion.
This measurement is not a common one, but its counterpart, explained
below is seen everywhere. It’s made by sending a perfect sine wave into
the input of the DUT, and looking at an FFT of the output (which shows
the amplitudes of individual frequency “bins”). You add the squares of
the amplitudes of the harmonics other than the fundamental, and take the
square root of this sum and divide it by the amplitude of the fundamental at
the output. If you multiply this value by 100, you get the THD in percent.
If you take the log of the number and multiply by 20, you’ll get it in dB.
A couple of things about THD measurements – usually the person doing
the measurement will only measure the amplitudes of the harmonics up to
a given harmonic, so you’ll see specifications like “THD through to the 7th
harmonic” meaning that there is more energy above that, but it’s probably
too low or close to the noise floor to measure.
Secondly, unless you have a smart comptuer which is able to do this
measurement, you probably will notice that’s relative time-consuming and
not really worth the effort. This is especially true when you compare the
procedure to getting a THD+N measurement as is explained below. That’s
why you rarely see it.
Since we’re looking for all of the energy other than the original, we’ll
send a perfect sine wave into the input of the DUT. We then acquire a
notch-filter tuned to the frequency of the input sine, which theoretically
eliminates the original inputted signal, leaving all other signal (distortion
produces and noise which is produced by the DUT itself) to get through.
This filtered output is measured using a true RMS meter (we usually band-
limit the signal as well, because it’s a noise measurement – see above). This
measurement is divided by the RMS level of the fundamental which gives a
ratio which can either be converted to a percentage or a dB equivalent as
explained above.
6.2.18 Crosstalk
6.2.19 Intermodulation Distortion (IMD)
NOT YET WRITTEN
6.2.20 DC Offset
NOT YET WRITTEN
6.2.21 Filter Q
See section 5.1.2.
6.2.23 Cross-correlation
NOT YET WRITTEN
The first of these is the microphone that you’re using to make the mea-
surement. The output of the loudspeaker is converted to a measurable elec-
trical signal by a microphone which cannot be a perfect transducer. So,
for example, when you make a frequency response measurement, you are
not only seeing the frequency response of the loudspeaker, but the response
of the microphone as well. The answer to this problem is to spend a lot
of money on a microphone that includes extensive documentation of its
particular characteristics. In many cases today, this documentation is in-
cluded in the form of measurement data that can be incorporated into your
measurement system and subtracted from your measurements to make the
microphone effectively “invisible.”
The second problem is that of the room that you’re using for the mea-
surements. Always remember that a loudspeaker is, more often than not,
in a room. That room has two effects. Firstly, different rooms make dif-
ferent loudspeakers behave differently. (Take the extreme case of a large
loudspeaker in a small room such as a closet or a car. In this case, the
“room” is more like an extra loudspeaker enclosure than a room. This will
cause the driver units to behave differently than in a large room because
the loudspeaker “sees” the load imposed on the driver by the space itself.)
Secondly, the room itself imposes resonance, reflections and reverberation
that will be recorded at the microphone position. These effects may be
indistinguishable from the loudspeaker (for example, a ringing room mode
compared to a ringing loudspeaker driver, or a ringing EQ stage in front of
the power amplifier).
6. Electroacoustic Measurements 438
measurement
system
Figure 6.8: A simple block diagram showing the basic setup used for typical loudspeaker measure-
ments. The measurement device may be anything from a signal generator and oscilloscope to a
sophisticated self-contained automated computer system.
Digital Audio
Figure 7.1: The analog waveform that we’re going to convert to digital representation.
441
7. Digital Audio 442
appears that you are watching a moving picture. In fact, you are watching
24 still pictures every second – but your eyes are too slow in responding to
the multiple photos and therefore you get fooled into thinking that smooth
motion is happening. In technical jargon, we are changing an event that
happens in continuous time into one that is chopped into slices of discrete
time.
Unlike a film, where we just take successive photographs of the event
to be replayed in succession later, audio uses a slightly different procedure.
Here, we use a device to sample the voltage of the signal at regular intervals
in time as is shown below in Figure 7.2.
Figure 7.2: The audio waveform being sliced into moments in time. A sample of the signal is taken
at each vertical dotted line.
Each sample is then temporarily stored and all the information regarding
what happened to the signal between samples is thrown away. The system
that performs this task is what is known as a sample and hold circuit because
it samples the original waveform at a given moment, and holds that level
until the next time the signal is sampled as can be seen in Figure 7.3.
Our eventual goal is to represent the original signal with a string of num-
bers representing measurements of each sample. Consequently, the next step
in the process is to actually do the measurement of each sample. Unfortu-
nately, the “ruler” that’s used to make this measurement isn’t infinitely
precise – it’s like a measuring tape marked in millimeters. Although you
can make a pretty good measurement with that tape, you can’t make an
accurate measurement of something that’s 4.23839 mm long. The same
problem exists with our measuring system. As can be seen in Figure 7.4, it
is a very rare occurance when the level of each sample from the sample and
hold circuit lands exactly on one of the levels in the measuring system.
If we go back to the example of the ruler marked in millimeters being
7. Digital Audio 443
Figure 7.3: The output of the sample and hold circuit. Notice that, although we still have the basic
shape of the original waveform, the smooth details have been lost.
Figure 7.4: The output of the sample and hold circuit shown against the allowable levels plotted
as horizontal dotted lines.
7. Digital Audio 444
Figure 7.5: The output of the quantizing circuit. Notice that almost all the samples have been
rounded off to the nearest dotted line.
7. Digital Audio 445
At this point, we finally have our digital signal. Looking back at Figure
7.5 as an example, we can see that the values are
0 2 3 4 4 4 4 3 2 -1 -2 -4 -4 -4 -4 -4 -2 -1 1 2 3
These values are then stored in (or transmitted by) the system as a
digital representation of the original analog signal.
7.1.3 Aliasing
I remember when I was a kid, I’d watch the television show M*A*S*H every
week, and every week, during the opening credits, they’d show a shot of the
7. Digital Audio 446
Figure 7.6: The results of the reconstruction filter showing the original staircase waveform from
which it was derived as a dotted line.
jeep accelerating away out of the camp. Oddly, as the jeep got going faster
and faster forwards, the wheels would appear to speed up, then stop, then
start going backwards... What didn’t make sense was that the jeep was still
going forwards. What causes this phenomenon, and why don’t you see it in
real day-to-day life?
Let’s look at this by considering a wheel with only one spoke as is shown
in the top left of Figure 8. Each column of Figure 8 rerpresents a different
rotational speed for the wheel, each numbered row represents a frame of the
movie. In the leftmost column, the wheel makes one sixth of a rotation per
frame. As can be seen in the animation in Figure 9, this results in a wheel
that appears to be rotating clockwise as expected. In the second column,
the wheel is making one third of a rotation per frame and the resulting
animation is a faster turning wheel, but still in the clockwise rotation. In
the third column, the wheel is turning slightly faster, making one half of
a rotation per frame. As can be seen in the corresponding animation, this
results the the appearance of a 2-spoked wheel that is stopped.
If the wheel is turning faster than one rotation every two frames, an odd
thing happens. The wheel, making more than one half of a rotation per
frame, results in the appearance of the wheel turning backwards and more
slowly than the actual rotation... This is a problem caused by the fact that
we are slicing continuous time into discrete time, thus distorting the actual
event. This result which appears to be something other than what happened
is known as an alias.
The same problem exists in digital audio. If you take a look at the
waveform in Figure 7.9, you can see that we have less than two samples per
period of the wave. Therefore the frequency of the wave is greater than one
half the sampling rate.
Figure 7.10 demonstrates that there is a second waveform with the same
amplitude as the one in Figure 7.9 which could be represented by the same
samples. As can be seen, this frequency is lower than the one that was
recorded
The whole problem of aliasing causes two results. Firstly, we have to
make sure that no frequencies above half of the sampling rate (typically
called the Nyquist frequency) get into the sample and hold circuit. Secondly,
we have to set the sampling rate high enough to be able to capture all the
frequencies we want to hear. The second of these issues is a pretty easy one
to solve: textbooks say that we can only hear frequencies up to about 20
kHz, therefore all we need to do is to make sure that our sampling rate is
at least twice this value – therefore at least 40,000 samples per second.
The only problem left is to ensure that no frequencies above the Nyquist
7. Digital Audio 448
Figure 7.8: Frame-by-frame shots of a 1-spoked wheel turning at different speeds and captured by
the same frame rate.
Figure 7.9: Waveform with a frequency that is greater than one-half the sampling rate.
Figure 7.10: The resulting alias frequency caused by sampling the waveform as shown in Figure 7.9.
7. Digital Audio 449
frequency get into the sample and hold circuit to begin with. This is a fairly
easy task. Just before the sample and hold circuit, a low-pass filter is used
to eliminate high frequency components in the audio signal. This low-pass
filter, usually called an anti-aliasing filter because it prevents aliasing, cuts
out all energy above the Nyquist, thus solving the problem. Of course, some
people think that this creates a huge problem because it leaves out a lot of
information that no one can really prove isn’t important.
There is a more detailed discussion of the issue of aliasing and antialiasing
filters in Section 7.3.
could use positive and negative binary numbers to represent this but we
don’t. We typically use a system called “two’s complement.” There are
really two issues here. One is that, if there’s no signal, we’d probably like
the digital representation of it to go to 0 – therefore zero level in the analog
signal corresponds to zeros in the digital signal. The second is, how do we
represent negative numbers? One way to consider this is to use a circular
plotting of the binary numbers. If we count from 0 to 7 using a 3-bit “word”
we have the following:
000
001
010
011
100
101
110
111
Now if we write these down in a circle starting at 12 o’clock and going
clockwise as is shown in Figure 7.11, we’ll see that the value 111 winds
up being adjacent to the value 000. Then, we kind of ignore what the
actual numbers are and starting at 000 turn clockwise for positive values and
counterclockwise for negative values. Now, we have a way of representing
positive and negative values for the signal where one step above 000 is 001
and one step below 000 is 111. This seems a little odd because the numbers
don’t really line up the way we’d like them as can be seen in Figure 7.12
– but does have some advantages. Particularly, digital zero corresponds to
analog zero – and if there’s a 1 at the beginning of the binary word, then
the signal is negative.
One issue that you may want to concern yourself here is the fact that
there is one more quantization level in the negative area than there is in the
positive. This is because there are an even number of quantization levels
(because that number is a power of two) but one of them is dedicated to the
zero level. Therefore, the system is slightly asymmetrical – so it is, in fact
possible to distort the signal in the positive before you start distorting in
the negative. But keep in mind that, in a typical 16-bit system we’re talking
about a VERY small difference.
Figure 7.12: Binary words corresponding to quantization levels in a two’s complement system.
7. Digital Audio 452
The fundamental difference between digital audio and analog audio is one of
resolution. Analog representations of analog signals have a theoretically
infinite resolution in both level and time. Digital representations of an
analog sound wave are discretized into quantifiable levels in slices of time.
We’ve already talked about discrete time and sampling rates a little in the
previous section and we’ll elaborate more on it later, but for now, let’s
concentrate on quantization of the signal level.
As we’ve already seen, a PCM-based digital audio system has a finite
number of levels that can be used to specifiy the signal level for a particular
sample on a given channel. For example, a compact disc uses a 16-bit binary
word for each sample, therefore there are a total of 65,536 (216 ) quantization
levels available. However, we have to always keep in mind that we only use
all of these levels if the signal has an amplitude equal to the maximum
possible level in the system. If we reduce the level by a factor of 2 (in other
words, a gain of -6.02 dB) we are using one fewer bits worth of quantization
levels to measure the signal. The lower the amplitude of the signal, the
fewer quantization levels that we can use until, if we keep attenuating the
signal, we arrive at a situation where the amplitude of the signal is the level
of 1 Least Significant Bit (or LSB ).
Let’s look at an example. Figure 9.1 shows a single cycle of a sine wave
plotted with a pretty high degree of resolution (well... high enough for the
purposes of this discussion).
Figure 7.13: A single cycle of a sine wave. We’ll consider this to be the analog input signal to our
digital converter.
7. Digital Audio 453
Let’s say that this signal is converted into a PCM digital representation
using a converter that has 3 bits of resolution – therefore there are a total
of 8 different levels that can be used to describe the level of the signal. In
a two’s complement system, this means we have the zero line with 3 levels
above it and 4 below. If the signal in Figure 9.1 is aligned in level so that its
positive peak is the same as the maximum possible level in the PCM digital
representation, then the resulting digital signal will look like the one shown
in Figure 7.14.
Figure 7.14: A single cycle of a sine wave after conversion to digital using 4-bit, PCM, two’s
complement where the signal level is rounded to the nearest quantization level at each sample.
The blue plot is the original waveform, the red is the digital representation.
Not surprisingly, the digital representation isn’t exactly the same as the
original sine wave. As we’ve already seen in the previous section, the cost
of quantization is the introduction of errors in the measurement. However,
let’s look at exactly how much error is introduced and what it looks like.
This error is the difference between what we put into the system and
what comes out of it, so we can see this difference by subtracting the red
waveform in Figure 7.14 from the blue waveform.
There are a couple of characteristics of this error that we should discuss.
Firstly, because the sine wave repeats itself, the error signal will be periodic.
Also, the period of this complex waveform made up of the will be identical
to the original sine wave – therefore it will be comprised of harmonics of
the original signal. Secondly, notice that the maximum quantization error
that we introduce is one half of 1 LSB. The significant thing to note about
this is its relationship to the signal amplitude. The quantization error will
never be greater than one half of an LSB, so the more quantization levels
7. Digital Audio 454
Figure 7.15: A plot of the quantization error generated by the conversion shown in Figure 7.14.
we have, the louder we can make the signal we want to hear relative to the
error that we don’t want to hear. See Figures 7.16 through 7.18 for a graphic
illustration of this concept.
Figure 7.16: A combined plot of the original signal, the quantized signal and the resulting quanti-
zation error in a 3-bit system.
As is evident from Figures 7.16, 7.17 and 7.18, the greater the number
of bits that we have available to describe the instantaneous signal level, the
lower the apparent level of the quantization error. I use the apparent here in
a strange way – no matter how many bits you have, the quantization error
will be a signal that has a peak amplitude of one half of an LSB in the worst
case. So, if we’re thinking in terms of LSB’s – then the amplitude of the
7. Digital Audio 455
Figure 7.17: A combined plot of the original signal, the quantized signal and the resulting quanti-
zation error in a 5-bit system.
Figure 7.18: A combined plot of the original signal, the quantized signal and the resulting quanti-
zation error in a 9-bit system.
7. Digital Audio 456
1. We have to always remember that the only time all of the bits in a
digital system are being used is when the signal is at its maximum
possible level. If you go lower than this – and we usually do – then
you’re using a subset of the number of quantization levels. Since the
quantization error stays constant at +/- 0.5 LSB and since the sig-
nal level is lower, then the relative level of the quantization error to
the signal is higher. The lower the signal, the more audible the error.
This is particularly true at the end of the decay of a note on an in-
strument or the reverberation in a large room. As the sound decays
from maximum to nothing, it uses fewer and fewer quantization levels
and the perceived quality drops because the error becomes more and
more evident because it is less and less masked.
7.2.1 Dither
Luckily, however, we can make a cute little trade. It turns out that we can
effectively eliminate quantization error simply by adding noise called dither
to our signal. It seems counterproductive to fix a problem by adding noise –
but we have to consider that what we’re esentially doing is to make a trade –
distortion for noise. By adding dither to the audio signal with a level that is
approximately one half the level of the LSB, we generate an audible, but very
low-level constant noise that effectively eliminates the program-dependent
noise (distortion) that results from low-level signals.
Notice that I used the word “effectively” at the beginning of the last para-
graph. In fact, we are not eliminating the quantization error. By adding
dither to the signal before quantizing it, we are randomizing the error, there-
fore changing it from a program-dependent distortion into a constant noise
floor. The advantage of doing this is that, although we have added noise
to our final signal, it is constant, and therefore not trackable by our brains.
Therefore, we ignore it,
So far, all I have said is that we add “noise” to the signal, but I have not
said what kind of noise - is it white, pink, or some other colour? People who
deal with dither typically don’t use these types of terms to describe the noise
– they talk about probability density functions or PDF instead. When we
add dither to a signal before quantizing it, we are adding a random number
that has a value that will be within a predictable range. The range has to
be controlled, otherwise the level of the noise would be unnecessarily high
and therefore too audible, or too low, and therefore ineffective.
Flip a coin. You’ll get a heads or a tails. Flip it again, and again and
again, each time writing down the result. If you flipped the coin 1000 times,
chances are that you’ll see that you got a heads about 500 times and a tails
about 500 times. This is because each side of the coin has an equal chance of
landing up, therefore there is a 50% probability of getting a heads, and a 50%
probability of getting a tails. If we were to draw a graph of this relationship,
with “heads” and “tails” being on the x-axis and the probability on the y-
axis, we would have two points, both at 50%.
Let’s do basically the same thing by rolling a die. If we roll it 600 times,
we’ll probably see around 100 1’s, 100 2’s, 100 3’s, 100 4’s, 100 5’s and 100
6’s. Like the coin, this is because each number has an equal probability of
being rolled. I tried this, and kept track of each number that was rolled an
7. Digital Audio 458
Result 1 2 3 4 5 6
Number of 111 98 101 94 92 104
times rolled
Table 7.1: The results of rolling a die 600 times.
100
80
Number of times rolled
60
40
20
0
1 2 3 4 5 6
Result
Let’s say that we didn’t know that there was an equal probability of
rolling each number on the die. How could we find this out experimentally?
All we have to do is to take the numbers in Table 7.1 and divide by the
number of times we rolled the die. This then tells us the probability (or the
chances) of rolling each number. If the probability of rolling a number is 1,
then it will be rolled every time. If the probability is 0, then it will never
be rolled. If it is 0.5, then the number will be rolled half of the time.
Notice that the numbers didn’t work out perfectly in this example, but
they did come close. I was expecting to get each number 100 times, but
there was a small deviation from this. The more times I roll the dice, the
more reality will approach the theoretical expectation. To check this out, I
did a second experiment where I rolled the die 60,000 times.
This graph tells us a number of things. Firstly, we can see that there
is a 0 probability of rolling a 7 (this is obvious because there is no “7” on
a die, so we can never roll and get that result). Secondly, we can see that
7. Digital Audio 459
0.25
0.2
Probability of being rolled
0.15
0.1
0.05
0
1 2 3 4 5 6
Result
Figure 7.20: The calculated probability of rolling each number on the die using the results shown
in Table 7.1.
0.2
0.18
0.16
0.14
Probability of being rolled
0.12
0.1
0.08
0.06
0.04
0.02
0
-2 0 2 4 6 8 10
Result
Figure 7.21: The calculated probability of rolling each number on the die using the results after
60,000 rolls. Notice that the graph has a rectangular shape.
7. Digital Audio 460
0.3
0.25
0.2
Probability
0.15
0.1
0.05
0
0 2 4 6 8 10 12 14 16 18 20
Age (years)
Figure 7.22: The calculated probability of the age of students in Grade 5 in Canada.
This is obviously not an RPDF because the result doesn’t look like a
rectangle. In fact, it is what statisticians call a normal distribution, better
known as a bell curve. What this tells us is that the probability a Canadian
Grade 5 student of being either 10 or 11 years old is higher than for being
any other age. It is possible, but less likely that the student will be 8, 9, 12
or 13 years old. It is extremely unlikely, but also possible for the student to
be 7 or 14 years old.
Dither examples
There are three typical dither PDF’s used in PCM digital audio, RPDF,
TPDF (triangular PDF and Gaussian PDF . We’ll look at the first two.
For this section, I used MATLAB to make a sine wave with a sampling
rate of 32.768 kHz. I realize that this is a strange sampling rate, but it
made the graphs cleaner for the FFT analyses. The total length of the sine
wave was 32768 samples (therefore 1 second of audio.) MATLAB typically
7. Digital Audio 461
400
350
300
250
Number of times used
200
150
100
50
0
-1.5 -1 -0.5 0 0.5 1 1.5
Amplitude (LSBs)
Figure 7.23: Histogram of RPDF dither for 32k block. MATLAB equation is RAND(1, 32768) -
0.5
1.5
0.5
0
-1.5 -1 -0.5 0 0.5 1 1.5
Amplitude (LSBs)
Figure 7.24: Probability distribution function of RPDF dither for 32k block.
800
700
600
500
Number of times used
400
300
200
100
0
-1.5 -1 -0.5 0 0.5 1 1.5
Amplitude (LSBs)
Figure 7.25: Histogram of TPDF dither for 32k block. MATLAB equation is RAND(1, 32768) -
RAND(1, 32768)
7. Digital Audio 463
2.5
2
Probability of being used (%)
1.5
0.5
0
-1.5 -1 -0.5 0 0.5 1 1.5
Amplitude (LSBs)
Figure 7.26: Probability distribution function of TPDF dither for 32k block.
0
dB fs
-50
-100
1 2 3 4
10 10 10 10
0
dB fs
-50
-100
1 2 3 4
10 10 10 10
0
dB fs
-50
-100
1 2 3 4
10 10 10 10
0
dB fs
-50
-100
1 2 3 4
10 10 10 10
Freq. (Hz)
Figure 7.27: From top to bottom, a 64-bit sine 1 kHz wave in MATLAB, 8-bit no dither, 8-bit
RPDF dither, 8-bit TPDF dither. Fs = 32768 Hz, FFT window is rectangular, FFT length is 32768
point.
7. Digital Audio 464
One of the important things to notice here is that, although the dithers
raised the overall noise floor of the signal, the resulting artifacts are wide-
band noise, rather than spikes showing up at harmonic intervals as can be
seen in the no-dither plot. If we were to look at the artifacts without the
original 1 kHz sine wave, we get a plot as shown in Figure 7.28.
0
-50
dB fs
-100
1 2 3 4
10 10 10 10
0
-50
dB fs
-100
1 2 3 4
10 10 10 10
0
-50
dB fs
-100
1 2 3 4
10 10 10 10
Freq. (Hz)
Figure 7.28: Artifacts omitting the 1 kHz sine wave. From top to bottom, 8-bit no dither, 8-bit
RPDF dither, 8-bit TPDF dither. Fs = 32768 Hz, FFT window is rectangular, FFT length is 32768
point.
mixer, it does math with an accuracy of 32 bits, and then you have to get
out to a 16-bit output. The process of converting the 32-bit signal into a
16-bit signal must include dither.
Remember, if you are quantizing, you dither.
7.3 Aliasing
Back in Section 7.1.3, we looked very briefly at the problem of aliasing, but
now we have to dig a little deeper to see more specifically what it does and
how to avoid it.
As we have already seen, aliasing is an artifact of chopping continuous
events into discrete time. It can be seen in film when cars go forwards but
their wheels go backwards. It happens under fluorescent or strobe lights
that prevent us from seeing intermediate motion. Also, if we’re not careful,
it happens in digital audio.
Let’s begin by looking at a simple example: we’ll look at a single sam-
pling rate and what it does to various sine waves from analog input to analog
output from the digital system.
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
out
-
Σ D Q
in +
Q
clock
Figure 7.37: A simple 1-bit ∆ − Σ (Delta-Sigma) analog to digital converter [Watkinson, 1988].
The result was the AES/EBU protocol (also known as IEC-958 Type 1).
It’s a bi-phase mark coding protocol which fulfills all of the above require-
ments. “What’s a bi-phase mark coding protocol?” I hear you cry... Well
what that means is that, rather than using two discrete voltages to denote
1 and 0, the distinction is made by voltage transitions.
In order to transmit a single bit down a wire, the AES/EBU system
carves it into two “cells.” If the cells are the same voltage, then the bit is a
7. Digital Audio 473
0 : if the cells are different voltages, then the bit is a 1. In other words, if
there is a transition between the cells, the bit is a 1. If there is no transition,
the bit is a 0.
BIT BIT BIT BIT
cell cell cell cell cell cell cell cell
{
{
{
{
1 0 1 0
Figure 7.38: The relationship between cell transitions and the value of the bit in a bi-phse mark.
Note that if both cells in one bit are the same, the represented value is 0. If the two cells have a
different value, the bit value is 1. This is independent of the actual high or low value in the signal.
BLOCK
FRAME
Sub-frame A (usually Left) Sub-frame B (usually Right)
SUBFRAME
0 3 4 7 8 27 28 29 30 31
Preamble Aux data Audio Sample Validity User Status Parity
4 bits 4 bits 20 bits 1 bit 1 bit 1 bit 1 bit
Figure 7.39: The relationship of the structures of a Block, Frame and Sub-Frame.
information.
Channel Code
This is information regarding the transmission itself – data that keeps the
machines talking to each other. It consists of 5 bits making up the Preamble
(or Sync Code) and the Parity Bit.
Preamble (also known as the Sync Code)
These are 4 bits which tell the receiving device that the trasmission is at
the beginning of a block or a subframe (and which subframe it is...) Different
specific codes tell the receiver what’s going on as follows...
Note that these codes violate the bi-phase mark protocol (because there
7. Digital Audio 475
Block
Sub-Frame A
Sub-Frame B
Figure 7.40: The structure of the preamble at the start of each Sub-Frame. Note that each of
these breaks the bi-phse mark rule that there must be a transition on every bit since all preambles
start with 3 consecutive cells of the same value.
is no transition at the beinning of the second bit.) but they do not violate
the no-DC rule.
Note as well, that these are sometimes called the X, Y, and Z preambles.
An X Preamble indicates that the Sub-Frame is an audio sample for the Left.
A Y Preamble indicates that the Sub-Frame is an audio sample for the Right.
A Z Preamble indicates the start of a Block.
Parity Bit
This is a single bit which ensures that all of the preambles are in phase.
It doesn’t matter to the receiving device whether the preambles start by
going up in voltage or down (I drew the above examples as if they are all
going up...) but all of the preambles must go the same way. The partity bit
is chosen to be a 1 or 0 to ensure that the next preamble will be headed in
the right direction.
Source Code
This is the information that we’re trying to transmit. It uses the other 27
bits of the sub-frame comprising the Audio Sample (20 bits), the Auxiliary
Data (4 bits), the Validity Bit (1 bit), the User Bit (1 bit) and the Status
Bit (1 bit).
Audio sample
This is the sample itself. It has a maximum of 20 bits, with the Least
Significant Bit sent first.
Auxiliary Data
This is 4 bits which can be used for anything. These days it’s usually
used for 4 extra bits to be attached to the audio sample – bringing the
resolution up to 24 bits.
Validity Bit
7. Digital Audio 476
This is simply a flag which tells the receiving device whether the data is
valid or not. If the bit is a 1, then the data is non-valid. A 0 indicates that
the data is valid.
User Bit
This is a single bit which can be used for anything the user or manufac-
turer wants (such as time code, for example).
For example, a number of user bits from successive sub-frames are strung
together to make a single word. Usually this is done by collecting all 192
user bits (one from each sub frame) for each channel in a block. If you then
put these together, you get 24 bytes of information in each channel.
Typically, the end user in a recording studio doesn’t have direct access
to how these bits should be used. However, if you have a DAT machine, for
example, that is able to send time code information on its digital output,
then you’re using your user bits.
Status Bit
This is a single-bit flag which can be used for a number of things such
as :
• Emphasis on / off
• Sampling rate
3 reserved
4 reference reserved
5 reserved
10
11
Byte number
13
14
17
18
21
Figure 7.41: The structure of the bytes made out of the status bits in the channel code in-
formation in a single Block. This is sometimes called the Channel Status Block Structure
[Sanchez and Taylor, 1998].
7. Digital Audio 478
The maximum cable run is about 300 m balanced using XLR connectors.
If the signal is unbalanced (using a transformer, for example) and sent using
a coaxial cable, the maximum cable run becomes about 1 km.
If the Sampling Rate is 44.1 kHz, 1 frame takes 22.7 microsec. to transmit
(the same as the time between samples)
If the Sampling Rate is 48 kHz, 1 frame takes 20.8 microsec. to transmit
At 44.1 kHz, the bit rate is 2.822 Mbit/s
At 48 kHz, the bit rate is 3.072 Mbit/s
Just for reference (or possibly just for interest), this means that 1/4
wavelength of the cell in AES/EBU is about 19 m on a wire.
7.5.4 S/PDIF
S/PDIF was developed by Sony and Philips (hence the S/P) before AES/EBU.
It uses a single unbalanced coaxial wire to transmit 2 channels of digital au-
dio and is specified in IEC 958 Type 2. The Source Code is identical to
AES/EBU with the exception of the channel status bit which is used as a
copy prohibit flag.
Some points :
The connectors used are RCA with a coaxial cable
The voltage alternates between 0V and 1V 20% (note that this is not
independent of the ground as in AES/EBU)
The source impedance is 75Ω
S/PDIF has better RF noise immunity than AES/EBU because of the
coax cable (please don’t ask me to explain why... the answer will be “dunno...
too dumb...”)
It can be sent as a “video” signal through exisiting video equipment
Signal loss will be about -0.75 dB / 35 m in video cable
Two synchronous devices have a single clock source and there is no delay
between them. For example, the left windshield wiper on your car is syn-
chronous with the right windshield wiper.
7. Digital Audio 479
Asynchronous
Two asynchronous devices have absolutely no relation to each other. They
are free-running with separate clocks. For example, your windshield wipers
are asynchronous with the snare drum playing on your car radio.
Isochronous
Two isochronous devices have the same clock but are separated by a fixed
propogation delay. They have a phase difference but that difference remains
constant.
7.5.6 Jitter
Jitter is a modulation in the frequency of the digital signal being transmitted.
As the bit rate changes (and assuming that the receiving PLL can’t correct
variations in the frequency), the frequency of the output will modulate and
therefore cause distortion or noise.
Jitter can be caused by a number of things, depending on where it occurs
:
http://www.crystal.com)
7. Digital Audio 481
7.6 Jitter
Go and make a movie using a movie camera that runs at 24 frames per
second. Then, play back the movie at 30 fps. Things in the movie will move
faster than they did in real life because the frame rate has speeded up. This
might be a neat effect, but it doesn’t reflect reality. The point so far is that,
in order to get out what you put in, a film must be played back at the same
frame rate at which is was recorded.
Similarly, when an audio signal is recorded on a digital recording system,
it must be played back at the same sampling rate in order to ensure that you
don’t result in a frequency shift. For example, if you increase the sampling
rate by 6% on playback, you will produce a shift in pitch of a semitone.
There is another assumption that is made in digital audio (and in film,
but it’s less critical). This is that the sampling rate does not change over
time – neither when you’re recording nor on playback.
Let’s think of the simple case of a sine tone. If we record a sine wave
with a perfectly stable sampling rate, and play it back with a perfectly stable
sampling rate with the same frequency as the recording sampling rate, then
we get out what we put in (ignoring any quantization or aliasing effects...).
We know that if we change the sampling rate of the playback, we’ll shift the
frequency of the sine tone. Therefore, if we modulate the sampling rate with
a regular signal, shifting it up and down over time, then we are subjecting
our sine tone to frequency modulation or FM .
1
0.5
-0.5
-1
0 20 40 60 80 100 120 140 160 180 200
0.5
-0.5
-1
0 20 40 60 80 100 120 140 160 180 200
marketing. I’ll try to write this section in an unbiased fashion, but you’d
better keep my personal cynicism in mind as you read...
Passband ripple
Once upon a time, there was a voice of sanity in the midst of the noise.
Listserves and newgroups on Usenet would be filled with people spouting
opinions on why bigger was better when it came to numbers describing
digital audio. All sorts of strange ideas were (are) put forward by people
who don’t know the quote by George Eliot – “Blessed is the man who,
having nothing to say, abstains from giving in words evidence to that fact”
or a similar piece of advice from Abraham Lincoln – “Better to be thought
a fool than to open your mouth and remove all doubt.” That lonely voice
belonged to a man named Julian Dunn. Mr. Dunn wrote a paper that
suggested that there was a very good reason why higher sampling rate may
result in better sounding audio even if you can’t hear above 20 kHz. He
showed that the antialiasing filters used within ADC’s do not have a flat
frequency response in their passband. And, not only was their frequency
response not flat, but they typically have a periodic ripple in the frequency
domain. Of course, there’s a catch – the ripple that we’re talking about is on
the order of 0.1 dB peak-to-peak, so we’re not talking about a big problem
here...
The interesting thing is that this frequency response irregularity can
be reduced by increasing your sampling rate and reducing the slope of the
antialiasing filters. Therefore, it’s possible that higher sampling rates sound
better because of reduced artifacts caused by the filters.
Dunn also noted that, if you’re smart, you can design your reconstruction
filter in your DAC to have the same ripple with the opposite phase (in the
7. Digital Audio 490
frequency domain), thus canceling the effects of both filters and producing
a perfectly flat response of the total system. Of course, this would mean
that all manufacturers of ADC’s and DAC’s would have to use the same
filters and that would, in turn mean that no converter would sound better
than another which would screw up the pricing structure of that market...
So most people that make converters (especially expensive ones) probably
think that this is a bad idea.
You can download a copy of this paper from the web at www.nanophon.com.
7.10.1 History
NOT YET WRITTEN
7.10.2 ATRAC
NOT YET WRITTEN
7.10.3 PASC
NOT YET WRITTEN
7.10.4 Dolby
NOT YET WRITTEN
7.10.5 MPEG
NOT YET WRITTEN
Layer II
NOT YET WRITTEN
AAC
NOT YET WRITTEN
7.10.6 MLP
NOT YET WRITTEN
7. Digital Audio 493
derived from, among other things, the frame rates of NTSC and PAL video.
To begin with, it was decided that the minimum sampling rate was 40
kHz to allow for a 20 kHz minimum Nyquist frequency. Remember that the
audio samples were stored as black and white stripes in the video signal, so
a number above 40 kHz had to be found that fit both formats nicely. NTSC
video has 525 lines per frame (of which 490 are usable lines for recording
signals) at a frame rate of 29.97 Hz. This can be further divided into 245
usable lines per field (there are 2 fields per frame) at a field rate of 59.95
Hz. If we put 3 audio samples on each line of video, then we arrive at the
following equation [Watkinson, 1988]:
59.94 Hz * 245 lines per field * 3 samples per line = 44.0559 Hz
PAL is slightly different. Each frame has 625 lines (with 588 usable lines)
at 25 Hz. This corresponds to 294 usable lines per field at a field rate of
50 Hz. Again, with 3 audio samples per line of video, we have the equation
[Watkinson, 1988]:
50.00 * 294 lines per field * 3 samples per line = 44.1000 Hz
These two resulting sampling rates were deemed to be close enough (only
a 0.1% difference in sampling rate) to be compatible (this difference in sam-
pling rate corresponds to a pitch shift of about 0.0175 of a semitone).
This is perfect, but we’re forgetting one small thing... most people record
in stereo. Therefore, the EIAJ format was developed from these equations,
resulting in 6 samples per video line (3 for each channel).
There is one odd addition to the story. Technically speaking, the com-
pact disc format really had no ties with video (back in 1983, you couldn’t
play video off a CD yet) but the equipment that was used for recording and
mastering was video-based. Interestingly, NTSC professional video gear (the
U-Matic format) can run at frame rate of 30 fps, and is not locked to the
29.97 of your television at home. Consequently, if you re-do the math with
this frame rate, you’ll find that the resulting sampling rate is exactly 44.1
kHz. Therefore, to ensure maximum compatibility and still keep a techni-
cally achievable sampling rate, 44.1 kHz was chosen to be the standard.
Laser beam
ser
by la
tracked
Path
ot
r sp
Lase
Figure 7.43: A single 14-bit channel word represented as “bumps” on the CD. Notice that the spot
formed by the laser beam is more than twice as wide as the width of the bump. This is intentional.
The laser spot diameter is approximately 1.2 µm. The bump width is 0.5 µm, and the bump height
is 0.13 µm. The track pitch (the distance between this row of bumps and an adjacent row) is 1.6
µm [Watkinson, 1988]. Also remember that, relative to most CD players, this drawing is upside
down – typically the laser hits the CD from below.
The wavelength λ of the laser light is 0.5 µm. The bump height is 0.13
µm, corresponding to approximately λ4 for the laser. As a result, when the
laser spot is hitting a bump on the disc, the reflections from both the bump
and the adjacent lands (remember that the laser spot is wider than the
7. Digital Audio 498
Sensor One-way
mirror
Laser
Figure 7.44: Simplified diagram showing how the laser is reflected off the CD. The laser shines
through a semi-reflective mirror, bounces off the CD, reflects off the mirror and arrives at the
sensor [Watkinson, 1988].
7. Digital Audio 500
27° 0.7 mm
Focusing lens
Laser beam
(incoming and reflected)
Figure 7.45: Cross section of a CD showing the polycarbonate base, the reflective aluminum coating
as well as the protective lacquer coating. The laser has a diameter of 0.7 mm when it hits the surface
of the disc, therefore giving it a reasonable immunity to dirt and scratches [Watkinson, 1988].
disc. We go to the table, look up that number and we get the corresponding
number 10000001000010.
Data value Data bits Channel bits
(decimal) (binary) (binary)
101 01100101 00000000100010
102 01100110 01000000100100
103 01100111 00100100100010
104 01101000 01001001000010
105 01101001 10000001000010
106 01101010 10010001000010
107 01101011 10001001000010
108 01101100 01000001000010
109 01101101 00000001000010
110 01101110 00010001000010
Table 7.4: A small portion of the table of equivalents in EFM. The value that we are trying to put
on the disc is the 8-bit word in the middle column. The actual word printed on the disc is the
14-bit word in the right column [Watkinson, 1988].
T 1 2 3 4 5 6 7 8 9 1011121314
101
102
103
Figure 7.46: Three examples of the representation of the data word from Table 7.4 being repre-
sented as pits and lands using EFM.
if the 14-bit word has a string of 1’s in it? Aren’t we still stuck with the
original problem, only worse? Well, yes. But, the clever people that came up
with this idea were very careful about choosing their 14-bit representative
words. They made sure that there are no 14-bit values with 1’s separated
be less than two 0’s. Huh? For example, non of the 14-bit words in the
lookup table contain the codes 11 or 101 anywhere. Take a look at the
small example in Table 7.4. You won’t find any 1’s that close together -
minimum separation of two 0’s at any time. In real textbooks they talk
about a minimum period between transitions of 3T where T is the period of
1 bit in the 14-bit word. (This period T is 231.4 ns, corresponding to a data
rate of 4.3218 MHz [Watkinson, 1988] – but remember, that’s the data rate
of the 14-bit word, not the signal stamped on the disc.) This guarantees that
the transition rate on the disc cannot exceed 720 kHz, which is acceptably
high.
So, that looks after the highest frequency, but what about the lowest
possible frequency of bump transitions? This is looked after by setting a
maximum period between transitions of 11T , therefore there are no 14-bit
words with more than ten 0’s between 1’s. This sets our minimum transition
frequency to 196 kHz which is acceptably low.
Let’s talk a little more about why we have this low-frequency limitation
on the data rate. Remember that when we talk about a “period between
transitions of 11T ” we’re directly talking about the length of the bump (or
not-bump) on the disc surface. We’re already seen that the rotational speed
of the disc is constantly changing as the laser gets further and further away
from the centre. This speed change is done in order to keep the data rate
constant – the physical length of a bump of 9T at the beginning of the disc
is the same as that of a bump of 9T at the end of the disc. The problem is,
if you’re the sensor responsible for converting bump length into a number,
7. Digital Audio 502
T 1 2 3 4 5 6 7 8 9 1011121314
T<3
T > 11
Figure 7.47: Two examples of invalid codes. The errors are circled. The top code cannot be used
since one of the pits is shorter than 3T . The bottom code is invalid because the land is longer then
11T .
you really need to know how to measure the bump length. The longer the
bump, the more difficult it is to determine the length, because it’s a longer
time since the last transition.
To get an idea of what this would be like, stand next to a train track and
watch a slowing train as it goes by. Count the cars, and get used to the speed
at which they’re going by, and how much they’re slowing down. Then close
your eyes and keep counting the cars. If you had to count for 3 cars, you’d
probably be pretty close to being right in synch with the train. If you had
to count 9 cars, you’d probably be wrong, or at least losing synchronization
with the train. This is exactly the same problem that the laser sensor has
in estimating pit lengths. The longer the pit, the more likely the error, so
we keep a maximum of 11T to minimize the likelihood of errors.
7.13 DVD-Audio
NOT YET WRITTEN
7.16.1 AIFF
NOT YET WRITTEN
7.16.2 WAV
NOT YET WRITTEN
7.16.3 SDII
NOT YET WRITTEN
7.16.4 mu-law
NOT YET WRITTEN
signal input
math
signal output
Figure 8.1: DSP in a very small nutshell. The “math” that is done is the processing done to the
digital signal. This could be a delay, an EQ or a reverb unit – either way, it’s just math.
Essentially, that’s just about it. The idea is that you have some signal
that you have converted from analog to a discrete representation using the
procedure we saw in Section 7.1. You want to change this signal somehow
– this could mean just about anything... you might want to delay it, filter
it, mix it with another signal, compress it, make reverberation out of it –
anything... Everything that you want to do to that signal means that you
are going to take a long string of numbers and turn them into a long string of
different numbers. This is processing of the digital signal or, Digital Signal
Processing – it’s the math that is applied to the signal to turn it into the
509
8. Digital Signal Processing 510
x[t] + y[t]
-k
a
Z
Figure 8.2: Basic block diagram of a comb filter implemented in the digital domain.
Let’s look at all of the components of Figure 8.2 to see what they mean.
Looking from left to right you can see the following:
through that little triangle (which means, in this case, everything that
comes out of the delay box) gets multiplied by a.
• The circle with the + sign in it indicates that the two signals coming
in from the left and below are added together and sent out the right.
Sometimes you will also see this as a circle with a Σ in it instead. This
is just another way of saying the same thing.
• yt As you can probably guess, this is the value of the sample at time
t at the output (hence the y) of the system.
There is another way of expressing this block diagram using math. Equa-
tion 8.1 shows exactly the same information without doing any drawing.
yt = xt + axt−k (8.1)
So, according to the DSP processor, where the sampling rate has a fre-
quency of “1”, the signal will have a frequency that can range from 0 to 0.5.
This is called the normalized frequency of the signal.
There’s an important thing to remember here. Usually people use the
word “normalize” to mean that something is changed (you normalize a mix-
ing console by returning all the knobs to a default setting, for example).
With normalized frequency, nothing is changed – it’s just a way of describ-
ing the frequency of the signal.
PUT A SHORT DISCUSSION HERE REGARDING THE USE OF ωt
Note:
If you’re unhappy with the concepts of real and imaginary components in
a signal, and how they’re represented using complex numbers, you’d better
go back and read Chapter 1.5.
Figure 8.3: The bottom plot is the sum of the top two plots.
If we continue with this series, adding a sine wave at 5 times the fre-
quency and 1/5th the amplitude, 7 times the frequency and 1/7th the am-
8. Digital Signal Processing 514
Figure 8.4: The sum of odd harmonics of a sine wave where the amplitude of each is 1/n where
‘n’ is the harmonic number up to the 31st harmonic.
Figure 8.5: The bottom plot is the sum of the top two plots. Note that the frequencies and
amplitudes of the two components are identical to those shown in Figure 1, however, the result of
adding the two produces a very different waveform.
Figure 8.6: The sum of odd harmonics of a sine wave where the amplitude of each is 1/n where
‘n’ is the harmonic number up to the 31st harmonic.
8. Digital Signal Processing 516
not the case for digital audio signals, where time and amplitude are divided
into discrete divisions. Consequently, in digital audio we use a variation on
the FFT called a Discrete Fourier Transform or DFT. This is what we’ll
look at. One thing to note is that most people in the digital world use the
term FFT when they really mean DFT – in fact, you’ll rarely hear someone
talk about DFT’s – even through that’s what they’re doing. Just remember,
if you’re doing what you think is an FFT to a digital signal, you’re really
doing a DFT.
Figure 8.8: The top plot is the original signal. The middle plot is one period of a cosine wave
(minus the last sample). The bottom plot is the result when we multiply the top two, sample by
sample.
Now, take the list of numbers that you’ve just created and add them
all together (for this particular example, the result happens to be 6.8949).
This is the “real” component at a frequency whose period is the length of
the audio window. (In our case, the window length is 1024 samples, so the
fs
period for this component is 1024 where fs is the sampling rate.)
Repeat the process, but use a sine wave instead of a cosine and you get
the imaginary component for the same frequency, shown in Figure 8.9.
Take that list of numbers and add them and the result is the “imaginary”
component at a frequency whose period is the length of the sample (for this
particular example, the result happens to be 0.9981).
8. Digital Signal Processing 518
Figure 8.9: The top plot is the original signal. The middle plot is one period of an inverted sine
wave (minus the last sample). The bottom plot is the result when we multiply the top two, sample
by sample.
Let’s assume that the sampling rate is 44.1 kHz, this means that our bin
representing the frequency of 43.0664 Hz (remember, 44100/1024) contains
the complex value 6.8949 + 0.9981i. We’ll see what we can do with this in
a moment.
Now, repeat the same procedure using the next harmonic of the cosine
wave, shown in Figure 8.10.
Take that list of numbers and add them and the result is the “real”
component at a frequency whose period is the one half the length of the
audio window (for this particular example, the result happens to be -4.1572).
And again, we repeat the procedure with the next harmonic of the sine
wave, shown in Figure 8.11.
Take that list of numbers and add them and the result is the “imaginary”
component at a frequency whose period is the length of the sample (for this
particular example, the result happens to be -1.0118).
If you want to calculate the frequency of this bin, it’s 2 times the fre-
quency of the last bin (because the frequency ofthe cosine
and sine waves
fs
are two times the fundamental). Therefore it’s 2 1024 , or, in this example,
2 * (44100/1024) = 86.1328 Hz.
Now we have the 86.1328 Hz bin containing the complex number -4.1572
– 1.0118i.
This procedure is repeated, using each harmonic of the cosine and sine
until you get up to a frequency where you have 1024 periods of the cosine
and sine in the window. (Actually, you just go up to the frequency where
8. Digital Signal Processing 519
Figure 8.10: The top plot is the original signal. The middle plot is two periods of a cosine wave
(minus the last sample). The bottom plot is the result when we multiply the top two, sample by
sample.
Figure 8.11: The top plot is the original signal. The middle plot is two periods of an inverted sine
wave (minus the last sample). The bottom plot is the result when we multiply the top two, sample
by sample.
8. Digital Signal Processing 520
the number of periods in the cosine or the sine is equal to the length of the
window in samples.)
Using these numbers, we can create Table 8.1.
How can we use this information? Well, remember from the chapter on
complex numbers that the magnitude of a signal – essentially, the amplitude
of the signal that we see – is calculated from the real and imaginary compo-
nents using the Pythagorean theorem. Therefore, in the example above, the
magnitude response can be calulated by taking the square root of the sum
of the squares of the real and imaginary results of the DFT. Huh? Check
out Table 8.2.
p
Bin Number Frequency (Hz) √ real2 + imag 2 Magnitude
0 0 Hz 2
√4.4308 4.4308
1 43.0664 Hz 2 2
√6.8949 + 0.9981 6.9668
2 86.1328 Hz −4.15722 + −1.01182 4.2786
Table 8.2: The magnitude of each bin, calculated using the data in Table 8.1
If we keep filling out this table up to the 1024th bin, and graphed the
results of Magnitude vs. Bin Frequency we’d have what everyone calls the
Frequency Response of the signal. This would tell us, frequency by frequency
the amplitude relationship of the various harmonics in the signal. The one
thing that it wouldn’t tell us is what the phase relationship of the vari-
ous harmonics are. How can we caluclate that? Well, remember from the
chapter on trigonometry that the relative levels of the real and imaginary
components can be calulated using the phase and amplitude of a signal.
Also, remember from the chapter on complex numbers that the phase of the
signal can be calculated using the relative levels of the real and imaginary
components using the equation:
8. Digital Signal Processing 521
imaginary
φ = arctan( ) (8.2)
real
So, now we can create a table of phase relationships, bin by bin as shown
in Table 8.3:
So, for example, the signal shown in Figure 3 has a component at 86.1328
Hz with a magnitude of 4.2786 and a phase offset of 13.6561◦ .
Note that you’ll sometimes hear people saying something along the lines
of the real component being the signal and the imaginary component con-
taining the phase information. If you hear this, ignore it – it’s wrong. You
need both the real and the imaginary components to determine both the
magnitude and the phase content of your signal. If you have one or the
other, you’ll get an idea of what’s going on, but not a very good one.
129 of those are usable (256/2 + 1). In real books on DSP, they’ll tell you
that, for an N-point DFT, you get N/2+1 bins. These bins go from 0 Hz up
to the Nyquist frequency or f2s .
Also, you’ll notice that at the first and last bins (at 0 Hz and the Nyquist
frequency) only contain real values – no imaginary components. This is
because, in both cases, we can’t calculate the phase. There is no phase
information at 0 Hz, and since, at the Nyquist frequency, the samples are
always hitting the same point on the sinusoid, we don’t see its phase.
0.8
0.6
0.4
0.2
Level
-0.2
-0.4
-0.6
-0.8
-1
0 100 200 300 400 500 600 700 800 900 1000
Time
Figure 8.12:
50
-50
-100
Level (dB)
-150
-200
-250
-300
-350
0 1 2 3
10 10 10 10
Frequency (Hz)
Figure 8.13:
8. Digital Signal Processing 525
You’ll notice that there’s a big spike at one frequency and a little noise
(very little... -350 dB is a VERY small number) in all of the other bins.
In case you’re wondering, the noise is caused by the limited resolution of
MATLAB which I used for creating these graphs. MATLAB calculates
numbers with a resolution of 64 bits. That gives us a dynamic range of
about 385 dB or so to work with – more than enough for this textbook...
More than enough for most things, actually...
Now, what would happen if we had a different frequency? For example,
Figure 8.14 shows a sine wave with a frequency of 0.875 times the one in
Figure 8.12.
0.8
0.6
0.4
0.2
Level
-0.2
-0.4
-0.6
-0.8
-1
0 100 200 300 400 500 600 700 800 900 1000
Time
Figure 8.14:
100
50
-50
-100
Level (dB)
-150
-200
-250
-300
-350
0 1 2 3
10 10 10 10
Frequency (Hz)
Figure 8.15:
0.8
0.6
0.4
0.2
Level
-0.2
-0.4
-0.6
-0.8
-1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time
Figure 8.16:
If we take the second sine wave and repeat it, we get Figure 8.17. Now,
we can see that things aren’t so pretty. Because the length of the signal is
not an integer number of cycles of the sine wave, when we repeat it, we get
a nasty-look change in the sine wave. In fact, if you look at Figure 8.17, you
can see that it can’t be called a sine wave any more. It has some parts that
look like a sine wave, but there’s a spike in the middle. If we keep repeating
the signal over and over, we’ve get a spike for every repetition.
That spike (also called a discontinuity in the time signal contains energy
8. Digital Signal Processing 527
0.8
0.6
0.4
0.2
Level
-0.2
-0.4
-0.6
-0.8
-1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time
Figure 8.17:
in frequency bins other than where the sine wave is. in fact, this energy can
be seen in the DFT that we did in Figure 8.15.
The moral of the story thus far is that if your signal’s period is not the
same length as 1 more sample than the window of time you’re doing the
DFT on, then you’re going to get a strange result. Why does it have to be 1
more sample? This is because if the period was equal to the window length,
then when you repeated it, you’d get a repetition of the signal because the
first sample in the window is the same as the last. (In fact, if you look
carefully at the end of the signal in Figure 8.12, you’ll see that it doesn’t
quite get back to 0 for exactly this reason.
How do we solve this problem? Well, we have to do something to the
signal to make sure that the nasty spike goes away. The concept is basically
the same as doing a crossfade between two sounds – we’re just going to make
sure that the signal in the window starts at 0, fades in, and then fades away
to 0 before we do the DFT. We do this by introducing something called a
windowing function. This is a list of gains that are multiplied by our signal
as is shown in Figure 8.18.
Let’s take the signal in Fire 8.14 – the one that caused us all the problems.
If we multiply each of the samples in that signal with its corresponding gain
shown in Figure 8.18, then we get a signal that looks like the one shown in
Figure 8.19.
You’ll notice that the signal still has the sine wave from Figure 8.14, but
we’ve changed its level over time so that it starts and ends with a 0 value.
This way, when we repeat it, the ends join together seamlessly. Now, if we
8. Digital Signal Processing 528
0.8
0.6
Gain
0.4
0.2
0
0 100 200 300 400 500 600 700 800 900 1000
Position in Window
Figure 8.18: An example of a windowing function. The X-value of the signal corresponds to a
position within the window. The Y-value is a gain multiplier used to change the level of the signal
we’re measuring.
0.8
0.6
0.4
0.2
Level
-0.2
-0.4
-0.6
-0.8
-1
0 100 200 300 400 500 600 700 800 900 1000
Time
Figure 8.19: The result of the signal in Figure 8.14 multiplied by the gain function shown in Figure
8.18.
8. Digital Signal Processing 529
50
-50
-100
Level (dB)
-150
-200
-250
-300
-350
0 1 2 3
10 10 10 10
Frequency (Hz)
Figure 8.20:
Okay, so the end result isn’t perfect, but we’ve attenuated the junk
information by as much as 100 dB, which, if you ask me is pretty darned
good.
Of course, like everything in life, this comes at a cost. What happens if
we apply the same windowing function to the well-behaved signal in Figure
8.12 and do a DFT? The result will look like Figure 8.21.
0.8
0.6
0.4
0.2
Level
-0.2
-0.4
-0.6
-0.8
-1
0 100 200 300 400 500 600 700 800 900 1000
Time
Figure 8.21:
So, you can see that applying the windowing function made bad things
better but good things worse. The moral here is that you need to know that
8. Digital Signal Processing 530
using a windowing function will have an effect on the output of your DFT
calculation. Sometimes you should use it, sometimes you shouldn’t. If you
don’t know whether you should or not, you should try it with and without
and decide which worked best.
100
50
-50
-100
Level (dB)
-150
-200
-250
-300
-350
0 1 2 3
10 10 10 10
Frequency (Hz)
Figure 8.22:
So, we’ve seen that applying a windowing function will change the result-
ing frequency response. The good thing is that this change is predictable,
and different for different functions, so you can not only choose whether or
not to window your signal, but you can also choose what kind of window to
use according to your requirements.
There are essentially an infinite number of different windowing functions
available for you, but there are a number of standard ones that everyone uses
for different reasons. We’ll look at only three – the rectangular, Hanning
and Hamming functions.
8.3.1 Rectangular
The rectangular window is the simplest of all the windowing functions be-
cause you don’t have to do anything. If you multiply each value in your time
signal by 1 (or do nothing to your signal) then your gain function will look
like Figure 8.23. This graph looks like a rectangle, so we call doing nothing
a rectangular window.
What does this do to our frequency response? Take a look at Figure
8.24 which is an 8192-point DFT of the signal in Figure ??.
This shows the general frequency response curve that is applied to the
8. Digital Signal Processing 531
0.8
0.6
Gain
0.4
0.2
Figure 8.23: Time vs. gain response of rectangular windowing function 8192 samples long.
-50
Attenuation (dB)
-100
-150
-500 -400 -300 -200 -100 0 100 200 300 400 500
∆ Frequency (Hz)
Figure 8.24: Frequency response of rectangular windowing function. The holes in the response are
where the magnitude drops to −∞.
8. Digital Signal Processing 532
-10
-20
-30
Attenuation (dB)
-40
-50
-60
-70
-80
-90
-100
-60 -40 -20 0 20 40 60
∆ Frequency (Hz)
8.3.2 Hanning
You have already seen one standard windowing function in Figure 8.18.
This is known as a Hanning window and is defined using the equation below
[Morfey, 2001].
1
1 + cos 2πt for − 12 T 6 t 6 12 T
w(t) = 2 T
0 otherwise
This looks a little complicated, but if you spend some time with it, you’ll
see that it is exactly the same math as a cardioid microphone. It says that,
within the window, you have the same response as a cardioid microphone,
and outside the window, the gain is 0. This is shown in Figure 8.26.
8. Digital Signal Processing 533
0.8
0.6
Gain
0.4
0.2
Figure 8.26: Time vs. gain response of Hanning windowing function 8192 samples long.
-50
Attenuation (dB)
-100
-150
-500 -400 -300 -200 -100 0 100 200 300 400 500
∆ Frequency (Hz)
10
-10
-20
-30
Attenuation (dB)
-40
-50
-60
-70
-80
-90
-100
-60 -40 -20 0 20 40 60
∆ Frequency (Hz)
8.3.3 Hamming
Our third windowing function is known as the Hamming window. The
equation for this is
EQUATION FOR HAMMING TO GO HERE
This gain response can be seen in Figure 8.29. Notice that this one is
slightly weird in that it never actually reaches 0 at the ends of the window,
so you don’t get a completely smooth transition.
The frequency response of the Hamming window is shown in Figure
8.30. Notice that the rejection of frequencies far away from the centre is
better than with the rectangular window, but worse than with the Hanning
function.
So, why do we use the Hamming window instead of the Hanning if its
rejection is worse away from the 0 Hz line? The centre lobe is still quite
wide, so that doesn’t give us an advantage. However, take a look at the lobes
adjacent to the centre in Figure 8.31. Notice that these are quite low, and
very narrow, particularly when they’re compared to the other two functions.
We’ll look at them side-by-side in one graph a little later, so no need to flip
pages back and forth at this point.
8. Digital Signal Processing 535
0.8
0.6
Gain
0.4
0.2
Figure 8.29: Time vs. gain response of Hamming windowing function 8192 samples long.
-50
Attenuation (dB)
-100
-150
-500 -400 -300 -200 -100 0 100 200 300 400 500
∆ Frequency (Hz)
10
-10
-20
-30
Attenuation (dB)
-40
-50
-60
-70
-80
-90
-100
-60 -40 -20 0 20 40 60
∆ Frequency (Hz)
8.3.4 Comparisons
Figure 8.32 shows the three standard windows compared on one graph. As
you can see, the rectangular window in blue has the narrowest centre lobe
of the three, but the least attenuation of its other lobes. The Hanning
window in black has a wider centre lobe but good rejection of its other lobes,
getting better and better as we get further away in frequency. Finally, the
Hamming window in red has a wide centre lobe, but much better rejection
in its adjacent lobes.
10
-10
-20
-30
Attenuation (dB)
-40
-50
-60
-70
-80
-90
-100
-100 -80 -60 -40 -20 0 20 40 60 80 100
∆ Frequency (Hz)
Figure 8.32: Frequency response of three common windowing functions. Rectangular (blue), Ham-
ming (red) and Hanning (black).
8. Digital Signal Processing 537
a0
x[t] + y[t]
a1
Z -k
Figure 8.33: Basic block diagram of a comb filter implemented in the digital domain.
yt = a0 xt + a1 xt−k (8.3)
This implementation is called a Finite Impulse Response comb filter or
FIR comb filter because, as we’ll see in the coming sections, its impulse
response is finite (meaning it ends at some predictable time) and that its
frequency response looks a little like a hair comb.
As we can see in the diagram, the output consists of the addition of two
signals, the original input and a gain-modified delayed signal (the delay is the
block with the Z −k in it. We’ll talk later about why that notation is used,
8. Digital Signal Processing 539
but for now, you’ll need to know that the delay time is k samples.). Let’s
assume for the remainder of this section that the gain value a is between -1
and 1, and is not 0. (If it was 0, then we wouldn’t hear the output of the
delay and we wouldn’t have a comb filter, we’d just have a “through-put”
where the output is identical to the input.)
If we’re thinking in terms of acoustics, the direct sound is simulated by
the non-delayed signal (the through-put) and the reflection is simulated by
the output of the delay.
1.8
1.6
1.4
1.2
Gain
1
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.34: Frequency response of an FIR comb filter with a delay of 1 sample, a0 = 1, and a1 = 1
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.35: Frequency response of an FIR comb filter with a delay of 2 samples, a0 = 1, and
a1 = 1
As we increase the delay time in the FIR comb filter, the first notch in
the frequency response drops lower and lower in frequency as can be seen in
Figures 8.36 and 8.37.
8. Digital Signal Processing 541
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.36: Frequency response of an FIR comb filter with a delay of 3 samples, a0 = 1, and
a1 = 1
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.37: Frequency response of an FIR comb filter with a delay of 4 samples, a0 = 1, and
a1 = 1
8. Digital Signal Processing 542
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
-3 -2 -1
10 10 10
Normalized Frequency
Figure 8.38: Frequency response of an FIR comb filter with a delay of 3 samples, a0 = 1, and
a1 = 1. Note that this is the same as the graph in Figure 8.36
So far, we have kept the gain on the output of the delay at 1 to make
things simple. What happens if this is set to a smaller (but still positive)
number? The bumps and notches in the frequency response will still be in
the same places (in other words, the won’t change in frequency) but they
won’t be as drastic. The bumps won’t be as big and the notches won’t be
as deep.
In the previous section we limited the value of the gain applied to the delay
component to positive values only. However, we also have to consider what
happens when this gain is set to a negative value. In essence, the behaviour
is the same, but we have a reversal between the constructive and destructive
interferences. In other words, what were bumps before become notches, and
the notches become bumps.
For example, let’s use an FIR comb filter with a delay of 1 sample and
8. Digital Signal Processing 543
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.39: Frequency response of an FIR comb filter with a delay of 3 samples, a0 = 1. Black
a1 = 1, blue a1 = 0.5, red a1 = 0.25.
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
-3 -2 -1
10 10 10
Normalized Frequency
Figure 8.40: Frequency responses of an FIR comb filter with a delay of 3 samples, a0 = 1. Black
a1 = 1, blue a1 = 0.5, red a1 = 0.25.
8. Digital Signal Processing 544
10
Gain (dB)
0
-2
-4
-6
-8
-10
-3 -2 -1
10 10 10
Normalized Frequency
Figure 8.41: Frequency responses of an FIR comb filter with a delay of 3 samples, a0 = 1. Black
a1 = 1, blue a1 = 0.5, red a1 = 0.25.
where a0 = 1, and a1 = −1. At DC, the output of the delay component will
be identical to but opposite in polarity with the non-delayed component.
This means that they will cancel each other and we get no output from the
filter. At a normalized frequency of 0.5 (the Nyquist Frequency) the two
components will be 180◦ out of phase, but since we’re multiplying one by
-1, they add to make twice the input value.
The end result is a frequency response as is shown in Figures 8.42.
If we have a longer delay time, then we get a similar behaviour as is
shown in Figure ??.
If the gain applied to the output of the delay is set to a value greater
than -1 but less than 0, we see a similar reduction in the deviation from a
gain of 1 as we saw in the examples with FIR comb filters with a positive
gain delay component.
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.42: Frequency response of an FIR comb filter with a delay of 1 sample, a0 = 1, and
a1 = −1
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.43: Frequency response of an FIR comb filter with a delay of 4 samples, a0 = 1, and
a1 = −1
8. Digital Signal Processing 546
Figure 8.44 shows three examples of different FIR comb filter impulse
responses and their corresponding frequency responses. Note that the delay
values for all three filters are the same, therefore the notches and peaks in
the frequency responses are all matched. Only the value of a1 was changed,
therefore modifying the amount of modulation in the frequency responses.
10
1
5
0.5 0
-5
0
-10
0 2 4 6 8 10 -2
10
10
1
5
0.5 0
-5
0
-10
0 2 4 6 8 10 -2
10
10
1
5
0.5 0
-5
0
-10
0 2 4 6 8 10 -2
10
Impulse responses Frequency responses
Figure 8.44: Impulse and corresponding frequency responses of FIR comb filters with a delay of 3
samples. Black a1 = 1, blue a1 = 0.5, red a1 = 0.25.
a0
x[t] + y[t]
a1
-k1
Z
a2
-k2
Z
a3
-k3
Z
ad
-kn
Z
that there can be any number of delays, each with its own delay time and
gain.
a0
x[t] + y[t]
a1
-k
Z
Figure 8.46: Basic block diagram of an IIR comb filter implemented in the digital domain.
samples ago indicated by the yt−k . Of course, both of these components are
multiplied by their respective gain factors.
yt = a0 xt + a1 yt−k (8.7)
Positive Feedback
Let’s use the block diagram shown in Figure 8.33 and make an IIR comb.
We’ll make a0 = 1, a1 = 0.5 and k = 3. The result of this is that a Dirac
impulse comes in the filter and immediately appears at the output (because
a0 = 1). Three samples later, it also comes out the output at one-half the
level (because a1 = 0.5 and k = 3).
The resulting impulse response will look like Figure 8.47.
0.8
0.6
0.4
0.2
Value
-0.2
-0.4
-0.6
-0.8
-1
-5 0 5 10 15 20
Time (samples)
Figure 8.47: The impulse response of the IIR comb filter shown in Figure 8.33 and Equation 8.8
with a0 = 1, a1 = 0.5 and k = 3.
Note that, in Figure 8.47, I only plotted the first 20 samples of the
impulse response, however, it in fact extends to infinity.
What will the frequency response of this filter look like? This is shown
in Figure 8.48.
Compare this with the FIR comb filter shown in Figure 8.39. There are
a couple of things to notice about the similarity and difference between the
two graphs.
The similarity is that the peaks and dips in the two graphs are at the
same frequencies. They aren’t the same basic shape, but they appear at the
same place. This is due to the matching 3-sample delays and the fact that
the gain applied to the delays are both positive.
8. Digital Signal Processing 551
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.48: The frequency response of the IIR comb filter shown in Figure 8.33 and Equation 8.8
with a0 = 1, a1 = 0.5 and m = 3.
The difference between the graphs is obviously the shape of the curve
itself. Where the FIR filter had broad peaks and narrow dips, the IIR filter
has narrow peaks and broad dips. This is going to cause the two filters to
sound very different. Generally speaking, it is much easier for us to hear a
boost than a cut. The narrow peaks in the frequency response of an IIR filter
are immediately obvious as boosts in the signal. This is not only caused by
the fact that the narrow frequency bands are boosted, but that there is a
smearing of energy in time at those frequencies known as ringing. In fact,
if you take an IIR filter with a fairly high value of a1 – say between 0.5 and
0.999 of a0 , and put in an impulse, you’ll hear a tone ringing in the filter.
The higher the gain of a1 , the longer the ringing and the more obvious the
tone. This frequency response change can be seen in Figure 8.49.
Negative Feedback
Just like the FIR counterpart, an IIR comb filter can have a negative gain
at the delay output. As can be seen in Figure 8.48, positive feedback causes
a large boost in the low frequencies with a peak at DC. This can be avoided
by using a negative feedback value.
The interesting thing here is that the result of the negative feedback
through a delay causes the impulse response to flip back and forth in polarity
as can be seen in Figure 8.50.
The resulting frequency response for this filter is shown in Figure 8.51.
IIR comb filters with negative feedback suffer from the same ringing
8. Digital Signal Processing 552
10
Gain
5
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.49: The frequency response of the IIR comb filter shown in Figure 8.33 and Equation 8.8
with a0 = 1 and m = 3. Black a0 = 0.999, Red a0 = 0.5 and blue a0 = 0.25.
0.8
0.6
0.4
0.2
Value
-0.2
-0.4
-0.6
-0.8
-1
-5 0 5 10 15 20
Time (samples)
Figure 8.50: The frequency response of the IIR comb filter shown in Figure 8.33 and Equation 8.8
with a0 = 1, a1 = −0.5 and m = 3.
8. Digital Signal Processing 553
1.8
1.6
1.4
1.2
Gain
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency
Figure 8.51: The frequency response of the IIR comb filter shown in Figure 8.33 and Equation 8.8
with a0 = 1, a1 = −0.5 and m = 3.
8.5.2 Danger!
There is one important thing to beware of when using IIR filters. Always
remember that feedback is an angry animal that can lash out and attack you
if you’re not careful. If the value of the feedback gain goes higher than 1,
then things get ugly very quickly. The signal comes out of the delay louder
than it went it, and circulates back to the input of the delay where it comes
out even louder and so on and so on. Depending on the delay time, it will
take a small fraction of a second for the filter to overload. And, since it
has an infinite impulse response, even if you pull the feedback gain back to
less than 1, the distortion that you caused will always be circulating in the
filter. The only way to get rid of it is to drop a1 to 0 until the delay clears
out and then start again. (Although some IIR filters allow you to send a
clear command, telling them to forget everything that’s happened before
now, and to continue on as if you were normal. Take a look at some of the
filters in Max/MSP, for example.)
8.5.3 Biquadratic
While it’s fun to make comb filters to get started (every budding guitarist
has a flanger or a phaser in their kit of pedal effects. The rich kids even have
8. Digital Signal Processing 554
10
Gain
5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency
Figure 8.52: The frequency response of the IIR comb filter shown in Figure 8.33 and Equation 8.8
with a0 = 1 and k = 3. Black a0 = 0.999, Red a0 = 0.5 and blue a0 = 0.25.
both and claim that they know the difference!) IIR filters can be a little more
useful. Now, don’t get me wrong, FIR filters are also useful, but as you’ll see
if you ever have to start working with them, they’re pretty expensive in terms
of processing power. Basically, if you want to do something interesting, you
have to do it for a long time (sort of Zen, no?). If you have an FIR filter
with a long, impulse response, then that means a lot of delays (or long ones)
and lots of math to do. We’ll see a worst-case scenario in Section 8.6.
A simple IIR filter has the advantage of having only three operations (two
multiplies and one add) and one delay, and still having an infinite impulse
response, so you can do interesting things with a minimum of processing.
One of the most common building blocks in any DSP algorithm is a little
package called a biquadratic filter or biquad for short. This is a sort of mini-
algorithm that contains a just small number of delays, gains and additions
as is shown in Figure 8.53, but it turns out to be a pretty powerful little
tool – sort of the op amp of the DSP world in terms of usefulness. CHECK
THIS DIAGRAM. I DON”T THINK THAT IT’S CORRECT
Okay, let’s look at what a biquad does, going through it, gain by gain.
• Now let’s look at a1 . The signal xt comes in, gets delayed by 1 sample,
multiplied by a1 and added to the output. By itself, this is just the
8. Digital Signal Processing 555
a0
x[t] + + y[t]
b1 a1
-1
Z
b2 a2
-1
Z
Figure 8.53: Block diagram of one implementation of a biquad filter. There are other ways to
achieve the same algorithm, but we won’t discuss that in this book. Check this reference for more
information [].
ω = 2πfc (8.10)
where fc is the cutoff frequency (or centre frequency in the case of peaking
filters or the shelf midpoint frequency for shelving filters). Note that the
frequency fc here is given as a normalized frequency between 0 and 0.5. If
you prefer to think in terms of the more usual way of describing frequency,
in Hertz, then you’ll need to do an extra little bit of math as is shown in
Equation 8.11.
2πfc
ω= (8.11)
fs
where fc is the cutoff frequency (or centre frequency in the case of peaking
filters or the shelf midpoint frequency for shelving filters) in Hz and f s is
the sampling rate in Hz. Important note: You should use either Equation
8.10 or Equation 8.11, depending on which way you prefer to state the
frequency. However, I would recommend that you get used to thinking in
terms of normalized frequency, so you should be happy with using Equation
8.10.
cs = cos ω (8.12)
sn = sin ω (8.13)
sn
Q= (8.14)
ln(2)bwω
Low-pass filter
We already know from Section ?? what a low-pass filter is. One way to im-
plement this in a DSP chain is to insert a biquad and calculate its coefficients
using the equations below.
1 − cs
b0 = (8.17)
2
b1 = 1 − cs (8.18)
1 − cs
b2 = (8.19)
2
a0 = 1 + α (8.20)
a1 = −2cs (8.21)
a2 = 1 − α (8.22)
Low-shelving filter
Likewise, we could, instead, create a low-shelving filter using coefficients
calculated in the equations below.
Peaking filter
The following equations will result in a reciprocal peak-dip filter configura-
tion.
b0 = 1 + αA (8.29)
b1 = −2cs (8.30)
b2 = 1 − αA (8.31)
α
a0 = 1 + (8.32)
A
a1 = −2cs (8.33)
α
a2 = 1 − (8.34)
A
High-shelving filter
The following equations will produce a high-shelving filter.
High-pass filter
Finally, the following equations will produce a high-pass filter.
1 + cs
b0 = (8.41)
2
b1 = −1 − cs (8.42)
1 + cs
b2 = (8.43)
2
a0 = 1 + α (8.44)
a1 = −2cs (8.45)
a2 = 1 − α (8.46)
Decorrelation
FINISH THIS OFF
Fractional delays
FINISH THIS OFF
8. Digital Signal Processing 560
a0
xt + yt
-a0
-1 -1
Z Z
Reverberation simulation
FINISH THIS OFF
8.6 Convolution
Let’s think of the most inefficient way to build an FIR filter. Back in Section
?? we saw a general diagram for an FIR that showed a stack of delays, each
with its own independent gain. We’ll build a similar device, but we’ll have an
independent delay and gain for every possible integer delay time (meaning
a delay value of an integer number of samples like 4 or 13, but not 2.4 or
15.2). When we need a particular delay time, we’ll turn on the corresponding
delay’s gain, and all the rest we’ll leave at 0 all the time.
For example, a smart way to do an FIR comb filter with a delay of 6
samples is shown in Figure 8.55 and Equation 8.47.
1
x[t] + y[t]
0.75
-6
Z
Figure 8.55: A smart way to implement an FIR comb filter with a delay of 6 samples and where
a0 = 1 and a1 = 0.75.
There are stupid ways to do this as well. For example take a look at
Figure 8.56 and Equation 8.48. In this case, we have a lot of delays that are
implemented (therefore taking up lots of memory) but their output gains
are set to 0, therefore they’re not being used.
1
x[t] + y[t]
0
-1
Z
0
-2
Z
0
-3
Z
0
-4
Z
0
-5
Z
0.75
-6
Z
0
-7
Z
0
-8
Z
Figure 8.56: A stupid way to implement the same FIR comb filter shown in Figure 8.55.
8. Digital Signal Processing 563
1
x[t] + y[t]
0
-1
Z
0
-1
Z
0
-1
Z
0
-1
Z
0
-1
Z
0.75
-1
Z
0
-1
Z
0
-1
Z
Figure 8.57: A slightly less stupid way to implement the FIR comb filter.
0.8
0.6
Value
0.4
0.2
-1 0 1 2 3 4 5 6 7 8 9 10
Time (samples)
Figure 8.58: The impulse response of the FIR comb filter shown in Figure 8.55.
8. Digital Signal Processing 564
At time 0, the first signal sample gets multiplied by the first value in in
impulse response. That’s the value of the output.
At time 1 (1 sample later) the first signal sample gets “moved” to the
second value in the impulse response and is multiplied by it. At the same
time, the second signal sample is multiplied by the first value in the impulse
response. The results of the two multiplications are added together and
that’s the output value.
At time 2 (1 sample later again...) the first signal sample gets “moved”
to the third value in the impulse response and is multiplied by it. The second
signal sample gets “moved” to the second value in the impulse response and
is multiplied by it. The third signal sample is multiplied by the first value
in the impulse response. The results of the three multiplications are added
together and that’s the output value.
As time goes on, this process is repeated over and over. For each sample
period, each value in the impulse response is multiplied by its corresponding
sample in the signal. The results of all of these multiplications are added
together and that is the output of the procedure for that sample period.
In essence, we’re using the values of the samples in the impulse response
as individual gain values in a multi-tap delay. Each sample is its own tap,
with an integer delay value corresponding to the index number of the sample
in the impulse response.
This whole process is called convolution. What we’re doing is convolving
the incoming signal with an impulse response.
Of course, if your impulse response is as simple as the one shown above,
then it’s still really stupid to do your filtering this way because we’re es-
sentially doing the same thing as what’s shown in Figure 8.57. However, if
you have a really complicated impulse response, then this is the way to go
(although we’re be looking at a smart way to do the math later...).
One reason convolution is attractive is that it gives you an identical
result as using the original filter that’s described by the impulse response
(assuming that your impulse response was measured correctly). So, if you
can go down to your local FIR filter rental store, rent a good FIR comb filter
for the weekend, measure its impulse response and return the filter to the
store on Monday. After that, if you convolve your signals with the measured
impulse response, it’s the same as using the filter. Cool huh?
The reason we like to avoid doing convolution for filtering is that it’s
just so expensive in terms of computational power. For every sample that
comes out of the convolver, its brain had to do as many multiplications as
there are samples in the impulse response, and only one fewer additions. For
example, if your impulse response was 8 samples long, then the convolver
8. Digital Signal Processing 565
does 8 multiplications (one for every sample in the impulse response) and 7
additions (adding the results of the 8 multiplications) for every sample that
comes out. That’s not so bad if your impulse response is only 8 samples
long, but what if it’s something like 100,000 samples long? That’s a lot of
math to do on every sample period!
So, now you’re probably sitting there thinking, “Why would I have an
impulse response of a filter that’s 100,000 samples long?” Well, think back
to Section 3.16 and you’ll remember that we can make an impulse response
measurement of a room. If you do this, and store the impulse response,
you can convolve a signal with the impulse response and you get your sig-
nal in that room. Well, technically, you get your signal played out of the
loudspeaker you used to do the IR measurement at that particular location
in the room, picked up by the measurement microphone at its particular
placement... If you do this in a big concert hall or a church, you could easily
get up to a 4 or 5 second-long impulse response, corresponding to a 220,500-
sample long FIR filter at 44.1 kHz. This then means 440,999 mathematical
operations (multiplications and additions) for every output sample, which
in turn means 19,448,066,900 operations per second per channel of audio...
That’s a lot – far more than a normal computer can perform these days.
So, here’s the dilemma, we want to use convolution, but we don’t have
the computational power to do it the way I just described it. That method,
with all the multiplications and additions of every single sample is called
real convolution.
So, let’s think of a better way to do this. We have a signal that has
a particular frequency content (or response), and we’re sending it through
a filter that has a particular frequency response. The resulting output has
a frequency content equivalent to the multiplication of these two frequency
responses as is shown in Figure 8.59
So, we now know two interesting things:
Luckily, some smart people have figured out some clever ways to do a
DFT that don’t take much computational power. (If you want to learn
about this, go get a good DSP textbook and look up the word butterfly.)
So, what we can do is the following:
8. Digital Signal Processing 566
400
300
Input signal
200
100
0
2 3 4
10 10 10
3
Filter
1
0
2 3 4
10 10 10
600
Output signal
400
200
0
2 3 4
10 10 10
Frequency (Hz)
Figure 8.59: The top graph is the frequency content of an arbitrary signal (actually it’s a recording
of Handel). The middle plot is the frequency response of an FIR comb filter with a 3-sample delay.
The bottom graph is the frequency content of the signal filtered through the comb filter. Notice
that the result is the same as if we had multiplied the top plot by the middle plot, bin by bin.
1. take a slice of our signal and do a DFT on it. This gives us the
frequency content of the signal.
2. take the impulse response of the filter and do a DFT on it.
3. multiply the results of the DFT’s keeping the real and imaginary com-
ponents separate. In other words, you multiply the real components
together, and multiply the imaginary components together, bin by bin.
4. take the resulting real and imaginary components and do an IDFT
(inverse discrete fourier transform), converting from the frequency do-
main to the time domain.
5. send the time domain out.
This procedure, called fast convolution will give you exactly the same re-
sults as if you did real convolution, however you use a lot less computational
power.
There are a couple of things to worry about when you’re doing fast
convolution.
8.6.1 Correlation
NOT YET WRITTEN
8.6.2 Autocorrelation
NOT YET WRITTEN
ω = 2πf (8.49)
1
= a−1 (8.50)
a
The next equation, which we saw in Section 1.6 is called Euler’s Identity.
This states that
You might notice that I made a slight change in Equation 8.51 compared
with Equation 1.60 in that I replaced the θ with an ωt to make things a little
easier to deal with in the time domain. Remember that ω is just another
way of thinking of frequency (see section ??) and t is the time in samples.
We have to make a variation on this equation by adding an extra delay
of k samples:
component is now imaginary, and can’t be mixed with the cos2 (θ). We can
still use this equation to draw a perfect circle, but now we’re using a real
and an imaginary axes as we saw way back in Figure ??.
Okay... that’s about it. Now we’re ready.
yt = xt + a1 xt−k (8.56)
Let’s put the output of a simple phasor (see Section ??) into the input
of this filter. Remember that a phasor is just a sinusoidal wave generator
8. Digital Signal Processing 570
x[t] + y[t]
a1
-k
Z
Figure 8.60: A simple filter using one delay of k samples multiplied by a gain and added to the
input.
xt = ejωt (8.57)
Therefore, we can rewrite Equation 8.56 for the filter as follows:
j ωt
y[t] = e ( 1 + a1 e-jωk )
input what the input
is multiplied
by to get the
output
Notice that the end of Equation 8.60 has a lot of ω’s in it. This means
that it is frequency-dependent. We already know that if you multiply a
signal by something, that “something” is gain. We also know that if that
8. Digital Signal Processing 571
=H(ω)
θ(ω) = arctan (8.63)
<H(ω)
Where the symbols = means “The imaginary component of...” and <
means “The real component of...”
Don’t panic – it’s not so difficult to see what the real and imaginary
components of the H(ω) are because the imaginary components have a j
in them. For example, if we look at the filter that we started with at the
beginning of this chapter, its frequency response was:
=H(ω)
θ(ω) = arctan (8.69)
<H(ω)
−a1 sin(ωk)
= arctan (8.70)
1 + a1 cos(ωk)
a1
x[t] -k y[t]
Z
yt = a1 xt−k (8.71)
= x a1 e−jωk (8.72)
z = ejω (8.73)
What if you want to write a delay, as in e−jωk ?
Well, we’ll just write
(you guessed it...) z . Therefore z is a 2-sample delay because e−jω2 is a
−k 2
z = ejω (8.74)
= cos(ω) + j sin ω (8.75)
w[t] = a0 xt (8.76)
8. Digital Signal Processing 574
a0 a1
x[t] y[t]
a0 a1
x[t] w[t] y[t]
and
yt = a1 w[t] (8.77)
Therefore
yt = a1 w[t] (8.78)
= a1 (a0 xt ) (8.79)
= a0 a1 xt (8.80)
What I took a long while to say here is that, if you have two gain stages
(or, more generally, two filters) connected in series, you can multiply the
effects that they have on the input to find out the output.
So far, we have been thinking in terms of the instantaneous value of
each sample, sample by sample. That’s why we have been using terms like
xt and yt – we’ve been quite specific about which sample we’re talking about.
However, since we’re dealing with LTI systems, we can say that the same
thing happens to every sample in the signal. Consequently, we don’t have to
think in terms of individual samples going through the filter, we can think
of the whole signal (say, a Johnny Cash tune, for example). When we speak
of the whole signal, we use things like X and Y instead of xt and yt . That
whole signal is what is called an operator .
8. Digital Signal Processing 575
In the case of the last filter we looked at (the one with the two gains in
series shown in Figure 8.63), we can write the equation in terms of operators.
Y = [a0 a1 ] X (8.81)
In this case, we can see that the operator X is multiplied by a0 a1 in order
to become Y. This is a simple way to think of DSP if you’re not working in
real time. You take the Johnny Cash tune on your hard disc. Multiply its
contents by a0 a1 and you get an output, which is a Johnny Cash tune at a
different level (assuming that a0 a1 6= 1).
That thing that the input operator is multiplied by is called the transfer
function of the filter and is represented by the symbol H(z). So, the transfer
function of our previous filter is a0 a1 .
Notice that the transfer function in the equation is held in square brack-
ets.
X H Y
filter with the transfer
function H(z)
Figure 8.65: Transfer function of a filter with operators as inputs and outputs.
Let’s look at another simple example using two 1-sample delays con-
nected in series as is shown in Figure 8.66
x[t] -1 -1 y[t]
z z
yt = xt z −1 z −1 (8.82)
−2
= xt z (8.83)
8. Digital Signal Processing 576
and therefore
Y = X z −2
(8.84)
Notice that I was able to simply multiply the two delays z −1 and z −1
together to get z −2 . This is one of the slick things about using z −k to
indicate a delay. If you multiply them together, you just add the exponents,
therefore adding the delay times.
Let’s do one more filter before moving on...
Figure 8.67 shows a fairly simple filter which adds a gain-modified version
of the input with a delayed, gain-modified version of the input.
a0
x[t] + y[t]
a1
-k
Z
Figure 8.67: A filter which adds a gain-modified version of the input with a delayed, gain-modified
version of the input.
yt = a0 xt + a1 xt−k (8.85)
−k
= a0 xt + a1 z xt (8.86)
= a0 + a1 z −k xt (8.87)
Therefore
h i
Y = a0 + a1 z −k X (8.88)
Y = H(z)W (8.89)
8. Digital Signal Processing 577
filter G filter H
a0 b0
X + W + Y
a1 b1
-1 -1
Z Z
W = G(z)X (8.90)
Therefore
Y = G(z)H(z)X (8.91)
−1 −1
= a0 + a1 z b0 + b 1 z X (8.92)
−1 −1 −1 −1
= a0 b0 + a0 b1 z + b0 a1 z + a1 z b1 z (8.93)
−1 −1
= a0 b0 + (a0 b1 + b0 a1 ) z + a1 b1 z X (8.94)
Therefore
8.7.6 Zeros
Let’s go back to a simpler filter as is shown in Figure 8.70.
This gives us the equation
8. Digital Signal Processing 578
a0 b0
X + Y
a0 b1
-1
Z
a1 b0
a1 a1
-2
Z
Figure 8.69: An equivalent result to the cascaded filters shown in Figure 8.68
X + Y
a1
-1
Z
yt = xt + a1 xt−1 (8.96)
If we write this out the “old way” – that is, before we knew about the
letter z, the equivalent would look like this:
1 − a1 z −1 = 1 + a1 e−jω (8.99)
then we can say that
H(z) = 1 + a1 z −1 (8.101)
z a1
= + (8.102)
z z
z + a1
= (8.103)
z
Okay, now I have a question. What values in the above equation make
the numerator (the top part of the fraction) equal to 0. The answer, in this
particular case is z = −a1 . If the filter had been different, we would have
gotten a different answer. (Maybe we would have even gotten more than
one answer... but that comes later.) Notice that if the numerator is 0, then
H(z) = 0. This will be important later.
Another question: What values will make the denominator (the bottom
part of the equation) equal to 0? The answer, in this particular case is z = 0.
8. Digital Signal Processing 580
Again, it might have been possible to get more than one answer. Notice in
this case, that if the denominator is 0, then H(z) = ∞.
Let’s graph z on a cartesian plot. We’ll make the x-axis the real axis
where we plot the real component of z, and the y-axis is the imaginary axis.
We know that
z = ejω (8.104)
= cos(ω) + j sin(ω) (8.105)
uency axis
F re q
ω = π = Nyquist ω=0
z-plane
Figure 8.71: The frequency axis in the z-plane. The top half of the circle corresponds to positive
frequencies, the bottom half corresponds to negative frequencies [?].
8. Digital Signal Processing 581
|z + a1 |
|H(z)| = (8.108)
|z|
An important thing to remember here is that, when you’re dealing with
complex numbers, as we are at the moment, the bars on the sides of the
values do not mean “the absolute value of...” as they normally do. They
mean “the magnitude of...” instead. Also remember that you calculate
the magnitude by finding the hypotenuse of the triangle where the real and
imaginary components are the other two sides (see Figure ??).
Since
|z| = 1 (8.112)
|z + a1 |
|H(z)| = (8.113)
|z|
|z + a1 |
= (8.114)
1
= |z + a1 | (8.115)
We found out from our questions and answers at the end of Section 8.7.6
that (for our filter that we’re still working on) H(z) = 0 when z = −1. Let’s
then mark that point with a “0” (a zero) on the graph of the z-plane. Note
that a1 has only a real component, therefore −a1 has only a real component
as well. Consequently, the “0” that we put on the graph sits right on the
real axis.
uency axis
F re q
ω
uency axis
F re q
ω
Figure 8.73: Finding the magnitude response of the filter using the zero plotted on the z-plane.
Magnitude
Magnitude
response
Figure 8.74: An equivalent plot to 8.73, if we “unwrapped” the frequency axis from a circle to a
straight line. Note that this plot is only roughly to scale – it is not terribly accurate.
8. Digital Signal Processing 584
frequency ω. Then you multiply all your distances together, and the result
is the magnitude response of the filter at that frequency.
There is another, possibly more intuitive way of thinking about this. Cut
a circle out of a sheet of heavy rubber, and magically suspend it in space.
The circle directly corresponds to the z-plane. If you have a zero on the
z-plane, you push the rubber down with a pointy stick as far as possible,
preferably infinitely. This will put a dent in the rubber that looks a bit like
a funnel. This will pull down the edge of the circle. The closer the zero is
to the edge, the more the edge will get pulled down.
If you were able to unwrap the edge of the circle, keeping its vertical
shape, you would have a picture of the frequency response.
MAKE A 3-D PLOT OF THIS IN MATHEMATICA TO ILLUSTRATE
THIS POINT.
Sampled
time
domain
Dis tran
rm
cre sfo
ran rse
Inv urier
Fo
sfo
rm
te rm
z-t Inve
ers tra
sfo
Fo
e d nsf
ran
uri
isc orm
er
z-t
ret
e
z = exp(jθ) Discrete
z domain Fourier
(frequency)
exp(jθ) = z domain
Figure 8.75: The relationship between the sampled time, discrete frequency and z domains
[Watkinson, 1988].
8.7.7 Poles
So far we have looked at how to calculate the frequency response (there-
fore the magnitude and phase responses) of a digital filter using zeros in
the z-plane. However, you may have noticed that we have only discussed
FIR filters in this section. We have not looked at what happens when you
introduce feedback, therefore creating an IIR filter.
So, let’s make a simple IIR filter with a single delayed feedback with
gain as is shown in Figure 8.76.
8. Digital Signal Processing 585
x + y
t t
a1
-k
Z
Figure 8.76: A simple IIR filter with one delay and one gain.
yt = xt + a1 yt−k (8.116)
Notice now that the output is the sum of the input (the xt ) and a gain-
modified version (using the gain a1 ) of the output yt with a delay of k
samples.
We already know from the previous section how to write this equation
symbolically using operators (think back to the Johnny Cash example) as is
shown in the equation below.
Y = X + a1 z −k Y (8.117)
Y = X + a1 z −k Y (8.118)
−k
Y − a1 z Y = X (8.119)
−k
X = Y − a1 z Y (8.120)
h i
X = 1 − a1 z −k Y (8.121)
So, in a weird way, we can think of this IIR filter that relies on feedback
as a FIR filter that only contains feedforward, except that we have to think
of the filter backwards. The input, X is simply the output of the filter,
Y , minus a delayed, gain-varied version of the output. This can be seen
intuitively if you compare Figure 8.60 with Figure 8.76. You’ll notice that,
if you look at one of the diagrams backwards (following the signal path from
right to left), they’re identical.
Let’s continue with the algebra, picking up where we left off...
8. Digital Signal Processing 586
h i
X = 1 − a1 z −k Y (8.122)
X
= Y (8.123)
1 − a1 z −k
X
Y = (8.124)
1 − a1 z −k
1
Y = X (8.125)
1 − a1 z −k
Therefore, we can see right away that the transfer function of this IIR
filter is
1
H(z) = (8.126)
1 − a1 z −k
because that’s what gets multiplied by the input to get the output.
This, in turn, means that the magnitude response of the filter is
1
|H(z)| = −k
(8.127)
1 − a1 z
1
= (8.128)
|1 − a1 e−iωk |
This raises an interesting problem. What happens if we make the de-
nominator equal to 0? First of all, let’s find out what values make this
happen.
0 = 1 − a1 z −k (8.129)
−k
1 = a1 z (8.130)
1
= z −k (8.131)
a1
1
= e−jωk (8.132)
a1
Remember Euler’s identity which states that
We saw that the denominator of the equation for the magnitude response
of our filter will be 0 when
1
= e−jωk (8.134)
a1
Therefore, the magnitude response of the filter will be 0 (at some values
for ω and k) when a1 > 1.
So what? Well, if the denominator in the equation describing the mag-
nitude response of the filter goes to 0, then the magnitude response goes to
∞. Therefore, no matter what the input of the filter (other than an input
of 0), the output will be ∞. This is bad because an infinite output signal
is very, very loud... An intuitive way to think of this is that the filter goes
nuts if the feedback causes it to get louder and louder over time. In the case
of a single feedback delay, this is easy to think of, since it’s just a question
of whether the gain applied to that delay outputs a bigger result than the
input, which is fed into the delay again and made bigger and bigger and
so on. In the case of a complicated filter lots of feedback and feedforward
paths (and therefore lots of poles and zeros) you’ll have to do some math to
figure out what will make the filter get out of control.
We saw that, if the numerator of the magnitude response equation of
the filter goes to 0, then we put a zero in the z-plane plot of the response.
Similarly, if the denominator of the equation goes to 0, then we put a pole
on the z-plane plot of the response. Whereas a zero can be thought of as a
deep “dent” in the 3D plot of the z-plane, a pole is a very high pointy shape
coming up off the z-plane. This is shown in Figures 8.77 and ??.
uency axis
F re q
ω
ω = π = Nyquist ω=0
z-k = a1
uency axis
F re q
ω
ω = π = Nyquist ω=0
z-k = a1
Figure 8.78: Finding the magnitude response of the filter using the pole plotted on the z-plane.
8. Digital Signal Processing 589
1
Mag. resp.
Magnitude
1
0
ω=0 Frequency ω = π = Nyquist
Figure 8.79: An equivalent plot to 8.78, if we “unwrapped” the frequency axis from a circle to a
1
straight line. Notice that the plot shows |H(z)| . Note that this plot is only roughly to scale – it is
not terribly accurate.
Mag. resp.
Magnitude
0
ω=0 Frequency ω = π = Nyquist
Figure 8.80: The magnitude response calculated from the plot in Figure 8.79. Note that this plot
is only roughly to scale – it is not terribly accurate.
Audio Recording
9.1.1 Introduction
When you sit down to do a recording – any recording, you have two basic
objectives:
1) make the recording sound nice aesthetically
2) make sure that the technical quality of the recording is high.
Different people and record labels will place their priorities differently
(I’m not going to mention any names here, but you know who you are...)
One of the easiest ways to guarantee a high technical quality is to pay
particular attention to your gain and levels at various points in the record-
ing chain. This sentence is true not only for the signal as it passes out
and into various pieces of equipment (i.e. from a mixer output to a tape
recorder input), but also as it passes through various stages within one piece
of equipment (in particular, the signal level as it passes through a mixer).
The question is: “what’s the best level for the signal at this point in the
recording chain?”
There are two beasts hidden in your equipment that you are constantly
trying to avoid and conceal as you do your recording. On a very general
level, these are noise and distortion.
591
9. Audio Recording 592
Noise
Noise can be generally defined as any audio in the signal that you don’t want
there. If we restrict ourselves to electrical noise in recording equipment, then
we’re talking about hiss and hum. The reasons for this noise and how to
reduce it are discussed in a different chapter, however, the one inescapable
fact is that noise cannot be avoided. It can be reduced, but never eliminated.
If you turn on any piece of audio equipment, or any component within any
piece of equipment, you get noise. Normally, because the noise stays at
a relatively constant level over a long period of time and because we don’t
bother recording signals lower in level than the noise, we call it a noise floor.
How do we deal with this problem? The answer is actually quite simple:
we turn up the level of the signal so that it’s much louder than the noise. We
then rely on psychoacoustic masking (and, if we’re really lucky, the threshold
of hearing) to cover up the fact that the noise is there. We don’t eliminate
the noise, we just hide it – and the louder we can make the signal, the better
it’s hidden. This works great, except that we can’t keep increasing the level
of the signal because at some point, we start to distort it.
Distortion
If the recording system was absolutely perfect, then the signal at its output
would be identical to the signal at the input of the microphone. Of course,
this isn’t possible. Even if we ignore the noise floor, the signals at the two
ends of the system are not identical – the system itself modifies or distorts
the signal a little bit. The less the modification, the lower the distortion of
the signal and the better it sounds.
Keep in mind that the term “distortion” is extremely general – differ-
ent pieces of equipment and different systems will have different detrimental
effects on different signals. There are different ways of measuring this –
these are discussed in the section on electroacoustic measurements – but we
typically look at the amount of distortion in percent. This is a measure-
ment of how much extra power is included in the signal that shouldn’t be
there. The higher the percentage, the more distortion and the worse the
signal. (See the chapter on distortion measurements in the Electroacoustic
Measurements section.)
There are two basic causes of distortion in any given piece of equipment.
The first is the normal day–to–day error of the equipment in transmitting or
recording the signal. No piece of gear is perfect, and the error that’s added
to the signal at the output is basically always there. The second, however,
9. Audio Recording 593
is a distortion of the signal caused by the fact that the level of the signal is
too high. The output of every piece of equipment has a maximum voltage
level that cannot be exceeded. If the level of the signal is set so high that it
should be greater than the maximum output, then the signal is clipped at
the maximum voltage as is shown in Figure 9.2.
Figure 9.1: A 1 kHz sine wave without distortion worth talking about.
For our purposes at this point in the discussion, I’m going to over–
simplify the situation a bit and jump to a hasty conclusion. Distortion can
be classified as a process that generates unwanted signals that are added to
our program material. In fact, this is exactly what happens – but the un-
wanted signals are almost always harmonically related to the signal whereas
your run–of–the–mill noise floor is completely unrelated harmonically to the
signal. Therefore, we can group distortion with noise under the heading
“stuff we don’t want to hear” and look at the level of that material as com-
pared to the level of the program material we’re recording – in other words
the “stuff we do want to hear.” This is a small part of the reason that
you’ll usually see a measurement called “THD+N” which stands for “Total
Harmonic Distortion plus Noise” – the stuff we don’t want to hear.
Figure 9.2: The same 1 kHz sine wave in a piece of equipment that has a maximum voltage of 15
V (and a minimum voltage of –15 V). Note that the top and bottom of the sine wave are clipped
at the voltage rails of the equipment. This clipping causes a high distortion level because the signal
is significantly changed or distorted. The green waveform is the original undistorted sine wave and
the blue is the clipped output.
order to answer this question, we have to know the exact behaviour of the
particular piece of gear that we’re using – but we can make some general
rules that apply for groups of gear. These three groups are 1) digital gear,
2) analog electronics and 3) analog tape.
the better the technical quality of the recording. (Do not confuse the signal
to noise ratio with the dynamic range of the system. The former is the ratio
between the signal and the noise floor. The latter is the ratio between the
maximum possible signal and the noise floor – as we’ll see, this raises the
question of how to define the maximum possible level...)
We also know from previous chapters that digital systems have a very
unforgiving maximum level. If you have a 16 bit system, then the peak
level of the signal can only go to the maximum level of the system defined
by those 16 bits. There is some debate regarding what you can get away
with when you hit that wall – some people say that 2 consecutive samples
at the maximum level constitutes a clipped signal. Others are more lenient
and accept one or two more consecutively clipped samples. Ignoring this
debate, we can all agree that, once the peak of a sine wave has reached the
maximum allowable level in a digital system, any increase in level results
in a very rapid increase in distortion. If the system is perfectly aligned,
then the sine wave starts to approach a square wave very quickly (ignoring
a very small asymmetry caused by the fact that there is one extra LSB
for the negative–going portion of the wave than there is for the positive
side in a PCM system). See Figure 9.2 to see a sample input and output
waveform. The “consecutively clipped samples” that we’re talking about is
a measurement of how long the flattened part of the waveform stays flat.
If we were to draw a graph of this behaviour, we would result in the plot
shown in Figure 9.3. Notice that we’re looking at the Signal to THD+N
ratio vs. the level of the signal.
The interesting thing about this graph is that it’s essentially a graph of
the peak signal level vs. audio quality (at least technically speaking... we’re
not talking about the quality of your mix or the ability of your performers...).
We can consider that the X–axis is the peak signal level in dB FS and the
Y–axis is a measurement of the quality of the signal. Consequently, we can
see that the closer we can get the peak of the signal to 0 dB FS the better
the quality, but if we try to increase the level beyond that, we get very bad
very quickly.
Therefore, the general moral of the story here is that you should set
your levels so that the highest peak in the signal for the recording will hit
as close to 0 dB FS as you can get without going over it. In fact, there are
some problems with this – you may actually wind up with a signal that’s
greater than 0 dB FS by recording a signal that’s less than 0 dB FS in some
situations... but we’ll look at that later... this is still the introduction.
9. Audio Recording 596
Figure 9.3: A plot of a measurement of the signal to THD+N (caused by noise and distortion
byproducts) ratio vs. the signal level in a typical digital converter with a dither level of one half
an LSB measured with a 997 Hz sine tone. The curves are 8–bit (yellow), 12–bit (green), 16–bit
(blue) and 24–bit (red). The resolution on the input level is 1 dB. The positive slope on the left is
the result of the increase in the signal level over the static noise floor. The nasty drop on the right
is caused by the sudden increase in distortion when you try to make the sine tone go beyond 0 dB
FS.
your signal.
Figure 9.4: A plot of the signal to THD+N ratio vs. the signal level for a simple analog equalizer
set to bypass mode and measured with a 1 kHz sine tone. The resolution on the input level is 1
dB. Note the similarity to the curve for PCM digital systems shown in Figure 3.
Figure 9.5: A measurement of a 1 kHz sine tone that is “clipped” by analog tape. Notice that,
although the peaks and troughs are distorted and limited to the boundaries, the clipping process
is much more gradual than was seen in Figure 9.3 with the digital gear and op amps. The blue
waveform is the original undistorted sine wave and the red is the output from the analog tape.
The result of this softer, more gradual clipping of the waveform is twofold.
Firstly, as was mentioned above, the increase in distortion is more gradual
as the level is increase. In addition, because the change in the slope of the
waveform is less abrupt, there are fewer very high frequency components
resulting from the distortion. Consequently, there are a large number of
people who actually use this distortion as an integral part of their process-
ing. This tape compression as it is commonly known, is most frequently
used for tracking drums.
Assuming that we are trying to maintain the highest possible techanical
quality and assuming that this does not include tape compression, then we
are trying to keep the signal level at the high point on the graph in Figure
5. This level of 0 dB VU is a so–called nominal level at which it has been
decided (by the tape recorder manufacturer, the analog tape supplier and
the technician that works in your studio) that the signal quality is best. Your
9. Audio Recording 599
Figure 9.6: A plot of the signal to noise (caused by noise and distortion byproducts) ratio vs. the
signal level in a typical analog tape recording. The blue signal is the response of the electronics
in the tape recorder (measured using the “Input” monitor). The red signal is the response of the
tape. (This is an old Revox A77 that needs a little maintenance, recording on some spare Ampex
456 tape that I had lying around, in case you’re wondering...)
goal in this case is to keep the average level of the signal for the recording
hovering around the 0 dB VU mark. You may go above or below this on
peaks and dips – but most of the time, the signal will be at an optimal level.
Notice that there are two fundamentally different ways of thinking pre-
sented above. In the case of digital gear or analog electronics, you’re deter-
mining your recording level based on the absolute maximum peak for the
entire recording. So, if you’re recording an entire symphony, you find out
what the loudest part will be and make that point in the recording as close
to maximum as possible. Look after the peak and the rest will look after
itself. In contrast, in the case of analog tape, we’re not thinking of the peak
of the signal, we’re concentrating on the average level of the signal – the
peaks will look after themselves.
9.1.5 Meters
So, now that we’ve got a very basic idea of the objective, how do we make
sure that the levels in our recording system are optimized? We use the
meters on the gear to give us a visual indication of the levels. The only
problem with this statement is that it assumes that the meter is either telling
you what you want to know, or that you know how to read the meter. This
isn’t necessarily as dumb as it sounds.
A discussion of meters can be divided into two subtopics. The first is
9. Audio Recording 600
the issue of scale – what actual signal level corresponds to what indication
on the meter. The second is the issue of ballistics – how the meter responds
in time to changes in level.
Before we begin, we’ll take a quick review of the difference between the
peak and the RMS value of a signal. Figure 9.7 shows a portion of a recorded
sound wave. In fact, it’s an excerpt of a recording of male speech.
Figure 9.8: The absolute value of the signal shown in Figure 9.7
A second, more complex method is to use the running RMS of the signal.
9. Audio Recording 601
Figure 9.9: Two running measurements of the RMS value of the displayed signal. The blue signal
is an RMS using a time constant of 2.27 ms. The red signal uses a time constant of 5.67 ms.
A level meter tells you the level of the signal – either the peak or the
RMS value of the level depending on the meter – on a relative scale. We’ll
look at these one at a time, and deal with the respective scale and ballistics
for each.
Figure 9.10: The same plot as Figure 8 with the Y–axis changed to a decibel scale. There are a
couple of things to note here. Firstly, the decibel scale is relative to 1 V, similar to a dBV scale –
the difference is that this plot uses an instantaneous measurement of the voltage compared to 1 V
rather than an RMS value relative to 1VRM S as in the dBV scale. Secondly, note that the decay
curves of both RMS measurements (the red and blue plots) are more linear on this dB scale when
compared to a linear absolute voltage scale. Also, note that the red plot (with a time constant of
5.67 ms) reads a signal level that gets closer to the maxima in level whereas the blue plot (with a
time constant of 2.27 ms) gives a result that is closer to the minima. In both cases, however, there
is approximately a 10 dB error in the RMS values relative to the instantaneous peak voltages.
Figure 9.11: Two typical peak indicators. On the left is an Overload indicator light on a GML
microphone preamplifier. On the right is a Peak light on a Sony/MCI recording console input strip.
The red LED is the peak indicator – the green LED is a signal indicator which lights at a much
lower level.
For example, take a look at Figure 9.12. Let’s assume that the signal is
passing through a piece of equipment that clips at a maximum voltage of
10 V. The peak indicator will more than likely light up when the signal is 3
dB below this level. Therefore any signal greater than 7.07 V or less than
–7.07 V will cause the LED to light up.
9. Audio Recording 603
Figure 9.12: The same male speech shown earlier passing through a hypothetical device that clips
at ± 10 V and has a peak level indicator that is calibrated to light at 3 dB below clipping (at ±
7.07 V). All of the signal components drawn in red are the signals that will cause the indicator to
light.
Ballistics
Note that a peak indicator is an instantaneous measurement. If all is working
properly, then any signal of any duration (no matter how short) will cause
the indicator to light if the signal strength is high enough.
Also note that the peak indicator lights when the signal level is slightly
lower than the level where clipping starts, so just because the light lights
doesn’t mean that you’ve clipped your signal... but you’re really close.
Ballistics
Since VU Meters are essentially RMS meters, we have to remember that they
do not respond to instantaneous changes in the signal level. The ballistics
for VU Meters have a carefully defined rise and decay time – meaning that
we know how fast they respond to a sudden attack or a sudden decay in
the sound – slowly. These ballistics are defined using a sine tone that is
suddenly switched on and off. If there is no signal in the system and a sine
tone is suddenly applied to the VU Meter, then the indicator (either a needle
or a light) will reach 99% of the actual RMS level of the signal in 300 ms.
In technical terms, the indicator will reach 99% of full–scale deflection in
300 ms. Similarly, when the sine tone is turned off and the signal drops to
0 V instantaneously, the VU meter should take 300 ms to drop back 99%
of the way (because the meter only sees the lack of signal as a new signal
level, therefore it gets 99% of the way there in 300 ms – no matter where
it’s going).
Figure 9.14: A simplified example of the ballistics of a VU meter. Notice that the signal (plotted in
green) changes from a 0 VRM S to a 1.228 VRM S sine wave instantaneously (I know, I know... you
can’t have an instantaneous change to an RMS value – but I warned you that it was a simplified
description!) The level displayed by the VU Meter takes 300 ms to get to 99% of the signal level.
Similarly, when the signal is turned off instantaneously, it takes 300 ms for the VU Meter to drop
to 0. Notice that the attack and decay curves are reciprocals.
Figure 9.15: The same graph as is shown in Figure 9.14 plotted in a decibel scale. Note that the
logarithmic decay of the VU Meter appears as a linear drop in the decibel scale, whereas the attack
curve is not linear.
Scale
The good thing about VU Meters is that they show you the average level of
the signal – so they’re great for recording to analog tape or for mastering
purposes where you want to know the overall general level of the signal.
However, they’re very bad at telling you the peak level of the signal – in
fact, the higher the crest factor, the worse they are at telling you what’s
going on. As we’ve already seen, there are many applications where we
need to know exactly what the peak level of the signal is. Once upon a
time, the only place where this was necessary was in broadcasting – because
if you overload a transmitter, bad things happen. So, the people in the
broadcasting world didn’t have much use for the VU Meter – they needed
to see the peak of the program material, so the Peak Program Meter or
PPM was developed in Europe around the same time as the VU Meter was
in development in the US.
A PPM is substantially different from a VU Meter in many respects.
These days it has many different incarnations – particularly in its scale, but
the traditional one that most people think of is the UK PPM (also known
as the BBC PPM). We’ll start there.
The UK PPM looks very different from a VU Meter – it has no decibel
9. Audio Recording 607
Figure 9.16: A photograph of a typical UK (or BBC) PPM on the output module of a Sony/MCI
mixing console.
There are a number of other PPM Scales available to the buying public.
In addition to the UK PPM, there’s the EBU PPM, the DIN PPM and the
Nordic PPM. Each of these has a different scale as is shown in Table 9.1
and the corresponding Figure 9.18.
608
Meter Standard Minimum Scale and Maximum Scale and Nominal Scale and
Meter Standard Corresponding Level Corresponding Level Corresponding Level
VU IEC 60268–17 –20 dB = –16 dBu +3 dB = 7 dBu 0 dB = 4 dBu
UK (BBC) PPM IEC 268–10 IIA Mark 1 = –14 dBu Mark 7 = 12 dBu Mark 4 = 0 dBu
EBU PPM IEC 268–10 IIB –12 dB = –12 dBu +12 dB = 12 dBu Test = 0 dBu
DIN PPM IEC 268–10 / DIN 45406 –50 dB = –44 dBu +5 dB = 11 dBu 0 dB = 6 dBu
Nordic PPM IEC 268–10 I –42 dB = –42 dBu +12 dB = 12 dBu Test (or 0 dB) = 0 dBu
Table 9.1: Various scales of analog level meters for professional recording equipment. Add 4 dB to the corresponding signal levels for professional
broadcasting equipment.
9. Audio Recording
9. Audio Recording 609
16
14 7
12 7 7 +12 +12
+5
10
8 6 6 6 +8
3
6 2 0 +6
1
4 0 5 5 5 +4
-1
2 -2
-3 -5
0 4 4 4 TEST TEST
-5
-2
-7
-4 3 3 3 -4 -10
-6 -10 -6
-8 2 2 2 -8
-10
-18 -18
-20
-24
-30 -30
-36
-40 -40 -42
-50 -50
-∞
Figure 9.18: Various scales of analog level meters for professional recording equipment. Add 4 dB
to the corresponding signal levels for professional broadcasting equipment.
Ballistics
Let’s be complete control freaks and build the perfect PPM. It would show
the exact absolute value of the voltage level of the signal all the time. The
needle would dance up and down constantly and after about 3 seconds you’d
have a terrible headache watching it. So, this is not the way to build a PPM.
In fact, what is done is the ballistics are modified slightly so that the meter
responds very quickly to a sudden increase in level, but it responds very
slowly to a sudden drop in level – the decay time is much slower even than
a VU Meter. You may notice that the PPM’s listed in Table 1 and Figure
9.18 are grouped into two “types” Type I and Type II. These types indicate
the characteristics of the ballistics of the particular meter.
Type I PPM’s
Type II PPM’s
Consumer–level Equipment
If we’re talking about consumer–level equipment, either for recording or just
for listening to things at home on your stereo, then the nominal 0 dB VU
point (and all other nominal levels) corresponds to a level of -10 dBV or
0.316VRM S .
Digital Meter
A digital meter is very similar to a PPM because, as we’ve already estab-
lished, your biggest concern with digital audio is that the peak of the signal is
never clipped. Therefore, we’re most interested in the peak or the amplitude
of the signal.
As we’ve said before, the noise floor in a PCM digital audio signal is
typically determined by the dither level which is usually at approximately
one half of an LSB. The maximum digital level we can encode in a PCM
digital signal is determined by the number of bits. If we’re assuming that
we’re talking about a two’s complement system, then the maximum positive
amplitude is a level that is expressed as a 0 followed by as many 1’s as are
allowed in the digital word. For example, in an 8–bit system, the maximum
possible positive level (in binary) is 01111111. Therefore, in a 16–bit system
with 65536 possible quantization values, the maximum possible positive level
is level number 32767. In a 24–bit system, the maximumn positive level is
8388607. (If you’d like to do the calculation for this, it’s (2 )
2−1 where n is the
number of bits in the digital word.
Note that the negative–going signal has one extra LSB in a two’s com-
plement system as is discussed in the chapter on digital conversion.
The maximum possible value in the positive direction in a PCM digital
signal is called full scale because a sample that has that maximum value
9. Audio Recording 612
uses the entire scale that is possible to express with the digital word. (Note
that we’ll see later that this definition is actually a lie – there are a couple
of other things to discuss here, but we’ll get back to them in a minute.)
Figure 9.21: A PCM two’s complement digital representation of a quantized sine wave with a
frequency of 1/20th of the sampling rate. Note that three samples (numbers 5, 6 and 7) have
reached full scale and are indicated in red. By comparison, the symmetrical samples (numbers 15,
16 and 17) are technically at full scale despite the extra LSB in the negative zone.
There’s just one small catch: I lied. There’s one additional piece of
information that I’ve omitted to keep things simple. Take a close look
at Figure 9.21. The way I made this plot was to create a sine wave and
quantize it using a 4–bit system assuming that the sampling rate is 20 times
the frequency of the sine wave itself. Although this works, you’ll notice that
there are some quantization levels that are not used. For example, not one
of the samples in the digital sine wave representation has a value of 0001,
0011 or 0101. This is because the frequency of the sine wave is harmonically
related to the sampling rate. In order to ensure that more quantization levels
are used, we have to use a sampling rate that is enharmonically related to
the sampling rate. The technical definition of “full scale” uses a digitally–
generated sine tone that has a frequency of 997 Hz. Why 997 Hz? Well,
if you divide any of the standard sampling rates (32 kHz, 44.1 kHz, 48
kHz, 88.2 kHz, 96 kHz, etc...) by 997, you get a nasty number. The result is
that you get a different quantization value for every sample in a second. You
won’t hit every quantization value because the whole system starts repeating
after one second – but, if your sine tone is 997 Hz and your sampling rate
is 44.1 kHz, you’ll wind up hitting 44100 different quantization values. The
higher the sampling rate, the more quantization values you’ll hit, and the
less your error from full scale.
The other reason for using this system is to avoid signals that are actually
higher than Full Scale without the system actually knowing. If you have a
sine tone with a frequency that is harmonically related to the sampling rate,
then it’s possible that the very peak of the wave is between two samples, and
that it will always be between two samples. Therefore the signal is actually
greater than 0 dB FS without you ever knowing it. With a 997 Hz tone,
eventually, the peak of the wave will occur as close as is reasonably possible
to the maximum recordable level.
This becomes part of the definition of full scale – the amplitude of a
signal is compared to the amplitude of a 997 Hz sine tone at full scale. That
way we’re sure that we’re getting as close as we can to that top quantization
level.
There is one other issue to deal with: the definition of dB FS uses the
RMS value of the signal. Therefore, a signal that is at 0 dB FS has the same
RMS value as a 997 Hz sine wave whose peak positive amplitude reaches full
scale. There are two main implications of this definition. The first has to
do with the crest factor of your signal. Remember that the crest factor is a
measurement of the relationship between the peak and the RMS value of the
signal. In almost all cases, the peak value will be greater than RMS value
(in fact, the only time this is not the case is a square wave in which they will
9. Audio Recording 614
Ballistics
As far as I’ve been able to tell, there are no standards for digital meter
ballistics or appearances, so I’ll just describe a typical digital meter. Most
of these use what is known as a dot bar mode which actually shows two levels
simultaneously. Looking at Figure 9.22, we can see that the meter shows a
bar that extends to –24 dB. This bar shows the present level of the signal
using ballistics that typically have roughly the same visual characteristics
as a VU Meter. Simultaneously, there is a dot at the –8 dB mark. This
indicates that the most recent peak hit –8 dB. This dot will be erased after
9. Audio Recording 615
Figure 9.22: A photograph of a typical digital meter on a Tascam DAT machine. There are a couple
of things shown here. The first is the bar graphs of the Left and Right channels just below the –24
dB mark. This is the level of the signal at the moment when the picture was taken. There are
also two “dots” at the –8 dB mark. These are the level of a recent peak and will be replaced by
a new peak in a couple of seconds. Finally, there is the “MARG” (for “margin”) indication of 6.7
dB – this indicates that the maximum peak of the entire program material since the recording was
started hit –6.7 dB. Note that we don’t know which channel that peak was on.
Finally, digital meters have a warning symbol to indicate that the signal
has clipped. This warning is simply called over since all we’re concerned
with is that the signal went over full scale – we don’t care how far over
full scale it went. The problem here is that different meters use different
definitions for the word “over.” As I’ve already pointed out, some meters
keep track of the number of consecutive samples at full scale and point out
when that number hits 2 or 3 (this is either defined by the manufacturer or
by the user, depending on the equipment and model number – check your
manual). On some equipment (particularly older gear), the “digital” meter
is driven by the analog conversion of the signal and is therefore extremely
inaccurate – again, check your manual. An important thing to note about
9. Audio Recording 616
these meters is that they rarely are aware that the signal has gone over full
scale when you’re playing back a digital signal, or if you’re using an external
analog to digital to convertor – so be very careful.
Figure 9.24: A photograph of a phase meter on the output module of a Sony/MCI mixing console.
9. Audio Recording 619
It is important to note that the values in Table 9.4 are for a single
9. Audio Recording 620
1. The Left and Right loudspeakers are typically a pair that may not
match any other loudspeaker in the room.
4. A single subwoofer.
we typically see more than two surround loudspeakers and more than one
subwoofer. In smaller systems, people have been told that they don’t need 5
large speakers, because all the bass can be produced by the subwoofer using
a bass management system described below, consequently, the subwoofer
produces more than just the LFE channel.
So, it is important to remember that delivery channels are not directly
equivalent to loudspeakers. It is an LFE channel – not a subwoofer channel.
Channels Loudspeakers
~ 80 Hz
L L
C C
R R
LS LS
RS RS
Subwoofer
LFE + 10 dB
Figure 9.25: A typical monitoring path for a bass-managed system. Note the 10 dB boost on the
LFE channel.
9. Audio Recording 623
9.2.4 Configuration
Two-channel Stereo
A two-channel playback system (typically misnamed “stereo”) has a stan-
dard configuration. Both loudspeakers should be equidistant from the lis-
tener and at angles of -30◦ and 30◦ where 0◦ is directly forward of the listener.
This means that the listener and the two loudspeakers form the points of
an equilateral triangle as shown in Figure 9.26, producing a loudspeaker
aperture of 60◦ .
Note that, for all discussions in this book, all positive angles are assumed
to be on the right of centre forward, and all negative angles are assumed to
be left of centre forward.
5-channel Surround
In the case of 5.1 surround sound playback, we are actually assuming that we
have a system comprised of 5 full-range loudspeakers and no subwoofer. This
is the recommended configuration for music recording and playback[dol, 1998]
whereas a true 5.1 configuration is intended only for film and television
sound. Again, all loudspeakers are assumed to be equidistant from the
listener and at angles of 0◦ , ±30◦ and with two surround loudspeakers sym-
metrically placed at an angle between ±100◦ and ±120◦ . This configuration
is detailed in ITU-R BS.775.1.[ITU, 1994] (usually called “ITU775” or just
“775” in geeky conversation... say all the numbers... “seven seven five” if
you want to be immediately accepted by the in-crowd) and shown in Figure
9.27. If you have 25 Swiss Francs burning a hole in your pocket, you can
order this document as a pdf or hardcopy from www.itu.ch. Note that the
configuration has 3 different loudspeaker apertures, 30◦ (with the C/L and
9. Audio Recording 625
C/R pairs), approximately 80◦ (L/LS and R/RS) and approximately 140◦
(LS/RS).
120° 60°
100°
L X R
X X
Listener
Figure 9.28: 5-channel setup: Step 1. Measure an equilateral triangle with your L and R loud-
speakers and the listening position as the three corners.
L X/2 X/2 R
Figure 9.29: 5-channel setup: Step 2. Find the midpoint between the L and R loudspeakers.
C
L R
Listener
Figure 9.30: 5-channel setup: Step 3. Measure the distance between the listening position and the
C loudspeaker to match the distances in Step 1.
9. Audio Recording 627
X
1.73 X
Listener
RS
Figure 9.31: 5-channel setup: Step 4. Measure a triangle created by the C and RS loudspeakers
and the listening position using the distances indicated.
Step 6. Double check your setup by measuring the distance between the
LS and RS loudspekaers. It should be 1.73X. (Therefore the C, LS and RS
loudspeakers should make an equilateral triangle.) See Figure 9.32.
LS 1.73 X RS
Figure 9.32: 5-channel setup: Step 5. Double check your surround loudspeaker placement by
measuring the distance between them. This should be the same as either surround loudspeaker to
the C.
7. If the room is small, put the sub in the corner of the room. If the
room is big, put the sub under the centre loudspeaker. Alternately, you
could just put the sub where you think that it sounds best.
Room Orientation
There is a minor debate between opinions regarding the placement of
the monitor configuration within the listening room. Usually, unless you’ve
spent lots of money getting a listening room or control room designed from
scratch, you’re probably going to be in a room that is essentially rectangular.
This then raises two important questions:
2. Do you use the room so that it’s narrow, but long, or wide but shallow?
Most people don’t think twice about the answer to the first question –
of course you use the room symmetrically. The argument for this logic is to
ensure a number of factors:
2. The early reflection patterns from left / right pairs of loudspeakers are
matched.
Therefore, your left / right pairs of speakers will “sound the same” (this
also means the left surround / right surround pair) and your imaging will
not pull to one side due to asymmetrical reflections.
Then again, the result of using a room symmetrically is that you are
sitting in the dead centre of the room which means that you are in one of
the worst possible locations for hearing room modes – the nulls are at a
minimum and the antinodes are at a maximum at the centre of the room.
In addition, if you listen for the fundamental axial mode in the width of
the room, you’ll notice that your two ears are in opposite polarities at this
frequency. Moving about 15 to 20 cm to one side will alleviate this problem
which, once heard once, unfortunately, cannot be ignored.
So, it is up to your logic and preference to decide on whether to use the
room symmetrically.
The second question of width vs. depth depends on your requirements.
Figure 9.33 shows that the choice of room orientation has implications on
the maximum distance to the loudspeakers. Both floorplans in the diagram
show rooms of identical size with a maximum loudspeaker distance for an
ITU775 configuration laid on the diagram. As can be seen, using the room as
a wide, but shallow space allows for a much larger radius for the loudspeaker
placement. Of course, this is a worst-case scenario where the loudspeakers
are placed against boundaries in the room, a practice which is not advisable
due to low-frequency boost and improved coupling to room modes.
Figure 9.33: Two rectangular rooms of identical arbitrary dimensions showing the maximum possible
loudspeaker distance for an ITU775 configuration. Notice that the loudspeakers can be further away
when you use the room “sideways.”
10.2 Surround
From the very beginning, it was recognized that the 5.1 standard was a
compromise. In a perfect system you would have an infinite number of
loudspeakers, but this causes all sorts of budgetary and real estate issues...
So we all decided to agree that 5 channels wasn’t perfect, but it was pretty
good. There are people with a little more money and loftier ideals than the
rest of us who are pushing for a system based on the MIBEIYDIS system
(more-is-better-especially-if-you-do-it-smartly).
One of the most popular of these systems uses the standard 5.1 system
as a starting point and expands on it. Dubbed 10.2 and developed by Tom-
linson Holman (the TH in THX) this is actually a 12.2 system that uses a
total of 16 loudspeakers.
There are a couple of things to discuss about this configuration. Other
than the sheer number of loudspeakers, the first big difference between this
configuration and the standard ITU775 standard is the use of elevated loud-
speakers. This gives the mixing engineer two possible options. If used as a
stereo pair, it becomes possible to generate phantom images higher than the
usual plane of presentation, giving the impression of height as in the IMAX
9. Audio Recording 630
0°
-30° 30°
-45° 45°
-60° 60°
-90° 90°
-120° 120°
180°
Figure 9.34: A 10.2 configuration. The light-gray loudspeakers match those in the ITU775 rec-
ommendation. The dark-gray speakers have an elevation of 45◦ relative to the listener as can be
seen in Figure 9.35. The speakers in boxes at ±90◦ are subwoofers. Note that all loudspeakers are
equidistant to the listener.
Figure 9.35: A simplified diagram of a 10.2 configuration seen from the side. The light-gray loud-
speakers match those in the ITU775 recommendation. The dark-gray speakers have an elevation
of 45◦ relative to the listener.
9. Audio Recording 631
system. If diffuse sound is sent to these loudspeakers, then the mix relies on
our impaired ability to precisely localize elevated sound sources (see Section
??) and therefore can give a better sense of envelopment than is possible
with a similar number of loudspeakers distributed in the horizontal plane [].
You will also notice that there are pairs of back-to-back loudspeakers
placed at the ±90◦ positions. These are what are called “diffuse radiators”
and are actually wired to create a dipole radiator as is described in Section
??. In essence, you simply send the same signal to both loudspeakers in the
pair, inverting the polarity of one of the two. This produces the dipole effect
and, in theory, cancels all direct sound arriving at the listener’s location.
Therefore, the listener receives only the reflected sound from the front and
rear walls predominantly, creating the impression of a more diffuse sound
than is typically available from the direct sound from a single loudspeaker.
Finally, you will note from the designation “10.2” that this system calls
for two subwoofers. This follows the recommendations of a number of people
[Martens, 1999][?] who have done research proving that uncorrelated signals
from two subwoofers can result in increased envelopment at the listening
position. The position of these subwoofers should be symmetrical, however
more details will be discussed below.
Ambisonics
Subwoofers
9.2.5 Calibration
The calibration of your monitoring system is possibly one of the most sig-
nificant factors that will determine the quality of your mixes. As a simple
example, if you have frequency-independent level differences between your
two-channel monitors, then your centre position is different from the rest of
the world’s. You will compensate for your problem, and consequently create
a problem for everyone else resulting in complaints that your lead vocals
aren’t centered.
Unfortunately, it is impossible to create the perfect monitor, so you
have to realize the limitations of your system and learn to work within
those constraints. Essentially, the better you know the behaviour of your
monitoring system, the more you can trust it, and therefore the more you
can be trusted by the rest of us.
There is a document available from the ITU that outlines a recommended
procedure for doing listening tests on small-scale impairments in audio sys-
tems [ITU, 1997]. Essentially, this is a description of how to do the listening
test itself, and how to interpret the results. However, there is a section in
there that describes the minimum requirements for the reproduction sys-
tem. These requirements can easily be seen as a minimum requirement for a
reference monitoring system, and so I’ll list them here to give you an idea of
what you should have in front of you at a recording or mixing session. Note
that these are not standards for recording studios, I’m just suggesting that
their a good set of recommendations that can give you an idea of a “good”
playback system.
Note that all of the specifications listed here are measured in a free field,
1 m from the acoustic centre of the loudspeaker.
Frequency Response
The on-axis frequency response of the loudspeaker should be measured
in one-third octave bands using pink noise as a source signal. The response
should not be outside the range of ±2 dB within the frequency range of 40
Hz to 16 kHz. The frequency response measured at 10◦ off-axis should not
differ from the on-axis response by more than 3 dB. The frequency response
measured at 30◦ off-axis should not differ from the on-axis response by more
than 4 dB [ITU, 1997].
All main loudspeakers should be matched in on-axis frequency response
within 1 dB in the frequency range of 250 Hz to 2 kHz [ITU, 1997].
Directivity Index
In the frequency range of 500 Hz to 10 kHz, the directivity index, C, of
the loudspeakers should be within the limit 6 dB 6 C 6 12 dB and “should
9. Audio Recording 633
Two-channel Stereo
Send the pink noise signal to the amplifier (or the crossover input if
you’re using active crossovers) for one of your loudspeakers. The level of the
signal should be 0 dB VU (or +4 dBu).
Place the SPL meter at the listening position pointing straight up. If you
are holding the meter, hold it as far away from your body as you can and
stand to the side so that the direct sound from the loudspeaker to the meter
reflects off your body as little as possible (yes, this will make a difference).
The SPL meter should be set to C-weighting and a slow response.
Adjust your amplifier gain so that you get 85 dBspl on the meter. (Feel
free to use a different value if you think that you have a really good excuse.
The 85 dBspl reference value is the one used by the film industry. Television
people use 79 dBspl and music people can’t agree on what value to use.)
Repeat this procedure with the other loudspeaker.
Remember that you are measuring one loudspeaker at a time – you
should 85 dBspl from each loudspeaker, not both of them combined.
A word of warning: It’s possible that your listening position happens
to be in a particular location where you get a big resonance due to a room
mode. In fact, if you have a smaller room and you’ve set up your room sym-
metrically, this is almost guaranteed. We’ll deal with how to cope with this
later, but you have to worry about it now. Remember that the SPL meter
isn’t very smart – if there’s a big resonance at one frequency, that’s basically
what you’re measuring, not the full-band average. If your two loudspeakers
happen to couple differently to the room mode at that frequency, then you’re
going to have your speakers matched at only one frequency and possibly no
others. This is not so good.
There are a couple of ways to avoid this problem. You could change the
laws of physics and have room modes eliminated in your room, but this isn’t
practical. You could move the meter around the listening position to see
if you get any weird fluctuations because many room modes produce very
localized problems. However, this may not tell you anything because if the
mode is a lower frequency, then the wavelength is very long and the whole
area will be problematic. Your best bet is to use a measurement device that
shows you the frequency response of the system at the listening position,
the simplest of which is a real-time analyzer. Using this system, you’ll be
able to see if you have serious problems in localized frequency bands.
Real-Time Analyzer Method
If you’ve got a real-time analyzer (or RTA) lying around, you could be a
little more precise and get a little more information about what’s happening
in your listening room at the listening position. Put an omnidirectional
microphone with a small diaphragm at the listening position and aim it at
9. Audio Recording 635
5.1 Surround
The method for calibrating a 5-channel system is no different than the pro-
cedure described above for a two-channel system, you just repeat the process
three more times for your Centre, Left Surround and Right Surround chan-
nels. (Notice that I used the word “channels” there instead of “loudspeakers”
because some studios have more than two surround loudspeakers. For ex-
ample, if you do have more than one Left Surround loudspeaker, then your
Left Surround loudspeakers should all be matched in level, and the total
output from all of them combined should be equal to the reference value.)
The only problem that now arises is the question of how to calibrate the
level of the subwoofer, but we’ll deal with that below.
10.2 Surround
The same procedure holds true for calibration of a 10.2 system. All channels
should give you the same SPL level (either wide band with an SPL meter
or narrow band with an RTA) at the listening position. The only exception
here is the diffuse radiators at ±90◦ . You’ll probably notice that you won’t
get as much low frequency energy from these loudspeakers at the listening
9. Audio Recording 636
position due to the cancellation of the dipole. The easiest way to get around
this problem is to band-limit your pink noise source to a higher frequency
(say, 250 Hz or so...) and measure one of your other loudspeakers that you’ve
already calibrated (Centre is always a good reference). You’ll notice that
you get a lower number because there’s less low-end – write that number
down and match the dipoles to that level using the same band-limited signal.
Ambisonics
Again, the same procedure holds for an Ambisonics configuration.
Subwoofers
Here’s where things get a little ugly. If you talk to someone about how
they’ve calibrated their subwoofer level, you’ll get one of five responses:
1. “It’s perfectly calibrated to +4 dB.”
5. “Huh?”
Oddly enough, it’s possible that the first three of these responses actually
mean exactly the same thing. This is partly due to an issue that I pointed
out earlier in Section 9.2.3. Remember that there’s a 10 dB gain applied
to the LFE input of a multichannel monitoring system for the remainder of
this discussion.
The objective with a subwoofer is to get a low-frequency extension of
your system without exaggerating the low-frequency components. Conse-
quently, if you send a pink-noise signal to a subwoofer and look at its output
level in a one-third octave band somewhere in the middle of its response, it
should have the same level as a one-third octave band in the middle of the
response of one of your other loudspeakers. Right? Well.... maybe not.
Let’s start by looking at a bass-managed signal with no signal sent to the
LFE input. If you send a high-frequency signal to the centre channel and
sweep the frequency down (without changing the signal level) you should
see not change in sound pressure level at the listening position. This is true
even after the frequency has gotten so low that it’s being produced by the
subwoofer. If you look at Figure 9.25 you’ll see that this really is just a
9. Audio Recording 637
matter of setting the gain of the subwoofer amplifier so that it will give you
the same output as one of your main channels.
What if you are only using the LFE channel and not using bass manage-
ment? In this case, you must remember that you only have one subwoofer
to compete with 5 other speakers, so the signal has been boosted by 10 dB
in the monitoring box. This means that if you send a pink noise to the
subwoofer and monitor it in a narrow band in the middle of its range, it
should be 10 dB louder than a similar measurement done with one of your
main channels. This extra 10 dB is produced by the gain in the monitoring
system.
Since the easiest way to send a signal to the subwoofer in your system is
to use the LFE input of your monitoring box, you have to allow for this 10
dB boost in your measurements.
Again, you can do your measurements with any appropriate system, but
we’ll just look at the cases of an SPL meter and an RTA.
SPL Meter Method
We will assume here that you have calibrated all of your main channels
to a reference level of 85 dBspl using + 4 dBu pink noise.
Send pink noise at +4 dBu, band-limited from 20 to 80 Hz, to your
subwoofer through the LFE input of your monitor box. Since the pink noise
has been band-limited, we expect to get less output from the subwoofer than
we would get from the main channels. In fact, we expect it to be about 6
dB less. However, the monitoring system adds 10 dB to the signal, so we
should wind up getting a total of 89 dBspl at the listening position, using a
C-weighted SPL meter set to a slow response.
Note that some CD’s with test signals for calibrating loudspeakers take
the 10 dB gain into account and therefore reduce the level of the LFE signal
by 10 dB to compensate. If you’re using such a disc instead of producing
your own noise, then be sure to find out the signal’s level to ensure that
you’re not calibrating to an unknown level...
If you choose to send your band-limited pink noise signal through your
bass management circuitry instead of through the LFE input, then you’ll
have to remember that you do not have the 10 dB boost applied to the
signal. This means that you are expecting a level of 79 dBspl at the listening
position.
The same warning about SPL meters as was described for the main
loudspeakers holds true here, but moreso. Don’t forget that room modes
are going to wreak havoc with your measurements here, so be warned. If
all you have is an SPL meter, there’s not really much you can do to avoid
these problems... just be aware that you might be measuring something you
9. Audio Recording 638
don’t want.
Real-Time Analyzer Method
If you’re using an RTA instead of an SPL meter, your goal is slightly
easier to understand. As was mentioned above, the goal is to have a system
where the subwoofer signal routed through the LFE input is 10 dB louder
in a narrow band than any of the main channels. So, in this case, if you’ve
aligned your main loudspeakers to have a level of 70 dBspl in each band
of the RTA, then the subwoofer should give you 80 dBspl in each band of
the RTA. Again, the signal is still pink noise with a level of +4 dBu and
band-limited from 20 Hz to 80 Hz.
Summary
Interchannel Differences
Panning techniques rely on these same two differences to produce the simu-
lation of sources located between the loudspeakers at predictable locations.
If we send a signal to just one loudspeaker in a two-channel system, then the
signal will appear to come from the loudspeaker. If the signal is produced by
both loudspeakers at the same level and the same time, then the apparent
location of the sound source is at a point directly in front of the listener,
halfway between the two loudspeakers. Since there is no loudspeaker at that
location, we call the effect a phantom image.
The exact location of a phantom image is determined by the relation-
ship of the sound produced by the two loudspeakers. In order to move
9. Audio Recording 640
the image to the left of centre, we can either make the left channel louder,
earlier, or simultaneously louder and earlier than the right channel. This
system uses essentially the same characteristics as our natural localization
system, however, now, we are talking about interchannel time differences
and interchannel amplitude differences.
Almost every pan knob on almost every mixing console in the world is
used to control the interchannel amplitude difference between the output
channels of the mixer. In essence, when you turn the pan knob to the
left, you make the left channel louder and the right channel quieter, and
therefore the phantom image appears to move to the left. There are some
digital consoles now being made which also change the interchannel time
differences in their panning algorithms, however, these are still very rare.
Crossed cardioids
For example, let’s take two cardioid microphons and place them so that
the two diaphragms are vertically aligned - one directly over the other.
This vertical alignment means that sounds reaching the microphones from
the horizontal plane will arrive at the two microphones simultaneously –
therefore there will be no time of arrival differences in the two channels.
consequently we call them coincident Now let’s arrange the microphones
such that one is pointing 45◦ to the left and the other 45◦ to the right,
remembering that cardioids are most sensitive to a sound source directly in
front of them.
If a sound source is located at 0◦ , directly in front of the pair of micro-
phones, then each microphone is pointing 45◦ away from the sound source.
This means that each microphone is equally insensitive to the sound arriv-
ing at the mic pair, so each mic will have the same output. If each mic’s
output is sent to a single loudspeaker in a stereo configuration then the two
loudspeakers will have the same output and the phantom image will appear
dead centre between the loudspeakers.
However, let’s think about what happens when the sound source is not at
◦
0 . If the sound source moves to the left, then the microphone pointing left
9. Audio Recording 641
90°
in the middle of your sound stage. There are a number of different ways
of thinking of why this is the case. One explanation is presented in Sec-
tion 9.4.1. A simpler explanation is given by Jörg Wuttke of Schoeps Mi-
crophones. As he points out, a cardioid is one half omnidirectional and
one half bidirectional. Therefore, a pair of coincident cardioids gives you
a signal, half of which is a pair of coincident omnidirectionals. A pair of
coincident omni’s will give you a completely mono signal which will image
in the dead centre of your loudspeakers – therefore instruments tend to pull
to this location.
Blumlein
As we’ll see in the next chapter, although a pair of coincident cardioid mi-
crophones does indeed give you good imaging characteristics, there are many
problems associated with this technique. In particular, you will find that
the overall sound stage in a two-channel stereo playback tends to “clump”
to the centre quite a bit. in addition, there is no feeling of “spaciousness”
that can be generated with phase or polarity differences between the two
channels. Both of these problems can be solved by trading in your cardioids
for a pair of bidirectional microphones. An arrangement of two bidirection-
als in a coincident pair with one pointing 45◦ to the left and the other 45◦
to the right is commonly called a Blumlein pair , named after the man who
patented two-channel stereo sound reproduction, Alan Blumlein.
The outputs of these two bidirectional microphones have some interesting
characteristics that will be analyzed in the next chapter, however, we can
look at the basic attributes of the configuration here. To begin with, in the
area in front of the microphones, you have basically the same behaviour as we
saw with the coincident cardioid pair. Changes in the angle of incidence of
the sound source result in changes in the interchannel amplitude differences
in the channels, resulting in simple pair-wise power panning. Note however,
that this pair is more sensitive to chanes in angle, so you will experience
bigger swings in the location of sound sources with a Blumlein pair than
with 90◦ cardioids.
Let’s consider what’s happening at the rear of a Blumlein pair. Since
a bidirectional microphone has a rear lobe that is symmetrical to the front
one, but with a negative polarity, then a Blumlein pair of microphones will
have the same response in the rear as it does in the front with only two
exceptions. Sources on the rear left of the pair image on the right and
sources on the right image on the left. This is becase the rear lobe of the
left microphone is pointing towards the rear right of the pair, consequently,
9. Audio Recording 643
ES
There are many occasions where you will want to add some extra micro-
phones out in the hall to capture reverberation with very little direct sound.
This is helpful, particularly in sessions where you don’t have a lot of sound-
check time, or when the hall has problems. (Typically, you can cover up
acoustical problems by adding more microphones to smear things out.)
Many people like to use VERY widely spaced omni’s for this, but this
9. Audio Recording 644
technique really doesn’t make much sense. As we’ll see later, the further
apart a pair of microphones are in a diffuse field (like a reverberant concert
hall) the less correlated they are. If your microphones are stuck on opposite
sides of the hall (this is not an uncommon practice) you basically get two
completely uncorrelated signals. The result is that the reverberation (and
the audience noise, if they’re there) sits in two discrete pockets in the lis-
tening room – one pocket for each loudspeaker. This gives the illusion of a
very wide sound, but there is nothing holding the two sides together – it’s
just one big hole in the middle.
So, how can we get a nice, wide hall sound, with an even spread and
avoid picking up too much direct sound?
This is the goal of the ES (or Enhanced Surround CHECK THE NAME)
microphone technique developed by Wieslaw Woszczyk. The configuration
is simply two cardioids with an included angle of 180◦ degrees (placed back-
to-back) with one of the outputs reversed in polarity. Each microphone is
panned completely to one channel.
This technique has a number of interesting properties.
TO DISCUSS: - SURROUND MATRIX SYSTEMS - MONO (FOR
AND AGAINST) - MATRIXED VERSION WITH 1 BIDIRECTIONAL
AND 1 OMNI TO GET LOW END
ORTF
NOS
30 cm
90°
Figure 9.37: An NOS pair of cardioids with an included angle of 90◦ and a diaphragm separation
of 30 cm.
Faulkner
One of the problems with the ORTF and NOS configurations is that they
both have their microphones aimed away from the centre of the stage. As
a result, what are likely the more important sound sources are subjected to
the off-axis response of the microphones. In addition, these configurations
may create problems in halls with very strong sidewall reflections, since
there is very little attenuation of these sources as a result of directional
characteristics.
One option that resolves both of these problems is known as the Faulker
Technique, named after its inventor, Tony Faulkner. This technique uses a
pair of bidirectional microphones, both facing directly forward and with a
separation of 30 cm. In essence, you can consider this configuration to be
very similar to a pair of spaced omnidirectionals but with heavy attenuation
of side sources, particularly sidewall reflections.
9. Audio Recording 647
L&R
L R
b b
Figure 9.38: A Decca Tree configuration. The size of the array varies according to your ensemble
and room, but typical spacings are around 1 m. Note that I’ve indicated cardioids here, however,
the traditional method is with omnidirectional capsules in 50 mm spheres as in the Neumann M50.
Also feel free to experiment with splaying of the L and R microphones at different angles.
9. Audio Recording 648
Top view
Side view
Figure 9.39: An easy way of making a Decca Tree boom using two boom stands and without going
to a lot of hassle making special equipment. Note that you’ll need at least 1 clamp (preferably 2)
to attach your mic clips to the boom ends. This diagram was drawn to be hung (as can be seen in
the side view) however, you can stand-mount this configuration as well if you have a sturdy stand.
(Lighting stands with mic stand thread adapters work well.)
9. Audio Recording 649
Binaural
NOT WRITTEN YET
OSS
NOT WRITTEN YET
10 dB 5 dB
Figure 9.40: Apparent angle vs. averaged interchannel amplitude differences for pair-wise power
panned sources in a standard 2-channel loudspeaker configuration. Values are based on those listed
in Table 9.5 and interpolated by the author.
1 ms 0.5 ms
Figure 9.41: Apparent angle vs. averaged interchannel time differences for pair-wise power panned
sources in a standard 2-channel loudspeaker configuration. Values are based on those listed in Table
9.5 and interpolated by the author.
9. Audio Recording 652
Pair (1/2) 1 2
C/L 14 dB 12 dB
L / LS 9 dB >16 dB
LS / RS 9 dB 9 dB
Table 9.6: Minimum interchannel amplitude difference required to locate phantom image at the
loudspeaker position. For example, it requires an interchannel amplitude difference of at least 14
dB to move a phantom image between the Centre and Left loudspeakers to 0◦ . The right side is
not shown as it is assumed to be symmetrical.
Pair (1/2) 1 2
C/L >2.0 ms 2.0 ms
L / LS 1.6 ms >2.0 ms
LS / RS 0.6 ms 0.6 ms
Table 9.7: Minimum interchannel time difference required to locate phantom image at the loud-
speaker position. For example, it requires an interchannel time difference of at least 0.6 ms to
move a phantom image between the Left Surround and Right Surround loudspeakers to 120◦ . The
right side is not shown as it is assumed to be symmetrical.
Using the smoothed averages of the phantom image locations, polar plots
can be generated to indicate the required differences to produce a desired
phantom image position as are shown in Figures 9.42 and 9.43.
5 dB 10 dB
Figure 9.42: Apparent angle vs. averaged interchannel amplitude differences for pair-wise power
panned sources in an ITU.R BS.775-1 loudspeaker configuration. Values taken from the raw data
acquired by the author in an experiment described in [Geoff Martin, 1999].
0.5 ms 1 ms 1.5 ms
Figure 9.43: Apparent angle vs. averaged interchannel time differences for pair-wise power panned
sources in an ITU.R BS.775-1 loudspeaker configuration. Values taken from the raw data acquired
by the author in an experiment described in [Geoff Martin, 1999].
9. Audio Recording 654
the individual drivers. As a result, when you pan a single sound from one
loudspeaker to another, you want to maintain a constant summed power,
rather than a constant summed pressure.
The top plot in Figure 9.44 shows the two gain coefficients determined
by the rotation of a pan knob for two output channels. Since the sum of
the two gains at any given position is 1, this algorithm is called a constant
amplitude panning curve. It works, but, if you take a look at the bottom
plot in the same figure, you’ll see the problem with it. When the signal is
panned to the centre position, there is a drop in the total summed power –
in fact, it has dropped by half (or 3 dB) relative to an image located in one
of the loudspeakers. Consequently, if this system was used for the panning
in a mixing console, as you swept an image from left to centre to right, it
would appear to get further away from you at the centre location because
it appears to be quieter.
1
0.8
0.6
Gain
0.4
0.2
0
-40 -30 -20 -10 0 10 20 30 40
Pan knob angle
0.8
Summed power
0.6
0.4
0.2
0
-40 -30 -20 -10 0 10 20 30 40
Pan knob angle
Figure 9.44: The top plot shows a linear panning algorithm where the sum of the two amplitudes
will produce the same value at all rotations of the pan knob. The bottom plot shows the resulting
power response vs. pan locations.
0.8
0.6
Gain
0.4
0.2
0
-40 -30 -20 -10 0 10 20 30 40
Pan knob angle
0.8
Summed power
0.6
0.4
0.2
0
-40 -30 -20 -10 0 10 20 30 40
Pan knob angle
Figure 9.45: The top plot shows a constant power panning algorithm where the sum of the two
powers will produce the same value at all rotations of the pan knob. The bottom plot shows the
resulting power response vs. pan locations.
Horizontal plane
Cardioids
Unless you only record pop music and you never use your imagination, all
of the graphs shown above don’t really apply to what happens when you’re
recording. This is because, usually your microphone isn’t pointing directly
forward... you usually have more than one microphone and they’re usually
pointing slightly to the left or right of forward, depending on your configu-
ration. Therefore, we have to think about what happens to the sensitivity
pattern when you rotate your microphone.
Figure 9.46 shows the sensitivity pattern of a cardioid microphone that
is pointing 45◦ to the right. Notice that this plot essentially looks exactly
the same as Figure ??, it’s just been pushed to the side a little bit.
Now let’s consider the case of a pair of coincident cardioid microphones
pointed in different directions. Figure 9.48 shows the plots of two polar pat-
terns for cardioid microphones point at -45◦ and 45◦ , giving us an included
angle (the angle subtended by the microphones) of 90◦ as is shown in Figure
9.47.
Figure 9.48 gives us two important pieces of information about how a
pair of cardioid microphones with an included angle of 90◦ will behave.
Firstly, let’s look at the vertical difference between the two curves. Since
9. Audio Recording 656
-5
-15
-20
-25
-30
-150 -100 -50 0 50 100 150
Angle of Incidence (°)
Figure 9.46: Cartesian plot of the absolute value of the sensitivity pattern of a cardioid microphone
on a decibel scale turned 45◦ to the right.
-5
Sensitivity (dB)
-10
-15
-20
-25
-30
-150 -100 -50 0 50 100 150
Angle of Incidence (°)
Figure 9.48: Cartesian plot of the absolute value of the sensitivity patterns of two cardioid micro-
phones on a decibel scale turned ±45◦ .
this plot essentially shows us the output level of each microphone for a given
angle, then the distance between the two plots for that angle will tell us the
interchannel amplitude difference. For example, at an angle of incidence (to
the pair) of 0◦ , the two plots intersect and therefore the microphones have
the same output level, meaning that there is an amplitude difference of 0
dB. This is also true at 180◦ , despite the fact that the actual output levels
are different than they are at 0◦ – remember, we’re looking at the difference
between the two channels and ignoring their individual output levels.
In order to calculate this, we have to find the ratio (because we’re think-
ing in decibels) between the sensitivities of the two microphones for all angles
of incidence. This is done using Equation 9.1.
S1
∆Amp. = 20 ∗ log10 (9.1)
S2
where
Sn = Pn + Gn ∗ cos (α + Ωn ) (9.2)
where Ω is the angle of rotation of the microphone in the horizontal
plane.
If we plot this difference for a pair of cardioids pointing at ±45◦ , the
result will look like Figure 9.49. Notice that we do indeed have a ∆Amp.
of 0 dB at 0◦ and 180◦ . Also note that the graph has positive and negative
values on the right and left respectively. This is because we’re comparing
9. Audio Recording 658
the output of one microphone with the other, therefore, when the values
are positive, the right microphone is louder than the left. Negative numbers
indicate that the left is louder than the right.
Coincident Cardioids: Included angle = 90 deg
30
20
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.49: Interchannel amplitude differences for a pair of coincident cardioid microphones in the
horizontal plane with an included angle of 90◦ .
just get a mirror image. For example, the response for an included angle of
190◦ is exactly the same as that for 170◦ , just pointing towards the rear of
the pair instead of the front.
Of course, if we actually do the calculation for an included angle of 0◦ ,
we’re only going to find out that the sensitivities of the two microphones are
9. Audio Recording 659
20
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.50: Interchannel amplitude differences for a pair of coincident cardioid microphones in the
horizontal plane with an included angle of 0◦ .
20
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.51: Interchannel amplitude differences for a pair of coincident cardioid microphones in the
horizontal plane with an included angle of 45◦ .
20
Interchannel amplitude difference (dB)
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.52: Interchannel amplitude differences for a pair of coincident cardioid microphones in the
horizontal plane with an included angle of 135◦ .
9. Audio Recording 661
20
Interchannel amplitude difference (dB)
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.53: Interchannel amplitude differences for a pair of coincident cardioid microphones in the
horizontal plane with an included angle of 180◦ .
-24 24
160
-18 18
-24 24
140 -12
18
12
-18
Included Angle (degrees)
-6 6
120 -12 12
100 -6 6
80 0
60
40
20
0
-150 -100 -50 0 50 100 150
Angle of Incidence (degrees)
Figure 9.54: Contour plot showing the difference in sensitivity in dB between two coincident cardioid
microphones with included angles of 0◦ to 180◦ , angles of rotation from -180◦ to 180◦ and a 0◦
angle of elevation. Note that Figure 9.49 is a horizontal “slice” of this contour plot where the
included angle is 90◦ .
9. Audio Recording 662
20
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.55: Interchannel amplitude differences for a pair of coincident subcardioid microphones in
the horizontal plane with an included angle of 45◦ .
20
Interchannel amplitude difference (dB)
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.56: Interchannel amplitude differences for a pair of coincident subcardioid microphones in
the horizontal plane with an included angle of 90◦ .
20
Interchannel amplitude difference (dB)
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.57: Interchannel amplitude differences for a pair of coincident subcardioid microphones
with an included angle of 135◦ in the horizontal plane.
9. Audio Recording 664
20
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.58: Interchannel amplitude differences for a pair of coincident subcardioid microphones
with an included angle of 180◦ in the horizontal plane.
20
Interchannel amplitude difference (dB)
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.59: Interchannel amplitude differences for a pair of coincident bidirectional microphones
in the horizontal plane with an included angle of 90◦ .
20
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.60: Interchannel amplitude differences for a pair of coincident bidirectional microphones
in the horizontal plane with an included angle of 45◦ .
20
Interchannel amplitude difference (dB)
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.61: Interchannel amplitude differences for a pair of coincident bidirectional microphones
in the horizontal plane with an included angle of 135◦ .
9. Audio Recording 667
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.62: Interchannel amplitude differences for a pair of coincident bidirectional microphones
in the horizontal plane with an included angle of 180◦ .
Hypercardioids
Not surprisingly, the response of a pair of hypercardioid microphones
looks like a hybrid of the bidirectional and cardioid pairs. As can be seen
in Figure 9.64, there are four infinite peaks in the value of ∆Amp., similar
to bidirectional pairs, however the slope of the peaks are skewed further left
and right as in the case of cardioids.
Again, similar to the case of bidirectional microphones, changing the
included angle of the hypercardioids results in a further skewing of the re-
sponse curve to one side or the other as can be seen in Figures 9.65 and
9.66.
Figure 9.67 shows the interesting case of hypercardioid microphones with
an included angle of 180◦ . In this case the maximum sensitivity point in
the rear lobe of each microphone is perfectly aligned with the maximum
sensitivity point in the other microphone’s front lobe. However, since the
rear lobe has a sensitivity with an absolute value that is 6 dB lower than
the front lobe, the value of ∆Amp. remains outside the ±6 dB window for
the larger part of the 360◦ rotation.
Spaced omnidirectionals
In the case of spaced omnidirectional microphones, it is commonly as-
sumed that the distance to the sound source is adequate to ensure that the
impinging sound can be considered to be a plane wave. In addition, it is
also assumed that there is no difference in signal levels due to differences
9. Audio Recording 668
160
140
20
0
-150 -100 -50 0 50 100 150
Angle of Incidence (degrees)
Figure 9.63: Contour plot showing the difference in sensitivity in dB between two coincident bidi-
rectional microphones with included angles of 0◦ to 180◦ , angles of rotation from -180◦ to 180◦
and a 0◦ angle of elevation.
20
Interchannel amplitude difference (dB)
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.64: Interchannel amplitude differences for a pair of coincident hypercardioid microphones
in the horizontal plane with an included angle of 90◦ .
9. Audio Recording 669
20
Interchannel amplitude difference (dB)
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.65: Interchannel amplitude differences for a pair of coincident hypercardioid microphones
in the horizontal plane with an included angle of 45◦ .
20
Interchannel amplitude difference (dB)
10
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.66: Interchannel amplitude differences for a pair of coincident hypercardioid microphones
in the horizontal plane with an included angle of 135◦ .
9. Audio Recording 670
20
−10
−20
−30
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.67: Interchannel amplitude differences for a pair of coincident hypercardioid microphones
in the horizontal plane with an included angle of 180◦ .
160
-24 24
140
Included Angle (degrees)
-24 24
-18 18
120 18
-18
12 -12 12 -12
100 -12 12
6 -6 6 -6
80 0 -6 6 0
12 -12
0
60
6 -6
40
20
0
-150 -100 -50 0 50 100 150
Angle of Incidence (degrees)
Figure 9.68: Contour plot showing the difference in sensitivity in dB between two coincident hy-
percardioid microphones with included angles of 0◦ to 180◦ , angles of rotation from -180◦ to 180◦
and a 0◦ angle of elevation.
9. Audio Recording 671
ϑ
D
Figure 9.69: Spaced omnidirectional microphones showing the microphone separation d, the angle
of rotation ϑ and the resulting extra distance D to the further microphone.
D = d sin ϑ (9.3)
where d is the distance between the microphone capsules in cm.
The additional time ∆T ime required for the sound to travel this distance
is calculated using Equation 9.4.
10D
∆T ime = (9.4)
c
where ∆T ime is the interchannel time difference in ms, ϑ is the angle
of incidence of the sound source to the pair, and c is the speed of sound in
m/s.
This time of arrival difference is plotted for various microphone separa-
tions in Figures 9.70 through 9.73. Note that the general curve formed by
this calculation is a simple sine wave, scaled by the separation between the
microphones. Also note that the value of ∆T ime is 0 ms for sound sources
located at 0◦ and 180◦ and a maximum for sound sources at 90◦ and -90◦ .
9. Audio Recording 672
1
Interchannel time difference (ms)
0.5
−0.5
−1
−1.5
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.70: Interchannel time differences for a pair of spaced microphones in the horizontal plane
with a separation of 0 cm.
Three-dimensional analysis
One of the big problems with looking at microphone polar responses in
only the horizontal plane is that we usually don’t only have sound sources
restricted to those two dimensions. Invariably, we tend to raise or lower the
microphone stand to obtain a different direct-reverberant ratio, for example,
without considering that we’re also changing the vertical angle of incidence
to the microphone pair. In almost all cases, this has significant effects on
the response of the pair which can be seen in a three-dimensional anaysis.
In order to include vertical angles in our calculations of microphone
sensitivity, we need to use a little spherical trigonometry – not to worry
though. The easiest way to do this is to follow the instructions below:
1
Interchannel time difference (ms)
0.5
−0.5
−1
−1.5
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.71: Interchannel time differences for a pair of spaced microphones in the horizontal plane
with a separation of 15 cm.
1
Interchannel time difference (ms)
0.5
−0.5
−1
−1.5
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.72: Interchannel time differences for a pair of spaced microphones in the horizontal plane
with a separation of 30 cm.
9. Audio Recording 674
−0.5
−1
−1.5
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.73: Interchannel time differences for a pair of spaced microphones in the horizontal plane
with a separation of 45 cm.
−0.75
−1.25
−1.75 0 1.75
60
1 0.25
−0.25
−0.25 −1.5 0.25 1.5
50 −1.25 0.5
−1 −0.75−0.5
−1
−0.5
0.5
40 0.75 1
0.75
30 −0.75
20 0
−0.5 0.5
−0.25 0.25
−0.25 0.25
10
0 0 0
−150 −100 −50 0 50 100 150
Angle of Incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.74: Interchannel time differences vs. microphone separation for a pair of spaced micro-
phones in the horizontal plane.
9. Audio Recording 675
3. Rotate your right index finger 45◦ upwards. Make sure that this new
angle of rotation goes 90◦ off the horizontal plane.
4. The question we now need to ask is, “What is the angle subtended by
your two fingers?”
The answer to this last question is actually pretty easy. If we call the
horizontal angle of rotation ϑ, matching the angle in the horizontal plane
we talked about earlier, and the vertical angle of rotation φ, then the total
resulting angle γ can be calulated using Equation 9.5.
There’s one more thing. In order to simplify our lives a bit, I’m going to
restrict the included angle between the microphones to the horizontal plane.
Basically, this means that, for all three dimensional analyses in this paper,
we’re thinking that we have a pair of microphones that’s set up parallel to the
floor, and that the instrument can be at any angle of incidence to the pair.
That angle of incidence is comprised of an angle of rotation and an angle of
elevation. We’re not going to have the luxury of tilting the microphone pair
on its side (unless you’re able to just think of that as moving the instrument
to a new position...).
So, in order to convert Equation 9.2 nto a three-dimensional version, we
combine it with Equation 9.7, resulting in Equation 9.8.
Sound
Source
φ
ϑ
Figure 9.75: A three-dimensional view of a microphone showing the microphone angle of rotation,
Ω, the angle of rotation of the sound source to the pair ϑ, and the elevation angle of the sound
source φ.
Figure 9.76: A comparison of the geographic and spherical coordinate systems. The point (ϑ, φ) in
the geographic coordinate system shown on the left is identical to the point (α, δ) in the spherical
coordinate system on the right.
Cardioids
We’ll begin with a pair of coincident cardioid microphones. Figures 9.77
and 9.78 show plots of the interchannel amplitude difference of a pair of
cardioids with an included angle of 90◦ . (Unfortunately, if you’re looking
at a black and white printout of this paper, you’re going to have a bit of a
problem seeing things... sorry...)
Take a look at the spherical plot in Figure 9.77. If we look at the value
in dB plotted on this sphere around its equator, we’d get exactly the same
graph as is shown in Figure 9.48. Now, however, we can see that changes in
the vertical angle might have an effect on the way things sound. For example,
if we have a sound source with a vertical elevation of 0◦ and a horizontal
rotation of 90◦ , then the interchannel amplitude difference is about 18 dB,
give or take (to get this number, I cheated and looked back at Figure 9.49).
Now, if we maintain that angle of rotation and change the vertical angle,
we reduce this difference until, with a vertical angle of 90◦ , the interchannel
amplitude difference is 0 dB – in other words, if the sound source is directly
above the pair, the outputs of the two mic’s are the same.
9. Audio Recording 678
What effect does this have on our phantom image location in the real
world? Let’s go do a live-to-two-track recording where we set up a 90◦ pair
of cardioids at eye level in front of an orchestra at the conductor’s position.
This means that your principal double bass player is at a vertical angle of
0◦ and a horizontal angle of 90◦ , so that instrument will be about 18 dB
louder in the right channel than in the left. This means that its phantom
image will probably be parked in the right speaker. We start to listen to the
sound and we decide that we’re too close to the orchestra – and since we
read a rule from an unreliable source that concert halls always sound much
better if you go up about 4 or 5 m, we’ll raise the mic stand way up. This
means that the double bass is still at an angle of rotation of 90◦ , but now
at a vertical angle of about 60◦ (we went really high). This means that the
interchannel amplitude difference for the instrument has dropped to about
6 dB, which would put in well inside the right loudspeaker in the recording.
So, the moral of the story is, if you have a pair of coincident cardioids
and you raise your mic stand without pointing the mic’s downwards to
compensate, your orchestra gets narrower in the stereo image. Also, don’t
forget that this doesn’t just apply to two-channel stuff. We could just as
easily be talking about the image between your surround loudspeakers.
Figure 9.78 shows a contour map of the same plot shown in Figure 9.77
which makes it a little easier to read. Now, the two angles are the X- and
Y-axes of the graph, and the lines indicate the angles where a given inter-
channel amplitude difference occurs. From this, you can see that, unless
your rotation is 0◦ , then any change in the vertical angle, upwards or down-
wards, will reduce your interchannel difference. This is true for all included
angles (except for 0◦ ) of a pair of coincident cardioids.
There’s one more thing to consider here, and that the fact that the
microphones are not just recording an instrument – they’re recording the
room as well. So, you have to keep in mind that, in the case of coincident
cardioids, all reflections and reveberation that come from above or below the
microphones will tend to pull towards the centre of your loudspeaker pair.
Again, the greater the angle of elevation, the more the image collapses.
Subcardioids
Subcardioids have a pretty predictable behaviour, now that we’ve looked
at the response patterns of cardioid microphones. The only big difference is
that, as we’re seen before, the interchannel amplitude difference never goes
outside the ±6 dB window. Apart from that, the responses are not too
different.
Bidirectionals
Now, let’s compare those results with a pair of coincident bidirectional
9. Audio Recording 679
Figure 9.77: Interchannel amplitude difference response (in dB) for a pair of coincident cardioid
microphones with an included angle of 90◦ .
80
0
0
−5 5
60
−10 10
40
−15 15
−5
5
Angle of Elevation (deg)
20 −20
−25 −10 10 25
−5 −15 5 1510
−10 20
20
0
0
−30 30
−25
−20 25
−20
−15 15
−40 −5 5
−10 10
−60 −5 5
0
−80
0 0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.78: Interchannel amplitude difference response for a pair of coincident cardioid microphones
with an included angle of 90◦ .
9. Audio Recording 680
80
0
0
60
−5 5
40
−30
0 −15 3020
10
−10
−10−25 10
0 25
−20
−20 15
−5 5
−5 5
−40
−60
0
−80
0
0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.79: Interchannel amplitude difference response for a pair of coincident cardioid microphones
with an included angle of 45◦ .
80
−5
5
60
−10
0 10 5
−15
40 −5
Angle of Elevation (deg)
5
−20 15 20
20 30
10
−30
−5 −5 15
−25 −10 25
−25 25
0 −20 5 20
−30 30
−10 −20 20
−15
10
−20 0
−15
−40 15
5
10
−10
−60 −5 −5
−80
0 0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.80: Interchannel amplitude difference response for a pair of coincident cardioid microphones
with an included angle of 135◦ .
9. Audio Recording 681
80
−5
5
60 5
−10 10
0
40
−5
−15 −20 20 15
Angle of Elevation (deg)
−10 10 5
20 −15 −5 15 30
−30
−25 25
−15 15
−10 10
−30 30
0 −5
5
−20−25 25 20
−20 20
−10 10
−20 0
−40
−15 15
5
−5
−10 10
−60
−5 5
−80
0 0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.81: Interchannel amplitude difference response for a pair of coincident cardioid microphones
with an included angle of 180◦ .
Figure 9.82: Interchannel amplitude difference response (in dB) for a pair of coincident subcardioid
microphones with an included angle of 90◦ .
9. Audio Recording 682
80
0
60 0 1
−1
40 0 −1
1
−2 2
00
−1 0
−2 2
−1
−20 0
−40 0 1
1
−60 −1
0
−80 0
0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.83: Interchannel amplitude difference response for a pair of coincident subcardioid micro-
phones with an included angle of 45◦ .
80
−1 1
0
60
0
−2 2
1
40 −2 2
−1
−1
−3 3
Angle of Elevation (deg)
20 −2 −4 4 2
−3 3 1
−3 3
0
−4 4
−20 0
−1
1
−2 2 1
−40 −3 −1 3
−2 2
−60
0
−1 1
−80
0 0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.84: Interchannel amplitude difference response for a pair of coincident subcardioid micro-
phones with an included angle of 90◦ .
9. Audio Recording 683
80 1
0 −1
−1 −2
2
60
−3 1 3
40 2
−4
0
Angle of Elevation (deg)
3
−2 4 1
20 3
5 4
−2 −3 −5 4
−4 −1
0 2
−1 −5
−3
−20 3 5
1
2
−40 −4
4
0
−2 3 1
−3
−2
−60
2
−1
−1
1
−80
0 0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.85: Interchannel amplitude difference response for a pair of coincident subcardioid micro-
phones with an included angle of 135◦ .
80 −1 0 1
0 −2 2
60
3
−3 −1 1
40
−3 1
−1 −2 4 2
Angle of Elevation (deg)
−4 −4 3
20 −1 4 1
−2 −5 5 2
−2 2
−3
0
4 3
0
0 −4 −5 5
−5 5
3
−20 −3
−1 1
−40 −4
4
−1 1
−2 −3 2
−2 2 3
−60
0
−80 0 −1 1
0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.86: Interchannel amplitude difference response for a pair of coincident subcardioid micro-
phones with an included angle of 180◦ .
9. Audio Recording 684
microphones instead. One caveat here before we begin, because we’re looking
at decibel scales, and because calculators don’t like being asked to find the
logarithm of a negative number, we’re looking at the absolute value of the
sensitivity of the microphones again.
This time, the plots tell a very different story. Notice how, in the spheri-
cal plot in Figure 9.87 and in the contour plots in Figures 9.88 through 9.90,
we get a bunch of vertical lines. The summary of what that means is that
the interchannel amplitude difference of a pair of bidirectional microphones
doesn’t change with changes in the angle of elevation. So you can raise
and lower the mic stand all you want without collapsing the image of the
orchestra. As we get further off axis (because we’ve changed the vertical
angle), the orchestra will get quieter, but it won’t pull to the centre of the
loudspeakers. This is a good thing.
There is a small drawback here, though. Remember that if you have
a big vertical angle, then a small horizontal movement of the instrument
corresponds to a large change in the angle of rotation, so you can get a
violent swing in the image location if you’re not careful. For example, if
your bidirectional pair is really high and you’re recording a singer that likes
to sway back and forth, you might wind up with a phantom image that
is always swinging back and forth between the two speakers, making your
listeners seasick.
Also, note with a pair of bidirectional microphones with an included
angle of 180 degrees that all angles of incidence produce the same sensitivity
– just remember that the two signals are of opposite polarity. If you want to
do this, use your “phase flip” button. It’s cheaper than a second microphone.
Hypercardioids
Once again, hypercardioids exhibit properties that are recognizable as
being something between a cardioid and a bidirectional. If we look at the
spherical plot of a pair of coincident hypercardioids with an included an-
gle of 90◦ shown in Figure 9.92, we can see that there is a dividing line
along the side of the pair, similar to that found in a bidirectional pair. Just
like the bidirectionals, this follows the null point in one of the two micro-
phones, the dividing line between the front and rear lobes. However, like
the cardioid pair, notice that vertical changes alter the interchannel ampli-
tude difference. There is one big difference from the cardioids, however. In
the case of cardioids, a vertical change always results in a reduction in the
interchannel amplitude difference whereas, in the case of a hypercardioid
pair, it is possible to have a vertical change that produces an increase in the
interchannel amplitude difference. This is most easily visible in the contour
plot in Figure 9.94. Notice that if you start with a horizontal angle of 100
9. Audio Recording 685
Figure 9.87: Interchannel amplitude difference response (in dB) for a pair of coincident bidirectional
microphones with an included angle of 90◦ .
10 −20 2010
10 15 30
0 −25 0 −5 −20 −10
−15
−10 −5
80 5 5 25 5 0
2525
20
20 5 −10 −20 −25
15 −5 30 −20
20 15 −30
−30 −15
−5 −15 −15
10 −25 −10 25
15 −25
60
40
Angle of Elevation (deg)
20
10 −20 2010
10 15 30
0 −25 0 −5 −20
−15
−10 −10 −5
5 5 25 5 0
2525
20
0 20 5 −10 −20 −25
15 −5 30 −20
20 15 −30
−30 −15
−5 −15 −15
10 −25 −10 25
15 −25
−20
−40
Figure 9.88: Interchannel amplitude difference response for a pair of coincident bidirectional micro-
phones with an included angle of 45◦ .
9. Audio Recording 686
40
−20
−40
−60 5
−5
5 10 20 15 10 5 0 −20 1520 2515 0
−10 −30 −15
30 20 −15 −25
−25
−25 −20−30
−20 −10 −5 −15−25 −15 −5
−80 2530
−5 0 25 20 −20
−30
−30 10 3030 10 5 −10 −10
15 25
Figure 9.89: Interchannel amplitude difference response for a pair of coincident bidirectional micro-
phones with an included angle of 90◦ .
30 30 −30
15 15 0
80 20
25 5 −20
−20 203015
5 1015 −15
−20
−25 10 −10 −25 −5
20 −5 −5 −20
25 −10 −25 −10 0 −15
−25
30 0 5 −30−10
5 −5 −30
−15−30−15
60 10 10 20
2525
40
Angle of Elevation (deg)
20
30 30 −30
15 15 0
20
25 5 −20
−20 203015
5 1015 −15
−20
−25 10 −10 −25 −5
0 20 −5 −5 −20
25 −10 −25 −10 0 −15
−25
30 0 5 −30−10
5 −5 −30
−15−30−15
10 10 20
2525
−20
−40
−60 30
15 15 30 0 −30
20
25 5 −20
−20 203015
5 1015 −15
−20
−25 10 −10 −25 −5
20 −5 −5 −20
25 −10 −25 −10 0 −15
−25
−80 5 30 0 5 −30−10
−5 −30
−15−30
−15
10 10 20
2525
Figure 9.90: Interchannel amplitude difference response for a pair of coincident bidirectional micro-
phones with an included angle of 135◦ .
9. Audio Recording 687
−40 0 0 0
0 0
0 0
00 0
0 0
0
−60 0 0 0 0
0 0 0
0 0 0
0 0
5 −5
−80 0 0 0 0 0
0 0 0 0
0 0 0
00 0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.91: Interchannel amplitude difference response for a pair of coincident bidirectional micro-
phones with an included angle of 180◦ .
degrees, then a vertical change off the equator will cause the interchannel
amplitude difference to increase to ∞ dB before it reduces back down to 0
dB at the north or south pole.
There are three principal practical issues to consider here. Firstly, re-
member that a change in the height of your mic stand with a pair of hyper-
cardioids will change the apparent width of your sound stage. Unlike the
case of cardioids, however, the change might wind up increasing the width
of some components while simultaneously decreasing the width of others.
So you wind up squeezing together the centre part of the orchestra while
you pull apart the sides.
The second issue to consider is similar to that with cardioids. Don’t
forget that you’ve got sound coming in from all angles at the same time - so
it’s possible that some parts of your room sound will be pulled wider while
others are pushed closer together.
Thirdly, there’s just the now-repetitious reminder that a lot of the signals
coming into the pair are arriving at the rear lobes of the microphones, so
you’re going to have signals that are either in opposite polarity in the two
channels, or similar, but inverted polarities.
Spaced omnidirectionals
Calculation of the interchannel time of arrival differences for a pair
of spaced microphones in a three-dimensional world requires only a small
change to Equation 9.3 as can be seen in Equation 9.10.
9. Audio Recording 688
Figure 9.92: Interchannel amplitude difference response (in dB) for a pair of coincident hypercar-
dioid microphones with an included angle of 90◦ .
20 −20
−30
−5 30 −30
30 1520 20
30 −5 −30
−20
−25 −15
−15
0 10
15 −10 5
−10 −25 −10 −15
15 5 25 −15 −10
5 15 0
0 25
10
−20 20 5
25 −25 −5
25 −30 30 −20
10 −25
30 −20
−30
−5
−40 20
−30 30
30 0 10
−30 30
−20 30
−5 30
1520 20
−60 −20
−5
−30
−25 30
05
10
15 −10 −15
−25 −15−10
5
25 15 −10
0−15
−25 25
30 20
10
−5 5
−80
Figure 9.93: Interchannel amplitude difference response for a pair of coincident hypercardioid mi-
crophones with an included angle of 45◦ .
9. Audio Recording 689
80 −10 −5 5
−5 0 15 10
−15 −25 20 5
−20 −30 25 30
−20 15 25
20
−10 −15 25 50
−5 0 30
60 5
10 10 −5
2020
10
15
−30 −15 −15 −10
25 −25 −20
−30 −15
−10−25
40 2515
30
10
−10 20
30 −30
−20
−25 15 −25 −5
Angle of Elevation (deg)
20 −5 15
5 20
5 5
−5 30
−10
25 0
25 −20
0 0 −20
0 −15
5 25
25 30
30 −30
10 −5
10
−20 20 20
1015 −30 −15 −15 −10
−25 −20
30 15 −30 −10−25−15
10 −30
−40 20
30
25 −10 −20
−25
−25 15 −5
25
−5 15
5 20 5
−60 0 −10 5 30 0
−15 −20 −5 25
25 10 20
−25
−30
−30 30 25
−20 −15 0 15
−10 10
−80 −5 5
0
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.94: Interchannel amplitude difference response for a pair of coincident hypercardioid mi-
crophones with an included angle of 90◦ .
−5 0 5
80 −10 10
−15 25
−30 −25 15
−20 −20 2030 20
−30 −5 30 15 5
60 −10−25
−30
−20 −25 15 30
2520
−15 10
−15 25
−15 −10 30
−30
40−10
−5 10 5 15
10
25
5
Angle of Elevation (deg)
−25
0 10 0
20 5 −5
−20 −10
−5
−15 0
15
10 20 −10
−15
5 −5
0 20 −30 15 −20
−20 2030
−5 15
−25
−20 25
−10 −15
30 5
−30 10
25
−40 5
−5 10
−25
−20−10 10 20
−20
−20 −10 5
−60 −25 −15
−30 −15 0 20 30 15
15
−30 25
−15 15 20 30 30
−30 −25 −20 25
−5 −10 −5 10
−80
5
Figure 9.95: Interchannel amplitude difference response for a pair of coincident hypercardioid mi-
crophones with an included angle of 135◦ .
9. Audio Recording 690
Figure 9.96: Interchannel amplitude difference response for a pair of coincident hypercardioid mi-
crophones with an included angle of 180◦ .
80
0
60
0.25
40
−0.25 −0.25
Angle of elevation (deg)
20
0.25
0 0.25
0
−20
−0.25 −0.25
−40
0.25
−60
−80 0
Figure 9.97: Interchannel time differences in ms for a pair of spaced microphones with a separation
of 15 cm.
80
0
−0.25 0.25
60
−0.5 0.5
40 −0.25 0.25
0.75
Angle of elevation (deg)
−0.75
20
−0.5 −0.25 0.25 0.5
−0.25 0.25
0
0
−0.5 0.5
−0.75
−20
0.75
−40
−0.5 0.5
−60 −0.25 0.25
−0.25 0.25
0
−80
Figure 9.98: Interchannel time differences in ms for a pair of spaced microphones with a separation
of 30 cm.
9. Audio Recording 692
80 0
−0.25 0.25
−0.5 0.5
60 −0.25 0.25
0.75
40 −0.75
−1 1
−0.75 −0.5 0.5
−0.75
−40 −0.75
−0.5 0.5
0.75
−0.25 0.25
−80
Figure 9.99: Interchannel time differences in ms for a pair of spaced microphones with a separation
of 45 cm.
Figure 9.100: Interchannel time differences in ms for a pair of spaced microphones with a separation
of 40 cm.
9. Audio Recording 693
Horizontal plane
Cardioids
Figure 9.101 shows the summed power response of a pair of coincident
cardioid microphones with an included angle of 90◦ . as you can see, the total
power for sources with an angle of incidence of 0◦ is about 2 dB. As you rotate
away from the front of the pair, the summed power drops to a minimum of
about -12 dB directly behind. Remember from the previous chapter that
the 0◦ and 180◦ locations in the horizontal plane are the two positions where
the interchannel amplitude difference is 0 dB, therefore instruments in these
two locations will result in phantom images between the two loudspeakers,
however, we can now see that, although this is true, the two images will
differ in power by approximately 15 dB, with sources in the front of the
microphone pair being louder than those behind.
The range of the summed power for a pair of cardioids changes with
the included angle as is shown in Figures 9.101 through 9.104. In fact, the
smaller the included angle, the greater the range. As can be seen in Figure
9.102, the range of summed power is approximately 28 dB compared to only
3 dB for an included angle of 180◦ . Also note that for larger included angles,
there are two symmetrical peaks in the power response rather than one at
an angle of incidence of 0◦ .
Each of these analyses, in conjunction with their corresponding inter-
channel amplitude difference plot for the same included angle, gives us an
indication of the general distribution of energy across the reproduced sound
stage. For example, if we look at a pair of coincident cardioids with an
9. Audio Recording 694
−5
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.101: Summed power response for a pair of coincident cardioid microphones in the horizontal
plane with an included angle of 90◦ .
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.102: Summed power response for a pair of coincident cardioid microphones in the horizontal
plane with an included angle of 45◦ .
9. Audio Recording 695
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.103: Summed power response for a pair of coincident cardioid microphones in the horizontal
plane with an included angle of 135◦ .
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.104: Summed power response for a pair of coincident cardioid microphones in the horizontal
plane with an included angle of 180◦ .
9. Audio Recording 696
included angle of 45◦ , we can see that instruments, reflections and rever-
beration with an angle of incidence of 0◦ are much louder than those away
from the front of the pair. In addition, we can see from the ∆Amp. plot
that a large portion of sources around the pair will image near the centre
position between the loudspeakers. Consequently, the resulting sound stage
appears to “clump” in the middle rather than being spread evenly across
the playback room.
In addition, for smaller included angles, it can be seen that much more
emphasis is placed on sound sources and reflections in the front of the pair
with sources to the rear attenutated.
Subcardioids
Figures 9.105 through 9.108 show the summed power plots for the hor-
izontal plane of a pair of subcardioid microphones. As is evident, there is
a much smaller range of values than is seen in the cardioid microphones,
however the general shape of the curves are similar. As can be seen in these
plots, there is a more evenly distributed sensitivity to sound sources and
reflections around the microphone pair, however, due to the limited range of
values for ∆Amp., these sources typically image between the loudspeakers
as well.
Coincident Subcardioids: Included angle = 45 deg
10
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.105: Summed power response for a pair of coincident subcardioid microphones in the
horizontal plane with an included angle of 45◦ .
Bidirectionals
Due to the symmetrical double lobes of bidirectional microphones, they
exhibit a considerably different power response as can be seen in Figures
9.109 through 9.112. When the included angle of the microphones is 90◦ ,
9. Audio Recording 697
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.106: Summed power response for a pair of coincident subcardioid microphones in the
horizontal plane with an included angle of 90◦ .
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.107: Summed power response for a pair of coincident subcardioid microphones in the
horizontal plane with an included angle of 135◦ .
9. Audio Recording 698
−5
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.108: Summed power response for a pair of coincident subcardioid microphones in the
horizontal plane with an included angle of 180◦ .
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.109: Summed power response for a pair of coincident bidirectional microphones in the
horizontal plane with an included angle of 45◦ .
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.110: Summed power response for a pair of coincident bidirectional microphones in the
horizontal plane with an included angle of 90◦ .
9. Audio Recording 700
−5
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.111: Summed power response for a pair of coincident bidirectional microphones in the
horizontal plane with an included angle of 135◦ .
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.112: Summed power response for a pair of coincident bidirectional microphones in the
horizontal plane with an included angle of 180◦ .
9. Audio Recording 701
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.113: Summed power response for a pair of coincident hypercardioid microphones in the
horizontal plane with an included angle of 45◦ .
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.114: Summed power response for a pair of coincident hypercardioid microphones in the
horizontal plane with an included angle of 90◦ .
9. Audio Recording 702
−5
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.115: Summed power response for a pair of coincident hypercardioid microphones in the
horizontal plane with an included angle of 135◦ .
−5
Summed power (dB)
−10
−15
−20
−25
−30
−35
−40
−150 −100 −50 0 50 100 150
Angle of incidence (deg)
Geoff Martin www.tonmeister.ca
Figure 9.116: Summed power response for a pair of coincident hypercardioid microphones in the
horizontal plane with an included angle of 180◦ .
9. Audio Recording 703
omnidirectional. Both cases happen frequently, but we’ll stick with the as-
sumption for this paper. As a result, we’ll assume that the summed power
output for a pair of omnidirectional microphones is +3 dB relative to either
of the microphones for all angles of rotation and elevation.
Three-dimensional analysis
As before, we can only get a complete picture of the response of the micro-
phones by looking at a three-dimensional response plot.
Cardioids
Figure 9.117: Summed power response for a pair of coincident cardioid microphones with an included
angle of 90◦ .
80
−1
60 −5 0
−5
−10
40 −1
0
−15 −15
0 −1
0
−5 0
−20
−20 −5 −10
−15
−40
−10
−1
−60
0
−1
−5 −5
−80
Figure 9.118: Summed power response for a pair of coincident cardioid microphones with an included
angle of 45◦ .
80
−5 −5
−1
60
40
−10 −10
−1
0
Angle of Elevation (deg)
20 −1
−5 0
−5
0
−1
0
−20
−10 −10
−40 −1
−60
−5 −1 −5
−80
Figure 9.119: Summed power response for a pair of coincident cardioid microphones with an included
angle of 90◦ .
9. Audio Recording 705
80
60
−5
−1
40
−5
−1
Angle of Elevation (deg)
20 −1
0
0
0
0 −1
0
−5
−20
−40 −1 −5
−1
−60
−80
Figure 9.120: Summed power response for a pair of coincident cardioid microphones with an included
angle of 135◦ .
80
60
40 −1
−1
Angle of Elevation (deg)
20
0 −1 −1
−20
−1 −1
−40
−60
−80
Figure 9.121: Summed power response for a pair of coincident cardioid microphones with an included
angle of 180◦ .
9. Audio Recording 706
Figure 9.122: Summed power response for a pair of coincident subcardioid microphones with an
included angle of 90◦ .
0 0
60 −1 1
−1
40 0.6
−2 −2
Angle of Elevation (deg)
20 0.6
2
1
0.6
0 −1 −1
0 2 0
1
−20
−2 −2
−40
0.6 2
−60 1
0.6
0 0
1
−80
0.6
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.123: Summed power response for a pair of coincident subcardioid microphones with an
included angle of 45◦ .
9. Audio Recording 707
60
1 0
40 0.6
Angle of Elevation (deg)
2
20 −1 −1
1
2 0.6
0
0
−20 1 0
2
−40
−1 0.6 1 −1
−60
0
0.6
1 0
−80
0.6
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.124: Summed power response for a pair of coincident subcardioid microphones with an
included angle of 90◦ .
0.6
80
0 0.6 1 0
60
0.6
40
Angle of Elevation (deg)
20 1 1
1
0
0 0
0.6
−20
0.6
−40
1
1
−60 0 0
−80 0.6
0.6
Figure 9.125: Summed power response for a pair of coincident subcardioid microphones with an
included angle of 135◦ .
9. Audio Recording 708
80
60 0.6
0.6
40 0.6
0.6
0.6
0.6
−20
−40 0.6
0.6
−60
0.6
−80
Figure 9.126: Summed power response for a pair of coincident subcardioid microphones with an
included angle of 180◦ .
Figure 9.127: Summed power response for a pair of coincident bidirectional microphones with an
included angle of 90◦ .
9. Audio Recording 709
microphone pair shows a number of horizontal lines. This means that if you
have a sound source at a given angle of elevation, changes in the angle of
rotation will have no effect on the apparent level of the sound source. This,
in turn, means that all sources at a given angle of elevation have the same
apparent total gain.
Additionally, notice that the point where the total power is the greatest
is the horizontal plane, with a value of 0 dB with decreasing level as we
move away from the equator.
Coincident Bidirectionals: Included angle = 90 deg
−25 −30 −25 −30 −25 −30
−20 −20 −20
80 −15 −15 −15
−10 −10 −10
60
−5 −5 −5
40
−1 −1 −1
Angle of Elevation (deg)
20
−20
−1 −1 −1
−40
−5 −5 −5
−60
Figure 9.128: Summed power response for a pair of coincident bidirectional microphones with an
included angle of 90◦ .
40 0
0
0 −1 −5 −1 −1 −1
0
−1
0 0
−20
0
−40 0
−1 −1 −1
−5
−5 −5 −5
−60
−10
−10
−10 −10
−15 −15
−80 −15 −20 −15
−25 −20 −20
−30 −30 −25 −30 −25
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.129: Summed power response for a pair of coincident bidirectional microphones with an
included angle of 45◦ .
40 0 0
−1 −5
−1
−5
Angle of Elevation (deg)
20 −1
−1
−5
0 −1 −5
−1
0
0
−20 0
0
−5
−40
−5 −1
−1
−5
−60 −10 −5 −10
Figure 9.130: Summed power response for a pair of coincident bidirectional microphones with an
included angle of 135◦ .
9. Audio Recording 711
−5 0
Angle of Elevation (deg)
0 −20
−15 −25
20 −20 −1 0 0 −1 −25
−15
−5
−25 −10 −25−10 0
0 −30
−20
0 3 3
−10 −15 −10
−15 −15
−5 −15
−30
−20 −5
−1 −1 −1
−1
−20
−40
0 0
−5
−20
−25
−30 −25
−20 −5
−60 −10 −25 −10
−25
−5 −30
−10 −10 −20
−15 −15 −15
−15
−80
−30 −20 −20
−25 −30 −30 −25
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.131: Summed power response for a pair of coincident bidirectional microphones with an
included angle of 180◦ .
important thing to note here is that, although the location with the highest
summed power value is on the horizontal plane as in the case of the cardioid
microphones, the point of minimum power is between the equator and the
poles in all cases but an included angle of 180◦ .
9.4.4 Correlation
Finally, we’ll take a rather holistic view of the pair of microphones and take
a look at the correlation coefficient of their outputs. This can give us a
general idea of the similarity of the two signals which could be interpreted
as a sensation of spaciousness. We have to be very careful here in making
this jump between correlation and spaciousness as will be discussed below,
but first, we’ll look at exactly what is meant by the term “correlation.”
Correlation Coefficient
The term “correlation” is one that is frequently misused and, as a result, mis-
understood in the field of audio engineering. Consequently, some discussion
is required to define the term. Generally speaking, the correlation of two au-
dio signals is a measure of the relationship of these signals in the time domain
[Fahy, 1989]. Specifically, given two two-dimensional variables (in the case of
audio, the two dimensions are amplitude and time), their correlation coeffi-
cient, r is calculated using their covariance sxy and their standard deviations
9. Audio Recording 712
Figure 9.132: Summed power response for a pair of coincident hypercardioid microphones with an
included angle of 90◦ .
80 −5
−15
−20 −10 −20
−30−25 −10 −30
−25
60
−15
−1
−5 −10
−10 0
40
−5
Angle of Elevation (deg)
20 −1
0 −5
−1
0
−5
−1
−20
0
−5 0
−40 −10
−5 −10
−15 −1
−60 −20 −20
−30−25 −30
−25
−5 −15
−80
−10 −10
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.133: Summed power response for a pair of coincident hypercardioid microphones with an
included angle of 45◦ .
9. Audio Recording 713
−10 −10
Angle of Elevation (deg)
20 −1
−5 0
0 −5
−1
−20
0
−5
−40 −10 −1
−10
−15
−5 −20
−60 −25 −25
−20
−5
−15
−80
−10 −10
−150 −100 −50 0 50 100 150
Angle of Rotation (deg)
Geoff Martin www.tonmeister.ca
Figure 9.134: Summed power response for a pair of coincident hypercardioid microphones with an
included angle of 90◦ .
80
−10
−10
−5 −5
60
−15 −15
40 −20 −20
−25 −25
−1
−5
−30
Angle of Elevation (deg)
−1 −30
20 0 −5
0
−1
0 −5 −10
−10 −1 0
−15 0 −15
−20 −1
−30 −1
−30
−20 −20
−25 −25
−40
−5
−60 −5 −5
−10
−10
−80
Figure 9.135: Summed power response for a pair of coincident hypercardioid microphones with an
included angle of 135◦ .
9. Audio Recording 714
80
60 −5
−5
40 −5
−5 −1
−1
0 0 0
−5
−5
−1 −1
−20 −5
−5 0
−1
−40
−60
−5 −5
−80
Figure 9.136: Summed power response for a pair of coincident hypercardioid microphones with an
included angle of 180◦ .
r = cos(ωτ ) (9.14)
where the radian frequency ω is defined in Equation 9.15 [Strawn, 1985]
and where τ is the time separation of the two sinusoids.
ω42πf (9.15)
where the symbol 4 denotes “is defined as” and f is the frequency in
Hz.
Further investigation of the topic of correlation highlights a number of
interesting points. Firstly, two signals of identical frequency and phase have
9. Audio Recording 716
So what?
In the field of perception of concert hall acoustics, it has long been known
that there is a link between Interaural Cross Correlation (IACC) and a per-
ceived sensation of diffuseness and auditory source width (ASW)[Ando, 1998].
(IACC is a measure of the cross correlation between the signals at your two
ears.) The closer the IACC approaches 1, the lower the subjective impres-
sion of diffuseness and ASW. The lower the IACC, the more diffuse and
wider the perceived sound field.
One of the nice things about recording is that you can control the IACC
at the listening position by controlling the interchannel correlation coef-
ficient. Although the interchannel correlation coefficient doesn’t directly
correspond to the IACC, they are related. In order to figure out the exact
relationship bewteen these two measurements, you’ll also need to know a
little bit about the room that the speakers are in.
There are a couple of things to remember about interchannel correlation
that we have to remember before we start talking about microphone response
characteristics.
3. When the correlation coefficient is -1, you have two signals that are
identical in every respect except polarity and possibly level.
9. Audio Recording 717
Free field
A free-field situation is one where the waveform is free to expand outwards
forever. This typically only happens in thought experiments, however we
typically assume that the situation exists in anechoic chambers and on the
top of poles outdoors. To extend our assumptions even further, we can
simplify the direct sound from a sound source received at a microphone to be
considered as a free field source. Consequently, the analysis of a microphone
pair in a free field becomes applicable to real life.
Coincident pairs
If we convert Equation 9.13 to something that represents the signal at
the two microphones, then we wind up with Equation 9.16 below.
S1 S2
r{α,φ} = p p (9.16)
S12 S22
where r is the correlation coefficient of the outputs of the two micro-
phones with sensitivities S1 and S2 for the angles of rotation α and elevation
φ.
Luckily, this can be simplified to Equation 9.17.
located at the null point of at least one of the microphones will produce a
correlation coefficient of 0. Sources located in positions where the receiving
lobes have opposite polarities (for example, to the side of a Blumlein pair
of bidirectionals), the correlation coefficient will be -1.
Spaced omnidirectionals
As was discussed above, spaced omnidirectional microphones are used
under the (usually incorrect) assumption that the only difference between
the two microphones in a free field situation will be their time of arrival. As
was shown in Figure 9.69, this time separation is dependent on the spatial
separation of the two microphones and the angles of rotation and elevation
of the source to the pair.
The result of this time of arrival difference caused by the extra propa-
gation distance is a frequency-dependent phase difference between the two
channels. This interchannel phase difference ωτ can be calculated for a given
additional propagation distance using Equation 9.18.
2πD
ωτ = (9.18)
λ
where λ is the acoustic wavelength in air.
This can be further adapted to the issue of microphone separation and
angle of incidence by combining Equations 9.3 and 9.18, resulting in Equa-
tion 9.19.
Diffuse field
Now we have to talk about what a diffuse field is. If we get into the offi-
cial definition of a diffuse field, then we have to have a talk about things
like infinity, plane waves, phase relationships and probability distribution...
maybe some other time... Instead, let’s think about a diffuse field in a
couple of different, equally acceptable ways. One way is to think that you
have sound coming from everywhere simultaneously. Another way is that
you have sound coming from different directions in succession with no time
inbetween their arrival.
If we think of reverberation as a very, very big number of reflections
coming from all directions in fast succession, then we can start to think of
what a diffuse field is like. Typically, we like to think of reveberation as a
diffuse field – this is particularly true for the people that make digital reverb
units because it’s much easier to create random messes that sort of sound
like reverb than it is to calculate everything that happens to sound as it
bounces around a room for a couple of seconds.
We need to pay a lot of attention to the correlation coefficient of the
diffuse component of the recorded signal. This can be used as a rough
guide to the overall sense of “spaciousness” (or whatever word you wish
to use – this area creates a lot of discussion) in your recording. If you
have a correlation coefficient of 1, this will probably mean that you have a
reverberant sound that is completely clumped into one location between the
two loudspeakers. The only possible exception to this is if your signals are
going to the adjacent pair of front and surround loudspeakers (i.e. Left and
Left Surround) where you’ll find it very difficult to obtain a single phantom
9. Audio Recording 720
location.
If your correlation coefficient is -1, then you have what most people call
two “out of phase” signals, but what they really are is identical signals with
opposite polarity.
If your correlation coefficient is 0, then there could be a number of dif-
ferent explanations behind the result. For example, a pair of coincident
bidirectionals with an included angle of 90◦ will have a correlation coeffi-
cient of 0. If we broke the signals hitting the two diaphragms into individual
sounds from an infinite number of sources, then each one would have a corre-
lation coefficient of either 1 or -1, but since there are as many 1’s as -1’s, the
whole diffuse field averages out to a total correlation of 0. Although the two
signals appear to be completely uncorrelated according to the math, there
will be an even distribution of sound between the speakers (because there
are some components in there that have a correlation of 1, remember...)
On the other hand, if we take two omnidirectional microphones and put
them very, very far apart – let’s put them in completely different rooms to
start, then the two signals are completely unrelated, therefore the correla-
tion coefficient will be 0 and you’ll get an image with no phantom sources
at all – just two loudspeakers producing a pocket of sound. The same is
true if you place the omni’s very far apart in the same concert hall (you’ll
sometimes see engineers doing this for their ambience microphones). The
resulting correlation coefficient, as we’ll see below, will also be 0 because the
sound fields at the two locations will sound similar, but they’ll be completely
unrelated. The result is a hall with a very large hole in the middle – because
there are no correlated components in the two signals, there cannot be an
even spread of energy between the loudspeakers.
The moral of the story here is that, in order to keep a “spacious” sound
for your reverb, you have to keep your correlation coefficient close or equal
to 0, but you can’t just rely on that one number to tell you everything.
Spacious isn’t necessarily pretty, or believable...
Coincident pairs
Calculating the correlation of the outputs of a pair of coincident micro-
phones is somewhat less than simple. In fact, at the moment, I have to
confess that I really don’t know the correct equation for doing this. I’ve
searched for this piece of information in all of my books, and I’ve asked
everyone that I think would know the answer, and I haven’t found it yet.
So, I wrote some MATLAB code to model the situation instead of doing
the math the right way. In other words, I did a numberical calculation to
produce the plots in Figures 9.137 and 9.138, but this should give us the
right answer.
9. Audio Recording 721
1
0.8
0.6
Correlation coefficient
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0 20 40 60 80 100 120 140 160 180
Included angle (°)
Figure 9.137: Correlation coefficients in the horizontal plane of a diffuse field for coincident omni-
directionals (top), subcardioids, cardioids, hypercardioids and bidirectionals (bottom) with included
angles from 0◦ through 180◦ .
Spaced omnidirectionals
If we have a pair of omnidirectionals spaced apart in a diffuse field, then
we can intuitively get an idea of what their correlation coefficient will be. At
0 Hz, the pressure at the two locations of the microphones will be the same.
This is because the sound pressure variations in the room are all varying
the day’s barometric pressure which is, for our purposes, 0 Hz. At very low
frequencies, the wavelengths of the sound waves going past the microphones
will be longer than the distance between the mic’s. As a result, the two
signals will be very correlated because the phase difference between the
mic’s is small. As we go higher and higher in frequency, then the correlation
should be less and less, until, at some high frequency, the wavelengths are
much smaller than the microphone separation. This means that the two
signals will be completely unrelated and the correlation coefficient goes to
0.
In fact, the relationship is a little more complicated than that, but not
much. According to Kutruff [Kutruff, 1991], the correlation coefficient of
two spaced locations in a theoretical diffuse field can be calculated using
Equation 9.22.
sin(kd)
r=
(9.22)
kd
where k is the “wave number.” This is a slightly different way of saying
9. Audio Recording 723
0.8
0.6
0.2
-0.2
-0.4
-0.6
-0.8
-1
0 0.2 0.4 0.6 0.8 1
Value of P
Figure 9.138: Correlation coefficients in the horizontal plane of a diffuse field for coincident micro-
phones with an included angle of 180◦ and various values of P (remember that G = 1 - P).
“frequency” as can be seen in Equation 9.23 below (also see Equation 9.20).
2πf
k=
(9.23)
c
Note that k is proportional to frequency and therefore inversely propo-
rational to wavelength.
If we were to calculate the correlation coefficient for a given microphone
separation and all frequencies, the plot would look like Figure 9.139. Note
that changes in the distance between the mic’s will only change the frequency
scale of the plot – the closer the mic’s are to each other, the higher the
frequency range of the area of high correlation.
9.4.5 Conclusions
9. Audio Recording 724
0.8
0.6
0.4
Correlation Coefficient
0.2
−0.2
−0.4
−0.6
−0.8
−1
2 3 4
10 10 10
Frequency (Hz)
Figure 9.139: Correlation coefficient vs. Frequency for a pair of omnidirectional microphones with
a separation of 30 cm in a diffuse field.
9. Audio Recording 725
Virtual Blumlein
We’ll begin with a fairly simple case – two bidirectional microphones, one
facing forward towards the middle of the stage (the “mid” microphone) and
the other facing to the right (the “side” microphone). If we plot these two
individual polar patterns on a cartesian plot the result will look like Figure
9.140.
0.8
0.6
0.4
0.2
Sensitivity
-0.2
-0.4
-0.6
-0.8
-1
Figure 9.140: Two bidirectional microphones with an included angle of 90◦ , with one facing forward
(blue line) and the other facing to the right (red line).
The same can be shown as the more familiar polar plot, displayed in
Figure 9.141.
Now, let’s take those two microphone outputs and, instead of sending
them to the left and right outputs as we normally do with stereo microphone
configurations, we’ll send them both to the right output by panning both
channels to the right. We’ll also drop their two levels by 3 dB while we’re
at it (we’ll see why later...)
9. Audio Recording 726
90
1
120 60
0.8
0.6
150 30
0.4
0.2
180 0
210 330
240 300
270
Figure 9.141: A polar plot showing the same information as is shown in Figure 9.140
M = cos σ (9.27)
S = sin σ (9.28)
9. Audio Recording 727
0.8
0.6
0.4
0.2
Sensitivity
-0.2
-0.4
-0.6
-0.8
-1
Figure 9.142: The resulting sensitivity pattern of the sum of the two bidirectional microphones
shown in Figure 9.140 with levels dropped by 3 dB.
low to use to pick up the sidewall reflections, you can use it for the mid,
and something like a Sennheiser MKH30 for the side bidirectional. Once
they’re matrixed, your resulting virtual pair of microphones (assuming that
you have symmetrical gains on your two outputs) will be perfectly matched.
Cool huh?
Traditional MS
You are not restricted to using two similar polar patterns when you use ma-
trixing in your microphone techniques. For example, most people when they
think of MS, think of a cardioid microphone for the mid and a bidirectional
for the side. These two are shown in Figures 9.143 and 9.144.
0.8
0.6
0.4
0.2
Sensitivity
-0.2
-0.4
-0.6
-0.8
-1
Figure 9.143: Two microphones with an included angle of 90◦ , with one forward-facing cardioid
(blue line) and a side-facing bidirectional (red line).
What happens when we mix the outputs of these two microphones? Well,
in order to maintain a constant on-axis response for the virtual microphone
that results, we know that we’re going to have to attenuate the outputs of
the real microphones before mixing them. So, let’s look at an example of
the cardioid reduced by 6.02 dB and the bidirectional reduced by 3.01 dB
(we’ll see why I chose these particular values later). If we were to express
the sensitivity of the resulting virtual mic as an equation it would look like
Equation 9.29
π
Svirtual = 0.5(0.5 + 0.5 cos(ϑ)) + 0.707(cos(ϑ − )) (9.29)
4
What does all this mean? Svirtual is the sensitivity of the virtual micro-
9. Audio Recording 729
90
1
120 60
0.8
0.6
150 30
0.4
0.2
180 0
210 330
240 300
270
Figure 9.144: Two microphones with an included angle of 90◦ , with one forward-facing cardioid
(blue line) and a side-facing bidirectional (red line).
phoneThe first 0.5 is there because we’re dropping the level of the cardioid by
6 dB, similarly the 0.707 is there to drop the output of the bidirectional by 3
dB. The output of the cardioid should be recognizable as the 0.5+0.5 cos(ϑ).
The bidirectional is the part that says cos(ϑ − π4 ). Note that the π4 is there
because we’ve turned the bidirectional 90◦ . (An easier way to say this is to
use sin(ϑ) instead – it will give you the same results.
If we graph the result of Equation 9.29 it will look like Figure 9.145.
Note that the result is a traditional hypercardioid pattern “aimed” at 71◦ .
There is a common misconception that using a cardioid and a bidirec-
tional as an MS pair will give you a pair of virtual cardioids with a con-
trollable included angle. This is not the case. The polar pattern of the
virtual microphones will change with the included angle. This can be seen
in Figure 9.146 which shows 10 different balances between the cardioid mid
microphone and the bidirectional side.
LINK TO ANIMATION
How do you know what the relative levels of the two microphones should
be? Let’s look at the theory and then the practice.
Theory
If you wanted to maintain a constant on-axis sensitivity for the virtual
microphone as it rotates, then we could choose an arbitrary number n and
9. Audio Recording 730
0.8
0.6
0.4
0.2
Sensitivity 0
-0.2
-0.4
-0.6
-0.8
-1
Figure 9.145: The resulting sensitivity of a forward-facing cardioid with an attenuation of 6.02 dB
and a side-facing bidirectional with an attenuation of 3.01 dB. Note that the result is a traditional
hypercardioid pattern “aimed” at 71◦ .
0.8
0.6
0.4
0.2
Sensitivity
-0.2
-0.4
-0.6
-0.8
-1
Figure 9.146: The result of the sum of a forward-facing cardioid and a side-facing bidirectional for
ten different balances. Notice that the polar patten changes from cardioid to bidirectional as the
angle of rotation changes from 0◦ to 90◦ .
9. Audio Recording 731
n
S = sin( ) (9.31)
2
This M and S are the gains to be applied to the cardioid and bidirectional
feeds, respectively. If you were to graph this relationship it would look like
Figure 9.147.
-5
-10
-15
Gain
-20
-25
-30
-35
-40
0 10 20 30 40 50 60 70 80 90 100
Figure 9.147: The gains applied to the mid cardioid (blue) and the side bidirectional (red) required
to maintain a constant on-axis sensitivity for the virtual microphone as it rotates from 0◦ to 90◦ .
Note that the x-axis of the graph does not directly correspond directly with any angle, however it
is non-linearly related to the direction in which the virtual microphone is aimed.
You’ll notice that I haven’t spent much time on this theoretical side.
This is because it has so little to do with the practical usage of MS mic’ing.
If you wanted to really do this the right way, then I’d suggest that you do
things a slightly different way. Use the math in Section 9.5.1 to create a
virtual bidirectional pointing in the desired direction, then add some omni
to produce the desired polar pattern. This idea will be elaborated in Section
9.5.2.
Practice
Okay, if you go to do an MS recording, I’ll bet money that you don’t
bring your calculator along to do any of the math I just described. Use the
following steps...
4. Pan the two parallel outputs of your bidirectional to hard left and hard
right.
8. While dropping the level of the cardioid, bring up the two bidirec-
tional channels. The image should get wider and farther away. When
you have bidirectional-only, you should hear an “out-of-phase” (i.e.
opposite polarity) signal of mostly sidewall reflections.
Don’t feel obliged to keep your two bidirectional gains equal. Just re-
member that if they’re not, then the polar patterns of your two virtual
microphones are not matched. Also, they will not be aimed symmetrically,
but that might be a good thing... People listening to the recording at home
won’t know whether you used a matched pair of microphones or not.
9.5.2 Ambisonics
To a large portion of the population who have enountered the term, “Am-
bisonics” generally means one of two things :
a recorded soundfield. This idea differs from most stereo and surround
recordings in that the intention is to re-create the acoustic wavefronts which
existed in the space at the time of the recording rather than to synthesize
an interpretation of an acoustic event.
Theory
Go to a room (you may already be in one...) and put a perfect omnidirec-
tional microphone in it. As we discussed in Section ??, an omnidirectional
microphone is also known as a pressure transducer which means that it re-
sponds to the changes in air pressure at the diaphragm of the microphone.
If you make a perfect recording of the output of the perfect omnidirectional
microphone when stuff is happening in the room, you have captured a record
(here, I’m using the word “record” as in a historical record, not as in a record
that you buy at a record shop from a record lady[Lovett, 1994]) of the change
in pressure over time at that location in that room on that day. If, at a later
date, you play back that perfect recording over a perfect loudspeaker in a
perfectly anechoic space, then you will hear a perfect representation (think
“re-presentation”) of that historical record. Interestingly, if you have a per-
fect loudspeaker and you’re in a perfectly anechoic space, then what you
hear from the playback is exactly what the microphone “heard” when you
did the recording.
This is a good idea, however, let’s take it a step farther. Since a pres-
sure transducer has an omnidirectional polar pattern, we don’t have any
information regarding the direction of travel of the sound wavefront. This
information is contained in the velocity of the pressure wave (which is why a
single directional microphone of any sort must have a velocity component).
So, let’s put up a perfect velocity microphone in the same place as our per-
fect pressure microphone. As we saw in Section ?? a velocity microphone
(if we’re talking about directional characteristics and not transducer design)
is a bidirectional microphone. Great, so we put a bidirectional mic facing
forward so we can tell if the wave is coming from the front or the rear. If
the outputs of the omni and the bidirectional have the same polarity, then
the sound source is in the front. If they’re opposite polarity, then the sound
source is in the rear. Also, we can see from the relative levels of the two mic
outputs what the angle to the sound source is, because we know the relative
sensitivities of the two microphones. For example, if the level is 3 dB lower
in the bidirectional than the omni and both have the same polarity, then the
sound source must be 45◦ away from directly forward. The problem is that
we don’t know if it’s to the left or the right. This problem is easily solved by
9. Audio Recording 734
• The two side speakers produce equal positive pressures that are one-
third the outputs of the front (because there’s nothing in the X channel
and they don’t play the Y channel.
Voltage
Time
+ +
Voltage
- + Time
Voltage
Time
Voltage
Time
+ +
Voltage
- + Time
-
Voltage
Time
Voltage
Time
+ +
Voltage
- + Time
-
Voltage
Time
Figure 9.148: Top views of a two-dimensional version of the system described in the text. These
are three examples showing the relationship between the outputs of the omnidirectional and two of
the bidirectional microphones for sound sources in various locations producing a positive impulse.
9. Audio Recording 736
Front
W+2Y
W-2X W+2X
W-2Y
Figure 9.149: A simple configuration for playing back the information captured by the three micro-
phones in Figure 9.148.
the negative Y channels cancel each other a little when they’re mixed
together at the speaker, but the negative signal is louder.
The loudspeakers produce a signal at exactly the same time, and the
different waves will propagate towards the sweet spot at the same speed. At
the sweet spot, the waves all add together (think of adding vectors together)
to produce a resulting pressure wave that has a velocity that is moving
towards the rear loudspeaker (because the two side speakers push equally
against each other, so there’s no sideways velocity, and because the front
speaker is pushing towards the rear one which is pulling the wave towards
itself).
If we used perfect microphones and a perfect recording system and per-
fect loudspeakers, the result, at the sweet spot in the listening room, is that
the sound wave has exactly the same pressure and velocity components as
the original wave that existed at the microphones’ position a the time of the
recording.
Consequently, we say that we have re-created the soundfield in the
recording space. If we pretend that the sound wave has only two com-
ponents, the pressure and the velocity, then our perfect system perfectly
duplicates reality.
9. Audio Recording 737
W = PΨ (9.32)
X = PΨ cos Ψ (9.33)
9. Audio Recording 738
Y = PΨ sin Ψ (9.34)
W + 2X cos ϕn + 2Y sin ϕn
Pn = (9.35)
N
Where
Pn is the amplitude of the nth loudspeaker, ϕn is the angle of the nth
loudspeaker in the listening room, and N is the number of loudspeakers.
The decoding algorithm used here is one suggested by Vanderkooy and
Lipshitz which differs from Gerzons original equations in √
that it uses a gain
of 2 on the X and Y channels rather than the standard 2. This is due to
the fact that this method omits the 1 gain from √12 the W channel in the
encoding process for simpler analysis [Bamford and Vanderkooy, 1995].
B = 2m + 1 (9.36)
Where B is the minimum number of loudspeakers required to accurately
produce the panphonic ambisonics signal and m is the order of the system.
(So far, we have only discussed 1st-order Ambisonics in this book.)
First-order periphonic
NOT YET WRITTEN
B =? (9.37)
Practical Implementation
Practically speaking, it is difficult to put four microphones (the omnidirec-
tional and the three bidirectionals) in a single location in the recording space.
If you’re doing a panphonic recording, you can make a vertical array with
the omni in between the two bidirectionals and come pretty close. There
is also the small problem of the fact that the frequency responses of the
bidirectionals and the omni won’t match perfectly. This will make the con-
tributions of the pressure and the velocity components frequency-dependent
when you mix them to send to the loudspeakers.
9. Audio Recording 739
sound too bright and thin. In order to make the Ambisonics output sound
warm and fuzzy (and therefore good) you have to boost the low-frequency
components in your velocity channels. Technically, this is incorrect, how-
ever, it sounds better, so people do it.
Higher orders
Ambisonics is a systems that works on “orders” – the higher the order of
the system, the more accurate the reproduction of the sound field.
• If we just use the W -channel (the omnidirectional component) then
we just get the pressure information and we consider it to be a 0th-
(zeroth) order system. This gives us the change in pressure over time
and nothing else.
• If we add the X -, Y - and Z -channels, we get the velocity information
as well. As a result we can tell not only the change in pressure of the
sound source, but also its direction relative to the microphone array.
This gives us a 1st-order system.
• A 2nd-order Ambisonics system adds information about the curvature
of the sound wave. This information is captured by a microphone that
doesn’t exist yet. It has a strange four-leaved clover shaped pattern
with four lobes.
Second-order periphonic
U = PΨ cos 2Ψ (9.38)
V = PΨ sin 2Ψ (9.39)
W = PΨ (9.40)
X = PΨ cos Ψ (9.41)
Y = PΨ sin Ψ (9.42)
B =? (9.44)
9. Audio Recording 741
0.8
0.6
0.4
Normalized Gain
0.2
-0.2
-0.4
-0.6
-0.8
-1
Figure 9.150: The sensitivity function combining the sensitivities of the B-format channels with the
mix for the loudspeaker.
time domain. The analysis you’re about to read uses the HRTF database
measured at MIT using a KEMAR dummy head. This is a public database
available for download via the Internet[Gardner and Martin, 1995].
We’ll begin by looking at the HRTF’s of two sound sources, one directly
in front of the listener and one directly to the right. The impulse responses
for the resulting HRTF’s for these two locations are shown in Figures 9.151
and 9.152 respectively.
There are two things to notice about the two impulse responses shown
in Figure 9.151 for a frontal sound source. Firstly, the times of arrival of
the impulses at the two ears are identical. Secondly, the impulse responses
themselves are identical throughout the entire length of the measurement.
0.5
-0.5
50 100 150 200 250 300 350 400 450 500
0.5
-0.5
50 100 150 200 250 300 350 400 450 500
Figure 9.151: The impulse responses measured at the two ears of a KEMAR dummy head for a
sound source directly in front[Gardner and Martin, 1995]. The top plot is the left ear and the
bottom plot is the right ear. The x-axes are time, measured in samples.
Let’s consider the same two aspects for Figure 9.152 which shows the
HRTF’s for a sound source on the side of a listener. Notice in this case that
the times of arrival of the impulses at the two ears different. Since the sound
source is on the right side of the listener, the impulse arrives at the right ear
before the left. This makes sense since the right ear is closer to sound sources
on the right side of your head. Now take a look at the impulse response over
time. The first big spike in the right ear goes positive. Similarly, the first big
spike in the left ear also goes positive. This should not come as a surprise,
since your eardrums are not bidirectional transducers. These interaural time
differences (ITD’s) are very significant components that our brains use in
determining where a sound source is.
Let’s now consider a source directly in front of a soundfield microphone,
9. Audio Recording 743
0.5
-0.5
50 100 150 200 250 300 350 400 450 500
0.5
-0.5
50 100 150 200 250 300 350 400 450 500
Figure 9.152: The impulse responses measured at the two ears of a KEMAR dummy head for a
sound source directly in to the right[Gardner and Martin, 1995]. The top plot is the left ear and
the bottom plot is the right ear. The x-axes are time, measured in samples.
Front
Figure 9.153: The 8-channel Ambisonics loudspeaker configuration used in this analysis.
0.4
0.2
-0.2
-0.4
50 100 150 200 250 300 350 400 450 500
0.4
0.2
-0.2
-0.4
50 100 150 200 250 300 350 400 450 500
Secondly, notice the differences in the impulse responses at the two ears.
The initial spike in the right ear is positive whereas the first spike in the
left ear is negative. This is caused by the fact that loudspeakers that are
opposite each other in the listening space in a 1st-order Ambisonics system
are opposite in polarity. This can be seen in the sensitivity function shown
in Figure 9.150. The result of this opposite polarity is that sound sources on
the sides sound similar to a stereo signal normally described as being “out of
phase” where the two channels are opposite in polarity [Geoff Martin, 1999].
0.4
0.2
-0.2
-0.4
50 100 150 200 250 300 350 400 450 500
0.4
0.2
-0.2
-0.4
50 100 150 200 250 300 350 400 450 500
It surrounds...
The title of this is pretty obvious... One of the best reasons to use surround
sound is that you can have sound that surrounds. I don’t know if I have to
say anything else. It is, of course, extremely difficult to have a completely
enveloping soundfield for a listener using 2-channel stereo. It’s not easy to
do this in 5-channel surround, but it’s easier than it is with stereo.
Playback configuration
Go to an orchestra concert and sit in a front row. Try to listen to the oboe
when everyone else is playing and you’ll probably find that you’re able to
do this pretty easily. If you could wind back time, you would find that
you could have listened to the clarinet instead. (If you don’t like orchestral
music, go to a bar and eavesdrop on people’s conversations – you can do
this too. If you get caught, tell them it’s research.) You’re able to do this
because you are able to track both the timbre and the location of a sound
source. (Check out a phenomenon called the cocktail party effect for more
information on this.)
If you record in mono, you have to be very careful about your mix. You
have to balance all the components very precisely to ensure that people can
hear what is needed to be heard at that particular moment. This is because
people are unable to use spatial cues to determine where instruments are
and therefore devote more concentration to them.
If we graduate to 2-channel stereo, life gets a little easier. By panning
sources across the stereo sound stage between the two speakers, people are
able to concentrate on one instrument and effectively attenuate others within
their brains.
The more spatial cues you give to the listener, the better able they are
to effectively zero in on whatever component of the recording they like. This
doesn’t necessarily mean that their perception can’t be manipulated but it
also means that you don’t have to do as much manipulation in your mixes.
Mastering engineers such as Bob Ludwig also report that they are finding
that less compression and sweetening is required in surround sound media
for the same reasons.
Back in the old days, we used level almost exclusively to manipulate
people’s attention in a mix. Nowadays, with surround, you can use level,
but also spatial cues to draw attention towards or away from components.
You may notice that I used this same topic as one of the advantages for
recording in surround. I put it here to make you aware that you shouldn’t
just go putting things in the centre channel all willy-nilly. Use the centre
channel with caution. Always remember that it’s mush easier to localize
a real source (like a loudspeaker) in a real room than it is to localize a
phantom source. This means that if you want to start playing with people’s
perception of distance to the sound source, you might want to carefully
consider the centre channel.
Another possible problem that the centre channel can create is in timbre.
Take a single channel of pink noise and send it to your Left and Right
speakers. If all goes well, you’ll get a phantom centre. Now, move around
the sweet spot a bit and listen to the timbre of the noise. You should
hear some comb filtering, but it really shouldn’t be too disturbing. (Now
that you’ve heard that, listen for it on the lead vocals of every CD you
own... Sorry to ruin your life...) Repeat this experiment, but this time,
send the pink noise to the Centre and Right channels simultaneously. Now
you should get a phantom image somewhere around 15◦ off-centre, but if
you move around the sweet spot you’ll hear much more serious problems
with your comb filtering. This is a bigger problem than in stereo because
your head isn’t getting in the way of the signal from the Centre speaker. In
the case of Left interfering with Right, you have a reasonably high degree of
attenuation in the crosstalk (Left speaker getting to the right ear and vice
versa.) In the case of the Centre channel, this attenuation is reduced, so you
get a big interference between the Centre and Right channels in your right
ear. The left ear isn’t a problem because the Right channel is attenuated.
So, the moral of this story is to be careful with sources that are panned
between speakers – decide whether you want the comb filtering. It’s okay
to have it as long as it’s intentional [Martin, 2002a][Martin, 2002b].
9. Audio Recording 750
Localization
Don’t expect perfect localization for all sources, 360◦ around the listener. A
single, focused phantom image on the side is probably impossible to achieve.
Phantom images behind the listener appear to get very close, sometimes
causing in-head localization. (Remember that the surround channels are
basically a giant pair of headphones.) Rear images are highly unstable and
dependent on the listeners movements due to the wide separation of the
loudspeakers.
Note that if you want to search for the holy grail of stable, precise and
accurate side images, you’ll probably have to start worrying about the spatial
distribution and timing of your early reflections [?].
Soundfield continuity
If you watch a movie mixed by a bad re-recording engineer (the film world’s
equivalent of a mixing engineer in the music world), you’ll notice a couple
of obvious things. All of the dialog and foley (all of the little extra sound
effects like zipping zippers, stepping foot steps and shutting doors) comes
from the Centre speaker, the music comes from the Left and Right speakers,
and the Surround speakers are used for the occasional special effect like rain
sounds or crowd noises. Essentially, you’re presented with three completely
unrelated soundfields. You can barely get away with this independence of
signal in a movie because people are busy using their eyes watching beautiful
people in car chases. In music-only recordings, however, we don’t have the
luxury of this distraction, unfortunately.
Listen to a poorly-recorded or mixed surround recording and you’ll notice
a couple of obvious, but common, mistakes. There is no connection between
the front and surround speakers – instruments in the front, reverb in the
surround is a common presentation that comes from the film world. Don’t
get me wrong here, I’m not saying that you shouldn’t have a presentation
where the instruments are in the front only – if that’s what you want, that’s
up to you. What I’m saying is, if you’re going for that spatial representation,
it’s a bad idea to use your surrounds as the only reverb in the mix. They’ll
sound completely disconnected with your instruments. You can correct this
by making some of the signals in the front and the rear the same – either
9. Audio Recording 751
send instruments to the rear or send reverb to the front. What you do is
up to you, but please be careful to not have a large wall between your front
and your rear. This is the surround equivalent of some of the early days of
stereo where the lead vocals were on the left and the guitar on the right.
(Not that I don’t like the Beatles, but their early stereo recordings weren’t
exactly sophisticated, spatially speaking...)
them at 120◦ .
Be careful not to think of the surround loudspeakers as rear loudspeakers.
They’re really out to the sides, and a little to the rear, they’re not directly
behind you. This becomes really obvious if you try to create a phantom
centre rear image using only the surround channels. You’ll find in this case
that the apparent distance to the sound source is quite close to the back of
your head, but this is not surprising if you draw a straight line between your
two surround loudspeakers... In theory, of course, the apparent distance to
the source should remain constant as it pans from LS to RS, but this is not
the case, probably because the speakers are so far apart,
L/LS c L b b R c R/RS
d d
LS RS
Figure 9.156: Fukada Tree: a = b = c = 1 - 1.5 m, d = 0 - 2 m, L/R angle = 110◦ -130◦ , LS/RS
angle = 60◦ -90◦ .
This is a very useful technique, particularly in larger halls with big en-
sembles. The large separation of the front three cardioids prevents any
detrimental comb filtering effects in the listening room on the direct sound
of the ensemble (this problem is discussed above). One interesting thing to
try with this configuration is to just listen to the five cardioid outputs with
9. Audio Recording 753
a large distance to the rear microphones. You will notice that, due to the
large separation between the front and rear signals in the recording space,
the perceived sound field in the listening room has two separate areas – that
is to say that the frontal sound stage appears to be separate from the sur-
round with nothing connecting them. This is caused by the low correlation
between the front and rear signals. Fukada cures this problem by sending
the outputs of the omnis to front and surround. The result is a very spacious
soundfield, but with reasonably reliable imaging characteristics. You may
notice some comb filtering effects caused by having identical signals in the
L/LS and R/RS pairs, but you will have to listen carefully for them...
Notice that the distance between the front array and rear pair of micro-
phones can be as little as 0 m, therefore, there may be situations where all
microphones but the centre are placed on a single boom.
L b b R
LS d d RS
L b b R
L R
LS RS
Figure 9.158: OCT Front + IRT Cross: a = 8 cm, b = 40 - 90 cm, c ≈ 100 cm, cross side = 20 -
25 cm
9. Audio Recording 755
L b b R
L R
+ +
+ +
LS RS
Figure 9.159: OCT Front + Hamasaki Square: a = 8 cm, b = 40 - 90 cm, c ≈ 100 cm, cross side
=2-3m
a
L b b R
LS RS
L b b R
Upwards-facing
LS RS
d
Film
The film industry uses a standard frame rate of 24 fps (Frames per Second).
This is just about the slowest frame rate you can get away with and still have
what appears to be smooth motion. (though some internet video–streaming
people might argue that point in their advertising brochures...) The only
exception to this rule is the IMAX format which runs at twice this rate –
48 fps.
In North America (or at least most of it...) the AC power that runs our
curling irons and golf ball washers (and televisions...) has a fundamental
frequency of 60 Hz. As a result, the people that invented black and white
television, thinking that it would be smart to have a frame rate that was
compatible with this frequency, set the rate to 30 fps.
I should mention a little bit about the way televisions work. They’re
slightly different from films in that a film shows you 24 actual pictures on
the screen each second. A television has a single gun that creates a line of
varying intensity on the screen. This gun traces a bunch of lines, one on
top of each other on the screen, that, when seen together, look like a single
picture. There are 525 lines in each frame – but each frame is divided into
two “fields” – the TV shows you the odd–numbered lines, and then goes
back to do the even–numbered ones. This system (of alternating between
the odd and even number lines using two fields) is called interlacing. The
important thing to remember is that since we have 30 fps and 2 fields per
frame, this means that there are 60 fields per second.
24 fps
As you’ll probably guess, this time code is used in the film industry. The
system counts 24 frames in each second, as follows:
00:00:00:00
00:00:00:01
00:00:00:02
.
.
.
00:00:00:22
9. Audio Recording 761
00:00:00:23
00:00:01:00
00:00:01:01
and so on. Notice that the first frame is labelled “00” therefore we count
up to 23 frames and then skip to second 01, frame 00. As was previously
mentioned, there are a maximum of 24 hours in a day, therefore we roll back
around to 0 as follows:
23:59:59:21
23:59:59:22
23:59:59:23
00:00:00:00
00:00:00:01
and so on.
Each of these addresses corresponds to a frame in the film, so while the
film is rolling, out time code reader will display a new address every 24th of
a second.
In theory, if our film started exactly at midnight, and it ran for 24 hours,
then the time code would display the time of day, all day.
25 fps
This time code is used with PAL and SECAM television formats. The
system counts 25 frames in each second, as follows:
00:00:00:00
00:00:00:01
00:00:00:02
.
.
.
00:00:00:23
00:00:00:24
00:00:01:00
00:00:01:01
and so on. Again, the first frame is labelled “00” but we count to 24
frames before skipping to second 01, frame 00. The roll around to 0 after
24 hours happens the same as the 24 fps counterpart, with the obvious
exception that we get to 23:59:59:24 before rolling back to 00:00:00:00.
Again, each of these addresses corresponds to a frame in the video, so
while the program is playing, out time code reader will display a new address
every 25th of a second.
9. Audio Recording 762
And again, the time code exactly corresponds to the time of day.
30 fps “non–drop”
This time code is designed to be used in black and white NTSC television
formats, however it’s rarely used these days except in the occasional com-
mercial. The system counts 30 frames in each second corresponding with
the frame rate of the format, as follows:
00:00:00:00
00:00:00:01
00:00:00:02
.
.
.
00:00:00:28
00:00:00:29
00:00:01:00
00:00:01:01
and so on.
The question you’re probably asking is ‘why is it called “non–drop?” ‘
but that question is probably best answered by explaining one more format
called ‘30 fps “drop–frame” ’
the omissions evenly throughout the day (that way, television programs of
less than 24 hours would come close to making sense...) So, this means that
we have to omit 108 frames every hour. The way they decided to do this
was to omit 2 frames every minute on the minute. The problem with this
idea is that omitting 2 frames a minute means losing too many frames – so
they had to add a couple back in. This is accomplished by NOT omitting
the two frames if the minute is at 0, 10, 20, 30, 40 or 50.
So, now the system counts like this:
00:00:00:00
00:00:00:01
00:00:00:02
.
.
.
00:00:58:29
00:00:59:00
00:00:59:01
.
.
.
00:00:59:28
00:00:59:29
00:01:00:02 (Notice that we’ve skipped two frame numbers)
00:01:00:03
and so on... but...
00:09:59:00
00:09:59:01
00:09:59:02
.
.
.
00:09:59:28
00:09:59:29
00:10:00:00 (Notice that we did not skip two frame numbers)
00:10:00:01
It’s important to keep in mind that we’re not actually leaving out frames
in the video – we’re just skipping numbers... like when you were counting
to 100 while playing hide and seek as a kid... 21, 22, 25, 26, 30... You didn’t
count any faster, but you started looking for your opponents sooner.
9. Audio Recording 764
Figure 9.162: The accumulated error in drop frame time code. At time 0, the time code reads
00:00:00:00 and is therefore correct, so there is no error. As time increase up to the 1–minute
mark, the time code is increasingly incorrect, displaying a time that is increasingly later than the
actual time. At the 1–minute mark, two frames are dropped from the count, making the display
slightly earlier than the actual time. This trend is repeated until the 9th minute, where the error
becomes increasingly less early until the display shows the correct time at the 10–minute mark.
Figure 9.163: This shows exactly the same information as the plot in Figure 1, showing the error
in seconds rather than frames.
9. Audio Recording 765
Since we’re dropping out the numbers for the occasional frame, we call
this system “drop–frame,” hence the designation “non–drop” when we don’t
leave things out.
Look back to the explanation of 30 fps “non–drop” and you’ll see that
we said that pretty well the only place it’s used nowadays is for television
commercials. This is because the first frame to get dropped in drop–frame
happens 1 minute after tha time code starts running. Most TV commercials
don’t go for longer than 30 seconds, so for that amount of time, drop–frame
and non–drop are identical. There is an error accumlated in the time code
relative to the time of day, but the error wouldn’t be fixed until the clock
read 1 minute anyway... we’ll never get there on a 30–second commercial.
moved the tape. This now meant that the head was going past the tape
quickly, making diagonal streaks across it while the tape just creeped along
at a very slow speed indeed.
(Sidebar: the guy in this story was named Alexander M. Poniatoff who
was given some cash to figure all this out by another guy named Bing Crosby
back in the 40’s... see Bing wanted to tape his shows and then sit at home
relaxing with the family while he watched himself on TV... Now, look at
Alexander’s initials and tack on an “excellence” and you get AMPEX.)
Back to videotape design... The system with the rotating heads is still
used today in your VCR (except that we call it “helical scanning” to sound
fancy) and this head on the drum is used to record and play both the video
information as well as the “hi–fi” audio (if you own a hi–fi VHS machine).
The tape is moving just fast enough to put another head in there which isn’t
on the drum – it records the mono audio on your VCR at home, but it could
be used for other low–bandwidth signals. It can’t handle high–bandwidth
material because the tape is moving so slowly relative to the head that
physics just doesn’t allow it.
The important thing to remember from all this is that there are two ways
of putting the signal (the time code, in our case) on the tape – longitudinally
with the stationary head (because it runs parallel with the tape direction)
and vertically with the rotating head (well, it’s actually not completely
vertical, but it’s getting close, depending on how much the drum is tilted).
Keep in mind that what we’re discussing here now is how the numbers
for the hours, minutes, seconds and frames actually get stored on a piece of
tape or transmitted across a wire.
the appropriate voltage at the appropriate time (if I send a 0 V signal – that
means 0, but if it’s 1 V, that means 1...) This is a nice system until someone
uses a transmission cable that inverts the polarity of the system, then the
voltages become 0 V and –1 V – which could be confused for a high and
low, then the whole system is screwed up. In order to make the transmission
(and recording) system a little more idiot–proof, we use a different system.
We’ll keep a high and low voltage, but alternate between them at a pre–
determined rate (think square wave). Now the rule is, every transition of
the square wave is a new “bit” or number – either a 1 or a 0. Each bit is
divided into two “cells” – if there is a voltage transition between cells, then
the value of the bit is a 1, if there is no transisiton between cells, then the
bit is a 0 (see the following diagram).
Figure 9.164: A bi–phase mark used to designate a 0 from a 1 by the division of each bit into two
cells. If the cells are the same, then the bit is a 0, if they are different, then the bit is a 1.
This allows us to invert the polarity and make the voltage of the signal
independent of the value of the bit, esentially, making the signal more robust.
There is no DC content in the signal (so we don’t have to worry about the
signal going through DC–blocking capacitors) and it’s self–clocking (that is
to say, if we build a smart receiving device, it can figure out how fast the
signal is coming at it). In addition, if we make the receiver a little tolerant,
we can change the rate of the signal (the tape speed, when shuttling, for
example) and still derive a time code address from it.
Each time code address required one word to define all of its information.
This word is comprised of 80 bits (which, at 30 fps means 2400 bits per
second). All 80 bits are not required for telling the machines what frame
we’re on – in fact only 26 bits are used for this. The rest of the word is
divided up as follows:
There are a number of texts which discuss exactly how these are laid out
in the signal – we won’t get into that. But we should take a quick glance at
what the additional parts of the TC word are used for.
9. Audio Recording 768
The time address information uses 4 bits to encode each of the decimal
numbers in the time code address. That is to say, for example, to encode
the designation of “12 frames,” the numbers 1 (0001) and 2 (0010) are used
sequentially instead of encoding the number 12 as a binary number. This
means, in theory, that we require 32 bits to store or transmit the time code
address, four bits each for 8 digits (HH:MM:SS:FF). This is not really the
case, however, since we don’t count all the way up to 9 with all of the
digits. In fact, we only require 2 bits each for the tens of hours (because it
never goes past “2”) and tens of frames, and 3 bits for each of the tens of
minutes and tens of seconds. This frees up 6 bits which are used for status
information, meaning 26 bits are used for the time address information.
There are 32 bits in the time code word reserved for storing what’s called
“user information.” These 32 bits are divided into eight 4–bit words which
are generally devoted to recording or transmitting things like reel numbers,
or the date on which the recording was made – things which don’t change
while the time code rolls past. There are two options that are not used as
frequently which are:
– encoding ASCII characters to send secret messages (like song lyrics?
credits?... be creative...) This would require 8–bit bytes instead of 4–bit
words, but we’ll come back to that in the Status Information
– since we have 32 bits to work with, we could lay down a second time
code address... though I don’t know what for offhand.
9. Audio Recording 769
Unassigned
This is a bit that has no determined designation.
9. Audio Recording 770
Other Information
There’s a couple of little things to know about LTC what might come in
useful some day.
Firstly, a time code reader should be able to read time code at speeds
between 1/40th of the normal rate and 40 times the normal rate. This would
put the maximum bandwidth up to 96000 bits per second (2400 * 40)
Secondly, remember that LTC is recorded using the stationary head on
the video recorder. This means a couple of things:
1) as you slow down, you get more inaccurate. If you’re stopped (or
paused) you don’t read a signal – therefore no time code.
2) as you slow down, the signal gets lower in amplitude (because the
voltage produced by the read head is proportional to the change in mag-
netism across its gap). Therefore, the slower the tape, the more difficult to
read.
Lastly, if you’re using LTC to lock two machines together, keep in mind
that the “slave” machine is not locked to every single frame to the closest
80th of a frame to the “master.” In fact, more likely than not, the slave is
running on its own internal clock (or some external “house sync” signal) and
checking the incoming time code every once and a while to make sure that
things are on schedule. (this explains the “forward/backward” info stored in
the Sync Word) This can explain why, when you hit STOP on your master
machine, it might take a couple of moments for the slave device to realize
that it’s time to hit the brakes...
In many respects, these bits are the same as their LTC counterparts, but
we’ll go through them again anyway.
A lot of what I’ve presented in this book relates to the theory of recording
and very little to do with aesthetics. Please don’t misinterpret this balance
of information – I’m not one of those people who believes that a recording
should be done using math instead of ears.
One of my favourite movies is My Dinner with André. If you haven’t seen
this film, you should. It takes about two hours to watch, and in those two
hours, you see two people having dinner and a very interesting conversation.
One of my other favourite movies is Cinema Paradiso in which, in two hours,
approximately 40 or 50 years pass. Let’s look at the difference between these
two concepts.
In the first, the idea is to present time in real time – watching the movie
is almost the same as sitting at the table with the two people having the
conversation. If there’s a break in the conversation because the waiter brings
the dessert, then you have to wait for the conversation to continue.
In the second, time is compressed. In order to tell the story, we need to
see a long stretch of time packed into two hours. Waiting 10 seconds for a
waiter to bring dessert would not be possible in this movie because there
isn’t time to wait.
The first movie is a lot like real life. The second movie is not, although
it shows the story of real lives. Most movies are like the latter – essentially,
the story is better than real life, because it leaves out the boring bits.
My life is full of normal-looking people doing mundane things, there’s
very little conflict, and therefore little need for resolution, not many car
chases, gun fights or martial arts. There are no robots from the future and
none of the people I know are trying to overthrow governments or large
corporations (assuming that these are different things... ). This is real life.
779
10. Conclusions and Opinions 780
All of the things that are in real life are the things that aren’t in movies.
Most people don’t go to movies to see real life – they go for the enter-
tainment value. Robots from the future that look like beautiful people, who,
using an their talents in car chases, gun fights and martial arts, are trying
to overthrow governments and large corporations.
Basically speaking, movies are better than life.
Recordings are the same. Listen to any commercial recording of an
orchestra and you’ll notice that there is lots of direct sound from the violins
but they also have lots of reverb on them. You’re simultaneously next to and
far from the instruments in the best hall in the world. This doesn’t happen
in real life. Britney Spears (or whoever is popular when you’re reading this)
can’t really sing like that without hundreds of takes and a lot of processing
gear. Commercial recordings are better than life. Or at least, that’s usually
the goal.
So, the moral of the story is that your goal on every recording you do
is to make it sound as good as possible. Of course, I can’t define “good”
– that’s up to your expectations (which is in turn dependent on your taste
and style) and your desired audience’s expectations.
So, to create that recording, whether it’s classical, jazz, pop or sound
effects in mono, stereo or multichannel, you do whatever you can to make
it sound better.
In order to do this, however, you need to know what direction to head
in. How do you move the microphone to make the sound wider, narrower,
drier. What will adding reverb, chorus, flanging or delay do to the signal?
You need to know in advance what direction you want to head in based on
aesthetics, and you need the theory to know how to achieve that goal.
Reference Information
781
11. Reference Information 782
1 th 1 th 1 th 1 th 2 th
12 Oct 6 Oct 3 Oct 2 Oct 3 Oct 1 Oct
1 1 1 1 1 1
1.06
1.12 1.12
1.18
1.25 1.25 1.25
1.32
1.4 1.4 1.4
1.5
1.6 1.6 1.6 1.6
1.7
1.8 1.8
1.9
2 2 2 2 2
2.12
2.24 2.24
2.36
2.5 2.5 2.5 2.5
2.65
2.8 2.8 2.8
3
3.15 3.15 3.15
3.35
3.55 3.55
3.75
4 4 4 4 4 4
4.25
4.5 4.5
4.75
5 5 5
5.3
5.6 5.6 5.6
6
6.3 6.3 6.3 6.3
6.7
7.1 7.1
7.5
8 8 8 8 8
8.5
9 9
9.5
Table 11.1: ISO Frequency Centres
11. Reference Information 783
11.2.2 Prefixes
Similarly, prefixes make our lives much easier by acting as built-in multipliers
in our spoken numbers. It’s too difficult to talk about very short distances
in meters – so we use millimeters. A large resistance is more easily expressed
in kΩ than Ω.
So, for example, if you see something written in kilometers (abbreviated
km), then you know that you must multiply the number you see by 103 to
convert to metres. 5 km = 5 * 103 m = 5,000 m. If the unit is milligrams
(mg), then you multiply by 10−3 to convert to grams (g). 15 mg = 15 *
10−3 g = 0.015 g.
If you’re dealing with computer memory, however, you use the other
column. A computer with 20 Gigabytes (GB) of hard disk space has 20 *
230 bytes = 21,474,836,480 bytes.
Note that, if the prefix denotes a multiplier that is less than 1, it starts
with a small letter (i.e. milli- or m). If the multiplier is greater than one,
then a capital letter is used (i.e. mega- and M). The odd exception to this
case is that of kilo- or k. I don’t know why this is different, I guess somebody
has to be...
11. Reference Information 784
lie
nc
gi
it
ig
di
tip
ra
td
ul
le
2n
1s
to
Figure 11.2: A typical moulded composition resistor showing the meanings of the coloured bands.
11. Reference Information 786
The colour bands on the resistor in Figure 11.2 are red, green, violet and
silver. Table 11.4 shows how this combination is translated into a value.
This list is just a bunch of things to keep in mind when you’re doing a
recording. It is by no means a complete list, just a collection of things that
I think about when I’m doing a recording. Also note that some of the items
in the list should be taken with a grain of salt...
• When you’re not the only person around the gear, make sure that’s
it’s damned near impossible to trip in any cables. Tape everything
down. On remote recording gigs, it’s always a good idea to run cables
over door frames rather than across the threshold...
• On a remote recording gig, make friends with the caretaker, the clean-
ing staff, the secretary, the stagehands... anyone that is in a position
to help you out of a jam. It’s always a good idea to bring along a cou-
ple of CD’s to give away to people as presents to make friends quicker.
I admit that this is manipulative, but it works, and it pays off.
787
12. Hints and tips 788
• Put your gain as early in the signal path as is possible. (See Section
9.1.6)
• When you’re recording to a digital medium, try to get the peak level
as close as you can to 0 dBFS without going over. (See Section ??)
• No one that buys a CD cares how tired you were at the end of the
session – they expect $20 worth of perfection. In other words, fix
everything.
• If you’re recording a group with a drum kit, consider your drum over-
heads as a wide stereo pair. Using them, listen to the pan location of
all other instruments in the group (including each individual drum).
Do not try to override that location by panning the instrument’s own
microphone to a different location. If you want a different left-right
arrangement than you’re getting in the overheads, move the instru-
ments or the microphones. (If you want to get really detailed about
this, you have to consider every pair of microphones as a stereo pair
with the resulting imaging issues.)
• If you’re doing a remote recording, get used to your control room. Set
up your loudspeakers first, and play CD’s that you know well while
you’re setting up everything else.
• Don’t piss off your musicians in your efforts to find the perfect sound.
Better to get a perfect performance from happy performers than a
perfect recording of a bad performance. Of course, if you can get a
12. Hints and tips 789
• Don’t get too excited about new gear. Just because you just bought
a fancy new reverb unit doesn’t mean that you have to put tons of
fancy new reverb on every track on the CD you’re recording.
• Don’t believe everything that you read or hear. (Particularly gear re-
views in magazines. Ever notice how an advertisement for that same
piece of gear is very near the review? Suspicions abound...) Some-
times, the cheapest device works the best. (For example, in a very
carefully calibrated very fair blind comparison between various mic
pre-amp’s, a friend of mine in Boston, along with a number of pro-
fessional recording engineers and mic pre-manufacturers, found that
the second best one they heard was a $200 box, sounding better than
other fancy tube devices for thousands of dollars. They threw in the
cheap pre as a joke, and it wound up surprising everyone.)
• Record your room as well as your instrument. Never forget that you’re
very rarely recording in an anechoic environment.
• If you’re using spot microphone, use stereo pairs instead of mono spots
whenever possible. The reason for this is directly related to the previ-
ous point. If you put a single mic on a guitar amp and pan the output
12. Hints and tips 790
to the desired location, then you have the sound of the amp as well
as the sound of the room, all clumped into one mono location in your
mix. If you close-mic’ed with a stereo pair instead, your panning can
be the same (with the right orientation of the microphones and the
amp) but the room is spread out over the full image.
• Always trust your ears, but ask people for their opinions to see if they
might be hearing something that you’re not. There are times when the
most inexperienced listener can identify problems that a professional
(that’s you...) miss for one reason or another.
• Wherever possible, keep your audio signal cables away from all other
wires, particularly AC mains cables. If you have to cross wires, do so
a right angles. (See Section ??)
Bibliography
[dol, 1998] (1998). Some guidelines for producing music in 5.1 channel sur-
round. Technical report, Dolby Laboratories.
[Ando, 1998] Ando, Y. (1998). Architectural acoustics: blending sound
sources, sound fields, and listeners. AIP series in modern acoustics and
signal processing. AIP Press: Springer, New York.
[Bamford and Vanderkooy, 1995] Bamford, J. S. and Vanderkooy, J. (1995).
Ambisonic sound for us. In 99th Convention of the Audio Engineering
Society, volume 4138, New York. Audio Engineering Society.
[Dempster, 1995] Dempster, S. (1995). Underground overlays from the cis-
tern chapel.
[Dolby, 2000] Dolby (2000). 5.1-Channel Production Guidelines. Dolby Lab-
oratories Inc., San Francisco, 1st edition.
[Fahy, 1989] Fahy, F. J. (1989). Sound intensity. Elsevier Applied Science,
London; New York.
[Gardner and Martin, 1995] Gardner, W. and Martin, K. (1995). HRTF
measurements of a KEMAR. Journal of the Acoustical Society of America,
97:3907–3908.
[Geoff Martin, 1999] Geoff Martin, Wieslaw Woszczyk, J. C. R. Q. (1999).
Controlling phantom image focus in a multichannel reproduction system.
In 107th Convention of the Audio Engineering Society, preprint no. 4996,
volume 4996, New York, USA. Audio Engineering Society.
[Golomb, 1967] Golomb, S. W. (1967). Shift Register Sequences. Holden
Day, San Francisco.
[Isaacs, 1990] Isaacs, A., editor (1990). A Concise Dictionary of Physics.
Oxford University Press, New York, new edition edition.
791
BIBLIOGRAPHY 792
795
INDEX 796
VCA, 319
velocity, instantaneous, 159
Vertical position, 417
VITC, 770
voltage, 58
voltage controlled amplifier, 319
voltage regulator, 127
voltmeter, 411
volume density, 151
Watt’s Law, 60
wave number, 165
wave, longitudinal, 157
wave, plane, 175
wave, torsional, 157
wave, transverse, 157
waveguide, 210
wavelength, 164
wavenumber, 165
wavenumber, acoustic, 165
weighting curve, 262
weighting filter, 262
window, Hamming, 534
window, Hanning, 532
window, rectangular, 530
windowing, 524
windowing function, 527
wolf fifth, 235
Zen, 554
zero, 582