2.8 Recommended Sequence of Practice and Test: Octave Frequencies
2.8 Recommended Sequence of Practice and Test: Octave Frequencies
Octave Frequencies
1. Practice Type: Matching
Monitor Selection: Pink Noise
Frequency Resolution: Octave
Number of Bands: 1
Gain Combination: +12 dB
Q=2
Frequency Range:
a. 500 to 2000 Hz
b. 63 to 250 Hz
c. 4000 to 16,000 Hz
d. 250 to 4000 Hz
e. 125 to 8000 Hz
f. 63 to 16,000 Hz
2. Same as above except:
Monitor Selection: a variety of sound recordings of your choice
3. Same as above except:
Practice Type: Absolute Identification
Monitor Selection: Pink Noise and a variety of sound recordings of your choice
Frequency Range: 63 to 16,000 Hz
4. Same as above except:
Number of Bands: 2
5. Same as above except:
Number of Bands: 3
6. Same as above except:
Gain Combination: +12/− 12 dB
7. Same as above except:
Gain Combination: +9 dB
8. Same as above except:
Gain Combination: +9/− 9 dB
9. Same as above except:
Gain Combination: +6 dB
10. Same as above except:
Gain Combination: +6/− 6 dB
11. Same as above except:
Gain Combination: +3 dB
12. Same as above except:
Gain Combination: +3/− 3 dB
52 Tonal Balance and Equalization
Third-Octave Frequencies
Progress through the sequence above but instead of working with octave frequencies, select
“1/3rd Octave” from the Frequency Resolution drop-down menu.
Time Limit
To increase difficulty further, you may wish to use the timer to focus on speed.
Summary
Equalization is perhaps our most important tool as audio engineers. It is possible to learn
how to identify boosts and cuts by ear through practice. The available software practice
module can serve as an effective tool for progress in technical ear training and critical listen-
ing when used for regular and consistent practice.
Chapter 3
Reverberation can create distance, depth, and spaciousness in recordings, whether we capture
it with microphones during the recording process or add it later during mixing. Reverbera-
tion use has evolved into distinct conventions for various music genres and eras of recording.
Specific reverberation techniques do not always translate across musical genres, although the
general principles of reverberation are the same. Film and game sound also make extensive
use of reverberation to reinforce visual scenes or give the viewer information about off-
camera actions or scenes.
In classical music recording, we position microphones to blend direct sound (from instru-
ments and voices) and indirect sound (reflected sound and reverberation), to represent the
natural sound of musicians performing in a reverberant space. As such, we listen closely to
the balance of the dry and reverberant sound and make adjustments to microphone positions
if the dry/reverberant balance is not to our liking. By moving microphones farther away
from the instruments, we increase reverberation and decrease direct sound.
Pop, rock, electronic, and other styles of music that use predominantly electric instruments
and computer-generated sounds are not usually recorded in reverberant acoustic spaces,
although there are some spectacular exceptions (see Chapter 7, Analysis of Sound). Rather,
we often create a sense of space with artificial reverberation and delays after the music has
been recorded in a relatively dry acoustic space with close microphones. We can use artificial
reverberation and delay to mimic real acoustic spaces or to create completely unnatural
sounding spaces. We do not always want every instrument or voice to sound like they are
at the front edge of the stage. We can think of recorded sound images like photography or
a painting. It is often more interesting to have elements in the mid-ground and background,
while we focus a few elements in the foreground. Delay and reverberation are the key tools
that help us create a sense of depth and distance in a recording. More reverberation on a
source makes it sound farther away while dryer elements remain to the front of our sound
image. Not only can we make sounds seem farther away and create the impression of an
acoustic space, but we can also influence the character and mood of a recording with careful
use of reverberation. In addition to depth and distance control, we can control the angular
location (left–right position or azimuth) of sound sources through standard amplitude
panning.
With stereo speakers, we have two dimensions within which to control sound source
location: distance (near to far) and angular location (azimuth). With elevated loudspeaker
arrays such as those found in IMAX movies, theme parks, and audio research environments,
we obviously have a third dimension of height. For the purposes of this book, I will focus
on loudspeakers in the horizontal plane only (no elevated speakers), whether stereo or multi-
channel; but again, the general principles apply to any audio reproduction environment
whether it has two dimensions or three.
54 Spatial Attributes and Reverberation
Spatial attributes also include correlation and spatial continuity of a sound image. Simply
put, correlation refers to the amount of similarity between two channels. We can measure
the correlation of the left and right channels of a stereo image with a correlation or phase
meter. This type of meter typically ranges from − 1 to +1. A correlation of +1 means that
the left and right channels are identical, although they could be different amplitudes. A
correlation of − 1 means that the left and right channels are identical but that one channel
is opposite polarity. In practice, most stereo recordings have a correlation that ranges from
0 to +1 and with occasional jumps toward the − 1 end of the meter.
Left and right channel correlation affects the perceived width in a recording. Two perfectly
correlated channels will result in a mono image. A correlation of − 1 creates an artificially
wide-sounding stereo image. We can hear the effect of negatively correlated or “out of
phase” channels easier over loudspeakers than headphones. Where we localize a negatively
correlated sound image depends highly on our listening position. Sitting in the ideal listen-
ing position (see Figure 1.2), we will tend to localize the sound image to the sides of our
head or outside of the loudspeaker locations. If we move ever so slightly to the left or right
of the ideal listening location, the sound image will shift quickly to one side or the other.
Negatively correlated sound images seem unstable and difficult to localize. It is important
to listen for artificially wide stereo mixes or elements within a mix, which would indicate
that there may be a polarity problem somewhere in the signal path that needs to be cor-
rected. We discuss more on listening to opposite polarity channels in Section 3.7.
Decorrelated channels (when the correlation or phase meter reads 0) tend to create a
stereo image with the energy located primarily at the left and right speakers. If you listen
to decorrelated pink noise over stereo speakers you may notice little audible energy in the
center of the stereo image, but the image is not wider than the speakers as a negatively
correlated image would be. What energy you do hear in the center of the image is mainly
low frequency. High frequencies are clearly located at the speakers.
Another meter that is useful for monitoring the stereo image width and location of energy
within the stereo image is a goniometer or vectorscope, which gives a Lissajous pattern. This type
of meter is often presented in multimeter plug-ins in combination with a phase or correlation
meter. To get an idea how a vectorscope represents a stereo image, I find it useful to start with a
sine tone test signal to show some conditions. Starting with a 1 kHz sine tone panned center in
Figure 3.1, we see that the vectorscope displays a vertical line in the center of the meter. The meter
represents where we would localize this center-panned sound—directly in the center of the stereo
image. If we pan the sine tone to one side, we see the effect in Figure 3.2 with a line tilting to the
right at a 45-degree angle from the center. Listening to this condition, we would localize it in the
right speaker or right headphone. If we pan the sine tone back to center and invert the polarity
(flip the phase) of one of the output channels, we get a horizontal line at the bottom of the meter
as in Figure 3.3. The horizontal line represents negatively correlated left and right channels.
Figure 3.1 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone
panned to the center. Note that the energy appears as a straight vertical line in the center of the
meter and the phase meter is reading +1. (Screenshot of iZotope Insight plug-in.)
Figure 3.2 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone
panned to the right. Note that the energy appears as a straight line at a 45-degree angle from
the center of the meter and the phase meter is reading 0. (Screenshot of iZotope Insight plug-in.)
Figure 3.3 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone with
phase reversed (polarity inverted) on one channel of the stereo bus. Note that the energy appears
as a straight horizontal line at the bottom of the meter and the phase meter is reading −1. (Screen-
shot of iZotope Insight plug-in.)
Figure 3.4 A vectorscope meter showing the stereo image width and correlation of a stereo mix. Note that
the energy is primarily weighted toward the center of the meter and the correlation is at almost
+1. Contrast this with Figure 3.1. (Screenshot of iZotope Insight plug-in.)
Spatial Attributes and Reverberation 57
Figure 3.5 A vectorscope meter showing the stereo image width and correlation of a stereo mix. Note that
the energy is widely spread across the meter with seemingly random squiggles. The correlation
is about two-thirds of the distance from 0 to +1, but the gray region of the meter shows that
its recent history fluctuated widely from a high point close to 1 and down to slightly below 0.
(Screenshot of iZotope Insight plug-in.)
Sine tones are useful for illustrating some basic vectorscope conditions, but in practice we
are more likely to meter more complex signals such as music, speech, and sound effects.
Figure 3.4 shows a vectorscope screenshot of a moment in time of a pop stereo mix. You
can see that, although it is not a single vertical line like the sine tone above, the energy is
primarily located in the center of the vectorscope image and the correlation meter sits close
to +1. Because the lead vocal, bass, kick drum, snare drum, and guitar are panned center,
with subtle reverb and auxiliary percussion panned to the sides in this recording, the meter
reflects the stereo image that we hear: a center-heavy mix, typical of what we find in pop
music recordings.
With different mixes we get different representations in the meter. Figure 3.5 shows
another stereo mix of a more experimental type of music. Note the much wider rep-
resentation on the meter, which is reflected in the stereo image width when we listen
to it.
or phase invert switch, usually labeled with “ø” or “Φ.” Listen to the effect that an “out
of phase” channel creates. Once you hear the out of phase sound, you will likely remember
it and recognize it immediately when you hear it again.
made use of reverberant spaces to create emotional impact in his productions. The quintes-
sential song in this production style is The Ronettes’s “Be My Baby” from 1963. A couple
of decades later with the help of producers Brian Eno and Daniel Lanois, U2’s albums The
Joshua Tree and The Unforgettable Fire employed extensive use of delays and reverb to create
the sense of big open spaces. Eno and Lanois had become well known for their ambient
music recordings only a few years prior, and they brought some of these spatial processing
methods and “treatments” to subsequent pop music recordings. German record producer
Manfred Eicher and his label ECM use more prominent reverb on their jazz recordings than
American jazz labels such as Impulse! Records and Blue Note Records. More recently, indie
pop bands such as Fleet Foxes and Sigur Rós have produced albums with clearly noticeable
washes of reverb.
Although some perceive prominent reverb in music recordings as gimmicky, I often find
that reverb makes a recording interesting and engaging. On the other hand, when reverb
seems to be an add-on that does not blend well with the music, makes the music muddy,
or does not have any apparent musical role, it can detract from our listening experience.
Production decisions often come down to choice and personal taste, but the music should
guide us. Experiment, try something new, take a risk, and do something out of your comfort
zone. Maybe nothing useful will come of it. Or maybe you will discover something really
interesting by following your ears, trying unconventional things, improvising, and being open
to new possibilities. Regardless of your stance on reverb use, listen to the way reverb, echo,
and delay are used in commercial recordings and try emulating them.
The spatial layout of sources in a sound image can influence clarity and cohesion partly
due to spatial masking. We know that a loud sound will partially or completely mask a
quiet sound. It is difficult to have a conversation in a noisy environment because the noise
masks our voices. It turns out that if the masker (noise) and the maskee (speaking voices,
for example) arrive from two different locations then less masking occurs. The same effect
can occur in stereo and multichannel audio images. Sound sources panned to the same
location may partially or completely mask other sound sources panned to that location. Pan
the sounds to opposite sides and suddenly we can hear a quieter sound that was previously
masked. Sometimes reverberation in a recording can seem inaudible or at least difficult to
identify because it blends with and is partially masked by direct sound. This is especially
true for recordings with sustained elements. Transient elements such as percussion and drums
allow us to hear reverb that may be present because, by definition, transient sounds decay
quickly, usually much more quickly than the reverb.
We must factor in subjective impressions of spatial processing as we translate between
controllable parameters on digital reverb such as decay time, predelay time, and early reflec-
tions, and their sonic results. Conceptually, we might link sound source distance control to
reverb simulation, but there is usually not a parameter labeled “distance” in a conventional
digital reverb processor. If we want to make a sound source seem more distant, we need to
control distance indirectly by adjusting reverberation parameters, such as decay time, predelay,
and mix level, in a coordinated way until we have the desired sense of distance. We must
translate between objective parameters of reverberation to create the desired subjective impres-
sion of source placement and simulated acoustic environment.
Our choice of reverberation parameter settings depends on a number of things such as
the transient nature and width of our dry sound sources, as well as the decay and early
reflection characteristics of our reverberation algorithm. Professional engineers often rely on
subjective qualities of reverb to accomplish their goals for each individual mix rather than
simply choosing parameter settings that worked in other situations. In other words, they
adjust parameters until the reverb sounds right for the mix, rather than simply pulling up
60 Spatial Attributes and Reverberation
a preset they used on a previous mix and assuming that it will work. A particular combina-
tion of parameter settings for one source and reverberation usually cannot simply be dupli-
cated for an identical distance and spaciousness effect with a different source or reverberation
algorithm.
We can benefit from analyzing spatial properties from both objective and subjective per-
spectives, because the tools have objective parameters, but our end goal in recording is to
achieve great sounding mixes, not to identify specific parameter settings. As with equalization,
we must find ways to translate between what we hear and the parameters available for
control. As mentioned above, spatial attributes can be broken down into the following cat-
egories and subcategories:
Sound Sources
The spatial attributes of sound sources consist of three main categories:
• angular location
• distance
• spatial extent
• Direct sound level. Quieter sounds are judged as being farther away because there is a sound
intensity loss of 6 dB per doubling of distance from a source (in a free field condition, i.e.,
no reflected sound present). This cue can be ambiguous for the listener because a change
in loudness can be the result of either a change in distance or a change in a source’s acous-
tic power.
• Reverberation level. As a source moves farther away from a listener in a room or hall, the
direct sound level decreases and the reverberant sound remains roughly constant, lowering
the direct-to-reverberant sound ratio.
• Distance of microphones from sound sources. Moving microphones farther away decreases the
direct-to-reverberant ratio and therefore creates a greater sense of distance.
• Room microphone placement and level. If we place microphones on the opposite end of a
room relative to the musicians, we will pick up primarily reverberant sound. We can treat
room microphone signals as reverberation to add to our mix.
• Low-pass filtering close-miked direct sounds. High frequencies are attenuated more than lower
frequencies because of air absorption as we move farther from a sound source. Further-
more, the acoustic properties of reflective surfaces in a room affect the spectrum of re-
flected sound reaching a listener’s ears.
concept in concert hall acoustics research is called apparent source width or ASW, which is
related to strength, timing, and direction of side reflections. Acoustician Michael Barron
found that stronger reflections from the side would result in a wider ASW.
As with concert hall acoustics, we can influence the perceived width of sources reproduced
over loudspeakers by adding early reflections, whether recorded with microphones or gener-
ated artificially. If artificial early reflections (in stereo) are added to a single, close microphone
recording of a sound source, the direct sound tends to fuse perceptually with early reflections
(depending on the time of arrival of the reflections) and produce an image that is wider
than just the dry sound on its own.
The perceived width of a sound image produced over loudspeakers will vary with the
microphone technique used, the sound source, and the acoustic environment in which it is
recorded. Spaced microphones produce a wider sound source because the level of correlation
of direct sounds between the two microphone signals is reduced as the microphones are
spread farther apart. As we discussed above, a stereo image correlation of 0 (decorrelated
left and right channels) creates a wide image with energy that seems to originate in the left
and right loudspeakers primarily, with little energy in the center. We can affect correlation
with the spacing of a stereo pair of microphones. In most cases, two microphones placed
close together will produce highly correlated signals, except for certain cases with the Blum-
lein technique that I describe in the next paragraph. Because pairs of coincident microphones
occupy nearly the same physical location, the acoustic energy reaching both will be almost
identical. As we move them apart, correlation will decrease. A small spacing of an inch or
two (a few centimeters) will decorrelate high frequencies, but low frequencies will still be
correlated. With more space between microphones, decorrelation will spread to lower fre-
quencies. Microphone spacing and the lowest frequency of correlation are inversely propor-
tional because as we go lower in frequency, wavelengths increase, thus requiring greater
spacing for low frequencies to be decorrelated. In other words, as we widen a pair of
microphones, our resulting stereo image also widens (as correlation decreases), assuming one
mic is panned hard left and the other is panned hard right.
As I mentioned above, the Blumlein stereo microphone technique, which uses coincident
figure-8 or bidirectional microphones angled 90 degrees apart, creates a slightly more compli-
cated stereo image. Sounds arriving at the fronts and backs of the microphones are in phase, so
we have no decorrelation. Sounds arriving at the sides are picked up by each microphone at the
same time, but the polarity of the microphones is opposite. For example, a sound arriving from
the right side of a Blumlein pair will be picked up by the front, positive lobe of the right-facing
microphone and also by the rear, negative lobe of the left-facing lobe. As a result, sounds arriv-
ing from the side are negatively correlated in the stereo image. See Figure 3.6, which shows the
polar patterns of the figure-8 microphones and a sound source arriving from the side.
Figure 3.6 A Blumlein stereo microphone technique uses two coincident figure-8 microphones angled
90 degrees apart. Sounds arriving from the sides are negatively correlated in the resulting stereo
image.
Spatial Attributes and Reverberation 63
Spatial extent of sound sources can be controlled through physical parameters such as the
following:
• Early reflection patterns originating from a real acoustic space or generated artificially
with reverberation.
• Type of stereo microphone technique used: spaced microphones generally yield a wider
spatial image than coincident microphone techniques, as we discussed above.
The Space: Spatial Extent (Width and Depth) of the Sound Stage
A sound stage is the acoustic environment within which we hear a sound source, and it
should be differentiated from a sound source. The environment may be a recording of a
real space, or it may be something that has been created artificially using delay and artificial
reverberation.
• Correlation of +1: Left and right channels are identical, composed completely of signals
that are panned center.
• Correlation of 0: Left and right channels are different. As mentioned above, the channels
could be in musical unison and still be decorrelated if the two parts were played by differ-
ent musicians or by the same musician as an overdub.
• Correlation of − 1: Left and right channels are identical but opposite in polarity, or nega-
tively correlated.
Phase meters provide one objective way of determining the relative polarity of stereo chan-
nels, but if no such meters are available, we must rely on our ears.
On occasion we may find an individual instrument that we recorded in stereo has opposite
polarity channels panned hard right and left. If such a signal is present, a phase meter on
the stereo bus may not register it strongly enough to give an unambiguous visual indication,
or we may not be using a phase meter. Sometimes stereo line outputs from electric instru-
ments are opposite polarity, or perhaps a polarity flip cable was used during recording by
mistake. Often stereo line outputs from electronic instruments are not truly stereo but mono.
When one output is opposite polarity, the two channels will cancel when summed to mono.
either the left or right channels. Often pop and rock music mixes have a strong center
component (as seen in the vectorscope image Figure 3.4) because of the number and strength
of instruments that are typically panned center, such as kick drum, snare drum, bass, and
vocals. Classical and acoustic music recordings may not have a similarly strong central image,
and it is possible to be deficient in the center image energy—sometimes referred to as hav-
ing a “hole in the middle” of the stereo image. We should strive to have an even and
continuous spread of sound energy from left to right.
Time Delay
Although a simple concept, time delay can serve as a fundamental building block for a wide
variety of complex effects. Figure 3.7 shows a block diagram of a signal being added or
mixed to a delayed version of itself, known as a feedforward comb filter, and its associated
impulse response. By simply delaying an audio signal and mixing it with the original non-
delayed signal, the product is either comb filtering (for shorter delay times, less than about
10 ms) or echo (for longer delay times). By adding hundreds of delayed versions of a signal
in an organized way, early reflection patterns such as those found in real acoustic spaces can
be mimicked. Chorus and flange effects are created through the use of delay times that are
modulated or vary over time. Figure 3.8 shows a block diagram of a delay with feedback
and its associated impulse response. We can see that the shape of this feedback comb filter’s
decay looks a little bit like the decay of sound in a room. A single feedback comb filter
will not sound like real reverb. To make it sound like actual reverb, we need to have numer-
ous feedback comb filters in parallel all set to slightly different delay times and gain amounts.
Figure 3.7 The top part (A) shows a block diagram of a signal combined with a delayed version of itself, also
known as a feedforward comb filter. The delay time amount is represented by the variable t, and
gain amount by g. The bottom part (B) shows the impulse response of the block diagram with a
gain of 0.5: a signal (in this case an impulse) plus a delayed version of itself at half the amplitude.
66 Spatial Attributes and Reverberation
Figure 3.8 The top part (A) shows a block diagram of a signal combined with a delayed version of itself with the
output connected back into the delay, also known as a feedback comb filter.The delay time amount is
represented by the variable t, and gain amount by g.The bottom part (B) shows the impulse response
of the block diagram with a gain of 0.5: a signal (in this case an impulse) plus a repeating delayed
version of itself where each subsequent delayed output is half the amplitude of the previous one.
Figure 3.9 A block diagram of an all-pass filter, which is essentially a combination of a feedforward and feed-
back comb filter. All-pass filters have a flat frequency response, but they can be set to produce a
decaying time response. There is one delay time, t, and three gain variables: blend (non-delayed
signal) = g1, feedforward delay = g2, feedback delay = g3.
If we combine a feedforward and feedback comb filter, we can create what is known as an
all-pass filter, as shown in Figure 3.9. All-pass filters have a flat frequency response, thus the
name “all” pass, but can be set to produce a decaying time response. As we will see below,
they are an essential building block of digital reverbs.
Reverberation
Whether originating from a real acoustic space or an artificially generated one, reverberation
is a powerful effect that can provide a sense of spaciousness, depth, cohesion, and distance
Spatial Attributes and Reverberation 67
in recordings. Reverberation helps blend sounds and create the illusion of being immersed
in an environment different from our physical surroundings.
On the other hand, reverberation, like any other type of audio processing, can also create
problems in sound recordings. Mixed too high or with a decay time that is excessively long,
reverberation can destroy the clarity of direct sounds or, as in the case of speech, affect
intelligibility. The quality of reverberation must be optimized to suit the musical and artistic
style being recorded.
Reverberation and delay have important functions in music recording, such as helping
the instruments and voices in a recording blend and “gel.” Through the use of reverbera-
tion, we can create the illusion of sources performing in a common acoustic space. Additio-
nal layers of reverberation and delay can be added to accentuate and highlight specific
soloists.
The sound of a close-miked instrument or singer played back over loudspeakers creates
an intimate or perhaps even uncomfortable feeling when listening over headphones. When
we hear a close-miked voice over headphones, it sounds like the singer is only a few cen-
timeters from our ears. This is not something we are accustomed to hearing acoustically
from a live music performance and it can make listeners feel uncomfortable. Concert goers
hear live music performances at least several feet away from the performers—certainly more
than a few centimeters—which means that reflected sound from walls, floor, and ceiling of
a room fuses perceptually with sound coming directly from a sound source. When recording
a performer with a close microphone, we can add delay or reverberation to the dry signal
to create the perception of a more comfortable distance between the listener and sound
source.
Conventional digital reverberation algorithms use a network of delays, all-pass filters, and
comb filters as their building blocks. Even the most sophisticated digital reverberation algo-
rithms are based on the basic ideas found in the first digital reverb invented by Manfred Schroe-
der in 1962. Figure 3.10 shows a block diagram of Schroeder’s digital reverb with four parallel
comb filters that feed into two all-pass filters. Each time a signal goes through the feedback
loop it is reduced in level by a preset amount so that its strength decays over time as we saw
in Figure 3.8.
At their most basic level, conventional artificial reverberation algorithms are just combina-
tions of delays with feedback or recursion. Although simple in concept, current reverb
plug-in designers use large numbers of comb and all-pass filters connected together in
sophisticated ways, with manually tuned delay and gain parameters to create realistic-sounding
comb filter
comb filter
input + all-pass filter all-pass filter output
comb filter
comb filter
Figure 3.10 A block diagram of Manfred Schroeder’s original digital reverberation algorithm, showing four
comb filters in parallel that feed two all-pass filters in series, upon which modern conventional
reverb algorithms are based.
68 Spatial Attributes and Reverberation
reverb decays. They also add equalization and filters to mimic reflected sound in a real room,
and subtle modulation to reduce repeating patterns that might catch our attention and
remind us that the reverb is artificial.
Another type of digital reverberation convolves an impulse response of a real acoustic space
with the incoming dry signal. Without getting into the mathematics, we might say that
convolution basically combines two signals by applying the features of one signal to another.
When we convolve a dry signal with the impulse response from a large hall, we create a
new signal that sounds like our dry signal recorded in a large hall. Hardware units capable
of convolution-based reverberation have been commercially available since the mid-1990s,
and software implementations are now commonly released as plug-ins with digital audio
workstations. Convolution reverberation is sometimes called “sampling” or “IR” reverb
because a sample or impulse response of an acoustic space is convolved with a dry audio
signal. Although possible to compute in the time domain, convolution reverb is usually
computed in the frequency domain to make the computation fast enough for real-time
processing. The resulting reverb from a convolution reverberator is arguably more realistic
sounding than that from conventional digital reverberation using comb and all-pass filters.
The main drawback is that there is not as much flexibility or control of parameters in
convolution reverberation as is possible with digital reverberation based on comb and all-pass
filters.
In conventional digital reverberation units, we usually find a number of possible parameters
to control. Although these parameters vary from one manufacturer to another, a few of the
most common include the following:
Although most digital reverberation algorithms represent simplified models of the acoustics
of a real space, they are widely used in recorded sound to help augment the recorded acoustic
space or to create a sense of spaciousness that did not exist in the original recording due to
close-miking techniques.
V = volume in m3, S = surface area in m2 for a given type of surface material, and α =
absorption coefficient of the respective surface.
Spatial Attributes and Reverberation 69
Because the RT60 will be some value greater than zero even if α is 1.0 (100% absorption
on all surfaces), the Sabine equation is typically only valid for α values less than 0.3. In other
words, the shortcoming of the Sabine equation is that we would calculate a reverberation
time greater than 0 for an anechoic chamber, even though we would measure no reverbera-
tion acoustically. Norris-Eyring proposed a slight variation on the equation for a wider
range of values (Howard & Angus, 2006):
−0.161 ∗ V
RT 60 =
S ∗ ln (1 − α )
V = volume in m3, S = surface area in m2 for a given type of surface material, ln is the
natural logarithm, and α = absorption coefficient of the respective surface.
It is helpful to have an intuitive sense of the sound of various decay times. A decay time of
2 seconds will have a much different sonic effect than a decay time of less than 1 second.
Delay Time
We can mix a straight delay (without feedback or recursion) with a dry signal to create a
sense of space, and it can supplement or substitute reverberation. With shorter delay times—
around 25–35 milliseconds—our auditory systems tend to fuse the direct and delayed sounds;
we localize the combined sound based on the location of the first-arriving direct sound.
Helmut Haas discovered that a single reflection added to a speech signal fused perceptually
with the dry sound unless the reflection arrived more than approximately 25–35 milliseconds
after the dry sound, at which point we perceive the delayed sound as an echo or separate
sound. The phenomenon is known as the precedence effect, the Haas effect, or the law of
the first wavefront.
When we add a signal to a delayed version of itself and the delay time is greater
than 25–35 milliseconds, we hear the delayed signal as a distinct echo of a direct sound.
The actual amount of delay time required to create a distinct echo depends on the
nature of the audio signal being delayed. Transient, percussive signals reveal distinct
echoes with shorter delay times (less than 30 milliseconds), whereas sustained, steady-
state signals require much longer delay times (more than 50 milliseconds) to create an
audible echo.
Predelay Time
Predelay time is typically defined as the time delay between the direct sound and the onset
of reverberation. Predelay can give the impression of a larger space even with a short decay
time. In a real acoustic space with no physical obstructions between a sound source and a
listener, there will always be a short delay between the arrival of direct and reflected sounds.
The longer this initial delay is, the larger we perceive the space to be.
for the same algorithm. The presets are individually named to indicate an application or space
such as large hall, bright vocal, studio drums, or theater. All of the presets using a given type
of algorithm represent identical types of processes and will sound identical if the parameters
of each preset are matched.
Because engineers adjust many reverberation parameters to create the most suitable
reverberation for each application, it makes sense to pick any preset and start tuning
parameters instead of searching for the perfect preset. The main drawback of trying to
find the right preset for each instrument and voice during a mix is that the right preset
might not exist. Or if something close does exist, it will likely require parameter adjust-
ments anyway, so why not just start by adjusting parameters. It is more efficient to simply
start with any preset and spend our time editing parameters to suit our mix. As we edit
parameters, we learn a reverb’s capabilities and what each parameter sounds like. In the
parameter-editing phase for an unfamiliar reverb, I find it helpful to turn parameters to
their range extremes to make sure I can hear their contributions, and then dial in the
settings I want.
On the other hand, we can learn more about the capabilities of a reverb algorithm by
going through the factory presets. Searching through endless lists of presets may not be the
best use of a mixing session, but it can be useful to listen carefully to presets during
downtime.
Figure 3.11 Impulse responses of three different reverb plug-ins with parameters set as identically as pos-
sible: reverb decay time: 2.0 s; predelay time: 0 ms; room type: hall. From these three impulse
responses, we can see that the decays look different, but perhaps more importantly, the decays
also sound distinctly different. Interestingly, according to FuzzMeasure audio test and measure-
ment software, all three impulse responses measure close to 2.0 seconds decay time.
72 Spatial Attributes and Reverberation
impulse responses of three different reverb plug-ins set to as close to the same parameters
as possible, but with three distinctly different decay patterns. Reverb plug-ins do not all share
the same set of controllable parameters, thus it is impossible to have two different plug-ins
with exactly the same settings.
Reverb parameters settings do not sound consistent across digital reverb algorithms because
there are many different reverb algorithms and there are thousands of acoustic spaces to
model. This is one reason why it can be worth exploring different reverb models to find
out what works best for your projects. There are hundreds of options with varying levels
of quality that appeal to different tastes. Reverberation is a powerful sonic tool available to
recording engineers who mix it with recorded sound to create the aural illusion of real
acoustics and spatial context.
Just as it is critical to learn to recognize spectral resonances (with EQ), it is equally
important to improve our perception of artificial reverberation. At least one researcher has
demonstrated that listeners can “learn” reverberation for a given room (Shinn-Cunningham,
2000). Other work in training listeners to identify spatial attributes of sound has been con-
ducted as well. Neher et al. (2003) have documented a method of training listeners to
identify spatial attributes using verbal descriptors for the purpose of spatial audio quality
evaluation. Other researchers have used graphical assessment tools to describe the spatial
attributes of reproduced sound (such as Ford et al., 2003; Usher & Woszczyk, 2003).
This training software has an advantage because you compare one spatial scene with
another by ear; you are never required to translate your auditory sensation to another sensory
modality or means of expression, such as by drawing an image or choosing a word. Using
the software, you compare and match two sound scenes, within a given set of artificial
reverberation parameters, using only your auditory system. Thus, there is no isomorphism
between different senses and methods of communication. Additionally, this method has
ecological validity, as it mimics the process of a sound engineer sculpting sonic details of a
sound recording by ear rather than through graphs and words.
Training Module
The included software training module “Technical Ear Trainer—Reverb” is available for
listening drills. The computer randomizes the exercises and gives a choice of difficulty and
parameters for an exercise. It works in much the same way as the EQ module described in
Chapter 2 works.
Sound Sources
I encourage you to begin the reverb training with simple, transient, or impulsive sounds such
as percussion—a single snare drum hit is great—and progress to more complex sounds such as
speech and music recordings. In the same way that we use pink noise in EQ ear training
because it exposes the spectral changes better than most music samples, we use percussive
or impulsive sounds training in time-based effects processing. Reverberation decay time is
easier to hear with transient signals than with steady-state sources, which tend to mask or
blend with reverberation, making judgments about it more difficult.
User Interface
A graphical user interface (GUI), shown in Figure 3.12, provides a control surface for you
to interact with the system.
Spatial Attributes and Reverberation 73
Figure 3.12 A screenshot of the user interface for the reverb trainer.
The graphical interface also keeps track of the current question and the average score up
to that point, and it provides the score and correct answer for the current question.
• delay time
• reverb decay time
• predelay time
• reverberation level (mix)
74 Spatial Attributes and Reverberation
As with the EQ module, your task with the exercises and tests is to duplicate a reference
sound scene by listening and comparing your answer to the reference and making the appropriate
changes to the parameters until they sound the same. The software randomly chooses parameter
values based on the level of difficulty and test parameters you choose, and it asks you to identify the
reverberation parameters of the reference by adjusting the appropriate parameter to the value that
most closely matches the sound of the reference. You can toggle between the reference question
and your answer either by clicking on the switches labeled “Question” and “Your Response”
(see Figure 3.12) or by pressing the space bar on the computer keyboard. Once the two sound
scenes are matched, you can click on “Check Answer” or hit the [Enter] key to submit the answer
and see the correct answer. Clicking on the “Next” button moves on to the next question.
Delay Time
Delay times range from 0 milliseconds to 200 milliseconds with an initial resolution of
40 milliseconds and increasing in difficulty to a resolution of 10 milliseconds.
Predelay Time
Predelay time is the amount of time delay between the direct (dry) sound and the beginning
of early reflections and reverberation. Predelay times vary between 0 and 200 ms, with an
initial resolution of 40 ms and decreasing to a resolution of 10 ms.
Mix Level
Often when mixing reverberation with recorded sound, the level of the reverberation is
adjusted as an auxiliary return on the recording console or digital audio workstation. The
training system allows you to practice learning various “mix” levels of reverberation. A mix
level of 100% means that there is no direct (unprocessed) sound at the output of the algo-
rithm, whereas a mix level of 50% represents an output with equal levels of processed and
unprocessed sound. The mix value resolution at the lowest level of difficulty is 25% and
progresses up to a resolution of 5%, covering the range from 0% to 100% mix.
Figure 3.13 A block diagram (A) and a mixer signal flow diagram (B) to convert Left and Right stereo signals
into Mid (Left + Right) and Side (Left − Right) signals, and subsequent mixing back into Left and
Right channels. Both diagrams result in equivalent signal processing, where diagram A is a basic
block diagram and diagram B shows one way to route signals on a mixer to achieve the process-
ing in diagram A. Dashed signal lines in the diagrams represent audio signal flow the same as solid
lines but are used to clarify signal flow for crossing lines. Dotted lines indicate fader grouping.
Mastering engineers sometimes split a stereo recording into its M and S components for
processing and then convert them back into L and R. Although there are plug-ins that auto-
matically convert the L and R channels to M and S, the process is quite simple. We can derive
the mid or sum component by adding the L and R channels together. Practically, we can do it
by bringing the two audio channels in on two faders and panning them both to the center. To
derive the side or difference channel, we send the L and R into two other pairs of channels. One
pair can be panned hard left and with the L channel opposite polarity. The final pair of L and
R channels can be panned right with the right channel opposite polarity. See Figure 3.13 for
details on the signal routing information. Now that the signals are split into M and S, we can
simply rebalance these two components, or we can apply processing to them independently. The
S signal represents the components of the signal that meet either of the following conditions:
Summary
This chapter covers the spatial attributes of sound, focusing primarily on reverberation and
mid-side processing. The goal of the spatial software practice module is to systematically
familiarize listeners with aspects of artificial reverberation, delay, and panning. By comparing
two audio scenes by ear, we can match one or more parameters of artificial reverberation
to a reference randomly chosen by the software. We can progress from comparisons using
percussive sound sources and coarse resolution between parameter values to more steady-state
musical recordings and finer resolution between parameter values. Often very minute changes
in reverberation parameters can have a significant influence on the depth, blend, spaciousness,
and clarity of the final mix of a sound recording.
Chapter 4
In this chapter we will discuss level control and dynamics processing. To inform our critical
listening, we will cover some of the theory of dynamics processors.
Mix balance has a direct effect on an artist’s musical expression. If one or multiple ele-
ments in a mix are too loud or too quiet, we as listeners may not be able to hear a musical
part or we may think the emphasis is on a different part than the artist intended. Achieving
an appropriate balance of a musical ensemble is essential for expressing an artist’s musical
intention. Conductors and composers understand the idea of finding optimal ensemble bal-
ance for each performance and piece of music. If an instrumental part within an ensemble
is not loud enough to be heard clearly, listeners do not receive the full impact of a piece of
music. Overall balance depends on the control of individual vocal and instrumental ampli-
tudes in an ensemble.
When recording spot microphone signals on multiple tracks and mixing those tracks, we
have direct control over balance and therefore also musical expression. When mixing multiple
tracks, we may need to continually adjust the level of certain instruments or voices for
consistent balance from the beginning to the end of a track. We can do this manually with
fader automation, automatically with dynamics processors, or use a hybrid approach that
uses both.
Dynamic range describes the difference between the loudest and quietest levels of an
audio signal. For microphone signals that have a dynamic range that is excessively wide
for the type of music, we can adjust fader levels over time to compensate for variations in
signal level and therefore maintain a consistent perceived loudness. We can manually boost
levels during quiet sections and attenuate loud sections. In this way, our fader level adjust-
ments made through a recording amount to manual dynamic range compression. Dynamic
range controllers—compressors/limiters and expanders/gates—adjust levels automatically
based on an audio signal’s level and can be applied to individual audio tracks or to a mix
as a whole.
Some signals have an inherently wide dynamic range; others have a relatively narrow range.
Distorted guitars generally have a small dynamic range, because distortion results from limit-
ing the amplitude of a signal, with instantaneous attack and release times. A close-miked
lead vocal, on the other hand, can have an extremely wide dynamic range. In extreme cases,
a singer’s dynamic range may vary from a loud scream to just a whisper, all within a single
song. If a vocal track’s fader is set to one level and left for the duration of a piece with no
compression or other level change, there will be moments when the vocal will be much too
loud and other moments when it will be too quiet. When a vocal level rises too high it
becomes uncomfortable for a listener, who may then want to turn the entire mix down. In
the opposite situation, a vocal that is too low in level becomes difficult to understand,
78 Dynamic Range Control
leaving an unsatisfying musical experience for a listener. Finding a satisfactory static fader
level without compression for a sound source as dynamic as pop vocals is likely to be impos-
sible unless the singer intentionally sings within a narrow dynamic range. One way of
compensating for a wide dynamic range is to manually adjust the fader level for each word
or phrase that a singer sings. Although some tracks do call for such detailed manual control
of fader level, compression is still helpful in getting partway to consistent, intelligible, and
musically satisfying levels, especially for tracks with a wide dynamic range.
Consistent levels for instruments and vocals in a pop music recording may help com-
municate the musical intentions of an artist more effectively than levels with a wide dynamic
range. Most recordings in the pop music genre have very limited dynamic range. Yet wide
dynamic contrasts are still essential to help convey musical emotion, especially in acoustic
music. It begs the question: if the level of a vocal track is adjusted so that the loud ( fortis-
simo or ff ) passages are the same loudness as the quiet ( pianissimo or pp) passages, how is a
listener going to hear any dynamic contrast? Before we address this question we should be
aware that level control partly depends on genre. Classical music recordings, for example,
usually do not benefit from highly controlled dynamic range because listeners expect
dynamic range variation in classical music and too much dynamic range control can make
it sound too processed. Although signal processing artifacts such as distortion, limiting, EQ,
and delays are often an expected part of pop, rock, and electronic music (e.g., Brian Eno’s
concept of the recording studio as a musical instrument), we try to avoid any processing
in classical music recording. It is as though classical music recordings should not sound like
recordings, but should mimic the concert hall experience. For most other genres of music,
at least some amount of dynamic range control is desirable. And specifically for pop, rock,
and electronic music recordings, a limited dynamic range is the goal partly to make record-
ings sound loud.
Fortunately, even with extreme dynamic range control we can still perceive dynamic range
changes partly because of timbre changes between quiet and loud levels. We know from
acoustic measurements that there is a significant increase in the number and strength of
higher-frequency harmonics as dynamic level goes from quiet to loud for almost all instru-
ments, including voice. So even with a heavily compressed vocal performance, we still perceive
dynamic range because of changes in timbre in the voice.
Nevertheless, overuse of compression and limiting can leave a performance sounding life-
less. We need to be aware of using too much dynamics processing because it can be fairly
destructive when used excessively. Once we record a track with compression, there is no
way to completely undo the effect. Some types of audio processing such as reciprocal peak/
dip equalization allow us to undo alterations with equal parameter and opposite gain settings,
but compression and limiting do not offer such transparent flexibility.
The effect of a compressor is amplitude modulation where the modulation depends on
an audio signal’s amplitude envelope and modifies it. Compression is simply gain reduction
where the gain reduction varies over time based on a signal’s level, with the amount of
reduction based on the threshold and ratio settings. Compression and expansion are examples
of nonlinear processing because the amount of gain reduction applied is amplitude-dependent
and the gain applied to a signal changes over time.
Dynamics processing such as compression, limiting, expansion, and gating all offer means
to sculpt and shape audio signals in unique and time-varying ways. We say it is time-varying
because the amount of gain reduction varies over time as the original signal level changes
over time. Dynamic range control can help in the mixing process by not only smoothing
out audio signal levels but by acting like a glue that helps add cohesion to various musical
parts in a mix.
Dynamic Range Control 79
Figure 4.2 A square wave has equal peak and RMS levels, so the crest factor is 0.
Dynamic Range Control 81
Figure 4.3 A pulse wave is similar to a square wave except that we are shortening the amount of time
the signal is at its peak level. The length of the pulse determines the RMS level, where a shorter
pulse will give a lower RMS level and therefore a larger crest factor. The RMS level shown in the
figure is approximate.
gain is applied to restore the peaks to their original level, the RMS level is increased as well,
making the overall signal louder.
By reducing the crest factor through compression and limiting, we can make an audio
signal sound louder even if its peak level is unchanged. We may be tempted to normalize
a recorded audio signal in an attempt to make it sound louder. Normalizing is a process
whereby an audio editing program scans an audio signal, finds the highest signal level for
the entire clip, calculates the difference in dB between the maximum recordable level (0 dBFS)
and the peak level of an audio signal, and then raises the entire audio clip by this difference
so that the peak level will reach 0 dBFS. If the peak levels are two or three decibels below
0 dBFS, we may only get a couple of decibels of gain at best by normalizing an audio signal.
This is one reason why the process of digitally normalizing a sound file will not necessarily
make a recording sound significantly louder. The only way to make a normalized signal
sound significantly louder is through compression and limiting to raise the RMS level and
reduce the crest factor.
As a side note, normalizing a mix is not necessarily a good idea, because even if the
original sample peaks are only as high as 0 dBFS, the peaks between samples (inter-sample
peaks) may actually go above 0 dBFS, in the case of oversampling on playback, and cause
clipping. Many mastering engineers recommend staying at least a few decibels below 0 dBFS.
82 Dynamic Range Control
For recordings that will be submitted for sale to the iTunes Store, Apple says that “digital
masters should have a small amount of headroom (roughly 1 dB) in order to avoid such
clipping.”1
In addition to learning how to identify the artifacts produced by dynamic range compres-
sion, it is also important to learn how to identify static changes in gain. If the overall level
of a recording is increased, it is important to be able to recognize the amount of gain change
applied in decibels.
are typically controllable on a compressor. These include threshold, ratio, attack time, release
time, and knee.
It may be worth making a clarification here. According to conventional sound synthesis
theory, we describe the amplitude envelope of a synthesized sound in terms of four main prop-
erties: attack, decay, sustain, and release, or simply ADSR. (See Figure 4.4a for a visualization of
a generic ADSR amplitude envelope.) The “attack” refers to the note onset, from silence to its
peak amplitude. Acoustic instruments have their own respective attack times, which can vary
somewhat depending on the performer. Some instruments have a fast attack or rise in ampli-
tude (such as piano or percussion) while other instruments produce a slightly slower attack
(such as violin or cello). While the term “attack” with respect to an instrument or synthesized
sound refers to a note onset, or quick rise in amplitude, “attack time” on a compressor refers to
a reduction in amplitude once a signal rises above a set threshold level. Similarly, a note “decay”
or “release” and a compressor “release time” represent opposite level changes as a note fades
out. The attack time of an expander is, in fact, more equivalent to the attack of a musical note
in that it is a rising amplitude change.
In the following sections I will be referring to the “attack” of a note onset as well as the
“attack time” of a compressor, the “decay” of an instrument, the “release” of a note, and
the “release time” of a compressor. One group of terms refers to sound sources (note attack,
decay, release) and the other refers to the result of processes applied to a sound source
(compressor attack time, release time).
A.
B.
Figure 4.4 The top graph (A) shows the four components of an ADSR (attack, decay, sustain, release) ampli-
tude envelope that describe and generate a synthesized sound. The attack starts when we press
a key on a keyboard with the note sustained as long as we press the key. As soon as we let go
of the key, the release portion of the envelope starts. The bottom graph (B) shows an amplitude
envelope for an acoustic sound, such as from a string or drum, which can have a relatively fast
attack but immediately starts to decay after being struck. Actual attack and decay times vary
across instruments and even within the range of a single instrument. For example, a low piano
note will have a much longer decay than a high piano note, assuming the piano key is held to
allow the string to vibrate.
84 Dynamic Range Control
Threshold
We can usually set the threshold level of a compressor, although some models instead have
a fixed threshold with a variable input gain. For fixed thresholds we raise the input to reach
the threshold and therefore have less makeup gain to apply at the end, possibly reducing the
added noise introduced by an analog compressor. A compressor starts to reduce the gain of
an input signal as soon as the amplitude of the signal itself or a side-chain input signal goes
above the threshold. Compressors with a side-chain or key input can accept an alternate
signal input to determine the gain function to be applied to the main audio signal input.
Compression to the input signal is triggered when the side-chain signal rises above the
threshold, regardless of the input signal level.
Attack Time
Although a compressor begins to reduce the gain of the audio signal as soon as its amplitude
rises above the threshold, it usually takes some amount of time to achieve maximum gain
reduction. The actual amount of gain reduction applied depends on the ratio and how far the
signal is above the threshold. In practice, the attack time can help us either define (that is, make
more prominent) or round off the attack of a percussive sound or the beginning of a musical
note. With appropriate adjustment of attack time, we can help a recording sound more “punchy.”
Release Time
The release time is the time that it takes for a compressor to stop applying gain reduction
after an audio signal has gone below the threshold. As soon as the signal level falls below
the threshold, the compressor begins to return it to unity gain and reaches unity gain in
the amount of time specified by the release time.
Knee
The knee describes the transition of level control from below the threshold (no gain reduc-
tion) to above the threshold (gain reduction). A smooth transition from one to the other is
called a soft knee, whereas an abrupt change at the threshold is known as a hard knee.
Ratio
The compression ratio determines the amount of gain reduction applied once the signal
rises above the threshold. It is the ratio of input level to output level in dB above the
threshold. For instance, with a 2:1 (input:output) compression ratio, the portion of the output
signal that is above the threshold will be half the level (in dB) of the input signal that is
above the threshold in dB. Compressors set to ratios of about 10:1 or higher are generally
considered to be limiters. Higher ratios are going to give more gain reduction when a signal
goes above threshold, and therefore the compression will be more apparent.
fixed level detection timing, some compressors allow us to switch between two or three
options. Typically the options differ in how fast the level detection is responding to a signal’s
level. For instance, peak level detection is good for responding to steep transients, and RMS
level detection responds to less transient signals. Some dynamics processors (such as the
George Massenburg Labs 8900 Dynamic Range Controller) have fast and slow RMS detec-
tion settings, where the fast RMS averages over a shorter period of time and thus responds
more to transients.
When a compressor is set to detect levels using slow RMS, it responds to very short
transients. Because RMS detection is averaging over time, a steep transient will not have
much influence on the averaged signal level.
Figure 4.5 This figure shows a step function, an amplitude-modulated sine wave, that we can use to test the
attack and release times of a compressor.
86 Dynamic Range Control
Figure 4.6 The step response of a compressor showing three different attack and release times: long (A),
medium (B), and short (C).
Some compressor models have attack and release curves that look a bit different. Figure 4.7
shows a step function audio signal (A) that has been processed by a compressor and the
resulting step response (B) that the compressor produced, based on the input signal level and
compressor parameter settings. The step response shows the amount of gain reduction applied
over time, which varies with the amplitude of the audio signal input. In this compressor there
appears to be an overshoot in the amount of gain reduction in the attack before it settles into a
constant level of about 0.5. The threshold was set to 6 dB, which corresponds to 0.5 in audio
signal amplitude, so every time the signal goes above 0.5 in level (− 6 dB), the gain function
shows a reduction in level.
0.5
A. 0
–0.5
–1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
4
x 10
output from an analog compressor (attack time = 50 ms, release time = 200 ms, ratio = 30:1)
1
0.5
B. 0
–0.5
–1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
4
x 10
time (in samples; sampling rate = 44.1 kHz)
Figure 4.7 The same modulated 40-Hz sine tone through a commercially available analog compressor with
an attack time of approximately 50 ms and a release time of 200 ms. Note the difference in the
gain curve from Figure 4.6. There appears to be an overshoot in the amount of gain reduction in
the attack before it settles into a constant level. A visual representation of a compressor’s attack
and release times such as this is not something that would be included in the specifications for a
device. The difference that is apparent between Figures 4.6 and 4.7 is typically something that an
engineer would listen for but could not visualize without doing the measurement.
loudspeakers that listeners can come to expect it to be part of all musical sound. Listening
to acoustic music without sound reinforcement can help in our ear training process to refresh
our perspective and remind ourselves what music sounds like without compression.
Because dynamics processing is dependent on an audio signal’s variations in amplitude,
the amount of gain reduction varies with changes in the signal. As we said above, dynamic
range compression results in amplitude modulation synchronized with amplitude fluctuations
of an audio signal. Because the gain reduction is synchronized with the amplitude envelope
of the audio signal itself, the gain reduction or modulation can be difficult to hear because
we do not know if the modulation was part of the original signal or not. Amplitude modu-
lation becomes almost inaudible because it reduces signal amplitude at a rate equivalent but
opposite to the amplitude variations in an audio signal. Compression or limiting can be
made easier to hear when we set the parameters of a device to their maximum or minimum
values—a high ratio, a short attack time, a long release time, and a low threshold.
If we apply amplitude modulation that does not vary synchronously with an audio signal,
we can hear the modulation much more easily. The resulting amplitude envelope does not
correlate with the signal’s envelope, and we can detect the modulation as a separate event.
For instance, with a sine wave modulator as used in a tremolo guitar effect, amplitude
modulation is periodic and not synchronous with any type of music signal from an acoustic
instrument and is therefore highly audible. In the case of a tremolo effect, amplitude modu-
lation with a sine wave can produce desirable effects on an audio signal. With tremolo
processing, the goal is usually to highlight the effect rather than make it transparent.
Through the action of gain reduction, compressors can create audible artifacts—such as
through timbre changes—that are completely intentional and contribute meaningfully to the
sound of a recording. In other situations, control of dynamic range is applied without creating
any artifacts or changing the timbre of sounds. We may want to turn down the loud parts
88 Dynamic Range Control
Figure 4.8 From an audio signal (A) sent to the input of a compressor, a gain function (B) is derived based
on compressor parameters and signal level. The resulting audio signal output (C) from the
compressor is the input signal with the gain function applied to it. The gain function shows the
amount of gain reduction applied over time, which varies with the amplitude of the audio signal
input. For example, a gain of 1 (unity gain) results in no change in level, and a gain of 0.5 reduces
the signal by 6 dB. The threshold was set to −6 dB, which corresponds to 0.5 in audio signal
amplitude, so every time the signal goes above 0.5 in level (−6 dB), the gain function shows a
reduction in level.
in a way that still controls the peaks but that does not distract the listener with artifacts. In
either case, we need to know what the artifacts sound like to decide how much or little
dynamic range control to apply to a recording. On many dynamic range controllers, the
user-adjustable parameters are interrelated to a certain extent and affect how we use and hear
them.
When mixing a multitrack recording, we are concerned with levels, dynamics, and balance
of each track. We want to be attentive to any sound sources that get masked at any point
in a piece. At a more subtle level, even if a sound source is not masked, we strive to find
the best possible musical balance, adjusting as necessary over time and across each note and
phrase of music. Focused listening helps us find the best compromise on the overall levels
of each sound source. It is often a compromise because it is not likely that every note of
every sound source will be heard perfectly clearly, even with extensive dynamic range control.
If we turn up each sound source to be heard above all others, we will run out of headroom
in our mix bus, so it becomes a balancing act where we need to set priorities. For instance,
vocals on a pop, rock, country, or jazz recording are typically the most important element.
Generally we want to make sure that each word of a vocal recording is heard clearly. Vocals
are often particularly dynamic in amplitude, and the addition of some dynamic range com-
pression can help make each word and phrase of a performance more consistent in level.
With recorded sound, we can guide a listener’s perspective and perception of a musical
performance through the use of level control on individual sound sources. We can bring
instruments and voices dynamically to the forefront and send them farther back, as the artistic
vision of a performance dictates. Sound source level automation can create a changing per-
spective that is obvious to the listener. Or we might create dynamic changes that are trans-
parent in order to maintain a perspective for the listener. Depending on the condition of the
raw tracks in a multitrack recording, we may need to make drastic changes behind the scenes
in order to create coherency and a focused musical vision. Listeners may not be consciously
aware that levels are being manipulated, and, in fact, engineers often try to make the changing
of levels as transparent and musical as possible. Listeners should only be able to hear that each
moment of a music recording is clear and musically satisfying, not that continuous level
changes are being applied to a mix. Again, we often strive to make the effect of technology
transparent to an artistic vision of the music we are recording. The old joke about recording
and live sound engineers is that we know we are doing a good job when no one notices
our work. Other engineers will notice our work, but listeners and musicians should be able
to focus on the art and not be distracted by engineering artifacts.
pulse where the signal clearly rises above the threshold and then drops below it, such as
those produced by drums, other percussion instruments, and sometimes bass. If any lower-
level sounds or background noise is present with the main sound being compressed, we will
hear a modulated background sound. Sounds that are more constant in level such as distorted
electric guitar will not exhibit such an audible pumping effect.
we lengthen the attack time to just a few milliseconds, we begin to hear a clicking sound
emerge at the onset of a transient. The click is produced by a few milliseconds of the
original audio passing through as gain reduction occurs, and the timbre of the click is directly
dependent on the length of the attack time. The abrupt gain reduction reshapes the ampli-
tude envelope of a drum hit. By increasing the compressor’s attack time further, the onset
sound gains prominence relative to the decay portion of the sound, because the compressor’s
attack time is lagging behind the drum attack time and therefore the gain reduction happens
after the drum’s attack and during its decay. By bringing down the decay relative to the
drum’s attack, we create a larger difference between the two components of the sound. So
the attack is more prominent relative to the decay.
If we increase a compressor’s attack time when compressing low-frequency drums such
as a bass/kick drum or even an entire drum set, we will typically hear an increase in low-
frequency energy. Because low frequencies have longer periods, a longer attack time will
allow more cycles of a low-frequency sound to occur before attack time gain reduction, and
therefore low-frequency content will be more audible on each rhythmic bass pulse. By
increasing the attack time from a very short value to a longer time, we increase the low-
frequency energy coming from the bass drum. As we increase a compressor’s attack time
from near zero to several tens or hundreds of milliseconds, the spectral effect is similar to
adding a low-shelf filter to the mix and increasing the low-frequency energy.
The release time affects mostly the decay of the sound. The decay portion of the sound
is that which becomes quieter after the loud onset. If we set the release time to be long,
the compressor gain reduction does not quickly return to unity gain after the signal level
has fallen below the threshold (which would typically happen during the decay), and there-
fore the natural decay of the drum sound becomes significantly reduced.
• Use low ratios. The lower the ratio, the less gain reduction that will be applied. A ratio of 2:1
is a good place to start.
• Use more than one compressor in series. By chaining two or three compressors in series on a
vocal, each set to a low ratio, each compressor can provide some gain reduction and the
effect is more transparent than using a single compressor to do all of the gain reduction.
To help identify when compression is applied too aggressively, listen for changes in timbre while
watching the gain reduction meter on our compressor. If there is any change in timbre while gain
reduction happens, the solution may be to lower the ratio or raise the threshold or both. Some-
times a track may sound slightly darker during extreme gain reduction, and it can be easier to
identify a compressor’s side effects by watching the gain reduction meter of the compressor.
A slight popping sound at the start of a singer’s word or phrase may indicate that the
attack time is too slow. Generally a very long attack time is not effective on a vocal since
it has the effect of accentuating the attack of a vocal and can be distracting to listeners.
92 Dynamic Range Control
Compression of a vocal usually brings out lower-level detail in a vocal performance such
as breaths and “s” sounds. A de-esser, which can reduce the “s” sound, is simply a compres-
sor that has a high-pass filtered (around 5 kHz) version of the vocal as its side-chain or key
input. De-essers tend to work most effectively with very fast attack and release times.
Threshold
Expanders modify the dynamic range of an audio signal by attenuating it when its level
falls below some predefined threshold, as opposed to compressors, which act on signal levels
above a threshold. A gate is simply an extreme version of an expander and usually mutes a
signal when it drops below a threshold.
Attack Time
The attack time on an expander is the amount of time it takes for an audio signal to return
to its original level once it has gone above the threshold. Like a compressor, the attack time
is the amount of time it takes to make a gain change after a signal goes above the threshold.
In the case of a compressor, a signal is attenuated above threshold. With an expander, a
signal returns to unity gain above threshold.
Release Time
The release time on an expander is the time it takes to complete its gain reduction once
the input signal has dropped below the threshold. Release time, for both expanders and
compressors, is not determined by a particular direction of level control (that is, boost or
cut), it is defined with respect to a signal level relative to the threshold. During release time
on an expander, the signal level is reduced; during the release time on a compressor, the
signal level is increased. In both cases, the gain change happens because the signal level goes
above the threshold.
Figure 4.10 From an audio signal (A) sent to the input of an expander, a gain function (B) is derived based on
expander parameters and signal level.The resulting audio signal output (C) from the expander is the
input signal with the gain function applied to it.The gain function shows the amount of gain reduc-
tion applied over time, which varies with the amplitude of the audio signal input. For example, a gain
of 1 (unity gain) results in no change in level, and a gain of 0.5 reduces the signal by 6 dB. For these
measurements, the threshold was set to −6 dB, which corresponds to 0.5 in audio signal amplitude,
so every time the signal drops below 0.5 in level (−6 dB), the gain function shows a reduction in level.
94 Dynamic Range Control
Practice Types
There are three practice types in the dynamics software practice module: Matching, Match-
ing Memory, and Absolute Identification:
• Matching. Working in Matching mode, the goal is to duplicate the dynamics processing
that has been applied by the software. In this mode, you are free to switch back and forth
Dynamic Range Control 95
Figure 4.11 A screenshot of the software user interface for the Technical Ear Trainer practice module for
dynamic range compression.
between the “Question” and “Your Response” to determine if the dynamics processing
chosen matches the unknown processing applied by the computer.
• Matching Memory. Similar to Matching, this mode allows free switching between “Ques-
tion,” “Your Response,” and “Bypass” until one of the question parameters is changed.
At that point, the “Question” is no longer selectable and you should have memorized its
sound well enough to determine if the response is correct.
• Absolute Identification. This practice mode is the most difficult and requires identification of
the applied dynamics processing without having the opportunity to listen to what is chosen
as the correct response. You can audition only “Bypass” (no processing) and “Question” (the
computer’s randomly chosen processing parameters); you cannot audition “Your Response.”
Sound Source
Any sound recording in the format of AIFF or WAV at a 44,100- or 48,000-Hz sampling
rate can be used for practice. There is also an option to listen to the sound source in mono
or stereo. If a sound file loaded in contains only one track of audio (as opposed to two),
the audio signal will be sent out of the left output only. By pressing the mono button, the
audio will be fed to both left and right output channels.
96 Dynamic Range Control
Summary
This chapter discusses the functionality of compressors and expanders and their sonic effects
on an audio signal. Dynamic range controllers can be used to smooth out fluctuating levels
of a track, or to create interesting timbral modifications that are not possible with other
types of signal processing devices. The compression and expansion software practice modules
are described, and I encourage readers to use them to practice hearing the sonic effects of
various parameter settings.
Note
1. “Mastered for iTunes: Music as the Artist and Sound Engineer Intended” http://images.apple.com/itunes/
mastered-for-itunes/docs/mastered_for_itunes.pdf
Chapter 5
Throughout the recording, live sound, mixing, and post-production processes, we encounter
technical issues that can introduce noise or degrade our audio signals inadvertently. If we
do not resolve technical issues that create noise and distortion, or if we cannot remove noise
and distortion from our audio, listeners’ attentions can get pulled toward these undesired
artifacts and away from the intended artistic experience of the audio. You may have heard
the saying that the only time average listeners notice sound quality is when there is a prob-
lem with the audio. In other words, if average listeners do not think about the audio but
simply enjoy the artistic experience of a recording, concert, game, or film, then the audio
engineer has done a great job. The audio engineer’s job is to help transmit an artist’s inten-
tions to an audience. It becomes difficult for listeners to completely enjoy an artist when
engineering choices add unwanted sonic artifacts that cause listeners’ attentions to be dis-
tracted from an artistic experience. When recording technology contributes negatively to a
recording, listeners’ attentions become focused on artifacts created by the technology and
drift away from the musical performance. Likely almost everyone, sound engineer or not, is
familiar with the screech of feedback or howlback when a microphone-amplifier-speaker
sound reinforcement system feeds back on itself. Although sound engineers work hard to
avoid feedback, it can be loud and offensive to listeners and artists, and unfortunately it
reminds listeners that there is audio technology between them and the artist they are hear-
ing. Feedback is so common in live sound reinforcement that film and TV sound designers
add a short bit of feedback sound at the beginning of a scene in which a character is speak-
ing into a voice reinforcement system. Once we hear that little feedback sound cue, we
know the character’s mic is amplified through a public address (PA) system. Feedback is
probably the most extreme negative artifact produced by audio systems, and when it’s loud
it can be painful to our ears. Many artifacts are much more subtle than howling feedback,
and even though average listeners may not consciously identify them as problems, the artifacts
detract from listeners’ experiences. As sound engineers we want to be aware of as many of
the sonic artifacts as possible that can detract from a sound recording, and as we gain expe-
rience in critical listening, we increase our sensitivity to various types of noise and
distortion.
Distortion and noise are the two broad categories of sonic artifacts that include variations
and subcategories. Most of the time we try to avoid them, but sometimes we use them for
creative effect. They can be present in a range of levels or intensities, so it is not always easy
to detect lower levels of unwanted distortion or noise. In this chapter we focus on extrane-
ous noises that sometimes find their way into a recording as well as forms of distortion,
both intentional and unintentional.
98 Distortion and Noise
5.1 Noise
Some composers and performers intentionally use noise for artistic effect. In fact there are
whole genres of music that emphasize noise as an artistic effect, such as noise rock, industrial
music, Japanese noise music, musique concrète, sampling, and glitch. Experimental and avant-
garde electronic and electroacoustic music composers and performers often use noise to
create musical effects, and they delight in blurring the line between music and noise. One
of the earliest examples is by French composer Pierre Schaeffer called “Étude aux chemins
de fer” [Railway Study], a musique concrète piece that he composed in 1948 from his record-
ings of train sounds.
From a conventional recording point of view, we treat noise, in its various forms, as an
unwanted signal that enters into our desired signals. As we discussed above, noise distracts
listeners from the art we are trying to present. We need to consider whether extraneous
noises, which may enter into our recording, serve an artistic goal or simply distract listeners.
Sources of noise include the following:
First, let’s discuss unwanted noise that detracts from the quality of a sound recording. Ground
hum and buzz, loud exterior sounds, radio frequency interference, and air-handling (HVAC)
noise are some of the many sources and types of noise that we seek to avoid when making
recordings in the studio. Frequently noise exists at a low, yet still audible, level and may not
register significantly on a meter, especially in the presence of musical audio signals. Therefore
we need to use our ears to constantly track sound quality. Noises of all kinds can start and
stop at seemingly random times, so we must remain attentive at all times.
Clicks
Clicks are various types of short-duration, transient sounds that contain significant high-
frequency energy that originate from electronic equipment. Malfunctioning analog equip-
ment, loose analog cable connections, connecting and disconnecting analog cables, and digital
audio synchronization errors are all causes of unwanted clicks.
Clicks resulting from analog equipment malfunction can often be random and sporadic,
making it difficult to identify their exact source. In this case, meters can be helpful to indi-
cate which audio channel contains a click, especially if clicks are present in the absence of
program material. A peak hold meter can be invaluable in chasing down a problematic piece
of equipment, because the meter holds the peak level if we happen to miss seeing it when
the click occurs.
Distortion and Noise 99
Loose or dirty analog connections may randomly break a connection, causing dropouts,
clicks, and sporadic noise bursts. When we make analog connections in a patch bay or
directly on a piece of equipment, we create signal discontinuities and therefore also clicks
and pops. Breaking a phantom powered microphone signal can make a particularly loud
pop or click that can damage not only the microphone but also any loudspeakers that may
try to reproduce the loud click.
With digital connections between equipment, it is important to ensure that sampling rates
are identical across all interconnected equipment and that clock sources are consistent.
Without properly selected clock sources in digital audio, clicks are inevitable and will likely
occur at some regular interval, usually spaced by several seconds. Clicks that originate from
improper clock sources are often fairly subtle, and they require vigilance to identify them
aurally. Depending on the digital interconnections in a studio, the clock source for each
device needs to be either internal, digital input, or word clock.
Pops
Pops are transient thump-like sounds that typically have more significant low-frequency
energy than clicks. Usually pops occur as a result of vocal plosives that are produced in
front of a microphone. Plosives are consonant sounds, such as those that result from pro-
nouncing the letters p, b, and d, in which a singer or speaker produces a burst of air when
producing these consonant sounds. If you hold your hand up in front of your mouth and
make a “p” sound, you can feel the little burst of air coming from your mouth. When this
burst of air reaches a microphone capsule, the microphone produces a low-frequency, thump-
like sound. Usually we try to counter pops during vocal recording by placing a pop filter
in front of a vocal microphone. Pop filters are usually made of thin, acoustically transparent
fabric stretched across a circular frame.
We do not hear pops from a singer when we listen acoustically in the same space as the
singer. The pop artifact is purely a result of a microphone’s response to a burst of air pro-
duced by a vocalist. Pops distract listeners from a vocal performance because they are not
expecting to hear a low-frequency thump from a singer. Even if the song has a kick drum
in the mix, often a vocalist’s pop will not line up with a kick drum hit. We can filter out
a pop with a high-pass filter, making sure the cutoff frequency is low enough not to affect
low harmonics in the voice, or inserted only during the brief moment while a pop is
sounding.
Listen for low-frequency thumps when recording, mixing, or providing live sound rein-
forcement for sung or spoken voice. In live sound situations, the best way to remove pops
is to turn on a high-pass filter on the respective mixer channel or turn on the high-pass
filter on the microphone itself if it has one.
When a ground problem is present, there is either a hum or a buzz generated with a
fundamental frequency equal to the power source alternating current frequency, 50 or 60
Hz, with additional harmonics above the fundamental. A hum is identified as a sound con-
taining primarily just lower harmonics and buzz as that which contains mainly higher
harmonics.
We want to make sure we identify any hum or buzz before recording, when the problem
is easier to solve. Trying to remove such noises in postproduction is possible but will take
extra time. Because a hum or buzz often includes numerous harmonics of 50 or 60 Hz, a
number of narrow notch filters are needed, each tuned to a harmonic, to effectively remove
all of the offending sound. Sometimes this is the only option to remove the offending noise,
but these notch filters also affect our program material, of course.
Hum can also be caused by electromagnetic interference (EMI). If we place audio cables
(especially those carrying microphone level signals) alongside power cables, the power
cables can induce hum in the adjacent audio lines. An audio cable’s proximity to power cables
matters, so the farther away the two can be, the better. If they do need to cross, try to
make the crossing a 90-degree angle to reduce the strength of the electromagnetic field
that crosses the audio cable. Although we are not going to discuss the exact technical and
wiring problems that can cause hum and buzz and how such problems might be solved,
there are many excellent references that cover the topic in great detail, such as Giddings’s
book titled Audio Systems Design and Installation, a classic reference that has recently been
republished.
One of the best ways we can check for low-level ground hum is to bring up monitor
levels with microphones on and powered but while musicians are not playing. If we eventu-
ally apply dynamic range compression with makeup gain to an audio signal, what was once
inaudible low-level noise could be much more audible. If we can apprehend any ground
hum before getting to that stage, our recording will be much cleaner.