0% found this document useful (0 votes)
124 views50 pages

2.8 Recommended Sequence of Practice and Test: Octave Frequencies

This document discusses spatial attributes in audio recording and mixing. It describes how reverberation can be used to create a sense of distance, depth and space. For classical music, microphones are positioned to blend direct and reverberant sound from the performance space. In other genres, artificial reverb and delays are often used to simulate space or create interesting spatial effects. Panning is used to control the left-right position of sound sources, while reverb and level affect the perceived distance. Correlation between channels impacts stereo image width, and vectorscopes can visualize these spatial attributes.

Uploaded by

James Pliano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views50 pages

2.8 Recommended Sequence of Practice and Test: Octave Frequencies

This document discusses spatial attributes in audio recording and mixing. It describes how reverberation can be used to create a sense of distance, depth and space. For classical music, microphones are positioned to blend direct and reverberant sound from the performance space. In other genres, artificial reverb and delays are often used to simulate space or create interesting spatial effects. Panning is used to control the left-right position of sound sources, while reverb and level affect the perceived distance. Correlation between channels impacts stereo image width, and vectorscopes can visualize these spatial attributes.

Uploaded by

James Pliano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Tonal Balance and Equalization 51

2.8 Recommended Sequence of Practice and Test


Although the EQ module allows you to select any exercise parameter combination for
practice and testing, here is a possible progression from easy to more difficult:

Octave Frequencies
1. Practice Type: Matching
Monitor Selection: Pink Noise
Frequency Resolution: Octave
Number of Bands: 1
Gain Combination: +12 dB
Q=2
Frequency Range:
a. 500 to 2000 Hz
b. 63 to 250 Hz
c. 4000 to 16,000 Hz
d. 250 to 4000 Hz
e. 125 to 8000 Hz
f. 63 to 16,000 Hz
2. Same as above except:
Monitor Selection: a variety of sound recordings of your choice
3. Same as above except:
Practice Type: Absolute Identification
Monitor Selection: Pink Noise and a variety of sound recordings of your choice
Frequency Range: 63 to 16,000 Hz
4. Same as above except:
Number of Bands: 2
5. Same as above except:
Number of Bands: 3
6. Same as above except:
Gain Combination: +12/− 12 dB
7. Same as above except:
Gain Combination: +9 dB
8. Same as above except:
Gain Combination: +9/− 9 dB
9. Same as above except:
Gain Combination: +6 dB
10. Same as above except:
Gain Combination: +6/− 6 dB
11. Same as above except:
Gain Combination: +3 dB
12. Same as above except:
Gain Combination: +3/− 3 dB
52 Tonal Balance and Equalization

Third-Octave Frequencies
Progress through the sequence above but instead of working with octave frequencies, select
“1/3rd Octave” from the Frequency Resolution drop-down menu.

Time Limit
To increase difficulty further, you may wish to use the timer to focus on speed.

Summary
Equalization is perhaps our most important tool as audio engineers. It is possible to learn
how to identify boosts and cuts by ear through practice. The available software practice
module can serve as an effective tool for progress in technical ear training and critical listen-
ing when used for regular and consistent practice.
Chapter 3

Spatial Attributes and


Reverberation

Reverberation can create distance, depth, and spaciousness in recordings, whether we capture
it with microphones during the recording process or add it later during mixing. Reverbera-
tion use has evolved into distinct conventions for various music genres and eras of recording.
Specific reverberation techniques do not always translate across musical genres, although the
general principles of reverberation are the same. Film and game sound also make extensive
use of reverberation to reinforce visual scenes or give the viewer information about off-
camera actions or scenes.
In classical music recording, we position microphones to blend direct sound (from instru-
ments and voices) and indirect sound (reflected sound and reverberation), to represent the
natural sound of musicians performing in a reverberant space. As such, we listen closely to
the balance of the dry and reverberant sound and make adjustments to microphone positions
if the dry/reverberant balance is not to our liking. By moving microphones farther away
from the instruments, we increase reverberation and decrease direct sound.
Pop, rock, electronic, and other styles of music that use predominantly electric instruments
and computer-generated sounds are not usually recorded in reverberant acoustic spaces,
although there are some spectacular exceptions (see Chapter 7, Analysis of Sound). Rather,
we often create a sense of space with artificial reverberation and delays after the music has
been recorded in a relatively dry acoustic space with close microphones. We can use artificial
reverberation and delay to mimic real acoustic spaces or to create completely unnatural
sounding spaces. We do not always want every instrument or voice to sound like they are
at the front edge of the stage. We can think of recorded sound images like photography or
a painting. It is often more interesting to have elements in the mid-ground and background,
while we focus a few elements in the foreground. Delay and reverberation are the key tools
that help us create a sense of depth and distance in a recording. More reverberation on a
source makes it sound farther away while dryer elements remain to the front of our sound
image. Not only can we make sounds seem farther away and create the impression of an
acoustic space, but we can also influence the character and mood of a recording with careful
use of reverberation. In addition to depth and distance control, we can control the angular
location (left–right position or azimuth) of sound sources through standard amplitude
panning.
With stereo speakers, we have two dimensions within which to control sound source
location: distance (near to far) and angular location (azimuth). With elevated loudspeaker
arrays such as those found in IMAX movies, theme parks, and audio research environments,
we obviously have a third dimension of height. For the purposes of this book, I will focus
on loudspeakers in the horizontal plane only (no elevated speakers), whether stereo or multi-
channel; but again, the general principles apply to any audio reproduction environment
whether it has two dimensions or three.
54 Spatial Attributes and Reverberation

Spatial attributes apply to sound sources and spaces:

• sound source locations within a given loudspeaker arrangement


| azimuth as determined by panning
| distance as determined by level and reverberation/echo
• simulated/real acoustic space characteristics
| reverberation decay time
| early reflection patterns
| prominent and/or long delayed echoes
| perceived size of the space

Spatial attributes also include correlation and spatial continuity of a sound image. Simply
put, correlation refers to the amount of similarity between two channels. We can measure
the correlation of the left and right channels of a stereo image with a correlation or phase
meter. This type of meter typically ranges from − 1 to +1. A correlation of +1 means that
the left and right channels are identical, although they could be different amplitudes. A
correlation of − 1 means that the left and right channels are identical but that one channel
is opposite polarity. In practice, most stereo recordings have a correlation that ranges from
0 to +1 and with occasional jumps toward the − 1 end of the meter.
Left and right channel correlation affects the perceived width in a recording. Two perfectly
correlated channels will result in a mono image. A correlation of − 1 creates an artificially
wide-sounding stereo image. We can hear the effect of negatively correlated or “out of
phase” channels easier over loudspeakers than headphones. Where we localize a negatively
correlated sound image depends highly on our listening position. Sitting in the ideal listen-
ing position (see Figure 1.2), we will tend to localize the sound image to the sides of our
head or outside of the loudspeaker locations. If we move ever so slightly to the left or right
of the ideal listening location, the sound image will shift quickly to one side or the other.
Negatively correlated sound images seem unstable and difficult to localize. It is important
to listen for artificially wide stereo mixes or elements within a mix, which would indicate
that there may be a polarity problem somewhere in the signal path that needs to be cor-
rected. We discuss more on listening to opposite polarity channels in Section 3.7.
Decorrelated channels (when the correlation or phase meter reads 0) tend to create a
stereo image with the energy located primarily at the left and right speakers. If you listen
to decorrelated pink noise over stereo speakers you may notice little audible energy in the
center of the stereo image, but the image is not wider than the speakers as a negatively
correlated image would be. What energy you do hear in the center of the image is mainly
low frequency. High frequencies are clearly located at the speakers.
Another meter that is useful for monitoring the stereo image width and location of energy
within the stereo image is a goniometer or vectorscope, which gives a Lissajous pattern. This type
of meter is often presented in multimeter plug-ins in combination with a phase or correlation
meter. To get an idea how a vectorscope represents a stereo image, I find it useful to start with a
sine tone test signal to show some conditions. Starting with a 1 kHz sine tone panned center in
Figure 3.1, we see that the vectorscope displays a vertical line in the center of the meter. The meter
represents where we would localize this center-panned sound—directly in the center of the stereo
image. If we pan the sine tone to one side, we see the effect in Figure 3.2 with a line tilting to the
right at a 45-degree angle from the center. Listening to this condition, we would localize it in the
right speaker or right headphone. If we pan the sine tone back to center and invert the polarity
(flip the phase) of one of the output channels, we get a horizontal line at the bottom of the meter
as in Figure 3.3. The horizontal line represents negatively correlated left and right channels.
Figure 3.1 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone
panned to the center. Note that the energy appears as a straight vertical line in the center of the
meter and the phase meter is reading +1. (Screenshot of iZotope Insight plug-in.)

Figure 3.2 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone
panned to the right. Note that the energy appears as a straight line at a 45-degree angle from
the center of the meter and the phase meter is reading 0. (Screenshot of iZotope Insight plug-in.)
Figure 3.3 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone with
phase reversed (polarity inverted) on one channel of the stereo bus. Note that the energy appears
as a straight horizontal line at the bottom of the meter and the phase meter is reading −1. (Screen-
shot of iZotope Insight plug-in.)

Figure 3.4 A vectorscope meter showing the stereo image width and correlation of a stereo mix. Note that
the energy is primarily weighted toward the center of the meter and the correlation is at almost
+1. Contrast this with Figure 3.1. (Screenshot of iZotope Insight plug-in.)
Spatial Attributes and Reverberation 57

Figure 3.5 A vectorscope meter showing the stereo image width and correlation of a stereo mix. Note that
the energy is widely spread across the meter with seemingly random squiggles. The correlation
is about two-thirds of the distance from 0 to +1, but the gray region of the meter shows that
its recent history fluctuated widely from a high point close to 1 and down to slightly below 0.
(Screenshot of iZotope Insight plug-in.)

Sine tones are useful for illustrating some basic vectorscope conditions, but in practice we
are more likely to meter more complex signals such as music, speech, and sound effects.
Figure 3.4 shows a vectorscope screenshot of a moment in time of a pop stereo mix. You
can see that, although it is not a single vertical line like the sine tone above, the energy is
primarily located in the center of the vectorscope image and the correlation meter sits close
to +1. Because the lead vocal, bass, kick drum, snare drum, and guitar are panned center,
with subtle reverb and auxiliary percussion panned to the sides in this recording, the meter
reflects the stereo image that we hear: a center-heavy mix, typical of what we find in pop
music recordings.
With different mixes we get different representations in the meter. Figure 3.5 shows
another stereo mix of a more experimental type of music. Note the much wider rep-
resentation on the meter, which is reflected in the stereo image width when we listen
to it.

LISTENING EXERCISE: NEGATIVELY


CORRELATED CHANNELS
Open up a DAW and import any stereo recording. Pan the left and right channels to center.
Now the stereo recording should sound like it is mono. On the stereo master bus—not the
input tracks—invert the polarity (or phase) of either the left or right channel but not both.
It does not matter which one. On some DAWs such as Pro Tools and Logic Pro, you need
to add a trim or gain plug-in to the stereo bus, and inside the plug-in there is a polarity
58 Spatial Attributes and Reverberation

or phase invert switch, usually labeled with “ø” or “Φ.” Listen to the effect that an “out
of phase” channel creates. Once you hear the out of phase sound, you will likely remember
it and recognize it immediately when you hear it again.

LISTENING EXERCISE: DECORRELATED CHANNELS


Open up a DAW and create two mono tracks. Add a pink noise generator plug-in (it might
be under “Utilities”) to each channel and pan one channel hard left and the other channel
hard right. Depending on the DAW this may or may not produce decorrelated noise in the
stereo bus. To make sure they are decorrelated, add a straight delay (that is, a single straight
delay with no feedback/repeats, filtering, modulation, or crossfeed) and turn the delay time
to the maximum amount. In other words, we just want to offset one channel relative to the
other. The stereo bus should now have two decorrelated channels of pink noise. Inverting
the polarity in one channel of the stereo bus should produce no audible effect or measur-
able change in the correlation. You can also create the same effect with musical signals.
Record a musical part, let’s say a guitar part, and then record the same musical part (in
unison) a second time on a different track. Play the recorded tracks back and pan them
hard left and hard right. These tracks, even though they are the same musical notes played
in time together, are decorrelated.
A vectorscope shows a relatively random distribution of decorrelated pink noise evenly
across the stereo image. In this case at least, the visual image does not correspond to what
we hear. The stereo image sounds more like the energy is anchored mainly in the speaker
locations rather than evenly spread across the image. Although a goniometer or vectorscope
gives us some clues about what we are hearing in a stereo image, it does not always cor-
respond directly with what we hear. This is another reason why we cannot rely solely on
meters to make final decisions about stereo image, tonal balance, and sound quality, we must
use our ears.

3.1 Analysis of Perceived Spatial Attributes


The human auditory system decodes the spatial attributes of any sound source, whether the
source is an acoustic musical instrument or a phantom image of a musical instrument record-
ing reproduced over loudspeakers. Spatial attributes help us determine the azimuth, elevation,
and distance of sound sources, as well as information about the environment or enclosure in
which they are produced. Because the human auditory system operates with two ears, it relies
on interaural time differences, interaural intensity differences, and filtering by the pinnae or
outer ear to determine the location of a sound source (Moore, 1997). The process of local-
ization of sound images reproduced over loudspeakers is somewhat different from localization
of single acoustic sources, and in this chapter we will concentrate on the spatial attributes
that are relevant to audio production and therefore sound reproduction over loudspeakers.
As audio engineers we need to be attuned to any spatial processing already present in or
added to a recording, however subtle. Panning, delay, and reverb affect the balance and blend
of elements in a mix, which in turn influence the way in which listeners perceive a musical
recording and react to it emotionally. For example, long reverb times can create drama and
excitement, as though the music is anthemic and emanating from a big space. Alternatively,
with the use of short reverberation times, we can create a warm and intimate or conversely
a stark and cold sound image.
Reverb is important in classical music recordings but it also plays an important role in
recordings of other music genres. Phil Spector’s Wall of Sound recordings from the 1960s
Spatial Attributes and Reverberation 59

made use of reverberant spaces to create emotional impact in his productions. The quintes-
sential song in this production style is The Ronettes’s “Be My Baby” from 1963. A couple
of decades later with the help of producers Brian Eno and Daniel Lanois, U2’s albums The
Joshua Tree and The Unforgettable Fire employed extensive use of delays and reverb to create
the sense of big open spaces. Eno and Lanois had become well known for their ambient
music recordings only a few years prior, and they brought some of these spatial processing
methods and “treatments” to subsequent pop music recordings. German record producer
Manfred Eicher and his label ECM use more prominent reverb on their jazz recordings than
American jazz labels such as Impulse! Records and Blue Note Records. More recently, indie
pop bands such as Fleet Foxes and Sigur Rós have produced albums with clearly noticeable
washes of reverb.
Although some perceive prominent reverb in music recordings as gimmicky, I often find
that reverb makes a recording interesting and engaging. On the other hand, when reverb
seems to be an add-on that does not blend well with the music, makes the music muddy,
or does not have any apparent musical role, it can detract from our listening experience.
Production decisions often come down to choice and personal taste, but the music should
guide us. Experiment, try something new, take a risk, and do something out of your comfort
zone. Maybe nothing useful will come of it. Or maybe you will discover something really
interesting by following your ears, trying unconventional things, improvising, and being open
to new possibilities. Regardless of your stance on reverb use, listen to the way reverb, echo,
and delay are used in commercial recordings and try emulating them.
The spatial layout of sources in a sound image can influence clarity and cohesion partly
due to spatial masking. We know that a loud sound will partially or completely mask a
quiet sound. It is difficult to have a conversation in a noisy environment because the noise
masks our voices. It turns out that if the masker (noise) and the maskee (speaking voices,
for example) arrive from two different locations then less masking occurs. The same effect
can occur in stereo and multichannel audio images. Sound sources panned to the same
location may partially or completely mask other sound sources panned to that location. Pan
the sounds to opposite sides and suddenly we can hear a quieter sound that was previously
masked. Sometimes reverberation in a recording can seem inaudible or at least difficult to
identify because it blends with and is partially masked by direct sound. This is especially
true for recordings with sustained elements. Transient elements such as percussion and drums
allow us to hear reverb that may be present because, by definition, transient sounds decay
quickly, usually much more quickly than the reverb.
We must factor in subjective impressions of spatial processing as we translate between
controllable parameters on digital reverb such as decay time, predelay time, and early reflec-
tions, and their sonic results. Conceptually, we might link sound source distance control to
reverb simulation, but there is usually not a parameter labeled “distance” in a conventional
digital reverb processor. If we want to make a sound source seem more distant, we need to
control distance indirectly by adjusting reverberation parameters, such as decay time, predelay,
and mix level, in a coordinated way until we have the desired sense of distance. We must
translate between objective parameters of reverberation to create the desired subjective impres-
sion of source placement and simulated acoustic environment.
Our choice of reverberation parameter settings depends on a number of things such as
the transient nature and width of our dry sound sources, as well as the decay and early
reflection characteristics of our reverberation algorithm. Professional engineers often rely on
subjective qualities of reverb to accomplish their goals for each individual mix rather than
simply choosing parameter settings that worked in other situations. In other words, they
adjust parameters until the reverb sounds right for the mix, rather than simply pulling up
60 Spatial Attributes and Reverberation

a preset they used on a previous mix and assuming that it will work. A particular combina-
tion of parameter settings for one source and reverberation usually cannot simply be dupli-
cated for an identical distance and spaciousness effect with a different source or reverberation
algorithm.
We can benefit from analyzing spatial properties from both objective and subjective per-
spectives, because the tools have objective parameters, but our end goal in recording is to
achieve great sounding mixes, not to identify specific parameter settings. As with equalization,
we must find ways to translate between what we hear and the parameters available for
control. As mentioned above, spatial attributes can be broken down into the following cat-
egories and subcategories:

• placement of direct/dry sound sources


• characteristics of acoustic spaces and phantom image sound stages
• characteristics of an overall sonic image produced by loudspeakers

LISTENING EXERCISE: HEARING REVERB IN


YOUR WORK
When you mix a track with a small amount of reverb, try muting and unmuting added
reverberation to make sure you hear its contribution to a mix.

Sound Sources
The spatial attributes of sound sources consist of three main categories:

• angular location
• distance
• spatial extent

Sound Sources: Angular Location


A sound source’s angular location or azimuth is its perceived location in a stereo image,
generally between the left and right loudspeakers. We can spread sources out across the
stereo image to lessen spatial masking and optimize clarity for each sound source. Spatial
masking is more likely to occur when sources not only occupy the same spatial location
but also the same frequency range.
We can pan each microphone signal to a specific location between loudspeakers using
conventional constant-power panning found on most mixers. We can also pan sources by
delaying a signal’s output to one loudspeaker channel relative to the other loudspeaker output,
but delay-based panning is not common in part because its effectiveness depends highly on
a listener’s location relative to the loudspeakers. Also, panning produced with time delays
does not sum to mono very well, since we will likely introduce comb filtering. Furthermore,
delay-based panning tools are not as common as the ubiquitous amplitude-based panner
found on every mixer, software or hardware.
With spaced stereo microphone techniques (e.g., ORTF, NOS, A-B), we automatically
employ delay-based panning without any special processing. Stereo microphone techniques
usually require microphone signals to be panned hard left and right, and because of the spac-
ing it can take a little extra time for sound to travel from one microphone to another for
sources that are not centered. Although the time differences are small—a maximum of 0.5 ms
Spatial Attributes and Reverberation 61

for 17 cm spacing in ORTF—the interchannel time difference works in combination with


the interchannel amplitude difference to create a more natural source placement in the stereo
image. The time difference reinforces our perception of source location that the amplitude
difference provides. The resulting positions of sound sources will depend on the stereo micro-
phone technique used and the respective locations of each source. Spaced stereo microphone
techniques such as ORTF will produce a wider sound image than a coincident technique
such as X-Y, partly because there is no interchannel time difference with X-Y. Experiment
with stereo microphone techniques in your recordings. Legendary recording engineer Bruce
Swedien, perhaps best known for his work with Michael Jackson, reports that he used stereo
microphone techniques almost exclusively in his recordings. Perhaps that is one reason why
his recordings sound so good.

Sound Sources: Distance


Although human perception of absolute distance is often inaccurate, relative distance of
sounds within a stereo image is important to give depth to a recording. Large ensembles
recorded in acoustically live spaces are likely to exhibit a natural sense of depth, analogous
to what we would hear as an audience member in the same space. This effect in classical
music can happen quite naturally with a stereo pair of microphones in front of an ensemble.
Musicians at the front of the stage (closer to the mics) sound closer than those upstage
(farther from the mics).
When we make recordings in acoustically dry spaces such as studios, we often create depth
using delays and artificial reverberation. We can control sound source distance by adjusting
physical parameters such as the following:

• Direct sound level. Quieter sounds are judged as being farther away because there is a sound
intensity loss of 6 dB per doubling of distance from a source (in a free field condition, i.e.,
no reflected sound present). This cue can be ambiguous for the listener because a change
in loudness can be the result of either a change in distance or a change in a source’s acous-
tic power.
• Reverberation level. As a source moves farther away from a listener in a room or hall, the
direct sound level decreases and the reverberant sound remains roughly constant, lowering
the direct-to-reverberant sound ratio.
• Distance of microphones from sound sources. Moving microphones farther away decreases the
direct-to-reverberant ratio and therefore creates a greater sense of distance.
• Room microphone placement and level. If we place microphones on the opposite end of a
room relative to the musicians, we will pick up primarily reverberant sound. We can treat
room microphone signals as reverberation to add to our mix.
• Low-pass filtering close-miked direct sounds. High frequencies are attenuated more than lower
frequencies because of air absorption as we move farther from a sound source. Further-
more, the acoustic properties of reflective surfaces in a room affect the spectrum of re-
flected sound reaching a listener’s ears.

Sound Sources: Spatial Extent


Sometimes we can localize sound sources precisely within a mix, that is, we can point directly
to their virtual locations within a stereo image. Other times sound source location may be
fuzzier or more ambiguous. Spatial extent describes a source’s perceived width. A related
62 Spatial Attributes and Reverberation

concept in concert hall acoustics research is called apparent source width or ASW, which is
related to strength, timing, and direction of side reflections. Acoustician Michael Barron
found that stronger reflections from the side would result in a wider ASW.
As with concert hall acoustics, we can influence the perceived width of sources reproduced
over loudspeakers by adding early reflections, whether recorded with microphones or gener-
ated artificially. If artificial early reflections (in stereo) are added to a single, close microphone
recording of a sound source, the direct sound tends to fuse perceptually with early reflections
(depending on the time of arrival of the reflections) and produce an image that is wider
than just the dry sound on its own.
The perceived width of a sound image produced over loudspeakers will vary with the
microphone technique used, the sound source, and the acoustic environment in which it is
recorded. Spaced microphones produce a wider sound source because the level of correlation
of direct sounds between the two microphone signals is reduced as the microphones are
spread farther apart. As we discussed above, a stereo image correlation of 0 (decorrelated
left and right channels) creates a wide image with energy that seems to originate in the left
and right loudspeakers primarily, with little energy in the center. We can affect correlation
with the spacing of a stereo pair of microphones. In most cases, two microphones placed
close together will produce highly correlated signals, except for certain cases with the Blum-
lein technique that I describe in the next paragraph. Because pairs of coincident microphones
occupy nearly the same physical location, the acoustic energy reaching both will be almost
identical. As we move them apart, correlation will decrease. A small spacing of an inch or
two (a few centimeters) will decorrelate high frequencies, but low frequencies will still be
correlated. With more space between microphones, decorrelation will spread to lower fre-
quencies. Microphone spacing and the lowest frequency of correlation are inversely propor-
tional because as we go lower in frequency, wavelengths increase, thus requiring greater
spacing for low frequencies to be decorrelated. In other words, as we widen a pair of
microphones, our resulting stereo image also widens (as correlation decreases), assuming one
mic is panned hard left and the other is panned hard right.
As I mentioned above, the Blumlein stereo microphone technique, which uses coincident
figure-8 or bidirectional microphones angled 90 degrees apart, creates a slightly more compli-
cated stereo image. Sounds arriving at the fronts and backs of the microphones are in phase, so
we have no decorrelation. Sounds arriving at the sides are picked up by each microphone at the
same time, but the polarity of the microphones is opposite. For example, a sound arriving from
the right side of a Blumlein pair will be picked up by the front, positive lobe of the right-facing
microphone and also by the rear, negative lobe of the left-facing lobe. As a result, sounds arriv-
ing from the side are negatively correlated in the stereo image. See Figure 3.6, which shows the
polar patterns of the figure-8 microphones and a sound source arriving from the side.

Figure 3.6 A Blumlein stereo microphone technique uses two coincident figure-8 microphones angled
90 degrees apart. Sounds arriving from the sides are negatively correlated in the resulting stereo
image.
Spatial Attributes and Reverberation 63

Spatial extent of sound sources can be controlled through physical parameters such as the
following:

• Early reflection patterns originating from a real acoustic space or generated artificially
with reverberation.
• Type of stereo microphone technique used: spaced microphones generally yield a wider
spatial image than coincident microphone techniques, as we discussed above.

Acoustic Spaces and Sound Stages


We can control additional spatial attributes such as the perceived characteristics, qualities,
and size of the acoustic environment in which each sound source is placed in a stereo image.
The environment or sound stage may consist of a real acoustic space captured with room
microphones, or we can create a virtual sound stage with artificial reverberation added dur-
ing mixing. We can use a common reverberation for all sounds, or a variety of different
reverberation sounds to accentuate differences among the elements in a mix. For instance,
it is fairly common to treat vocals or solo instruments with a different reverberation than
the rest of an accompanying ensemble.

The Space: Reverberation Decay Character


Decay time is perhaps the most common parameter in artificial reverberation algorithms.
Although reverberation decay time is often not adjustable in a real acoustic space, some halls
and studios have panels on the walls and ceiling that can rotate to expose different sound-
absorbing or reflecting materials, to allow a variable reverberation decay time.
Reverb decay time is defined as the time in which sound continues to linger after the
direct sound has stopped sounding. Decay time or RT60 is technically defined as the amount
of time it takes a sound to decay by 60 dB after the source stops sounding. Longer rever-
beration times are typically more audible than shorter reverberation times for a given rever-
beration level. Transient sounds such as drums or percussion expose decay time more than
sustained sounds, allowing us to hear the rate of decay more clearly.
Some artificial reverberation algorithms incorporate modulation into the decay to give it
variation and hopefully make it sound less artificial. The idea is that moving air currents
and slight variations in air temperature in a large space affect ever so slightly the way sound
propagates through a room. Modulation of artificial reverb is one way to mimic this effect.
Artificial reverberation can sound unnaturally smooth, and modulation can help create the
illusion that the reverb is real, or at least less artificial.

The Space: Spatial Extent (Width and Depth) of the Sound Stage
A sound stage is the acoustic environment within which we hear a sound source, and it
should be differentiated from a sound source. The environment may be a recording of a
real space, or it may be something that has been created artificially using delay and artificial
reverberation.

The Space: Spaciousness


Spaciousness represents the perception of physical and acoustical characteristics of a recording
space, and in concert hall acoustics, it is related to envelopment. We can use the term spa-
ciousness simply to describe the feeling of space within a recording.
64 Spatial Attributes and Reverberation

Overall Characteristics of Stereo Images


Also grouped under spatial attributes are items describing overall impressions and character-
istics of a stereo image reproduced by loudspeakers. A stereo image is the illusion of sound
source localization from loudspeakers. Although there are only two loudspeakers for stereo,
the human binaural auditory system allows us to hear phantom images at locations between
the loudspeakers. We call them phantom images because they seem to be originating from
locations where there is no speaker. In this section, we consider the overall qualities of a
stereo image rather than those specific to the source and sound stage.

Stereo Image: Coherence and Relative Polarity between Channels


Despite widespread use of stereo and multichannel playback systems among consumers, mono
compatibility continues to remain critically important, mainly because we can listen to music
through computers and mobile devices with single speakers. When we check a mix for
mono compatibility, we listen for changes in timbre that result from destructive interference
between the left and right channels. In the worst-case scenario with opposite polarity stereo
channels, summation to mono will cancel a significant portion of a mix. We need to check
each project that we mix to make sure that the channels are not opposite polarity. When
left and right channels are both identical and opposite polarity, or negatively correlated, they
will cancel completely when summed together. If both channels are identical, or completed
correlated, then the mix is monophonic and not truly stereo. Most stereo mixes include
some combination of mono and stereo components, or correlated and decorrelated compo-
nents. As we discussed above, we can describe the correlation between signal components
in the left and right channels along a scale between − 1 and +1:

• Correlation of +1: Left and right channels are identical, composed completely of signals
that are panned center.
• Correlation of 0: Left and right channels are different. As mentioned above, the channels
could be in musical unison and still be decorrelated if the two parts were played by differ-
ent musicians or by the same musician as an overdub.
• Correlation of − 1: Left and right channels are identical but opposite in polarity, or nega-
tively correlated.

Phase meters provide one objective way of determining the relative polarity of stereo chan-
nels, but if no such meters are available, we must rely on our ears.
On occasion we may find an individual instrument that we recorded in stereo has opposite
polarity channels panned hard right and left. If such a signal is present, a phase meter on
the stereo bus may not register it strongly enough to give an unambiguous visual indication,
or we may not be using a phase meter. Sometimes stereo line outputs from electric instru-
ments are opposite polarity, or perhaps a polarity flip cable was used during recording by
mistake. Often stereo line outputs from electronic instruments are not truly stereo but mono.
When one output is opposite polarity, the two channels will cancel when summed to mono.

Stereo Image: Spatial Continuity of a Sound Image from


One Loudspeaker to Another
As an overall attribute of a mix, we should consider the continuity and balance of a sound
image from one loudspeaker to another. An ideal stereo image will be balanced between
left and right and will not have too much or too little energy located in the center and
Spatial Attributes and Reverberation 65

either the left or right channels. Often pop and rock music mixes have a strong center
component (as seen in the vectorscope image Figure 3.4) because of the number and strength
of instruments that are typically panned center, such as kick drum, snare drum, bass, and
vocals. Classical and acoustic music recordings may not have a similarly strong central image,
and it is possible to be deficient in the center image energy—sometimes referred to as hav-
ing a “hole in the middle” of the stereo image. We should strive to have an even and
continuous spread of sound energy from left to right.

3.2 Basic Building Blocks of Digital Reverberation


Next we will explore two fundamental processes found in most digital reverberation units:
time delay and reverberation.

Time Delay
Although a simple concept, time delay can serve as a fundamental building block for a wide
variety of complex effects. Figure 3.7 shows a block diagram of a signal being added or
mixed to a delayed version of itself, known as a feedforward comb filter, and its associated
impulse response. By simply delaying an audio signal and mixing it with the original non-
delayed signal, the product is either comb filtering (for shorter delay times, less than about
10 ms) or echo (for longer delay times). By adding hundreds of delayed versions of a signal
in an organized way, early reflection patterns such as those found in real acoustic spaces can
be mimicked. Chorus and flange effects are created through the use of delay times that are
modulated or vary over time. Figure 3.8 shows a block diagram of a delay with feedback
and its associated impulse response. We can see that the shape of this feedback comb filter’s
decay looks a little bit like the decay of sound in a room. A single feedback comb filter
will not sound like real reverb. To make it sound like actual reverb, we need to have numer-
ous feedback comb filters in parallel all set to slightly different delay times and gain amounts.

Figure 3.7 The top part (A) shows a block diagram of a signal combined with a delayed version of itself, also
known as a feedforward comb filter. The delay time amount is represented by the variable t, and
gain amount by g. The bottom part (B) shows the impulse response of the block diagram with a
gain of 0.5: a signal (in this case an impulse) plus a delayed version of itself at half the amplitude.
66 Spatial Attributes and Reverberation

Figure 3.8 The top part (A) shows a block diagram of a signal combined with a delayed version of itself with the
output connected back into the delay, also known as a feedback comb filter.The delay time amount is
represented by the variable t, and gain amount by g.The bottom part (B) shows the impulse response
of the block diagram with a gain of 0.5: a signal (in this case an impulse) plus a repeating delayed
version of itself where each subsequent delayed output is half the amplitude of the previous one.

Figure 3.9 A block diagram of an all-pass filter, which is essentially a combination of a feedforward and feed-
back comb filter. All-pass filters have a flat frequency response, but they can be set to produce a
decaying time response. There is one delay time, t, and three gain variables: blend (non-delayed
signal) = g1, feedforward delay = g2, feedback delay = g3.

If we combine a feedforward and feedback comb filter, we can create what is known as an
all-pass filter, as shown in Figure 3.9. All-pass filters have a flat frequency response, thus the
name “all” pass, but can be set to produce a decaying time response. As we will see below,
they are an essential building block of digital reverbs.

Reverberation
Whether originating from a real acoustic space or an artificially generated one, reverberation
is a powerful effect that can provide a sense of spaciousness, depth, cohesion, and distance
Spatial Attributes and Reverberation 67

in recordings. Reverberation helps blend sounds and create the illusion of being immersed
in an environment different from our physical surroundings.
On the other hand, reverberation, like any other type of audio processing, can also create
problems in sound recordings. Mixed too high or with a decay time that is excessively long,
reverberation can destroy the clarity of direct sounds or, as in the case of speech, affect
intelligibility. The quality of reverberation must be optimized to suit the musical and artistic
style being recorded.
Reverberation and delay have important functions in music recording, such as helping
the instruments and voices in a recording blend and “gel.” Through the use of reverbera-
tion, we can create the illusion of sources performing in a common acoustic space. Additio-
nal layers of reverberation and delay can be added to accentuate and highlight specific
soloists.
The sound of a close-miked instrument or singer played back over loudspeakers creates
an intimate or perhaps even uncomfortable feeling when listening over headphones. When
we hear a close-miked voice over headphones, it sounds like the singer is only a few cen-
timeters from our ears. This is not something we are accustomed to hearing acoustically
from a live music performance and it can make listeners feel uncomfortable. Concert goers
hear live music performances at least several feet away from the performers—certainly more
than a few centimeters—which means that reflected sound from walls, floor, and ceiling of
a room fuses perceptually with sound coming directly from a sound source. When recording
a performer with a close microphone, we can add delay or reverberation to the dry signal
to create the perception of a more comfortable distance between the listener and sound
source.
Conventional digital reverberation algorithms use a network of delays, all-pass filters, and
comb filters as their building blocks. Even the most sophisticated digital reverberation algo-
rithms are based on the basic ideas found in the first digital reverb invented by Manfred Schroe-
der in 1962. Figure 3.10 shows a block diagram of Schroeder’s digital reverb with four parallel
comb filters that feed into two all-pass filters. Each time a signal goes through the feedback
loop it is reduced in level by a preset amount so that its strength decays over time as we saw
in Figure 3.8.
At their most basic level, conventional artificial reverberation algorithms are just combina-
tions of delays with feedback or recursion. Although simple in concept, current reverb
plug-in designers use large numbers of comb and all-pass filters connected together in
sophisticated ways, with manually tuned delay and gain parameters to create realistic-sounding

comb filter

comb filter
input + all-pass filter all-pass filter output
comb filter

comb filter

Figure 3.10 A block diagram of Manfred Schroeder’s original digital reverberation algorithm, showing four
comb filters in parallel that feed two all-pass filters in series, upon which modern conventional
reverb algorithms are based.
68 Spatial Attributes and Reverberation

reverb decays. They also add equalization and filters to mimic reflected sound in a real room,
and subtle modulation to reduce repeating patterns that might catch our attention and
remind us that the reverb is artificial.
Another type of digital reverberation convolves an impulse response of a real acoustic space
with the incoming dry signal. Without getting into the mathematics, we might say that
convolution basically combines two signals by applying the features of one signal to another.
When we convolve a dry signal with the impulse response from a large hall, we create a
new signal that sounds like our dry signal recorded in a large hall. Hardware units capable
of convolution-based reverberation have been commercially available since the mid-1990s,
and software implementations are now commonly released as plug-ins with digital audio
workstations. Convolution reverberation is sometimes called “sampling” or “IR” reverb
because a sample or impulse response of an acoustic space is convolved with a dry audio
signal. Although possible to compute in the time domain, convolution reverb is usually
computed in the frequency domain to make the computation fast enough for real-time
processing. The resulting reverb from a convolution reverberator is arguably more realistic
sounding than that from conventional digital reverberation using comb and all-pass filters.
The main drawback is that there is not as much flexibility or control of parameters in
convolution reverberation as is possible with digital reverberation based on comb and all-pass
filters.
In conventional digital reverberation units, we usually find a number of possible parameters
to control. Although these parameters vary from one manufacturer to another, a few of the
most common include the following:

• Reverberation decay time (RT60)


• Delay time
• Predelay time
• Some control over early reflection patterns, either by choice of predefined sets of early
reflections or control over individual reflections
• Low-pass filter cutoff frequency
• High-pass filter cutoff frequency
• Decay time multipliers for different frequency bands
• Gate—threshold, attack time, hold time, release or decay time, depth

Although most digital reverberation algorithms represent simplified models of the acoustics
of a real space, they are widely used in recorded sound to help augment the recorded acoustic
space or to create a sense of spaciousness that did not exist in the original recording due to
close-miking techniques.

Reverberation Decay Time


The reverberation time is defined as the amount of time it takes for a sound to decay 60
dB once the source is turned off. Usually referred to as RT60, W. C. Sabine proposed an
equation for calculating it in a real acoustic space (Howard & Angus, 2006):
V
RT 60 = 0.161 ∗

V = volume in m3, S = surface area in m2 for a given type of surface material, and α =
absorption coefficient of the respective surface.
Spatial Attributes and Reverberation 69

Because the RT60 will be some value greater than zero even if α is 1.0 (100% absorption
on all surfaces), the Sabine equation is typically only valid for α values less than 0.3. In other
words, the shortcoming of the Sabine equation is that we would calculate a reverberation
time greater than 0 for an anechoic chamber, even though we would measure no reverbera-
tion acoustically. Norris-Eyring proposed a slight variation on the equation for a wider
range of values (Howard & Angus, 2006):

−0.161 ∗ V
RT 60 =
S ∗ ln (1 − α )

V = volume in m3, S = surface area in m2 for a given type of surface material, ln is the
natural logarithm, and α = absorption coefficient of the respective surface.
It is helpful to have an intuitive sense of the sound of various decay times. A decay time of
2 seconds will have a much different sonic effect than a decay time of less than 1 second.

Delay Time
We can mix a straight delay (without feedback or recursion) with a dry signal to create a
sense of space, and it can supplement or substitute reverberation. With shorter delay times—
around 25–35 milliseconds—our auditory systems tend to fuse the direct and delayed sounds;
we localize the combined sound based on the location of the first-arriving direct sound.
Helmut Haas discovered that a single reflection added to a speech signal fused perceptually
with the dry sound unless the reflection arrived more than approximately 25–35 milliseconds
after the dry sound, at which point we perceive the delayed sound as an echo or separate
sound. The phenomenon is known as the precedence effect, the Haas effect, or the law of
the first wavefront.
When we add a signal to a delayed version of itself and the delay time is greater
than 25–35 milliseconds, we hear the delayed signal as a distinct echo of a direct sound.
The actual amount of delay time required to create a distinct echo depends on the
nature of the audio signal being delayed. Transient, percussive signals reveal distinct
echoes with shorter delay times (less than 30 milliseconds), whereas sustained, steady-
state signals require much longer delay times (more than 50 milliseconds) to create an
audible echo.

Predelay Time
Predelay time is typically defined as the time delay between the direct sound and the onset
of reverberation. Predelay can give the impression of a larger space even with a short decay
time. In a real acoustic space with no physical obstructions between a sound source and a
listener, there will always be a short delay between the arrival of direct and reflected sounds.
The longer this initial delay is, the larger we perceive the space to be.

Digital Reverberation Presets


Most digital reverberation units currently available, whether in plug-in or hardware form,
offer hundreds if not thousands of reverberation presets. What may not be immediately obvi-
ous to the novice engineer is that there are typically only a handful of unique algorithms for
a given reverberation plug-in or unit. The presets simply give variations in parameter settings
70 Spatial Attributes and Reverberation

for the same algorithm. The presets are individually named to indicate an application or space
such as large hall, bright vocal, studio drums, or theater. All of the presets using a given type
of algorithm represent identical types of processes and will sound identical if the parameters
of each preset are matched.
Because engineers adjust many reverberation parameters to create the most suitable
reverberation for each application, it makes sense to pick any preset and start tuning
parameters instead of searching for the perfect preset. The main drawback of trying to
find the right preset for each instrument and voice during a mix is that the right preset
might not exist. Or if something close does exist, it will likely require parameter adjust-
ments anyway, so why not just start by adjusting parameters. It is more efficient to simply
start with any preset and spend our time editing parameters to suit our mix. As we edit
parameters, we learn a reverb’s capabilities and what each parameter sounds like. In the
parameter-editing phase for an unfamiliar reverb, I find it helpful to turn parameters to
their range extremes to make sure I can hear their contributions, and then dial in the
settings I want.
On the other hand, we can learn more about the capabilities of a reverb algorithm by
going through the factory presets. Searching through endless lists of presets may not be the
best use of a mixing session, but it can be useful to listen carefully to presets during
downtime.

3.3 Reverberation in Multichannel Audio


From a practical point of view, my informal research and listening seem to indicate that,
in general, higher levels of reverberation are possible in multichannel audio recordings
than two-channel stereo, while maintaining an acceptable level of clarity. More formal
tests need to be conducted to verify this point, but it may make sense from what we
know about spatial masking. As we discussed earlier, spatial separation of two sound
sources reduces the masking that occurs when they are located in the same place (Kidd
et al., 1998; Saberi et al., 1991). The effect seems to be consistent for real sound sources
as well as virtual sound sources panned across a multichannel loudspeaker array. It appears
that because of the larger spatial distribution of sound in multichannel audio, relative to
two-channel stereo, reverberation is less likely to obscure or mask the direct sound and
therefore can be more prominent in multichannel audio. We could argue that reverbera-
tion is increasingly critical in recordings mixed for multichannel audio reproduction
because multichannel audio offers a much greater possibility to re-create a sense of
immersion in a virtual acoustic space than two-channel stereo. We can benefit from a
systematic training method to learn to match parameter settings of artificial reverberation
by ear and to further develop the ability to consistently identify subtle details of sound
reproduced over loudspeakers.
Recording music and sound for multichannel reproduction also presents new challenges
over two-channel stereo in terms of creating a detailed and enveloping sound image. One
of the difficulties with multichannel audio reproduction using the ITU-R BS.775 (ITU-R,
1994) loudspeaker layout is the large space on the sides (between the front and rear loud-
speakers, 80° to 90° spacing; see Fig. 1.4). Because of the spacing between the loudspeakers
and the nature of our binaural sound localization abilities, side phantom images are typically
unstable. Furthermore, it is a challenge to produce phantom images that join the front sound
image to the rear. I have found that reverberation can be helpful in creating the illusion of
sound images that span the space between loudspeakers, even though I am unclear why it
seems to help.
Spatial Attributes and Reverberation 71

3.4 Software Training Module


The “Technical Ear Trainer—Reverb” software module and the other software modules are
included on the companion website: www.routledge.com/cw/corey.
I designed the associated software training module to focus on hearing subtle details and para-
meters of artificial digital reverberation. Although not focused on real room acoustics, it is possible
that improved listening skills in digital reverb may transfer to real room acoustics because we
increase our abilities to distinguish reverb decay times, echoes, reflections, and source locations.
Most conventional digital reverberation algorithms are based on various combinations of
comb and all-pass filters after Schroeder’s model, as we discussed earlier. Although these
algorithms are computationally efficient and provide many controllable parameters, they
simply approximate the behavior of sound in a real room; the reverb tails are not physical
models of sound in a room. As such, we cannot be sure exactly how the reverberation decay
time (RT60) of a given artificial reverberation algorithm relates to decay time of sound in
a real room. For instance, if we set a variety of artificial reverb plug-ins to the same reverb
decay time, we may hear roughly the same decay time, but other qualities of the reverb tails
may sound different, such as the stereo spread or the shape of the decay. Figure 3.11 shows

Figure 3.11 Impulse responses of three different reverb plug-ins with parameters set as identically as pos-
sible: reverb decay time: 2.0 s; predelay time: 0 ms; room type: hall. From these three impulse
responses, we can see that the decays look different, but perhaps more importantly, the decays
also sound distinctly different. Interestingly, according to FuzzMeasure audio test and measure-
ment software, all three impulse responses measure close to 2.0 seconds decay time.
72 Spatial Attributes and Reverberation

impulse responses of three different reverb plug-ins set to as close to the same parameters
as possible, but with three distinctly different decay patterns. Reverb plug-ins do not all share
the same set of controllable parameters, thus it is impossible to have two different plug-ins
with exactly the same settings.
Reverb parameters settings do not sound consistent across digital reverb algorithms because
there are many different reverb algorithms and there are thousands of acoustic spaces to
model. This is one reason why it can be worth exploring different reverb models to find
out what works best for your projects. There are hundreds of options with varying levels
of quality that appeal to different tastes. Reverberation is a powerful sonic tool available to
recording engineers who mix it with recorded sound to create the aural illusion of real
acoustics and spatial context.
Just as it is critical to learn to recognize spectral resonances (with EQ), it is equally
important to improve our perception of artificial reverberation. At least one researcher has
demonstrated that listeners can “learn” reverberation for a given room (Shinn-Cunningham,
2000). Other work in training listeners to identify spatial attributes of sound has been con-
ducted as well. Neher et al. (2003) have documented a method of training listeners to
identify spatial attributes using verbal descriptors for the purpose of spatial audio quality
evaluation. Other researchers have used graphical assessment tools to describe the spatial
attributes of reproduced sound (such as Ford et al., 2003; Usher & Woszczyk, 2003).
This training software has an advantage because you compare one spatial scene with
another by ear; you are never required to translate your auditory sensation to another sensory
modality or means of expression, such as by drawing an image or choosing a word. Using
the software, you compare and match two sound scenes, within a given set of artificial
reverberation parameters, using only your auditory system. Thus, there is no isomorphism
between different senses and methods of communication. Additionally, this method has
ecological validity, as it mimics the process of a sound engineer sculpting sonic details of a
sound recording by ear rather than through graphs and words.

3.5 Description of the Software

Training Module
The included software training module “Technical Ear Trainer—Reverb” is available for
listening drills. The computer randomizes the exercises and gives a choice of difficulty and
parameters for an exercise. It works in much the same way as the EQ module described in
Chapter 2 works.

Sound Sources
I encourage you to begin the reverb training with simple, transient, or impulsive sounds such
as percussion—a single snare drum hit is great—and progress to more complex sounds such as
speech and music recordings. In the same way that we use pink noise in EQ ear training
because it exposes the spectral changes better than most music samples, we use percussive
or impulsive sounds training in time-based effects processing. Reverberation decay time is
easier to hear with transient signals than with steady-state sources, which tend to mask or
blend with reverberation, making judgments about it more difficult.

User Interface
A graphical user interface (GUI), shown in Figure 3.12, provides a control surface for you
to interact with the system.
Spatial Attributes and Reverberation 73

Figure 3.12 A screenshot of the user interface for the reverb trainer.

With the GUI you can do the following:

• Choose the level of difficulty.


• Select the parameter(s) with which to work.
• Choose a sound file.
• Adjust parameters of the reverberation.
• Toggle between the reference and your answer.
• Control the overall level of the sound output.
• Submit a response to each question and move to the next example.

The graphical interface also keeps track of the current question and the average score up
to that point, and it provides the score and correct answer for the current question.

3.6 Getting Started with Practice


The training curriculum covers a few of the most commonly found parameters in digital
reverberation units, including the following:

• delay time
• reverb decay time
• predelay time
• reverberation level (mix)
74 Spatial Attributes and Reverberation

As with the EQ module, your task with the exercises and tests is to duplicate a reference
sound scene by listening and comparing your answer to the reference and making the appropriate
changes to the parameters until they sound the same. The software randomly chooses parameter
values based on the level of difficulty and test parameters you choose, and it asks you to identify the
reverberation parameters of the reference by adjusting the appropriate parameter to the value that
most closely matches the sound of the reference. You can toggle between the reference question
and your answer either by clicking on the switches labeled “Question” and “Your Response”
(see Figure 3.12) or by pressing the space bar on the computer keyboard. Once the two sound
scenes are matched, you can click on “Check Answer” or hit the [Enter] key to submit the answer
and see the correct answer. Clicking on the “Next” button moves on to the next question.

Delay Time
Delay times range from 0 milliseconds to 200 milliseconds with an initial resolution of
40 milliseconds and increasing in difficulty to a resolution of 10 milliseconds.

Reverb Decay Time


Decay times range from 0.5 seconds to 2.5 seconds with an initial resolution of 1.5 seconds
and increasing in difficulty to a resolution of 0.25 seconds.

Predelay Time
Predelay time is the amount of time delay between the direct (dry) sound and the beginning
of early reflections and reverberation. Predelay times vary between 0 and 200 ms, with an
initial resolution of 40 ms and decreasing to a resolution of 10 ms.

Mix Level
Often when mixing reverberation with recorded sound, the level of the reverberation is
adjusted as an auxiliary return on the recording console or digital audio workstation. The
training system allows you to practice learning various “mix” levels of reverberation. A mix
level of 100% means that there is no direct (unprocessed) sound at the output of the algo-
rithm, whereas a mix level of 50% represents an output with equal levels of processed and
unprocessed sound. The mix value resolution at the lowest level of difficulty is 25% and
progresses up to a resolution of 5%, covering the range from 0% to 100% mix.

3.7 Mid-Side Matrixing


Mathematician Michael Gerzon (1986, 1994) made important contributions to audio engi-
neering, specifically with his mathematical explanations of matrixing and shuffling of stereo
recordings to enhance and rebalance correlated and decorrelated components in a mix. His
suggested techniques are useful for technical ear training because they can help in the analysis
and deconstruction of recordings by bringing forth components of a sound image that might
not otherwise be as audible.
By applying principles of the stereo mid-side microphone technique to stereo recordings,
we can rebalance aspects of a recording and learn more about some of the techniques used.
Although this process takes its name from a specific stereo microphone technique, any stereo
recording can be post-processed to convert the left and right channels to mid (M) and side
(S) or sum and difference, regardless of the mixing or microphone technique used.
Spatial Attributes and Reverberation 75

Figure 3.13 A block diagram (A) and a mixer signal flow diagram (B) to convert Left and Right stereo signals
into Mid (Left + Right) and Side (Left − Right) signals, and subsequent mixing back into Left and
Right channels. Both diagrams result in equivalent signal processing, where diagram A is a basic
block diagram and diagram B shows one way to route signals on a mixer to achieve the process-
ing in diagram A. Dashed signal lines in the diagrams represent audio signal flow the same as solid
lines but are used to clarify signal flow for crossing lines. Dotted lines indicate fader grouping.

Mastering engineers sometimes split a stereo recording into its M and S components for
processing and then convert them back into L and R. Although there are plug-ins that auto-
matically convert the L and R channels to M and S, the process is quite simple. We can derive
the mid or sum component by adding the L and R channels together. Practically, we can do it
by bringing the two audio channels in on two faders and panning them both to the center. To
derive the side or difference channel, we send the L and R into two other pairs of channels. One
pair can be panned hard left and with the L channel opposite polarity. The final pair of L and
R channels can be panned right with the right channel opposite polarity. See Figure 3.13 for
details on the signal routing information. Now that the signals are split into M and S, we can
simply rebalance these two components, or we can apply processing to them independently. The
S signal represents the components of the signal that meet either of the following conditions:

• exist in only the L channel or only the R channel


• are opposite of polarity, L relative to R

The Mid or Sum Component


The mid signal represents all components from a stereo mix that are not opposite polarity
between the two channels—that is, anything that is common to both channels or just pres-
ent in one side. As we can see from the block diagram and mixer signal flow presented in
Figure 3.13, the M component is derived from L + R.
76 Spatial Attributes and Reverberation

The Side or Difference Component


The side signal is derived by subtracting the L and R channels: side = L − R. Anything that
is common to both L and R will be canceled out and will not form part of the S compo-
nent. In other words, any signal that is panned center in a mix will be canceled from the
S component. Any stereo signal that has opposite polarity components, and any signal panned
left or right (partially or completely), will form the S signal.

Exercise: Listening to Mid-Side Processing


All of the “Technical Ear Trainer” software modules are available on the companion website:
www.routledge.com/cw/corey.
The practice module “Technical Ear Trainer—Mid-Side” offers an easy way to audition
mid and side components of any stereo recording (AIFF or WAV file formats) and hear
what it sounds like if they are rebalanced. By converting a stereo mix (L and R) into M
and S signals, we can sometimes hear mix elements that may have been masked in the
standard L/R format. Besides being able to hear stereo reverberation better (assuming the
reverb is not mono), sometimes other artifacts become apparent. Artifacts such as punch-ins/
edits, distortion, dynamic range compression, and fader level changes can become more
audible as we listen to only the S component. Many stereo mixes have a strong center
component, and when we listen to just the S component, everything panned center will be
missing. Punch-ins and edits, usually more problematic in analog tape recordings, are more
audible when listening to the S component in isolation.
By splitting a stereo mix into its M and S components, we can highlight artifacts created
by perceptual encoding processes (e.g., MP3, AAC, Ogg Vorbis). Although these artifacts are
mostly masked by the stereo audio with a reasonably high bit rate, removing the M com-
ponent does make the artifacts more audible. Because the Mid-Side module has a slider
which allows us to transition gradually from hearing only the Mid signal, to an equal mix
of Mid and Side (i.e., the original stereo image), to just the Side component, the Side signal
is routed to the left channel and a duplicate opposite polarity version of the Side (i.e., − S)
to the right channel. So by listening to 100% Side component, we hear a correlation of − 1,
because the left channel is producing the original S component and the right channel is
producing an opposite polarity S (or − S) component.

Summary
This chapter covers the spatial attributes of sound, focusing primarily on reverberation and
mid-side processing. The goal of the spatial software practice module is to systematically
familiarize listeners with aspects of artificial reverberation, delay, and panning. By comparing
two audio scenes by ear, we can match one or more parameters of artificial reverberation
to a reference randomly chosen by the software. We can progress from comparisons using
percussive sound sources and coarse resolution between parameter values to more steady-state
musical recordings and finer resolution between parameter values. Often very minute changes
in reverberation parameters can have a significant influence on the depth, blend, spaciousness,
and clarity of the final mix of a sound recording.
Chapter 4

Dynamic Range Control

In this chapter we will discuss level control and dynamics processing. To inform our critical
listening, we will cover some of the theory of dynamics processors.
Mix balance has a direct effect on an artist’s musical expression. If one or multiple ele-
ments in a mix are too loud or too quiet, we as listeners may not be able to hear a musical
part or we may think the emphasis is on a different part than the artist intended. Achieving
an appropriate balance of a musical ensemble is essential for expressing an artist’s musical
intention. Conductors and composers understand the idea of finding optimal ensemble bal-
ance for each performance and piece of music. If an instrumental part within an ensemble
is not loud enough to be heard clearly, listeners do not receive the full impact of a piece of
music. Overall balance depends on the control of individual vocal and instrumental ampli-
tudes in an ensemble.
When recording spot microphone signals on multiple tracks and mixing those tracks, we
have direct control over balance and therefore also musical expression. When mixing multiple
tracks, we may need to continually adjust the level of certain instruments or voices for
consistent balance from the beginning to the end of a track. We can do this manually with
fader automation, automatically with dynamics processors, or use a hybrid approach that
uses both.
Dynamic range describes the difference between the loudest and quietest levels of an
audio signal. For microphone signals that have a dynamic range that is excessively wide
for the type of music, we can adjust fader levels over time to compensate for variations in
signal level and therefore maintain a consistent perceived loudness. We can manually boost
levels during quiet sections and attenuate loud sections. In this way, our fader level adjust-
ments made through a recording amount to manual dynamic range compression. Dynamic
range controllers—compressors/limiters and expanders/gates—adjust levels automatically
based on an audio signal’s level and can be applied to individual audio tracks or to a mix
as a whole.
Some signals have an inherently wide dynamic range; others have a relatively narrow range.
Distorted guitars generally have a small dynamic range, because distortion results from limit-
ing the amplitude of a signal, with instantaneous attack and release times. A close-miked
lead vocal, on the other hand, can have an extremely wide dynamic range. In extreme cases,
a singer’s dynamic range may vary from a loud scream to just a whisper, all within a single
song. If a vocal track’s fader is set to one level and left for the duration of a piece with no
compression or other level change, there will be moments when the vocal will be much too
loud and other moments when it will be too quiet. When a vocal level rises too high it
becomes uncomfortable for a listener, who may then want to turn the entire mix down. In
the opposite situation, a vocal that is too low in level becomes difficult to understand,
78 Dynamic Range Control

leaving an unsatisfying musical experience for a listener. Finding a satisfactory static fader
level without compression for a sound source as dynamic as pop vocals is likely to be impos-
sible unless the singer intentionally sings within a narrow dynamic range. One way of
compensating for a wide dynamic range is to manually adjust the fader level for each word
or phrase that a singer sings. Although some tracks do call for such detailed manual control
of fader level, compression is still helpful in getting partway to consistent, intelligible, and
musically satisfying levels, especially for tracks with a wide dynamic range.
Consistent levels for instruments and vocals in a pop music recording may help com-
municate the musical intentions of an artist more effectively than levels with a wide dynamic
range. Most recordings in the pop music genre have very limited dynamic range. Yet wide
dynamic contrasts are still essential to help convey musical emotion, especially in acoustic
music. It begs the question: if the level of a vocal track is adjusted so that the loud ( fortis-
simo or ff ) passages are the same loudness as the quiet ( pianissimo or pp) passages, how is a
listener going to hear any dynamic contrast? Before we address this question we should be
aware that level control partly depends on genre. Classical music recordings, for example,
usually do not benefit from highly controlled dynamic range because listeners expect
dynamic range variation in classical music and too much dynamic range control can make
it sound too processed. Although signal processing artifacts such as distortion, limiting, EQ,
and delays are often an expected part of pop, rock, and electronic music (e.g., Brian Eno’s
concept of the recording studio as a musical instrument), we try to avoid any processing
in classical music recording. It is as though classical music recordings should not sound like
recordings, but should mimic the concert hall experience. For most other genres of music,
at least some amount of dynamic range control is desirable. And specifically for pop, rock,
and electronic music recordings, a limited dynamic range is the goal partly to make record-
ings sound loud.
Fortunately, even with extreme dynamic range control we can still perceive dynamic range
changes partly because of timbre changes between quiet and loud levels. We know from
acoustic measurements that there is a significant increase in the number and strength of
higher-frequency harmonics as dynamic level goes from quiet to loud for almost all instru-
ments, including voice. So even with a heavily compressed vocal performance, we still perceive
dynamic range because of changes in timbre in the voice.
Nevertheless, overuse of compression and limiting can leave a performance sounding life-
less. We need to be aware of using too much dynamics processing because it can be fairly
destructive when used excessively. Once we record a track with compression, there is no
way to completely undo the effect. Some types of audio processing such as reciprocal peak/
dip equalization allow us to undo alterations with equal parameter and opposite gain settings,
but compression and limiting do not offer such transparent flexibility.
The effect of a compressor is amplitude modulation where the modulation depends on
an audio signal’s amplitude envelope and modifies it. Compression is simply gain reduction
where the gain reduction varies over time based on a signal’s level, with the amount of
reduction based on the threshold and ratio settings. Compression and expansion are examples
of nonlinear processing because the amount of gain reduction applied is amplitude-dependent
and the gain applied to a signal changes over time.
Dynamics processing such as compression, limiting, expansion, and gating all offer means
to sculpt and shape audio signals in unique and time-varying ways. We say it is time-varying
because the amount of gain reduction varies over time as the original signal level changes
over time. Dynamic range control can help in the mixing process by not only smoothing
out audio signal levels but by acting like a glue that helps add cohesion to various musical
parts in a mix.
Dynamic Range Control 79

4.1 Signal Detection in Dynamics Processors


Dynamics processors work with objective audio signal levels, usually measured in decibels.
The first reason for working on a decibel scale is that the decibel is a logarithmic scale that
is comparable to the way the human auditory system interprets changes in loudness. There-
fore, the decibel as a measurement scale seems to correlate to our perception of sound. The
second main reason for using decibels is to scale the range of audible sound levels to a more
manageable range. For instance, human hearing ranges from the threshold of hearing, at
about 0.00002 pascals (or Pa), to the threshold of pain, around 20 Pa, a range that represents
a factor of 1 million. Pascals are a unit of pressure that measure force per unit area. When
this range is converted to decibels, it scales from 0 to 120 dB sound pressure level (SPL), a
much more meaningful and manageable range.
To control the level of a track, a compressor needs some way of measuring and indicating
the amplitude of an audio signal. As it turns out, there are many ways to meter a signal, but
they are all typically based on two common representations of audio signal level: peak level and
RMS level (which stands for root-mean-square level). Peak level simply indicates the high-
est amplitude of a signal at any given time. Digital recorders (hardware or software) usually
have peak level meters because we need to see precisely how close a signal is to the 0 dBFS
(decibels relative to full scale) digital clip point. The RMS is somewhat like an average signal
level, although not mathematically equivalent to the average. With audio signals where there is
a voltage that varies between positive and negative values, a mathematical average calculation
is not useful, because the average will always be around zero. The RMS, on the other hand, is
highly useful and is calculated by squaring the signal, taking the average of some predefined
window of time, and then taking the square root of that. For sine tones the RMS is easily calcu-
lated because it will always be 3 dB below the peak level, or 70.7% of the peak level. For more
complex audio signals such as music or speech, the RMS level must be measured directly from
a signal and cannot be calculated by simply subtracting 3 dB from the peak value. Although
RMS and average are not mathematically identical, RMS can be thought of as a type of signal
average, and we will use the terms RMS and average interchangeably. VU (or Volume Unit)
meters give the RMS level for a sine tone and approximate the RMS for more complex signals
such those we encounter in recording and mixing. Figures 4.1, 4.2, and 4.3 illustrate peak,
RMS, and crest factor levels for three different signals.
The dynamic range can have a significant effect on the loudness of recorded music. The
term loudness is used to describe the perceived level rather than the physical, measured sound
pressure level. A number of factors contribute to perceived loudness, such as power spectrum
and crest factor (the ratio of the peak level to the RMS level). Given two musical recordings
with the same peak level, the one with a smaller crest factor will generally sound louder
because its RMS level is higher. When judging the loudness of sounds, our ears respond
more to average levels than to peak levels.
Dynamic range compression increases the average level through a two-stage process start-
ing with a gain reduction of the loudest or peak levels followed by a linear output gain,
sometimes called makeup gain. Compressors and limiters lower the loudest sections of an
audio signal and then apply a linear gain stage to bring the entire audio signal back. The
linear gain stage after compression is usually called makeup gain because it makes up for peak
level reduction. Some compressors and limiters apply an automatic makeup gain at the output
stage so that the gain-reduced loud sections remain at roughly the same level. Makeup gain
brings up the entire signal (quiet and loud levels), so if we match the audio peak to their
pre-compression levels, we have essentially brought up the quieter audio sections. The process
of compression and limiting reduces the crest factor of an audio signal, and when makeup
Figure 4.1 The RMS value of a sine wave is always 70.7% of the peak value, which is the same as saying that the
RMS value is 3 dB below the peak level.This is only true for a sine wave.The crest factor is the difference
between the peak and RMS levels, usually measured in dB, thus a sine wave has a crest factor of 3 dB.

Figure 4.2 A square wave has equal peak and RMS levels, so the crest factor is 0.
Dynamic Range Control 81

Figure 4.3 A pulse wave is similar to a square wave except that we are shortening the amount of time
the signal is at its peak level. The length of the pulse determines the RMS level, where a shorter
pulse will give a lower RMS level and therefore a larger crest factor. The RMS level shown in the
figure is approximate.

gain is applied to restore the peaks to their original level, the RMS level is increased as well,
making the overall signal louder.
By reducing the crest factor through compression and limiting, we can make an audio
signal sound louder even if its peak level is unchanged. We may be tempted to normalize
a recorded audio signal in an attempt to make it sound louder. Normalizing is a process
whereby an audio editing program scans an audio signal, finds the highest signal level for
the entire clip, calculates the difference in dB between the maximum recordable level (0 dBFS)
and the peak level of an audio signal, and then raises the entire audio clip by this difference
so that the peak level will reach 0 dBFS. If the peak levels are two or three decibels below
0 dBFS, we may only get a couple of decibels of gain at best by normalizing an audio signal.
This is one reason why the process of digitally normalizing a sound file will not necessarily
make a recording sound significantly louder. The only way to make a normalized signal
sound significantly louder is through compression and limiting to raise the RMS level and
reduce the crest factor.
As a side note, normalizing a mix is not necessarily a good idea, because even if the
original sample peaks are only as high as 0 dBFS, the peaks between samples (inter-sample
peaks) may actually go above 0 dBFS, in the case of oversampling on playback, and cause
clipping. Many mastering engineers recommend staying at least a few decibels below 0 dBFS.
82 Dynamic Range Control

For recordings that will be submitted for sale to the iTunes Store, Apple says that “digital
masters should have a small amount of headroom (roughly 1 dB) in order to avoid such
clipping.”1
In addition to learning how to identify the artifacts produced by dynamic range compres-
sion, it is also important to learn how to identify static changes in gain. If the overall level
of a recording is increased, it is important to be able to recognize the amount of gain change
applied in decibels.

4.2 Compressors/Limiters and Expanders/Gates


To reduce the dynamic range of a recording, we use dynamics processing in the form of
compressors and limiters. Typically a compressor or limiter will attenuate the level of a
signal once it has reached or gone above a threshold level. Compressors and expanders
belong to a group of sound processing effects that are adaptive, meaning that the amount
or type of processing is determined by some component of the signal itself (Verfaille et al.,
2006). In the case of compressors and expanders, the amount of gain reduction applied to
a signal is dependent on the level of the signal itself or a secondary signal known as a
side-chain or key input. With other types of processing such as equalization and reverbera-
tion, the type, amount, or quality of processing remains the same, regardless of the input
signal characteristics. Because signal-dependent processors alter a signal when the signal
changes, it can be difficult to recognize the processing. Compression is sometimes difficult
to hear precisely because gain reduction is being applied at the same moment a signal level
is increasing. Gain changes occur synchronously with changes in the audio signal itself,
and sometimes the actual signal will mask these changes or our auditory system will assume
that they are part of the original sound (as in the case of compression). So-called “look
ahead” limiters, that are sometimes used in broadcasting, are highly effective at detecting
and attenuating peaks since they delay the incoming signal by some amount in order to
reduce the gain before a dangerous peak happens. Without hearing the original signal we
do not know exactly how a signal varied dynamically before compression. Thus it can be
useful to listen for side effects or artifacts produced from attack and release times to identify
compression.
Alternatively, some signal-dependent processing is much more obvious. In signal-dependent
quantization errors at low bit rates, also known as bit-crushing when used as a creative tool,
the distortion (error) will be modulated by the amplitude of the signal and will therefore
be much more noticeable, as we will discuss in Section 5.2.
Other forms of dynamic processing increase the dynamic range by attenuating lower-
amplitude sections of a recording. These types of processors are often referred to as expanders
or gates. In contrast to a compressor, an expander attenuates the signal when it is below the
threshold level. Expanders are commonly used when mixing drums for pop and rock music.
Each component of a drum kit is often close-miked, but there is still some “leakage” of the
sound of adjacent drums into each microphone. To reduce this effect, expanders or gates
can be used to attenuate a microphone signal between hits on its respective drum.
There are many different types of compressors and limiters, and each make and model has
its own unique “sound.” This sonic signature is based on a number of factors such as the
signal detection circuit or algorithm used to determine the level of an input audio signal and
therefore whether to apply dynamics processing or not, and how much to apply based on
parameters settings. Attack and release time curves of each compressor also contribute to the
unique sound of a compressor. In analog processors, the actual electrical components in the
audio signal chain and power supply also affect the audio signal. A number of parameters
Dynamic Range Control 83

are typically controllable on a compressor. These include threshold, ratio, attack time, release
time, and knee.
It may be worth making a clarification here. According to conventional sound synthesis
theory, we describe the amplitude envelope of a synthesized sound in terms of four main prop-
erties: attack, decay, sustain, and release, or simply ADSR. (See Figure 4.4a for a visualization of
a generic ADSR amplitude envelope.) The “attack” refers to the note onset, from silence to its
peak amplitude. Acoustic instruments have their own respective attack times, which can vary
somewhat depending on the performer. Some instruments have a fast attack or rise in ampli-
tude (such as piano or percussion) while other instruments produce a slightly slower attack
(such as violin or cello). While the term “attack” with respect to an instrument or synthesized
sound refers to a note onset, or quick rise in amplitude, “attack time” on a compressor refers to
a reduction in amplitude once a signal rises above a set threshold level. Similarly, a note “decay”
or “release” and a compressor “release time” represent opposite level changes as a note fades
out. The attack time of an expander is, in fact, more equivalent to the attack of a musical note
in that it is a rising amplitude change.
In the following sections I will be referring to the “attack” of a note onset as well as the
“attack time” of a compressor, the “decay” of an instrument, the “release” of a note, and
the “release time” of a compressor. One group of terms refers to sound sources (note attack,
decay, release) and the other refers to the result of processes applied to a sound source
(compressor attack time, release time).

A.

B.

Figure 4.4 The top graph (A) shows the four components of an ADSR (attack, decay, sustain, release) ampli-
tude envelope that describe and generate a synthesized sound. The attack starts when we press
a key on a keyboard with the note sustained as long as we press the key. As soon as we let go
of the key, the release portion of the envelope starts. The bottom graph (B) shows an amplitude
envelope for an acoustic sound, such as from a string or drum, which can have a relatively fast
attack but immediately starts to decay after being struck. Actual attack and decay times vary
across instruments and even within the range of a single instrument. For example, a low piano
note will have a much longer decay than a high piano note, assuming the piano key is held to
allow the string to vibrate.
84 Dynamic Range Control

Threshold
We can usually set the threshold level of a compressor, although some models instead have
a fixed threshold with a variable input gain. For fixed thresholds we raise the input to reach
the threshold and therefore have less makeup gain to apply at the end, possibly reducing the
added noise introduced by an analog compressor. A compressor starts to reduce the gain of
an input signal as soon as the amplitude of the signal itself or a side-chain input signal goes
above the threshold. Compressors with a side-chain or key input can accept an alternate
signal input to determine the gain function to be applied to the main audio signal input.
Compression to the input signal is triggered when the side-chain signal rises above the
threshold, regardless of the input signal level.

Attack Time
Although a compressor begins to reduce the gain of the audio signal as soon as its amplitude
rises above the threshold, it usually takes some amount of time to achieve maximum gain
reduction. The actual amount of gain reduction applied depends on the ratio and how far the
signal is above the threshold. In practice, the attack time can help us either define (that is, make
more prominent) or round off the attack of a percussive sound or the beginning of a musical
note. With appropriate adjustment of attack time, we can help a recording sound more “punchy.”

Release Time
The release time is the time that it takes for a compressor to stop applying gain reduction
after an audio signal has gone below the threshold. As soon as the signal level falls below
the threshold, the compressor begins to return it to unity gain and reaches unity gain in
the amount of time specified by the release time.

Knee
The knee describes the transition of level control from below the threshold (no gain reduc-
tion) to above the threshold (gain reduction). A smooth transition from one to the other is
called a soft knee, whereas an abrupt change at the threshold is known as a hard knee.

Ratio
The compression ratio determines the amount of gain reduction applied once the signal
rises above the threshold. It is the ratio of input level to output level in dB above the
threshold. For instance, with a 2:1 (input:output) compression ratio, the portion of the output
signal that is above the threshold will be half the level (in dB) of the input signal that is
above the threshold in dB. Compressors set to ratios of about 10:1 or higher are generally
considered to be limiters. Higher ratios are going to give more gain reduction when a signal
goes above threshold, and therefore the compression will be more apparent.

Level Detection Timing


To apply a gain function to an input signal, dynamics processors need to determine the
amplitude of the signal and compare that to the set threshold. As mentioned earlier, there
are different ways to measure the amplitude of a signal, and although most compressors have
Dynamic Range Control 85

fixed level detection timing, some compressors allow us to switch between two or three
options. Typically the options differ in how fast the level detection is responding to a signal’s
level. For instance, peak level detection is good for responding to steep transients, and RMS
level detection responds to less transient signals. Some dynamics processors (such as the
George Massenburg Labs 8900 Dynamic Range Controller) have fast and slow RMS detec-
tion settings, where the fast RMS averages over a shorter period of time and thus responds
more to transients.
When a compressor is set to detect levels using slow RMS, it responds to very short
transients. Because RMS detection is averaging over time, a steep transient will not have
much influence on the averaged signal level.

Visualizing the Output of a Compressor


To fully understand the effect of dynamics processing on an audio signal, we need to look
beyond just the input/output transfer function that is commonly seen with explanations of
dynamics processors. I find it helpful to visualize the way a compressor’s output changes
over time given a specific type of signal and thus take into account the ever-critical param-
eters: attack and release time. Dynamics processors change the gain of an audio signal over
time, so they are classified as nonlinear time-varying devices. They are considered nonlinear
because compressing the sum of two signals is generally going to result in something dif-
ferent from compressing the two signals individually and subsequently adding them together
(Smith, accessed August 4, 2009).
To view the effect of a compressor on an audio signal, a step function is the best type of test
signal. A step function is a signal that instantaneously changes its amplitude and stays at the
new amplitude for some period of time. By using a step function, it is possible to illustrate
how a compressor responds to an immediate change in the amplitude of an input signal and
eventually settles to its target gain. For the following visualizations, an amplitude-modulated
sine wave acts as a step function (see Figure 4.5). The modulator is a square wave with a period
of 1 second. The peak amplitude of the sine wave was chosen to switch between 1 (0 dB) and
0.25 (− 12 dB).
Figure 4.6 shows the step response of a compressor for long (A), medium (B), and short
(C) attack and release times. These responses are usually not published with compressors’
specifications, but we can visualize them by recording the output when we send an amplitude-
modulated sine tone as an input signal (as I did for Figure 4.5). If we measure the step
response of various types of analog and digital compressors, it would be found that most
would look like those in Figure 4.6.

Figure 4.5 This figure shows a step function, an amplitude-modulated sine wave, that we can use to test the
attack and release times of a compressor.
86 Dynamic Range Control

Figure 4.6 The step response of a compressor showing three different attack and release times: long (A),
medium (B), and short (C).

Some compressor models have attack and release curves that look a bit different. Figure 4.7
shows a step function audio signal (A) that has been processed by a compressor and the
resulting step response (B) that the compressor produced, based on the input signal level and
compressor parameter settings. The step response shows the amount of gain reduction applied
over time, which varies with the amplitude of the audio signal input. In this compressor there
appears to be an overshoot in the amount of gain reduction in the attack before it settles into a
constant level of about 0.5. The threshold was set to 6 dB, which corresponds to 0.5 in audio
signal amplitude, so every time the signal goes above 0.5 in level (− 6 dB), the gain function
shows a reduction in level.

Automated Level Control through Compression


Dynamic range compression may be one of the most difficult types of processing for the
beginning engineer to learn how to hear and use. Likely it is difficult to hear because often
the goal of compression is to be transparent. Engineers employ a compressor when they
want to remove amplitude inconsistencies in an instrument or voice or an entire mix.
Depending on the nature of the signal being compressed and the parameter settings chosen,
compression can range from being highly transparent to entirely obvious.
Perhaps another reason why novice engineers find it difficult to identify compression is
that nearly all recorded sound that listeners hear has been compressed to some extent.
Compression has become such an integral part of almost all music heard through
Dynamic Range Control 87

input signal – amplitude modulated sine tone


1

0.5

A. 0

–0.5

–1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
4
x 10
output from an analog compressor (attack time = 50 ms, release time = 200 ms, ratio = 30:1)
1

0.5

B. 0

–0.5

–1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
4
x 10
time (in samples; sampling rate = 44.1 kHz)

Figure 4.7 The same modulated 40-Hz sine tone through a commercially available analog compressor with
an attack time of approximately 50 ms and a release time of 200 ms. Note the difference in the
gain curve from Figure 4.6. There appears to be an overshoot in the amount of gain reduction in
the attack before it settles into a constant level. A visual representation of a compressor’s attack
and release times such as this is not something that would be included in the specifications for a
device. The difference that is apparent between Figures 4.6 and 4.7 is typically something that an
engineer would listen for but could not visualize without doing the measurement.

loudspeakers that listeners can come to expect it to be part of all musical sound. Listening
to acoustic music without sound reinforcement can help in our ear training process to refresh
our perspective and remind ourselves what music sounds like without compression.
Because dynamics processing is dependent on an audio signal’s variations in amplitude,
the amount of gain reduction varies with changes in the signal. As we said above, dynamic
range compression results in amplitude modulation synchronized with amplitude fluctuations
of an audio signal. Because the gain reduction is synchronized with the amplitude envelope
of the audio signal itself, the gain reduction or modulation can be difficult to hear because
we do not know if the modulation was part of the original signal or not. Amplitude modu-
lation becomes almost inaudible because it reduces signal amplitude at a rate equivalent but
opposite to the amplitude variations in an audio signal. Compression or limiting can be
made easier to hear when we set the parameters of a device to their maximum or minimum
values—a high ratio, a short attack time, a long release time, and a low threshold.
If we apply amplitude modulation that does not vary synchronously with an audio signal,
we can hear the modulation much more easily. The resulting amplitude envelope does not
correlate with the signal’s envelope, and we can detect the modulation as a separate event.
For instance, with a sine wave modulator as used in a tremolo guitar effect, amplitude
modulation is periodic and not synchronous with any type of music signal from an acoustic
instrument and is therefore highly audible. In the case of a tremolo effect, amplitude modu-
lation with a sine wave can produce desirable effects on an audio signal. With tremolo
processing, the goal is usually to highlight the effect rather than make it transparent.
Through the action of gain reduction, compressors can create audible artifacts—such as
through timbre changes—that are completely intentional and contribute meaningfully to the
sound of a recording. In other situations, control of dynamic range is applied without creating
any artifacts or changing the timbre of sounds. We may want to turn down the loud parts
88 Dynamic Range Control

Figure 4.8 From an audio signal (A) sent to the input of a compressor, a gain function (B) is derived based
on compressor parameters and signal level. The resulting audio signal output (C) from the
compressor is the input signal with the gain function applied to it. The gain function shows the
amount of gain reduction applied over time, which varies with the amplitude of the audio signal
input. For example, a gain of 1 (unity gain) results in no change in level, and a gain of 0.5 reduces
the signal by 6 dB. The threshold was set to −6 dB, which corresponds to 0.5 in audio signal
amplitude, so every time the signal goes above 0.5 in level (−6 dB), the gain function shows a
reduction in level.

in a way that still controls the peaks but that does not distract the listener with artifacts. In
either case, we need to know what the artifacts sound like to decide how much or little
dynamic range control to apply to a recording. On many dynamic range controllers, the
user-adjustable parameters are interrelated to a certain extent and affect how we use and hear
them.

Manual Dynamic Range Control


Because dynamic range controllers are responding to an objective measure of signal level,
peak or RMS, rather than subjective signal levels, such as loudness, it is possible that the level
reduction provided by a compressor does not suit an audio signal as well as desired. The
automated dynamic range control of a compressor may not be as transparent as we would
like for a given application. The amount that a compressor is acting on an audio signal is
based on how much it determines an audio signal is going above a specified threshold and
as a result applies gain reduction based on objective measures of signal level. Objective signal
levels do not always correspond to our perceptions of loudness. As a result, a compressor
may measure a signal to be louder or quieter than we perceive it to be and therefore apply
more or less attenuation than we desire.
Dynamic Range Control 89

When mixing a multitrack recording, we are concerned with levels, dynamics, and balance
of each track. We want to be attentive to any sound sources that get masked at any point
in a piece. At a more subtle level, even if a sound source is not masked, we strive to find
the best possible musical balance, adjusting as necessary over time and across each note and
phrase of music. Focused listening helps us find the best compromise on the overall levels
of each sound source. It is often a compromise because it is not likely that every note of
every sound source will be heard perfectly clearly, even with extensive dynamic range control.
If we turn up each sound source to be heard above all others, we will run out of headroom
in our mix bus, so it becomes a balancing act where we need to set priorities. For instance,
vocals on a pop, rock, country, or jazz recording are typically the most important element.
Generally we want to make sure that each word of a vocal recording is heard clearly. Vocals
are often particularly dynamic in amplitude, and the addition of some dynamic range com-
pression can help make each word and phrase of a performance more consistent in level.
With recorded sound, we can guide a listener’s perspective and perception of a musical
performance through the use of level control on individual sound sources. We can bring
instruments and voices dynamically to the forefront and send them farther back, as the artistic
vision of a performance dictates. Sound source level automation can create a changing per-
spective that is obvious to the listener. Or we might create dynamic changes that are trans-
parent in order to maintain a perspective for the listener. Depending on the condition of the
raw tracks in a multitrack recording, we may need to make drastic changes behind the scenes
in order to create coherency and a focused musical vision. Listeners may not be consciously
aware that levels are being manipulated, and, in fact, engineers often try to make the changing
of levels as transparent and musical as possible. Listeners should only be able to hear that each
moment of a music recording is clear and musically satisfying, not that continuous level
changes are being applied to a mix. Again, we often strive to make the effect of technology
transparent to an artistic vision of the music we are recording. The old joke about recording
and live sound engineers is that we know we are doing a good job when no one notices
our work. Other engineers will notice our work, but listeners and musicians should be able
to focus on the art and not be distracted by engineering artifacts.

4.3 Timbral Effects of Compression


In addition to being a utilitarian device for managing the dynamic range of recording media,
dynamics processing has become a tool for altering the color and timbre of recorded sound.
When applied to a full mix, compression and limiting can help the elements of a mix
coalesce. The compressed musical parts will have what is known in auditory perception as
common fate because their amplitude changes are similar. When two or more elements (e.g.,
instruments or voices) in a mix have synchronously changing amplitudes, our auditory sys-
tems will tend to fuse these elements together perceptually. The result is that dynamics
processing can help blend elements of a mix together. Although compressors are not equal-
izers or filters by any stretch, we can use compressors to do some spectral shaping. In this
section we will move beyond the use of compression for simply maintaining consistent signal
levels to the use of compression as a tool to sculpt the timbre of a track.

Effect of Attack Time


With a compressor set to a long attack time—in the 100-millisecond range or greater—with
a low threshold and high ratio we can hear the sound plunge down in level when the input
signal goes above the threshold. The audible effect of the sound being brought down at this
rate is what is known as a pumping sound and can be most audible on sounds with a strong
90 Dynamic Range Control

pulse where the signal clearly rises above the threshold and then drops below it, such as
those produced by drums, other percussion instruments, and sometimes bass. If any lower-
level sounds or background noise is present with the main sound being compressed, we will
hear a modulated background sound. Sounds that are more constant in level such as distorted
electric guitar will not exhibit such an audible pumping effect.

Effect of Release Time


Another related effect is present if we set a compressor to have a long release time, in the
100-millisecond range or greater. Listening again with a low threshold and high ratio, be
attentive for the sound to come back up in level after a strong pulse. The audible effect of
the sound being brought back up in level after significant gain reduction is called breathing
because it can sound like someone taking a breath. As with the pumping effect, you may
notice the effect most prominently on background sounds, hiss, or higher overtones that
ring after a strong pulse.
Although compression tends to be explained as a process that reduces the dynamic range
of an audio signal, there are ways to use a compressor that can accentuate the difference
between transient peak levels and any sustained resonance that may follow. In essence, what
can be achieved with compression can be similar to dynamic range expansion because peaks
or strong pulses can be highlighted relative to quieter sounds that immediately follow them.
It may seem completely counterintuitive to try to think of compressors performing dynamic
range expansion, but in the following section we will work through what happens when
experimenting with various attack times.

Compression and Drums


A recording with a strong pulse, such as from drums or percussion, with a regularly repeat-
ing transient will trigger gain reduction in a compressor and can serve as a useful sound to
highlight the effect of dynamics processing. By processing a stereo mix of a full drum kit
through a compressor at a fairly high ratio of 6:1, we can adjust attack and release times to
hear their effect on the sound of the drums. On a typical snare drum, kick drum, and tom
drums that have not been compressed, there is a natural attack or onset, and then a release
or decay, all of which are dependent on the drums’ physical characteristics and tuning. A
compressor can influence all of these properties depending on how the parameters are set.
Let us explore the sonic effect of a compressor with a low threshold, high ratio, and very
short attack time (e.g., down to 0 milliseconds) on drums. The compressor attack time gives
us the greatest influence in shaping the drum sound onset. With a short (or fast) attack
time, a compressor brings transients immediately down in level and the naturally sharp onset
of the snare drum is dulled. Where the rate of gain reduction nearly matches the rate at
which a transient signal rises in level, a compressor significantly reduces a signal’s transient
nature. So with a very short attack time (accompanied by a short release time), a compressor
can nearly erase transients because the gain reduction is bringing the signal’s level down at
nearly the same rate that the signal was originally rising up during a transient. As a result,
the initial attack of a transient signal is reduced to the level of the resonant part of the
amplitude envelope. Very short attack times can be useful in some instances such as on
limiters that are used to avoid clipping. For shaping drum and percussion sounds, short attack
times are quite destructive and tend to take the life out of the original sounds.
On the other hand, if we mix our original, uncompressed drums with short-attack-time
compressed drums, we maintain the original transients and bring out the drum decay. As
Dynamic Range Control 91

we lengthen the attack time to just a few milliseconds, we begin to hear a clicking sound
emerge at the onset of a transient. The click is produced by a few milliseconds of the
original audio passing through as gain reduction occurs, and the timbre of the click is directly
dependent on the length of the attack time. The abrupt gain reduction reshapes the ampli-
tude envelope of a drum hit. By increasing the compressor’s attack time further, the onset
sound gains prominence relative to the decay portion of the sound, because the compressor’s
attack time is lagging behind the drum attack time and therefore the gain reduction happens
after the drum’s attack and during its decay. By bringing down the decay relative to the
drum’s attack, we create a larger difference between the two components of the sound. So
the attack is more prominent relative to the decay.
If we increase a compressor’s attack time when compressing low-frequency drums such
as a bass/kick drum or even an entire drum set, we will typically hear an increase in low-
frequency energy. Because low frequencies have longer periods, a longer attack time will
allow more cycles of a low-frequency sound to occur before attack time gain reduction, and
therefore low-frequency content will be more audible on each rhythmic bass pulse. By
increasing the attack time from a very short value to a longer time, we increase the low-
frequency energy coming from the bass drum. As we increase a compressor’s attack time
from near zero to several tens or hundreds of milliseconds, the spectral effect is similar to
adding a low-shelf filter to the mix and increasing the low-frequency energy.
The release time affects mostly the decay of the sound. The decay portion of the sound
is that which becomes quieter after the loud onset. If we set the release time to be long,
the compressor gain reduction does not quickly return to unity gain after the signal level
has fallen below the threshold (which would typically happen during the decay), and there-
fore the natural decay of the drum sound becomes significantly reduced.

Compression and Vocals


Because vocal performances tend to have a wide dynamic range, engineers often find that
some sort of dynamic range control helps them reach their artistic goals in a recording.
Compression can be very useful in reducing the dynamic range and de-essing a vocal track.
Unfortunately, compression does not always work as transparently as desired, and artifacts
from the automated gain control of a compressor sometimes come through.
Here are a couple of simple tips to help reduce dynamic range without adding too many
of the side effects that can detract from a performance:

• Use low ratios. The lower the ratio, the less gain reduction that will be applied. A ratio of 2:1
is a good place to start.
• Use more than one compressor in series. By chaining two or three compressors in series on a
vocal, each set to a low ratio, each compressor can provide some gain reduction and the
effect is more transparent than using a single compressor to do all of the gain reduction.

To help identify when compression is applied too aggressively, listen for changes in timbre while
watching the gain reduction meter on our compressor. If there is any change in timbre while gain
reduction happens, the solution may be to lower the ratio or raise the threshold or both. Some-
times a track may sound slightly darker during extreme gain reduction, and it can be easier to
identify a compressor’s side effects by watching the gain reduction meter of the compressor.
A slight popping sound at the start of a singer’s word or phrase may indicate that the
attack time is too slow. Generally a very long attack time is not effective on a vocal since
it has the effect of accentuating the attack of a vocal and can be distracting to listeners.
92 Dynamic Range Control

Compression of a vocal usually brings out lower-level detail in a vocal performance such
as breaths and “s” sounds. A de-esser, which can reduce the “s” sound, is simply a compres-
sor that has a high-pass filtered (around 5 kHz) version of the vocal as its side-chain or key
input. De-essers tend to work most effectively with very fast attack and release times.

4.4 Expanders and Gates


Most of the controllable parameters on an expander are similar in function to a compressor,
with a couple of exceptions: attack and release times. These two parameters need to be
considered in relation to an audio signal’s level, rather than in relation to gain reduction.

Threshold
Expanders modify the dynamic range of an audio signal by attenuating it when its level
falls below some predefined threshold, as opposed to compressors, which act on signal levels
above a threshold. A gate is simply an extreme version of an expander and usually mutes a
signal when it drops below a threshold.

Attack Time
The attack time on an expander is the amount of time it takes for an audio signal to return
to its original level once it has gone above the threshold. Like a compressor, the attack time
is the amount of time it takes to make a gain change after a signal goes above the threshold.
In the case of a compressor, a signal is attenuated above threshold. With an expander, a
signal returns to unity gain above threshold.

Release Time
The release time on an expander is the time it takes to complete its gain reduction once
the input signal has dropped below the threshold. Release time, for both expanders and
compressors, is not determined by a particular direction of level control (that is, boost or
cut), it is defined with respect to a signal level relative to the threshold. During release time
on an expander, the signal level is reduced; during the release time on a compressor, the
signal level is increased. In both cases, the gain change happens because the signal level goes
above the threshold.

Visualizing the Output of an Expander


Figure 4.9 shows the effect that an expander can have on an amplitude-modulated sine
wave; these three output signals are also known as the expander’s step responses. Like com-
pressors, expanders often have side-chain inputs that can be used to control an audio signal
with a secondary signal. For instance, we can gate a low-frequency sine tone (around 40
or 50 Hz) with a kick drum signal sent to the side-chain input of the gate. The result is
a low-frequency sine tone that sounds only with each kick drum hit. The two sounds—
kick and sine tone—can be mixed together to create a new timbre that accentuates the
low end of the drum.
Figure 4.10 shows a clip from a music recording with the gain function derived from the
audio signal and parameter settings, and the resulting output audio signal. Low-level sections
of an audio signal are reduced even further in the expanded audio signal.
Figure 4.9 This figure shows the step response of an expander for three different attack and release times:
long (A), medium (B), and short (C). The input signal was a step function as shown in Figure 4.5.

Figure 4.10 From an audio signal (A) sent to the input of an expander, a gain function (B) is derived based on
expander parameters and signal level.The resulting audio signal output (C) from the expander is the
input signal with the gain function applied to it.The gain function shows the amount of gain reduc-
tion applied over time, which varies with the amplitude of the audio signal input. For example, a gain
of 1 (unity gain) results in no change in level, and a gain of 0.5 reduces the signal by 6 dB. For these
measurements, the threshold was set to −6 dB, which corresponds to 0.5 in audio signal amplitude,
so every time the signal drops below 0.5 in level (−6 dB), the gain function shows a reduction in level.
94 Dynamic Range Control

4.5 Getting Started with Practice


All of the “Technical Ear Trainer” software modules are available on the companion website:
www.routledge.com/cw/corey.
The recommendations on Getting Started with Practice in Section 2.5 are applicable to
all of the software exercises described in the book, and I encourage you to review those
recommendations on frequency and duration of practice. The overall functionality of the
software modules focused on dynamics processing, “Technical Ear Trainer—Dynamic Range
Compression” and “Technical Ear Trainer—Dynamic Range Expansion,” is very similar to
that of the equalization module. With the focus on dynamics there are different parameters
and qualities of sound to explore than there were with equalization.
The dynamics modules allow you to practice with up to three test parameters at a time:
attack time, release time, and ratio. You can practice with each parameter on its own or in
combination with one or two of the other parameters, depending on what “Parameter
Combination” you choose. Threshold is completely variable for all exercises and controls
the threshold for both the computer-generated “Question” as well as “Your Response.”
Because the signal level of a sound recording will determine how much time a signal spends
above a threshold, and I cannot predict how the level of every recording is going to relate
to a given threshold, I decided to maintain a fully variable threshold. In the compressor
module, the threshold level should initially be set fairly low so that the effect of the com-
pression is most audible. A makeup gain fader is included so that you can match the subjec-
tive levels of compressed and bypassed signals by ear if desired. In the case of the expander
module, a higher threshold will cause the expander to produce more pronounced changes
in level. Additionally, the input level can be reduced to further highlight dynamic level
changes.
The Level of Difficulty option controls the number of choices available for a given
parameter. With higher levels of difficulty, a greater number of parameter choices are avail-
able within each range of values.
The Parameter Combination determines which parameters will be included in a given
exercise. When working with a Parameter Combination that tests only one or two param-
eters, the remaining user controllable parameter(s) that are not being tested will control the
processing for both the “Question” and “Your Response” compressors.
The dynamic range control practice modules are the only ones of the entire ear training
software collection in which the computer can choose “no effect” or “flat” as a possible
question. Practically, this means that a ratio of 1:1 could be chosen, but only when the
Parameter Combination includes “ratio” as one of the options. When you encounter a
question in which you hear no dynamic range control, indicate as such by selecting a ratio
of 1:1, which is equivalent to bypassing the module. If a question has a ratio of 1:1, all other
parameters will be ignored in the calculation of question and average scores.
Figure 4.11 shows a screenshot of the dynamic range compression software practice module.

Practice Types
There are three practice types in the dynamics software practice module: Matching, Match-
ing Memory, and Absolute Identification:

• Matching. Working in Matching mode, the goal is to duplicate the dynamics processing
that has been applied by the software. In this mode, you are free to switch back and forth
Dynamic Range Control 95

Figure 4.11 A screenshot of the software user interface for the Technical Ear Trainer practice module for
dynamic range compression.

between the “Question” and “Your Response” to determine if the dynamics processing
chosen matches the unknown processing applied by the computer.
• Matching Memory. Similar to Matching, this mode allows free switching between “Ques-
tion,” “Your Response,” and “Bypass” until one of the question parameters is changed.
At that point, the “Question” is no longer selectable and you should have memorized its
sound well enough to determine if the response is correct.
• Absolute Identification. This practice mode is the most difficult and requires identification of
the applied dynamics processing without having the opportunity to listen to what is chosen
as the correct response. You can audition only “Bypass” (no processing) and “Question” (the
computer’s randomly chosen processing parameters); you cannot audition “Your Response.”

Sound Source
Any sound recording in the format of AIFF or WAV at a 44,100- or 48,000-Hz sampling
rate can be used for practice. There is also an option to listen to the sound source in mono
or stereo. If a sound file loaded in contains only one track of audio (as opposed to two),
the audio signal will be sent out of the left output only. By pressing the mono button, the
audio will be fed to both left and right output channels.
96 Dynamic Range Control

Recommended Recordings for Practice


A few artists are making multitrack stems available for purchase or free download. Single
drum hits are useful to begin training, and then it makes sense to progress through to drum
kits, as well as other solo instruments and voice.
A few websites exist with free sound samples and loops that can be used for practice,
such as www.freesound.org, www.royerlabs.com, www.telefunken-elektroakustik.com, and
www.cambridge-mt.com/ms-mtk.htm, among many others. There are also excerpts or loops
of various solo instruments bundled with Apple’s GarageBand and Logic Pro that can be
used with the ear training software.

Summary
This chapter discusses the functionality of compressors and expanders and their sonic effects
on an audio signal. Dynamic range controllers can be used to smooth out fluctuating levels
of a track, or to create interesting timbral modifications that are not possible with other
types of signal processing devices. The compression and expansion software practice modules
are described, and I encourage readers to use them to practice hearing the sonic effects of
various parameter settings.

Note
1. “Mastered for iTunes: Music as the Artist and Sound Engineer Intended” http://images.apple.com/itunes/
mastered-for-itunes/docs/mastered_for_itunes.pdf
Chapter 5

Distortion and Noise

Throughout the recording, live sound, mixing, and post-production processes, we encounter
technical issues that can introduce noise or degrade our audio signals inadvertently. If we
do not resolve technical issues that create noise and distortion, or if we cannot remove noise
and distortion from our audio, listeners’ attentions can get pulled toward these undesired
artifacts and away from the intended artistic experience of the audio. You may have heard
the saying that the only time average listeners notice sound quality is when there is a prob-
lem with the audio. In other words, if average listeners do not think about the audio but
simply enjoy the artistic experience of a recording, concert, game, or film, then the audio
engineer has done a great job. The audio engineer’s job is to help transmit an artist’s inten-
tions to an audience. It becomes difficult for listeners to completely enjoy an artist when
engineering choices add unwanted sonic artifacts that cause listeners’ attentions to be dis-
tracted from an artistic experience. When recording technology contributes negatively to a
recording, listeners’ attentions become focused on artifacts created by the technology and
drift away from the musical performance. Likely almost everyone, sound engineer or not, is
familiar with the screech of feedback or howlback when a microphone-amplifier-speaker
sound reinforcement system feeds back on itself. Although sound engineers work hard to
avoid feedback, it can be loud and offensive to listeners and artists, and unfortunately it
reminds listeners that there is audio technology between them and the artist they are hear-
ing. Feedback is so common in live sound reinforcement that film and TV sound designers
add a short bit of feedback sound at the beginning of a scene in which a character is speak-
ing into a voice reinforcement system. Once we hear that little feedback sound cue, we
know the character’s mic is amplified through a public address (PA) system. Feedback is
probably the most extreme negative artifact produced by audio systems, and when it’s loud
it can be painful to our ears. Many artifacts are much more subtle than howling feedback,
and even though average listeners may not consciously identify them as problems, the artifacts
detract from listeners’ experiences. As sound engineers we want to be aware of as many of
the sonic artifacts as possible that can detract from a sound recording, and as we gain expe-
rience in critical listening, we increase our sensitivity to various types of noise and
distortion.
Distortion and noise are the two broad categories of sonic artifacts that include variations
and subcategories. Most of the time we try to avoid them, but sometimes we use them for
creative effect. They can be present in a range of levels or intensities, so it is not always easy
to detect lower levels of unwanted distortion or noise. In this chapter we focus on extrane-
ous noises that sometimes find their way into a recording as well as forms of distortion,
both intentional and unintentional.
98 Distortion and Noise

5.1 Noise
Some composers and performers intentionally use noise for artistic effect. In fact there are
whole genres of music that emphasize noise as an artistic effect, such as noise rock, industrial
music, Japanese noise music, musique concrète, sampling, and glitch. Experimental and avant-
garde electronic and electroacoustic music composers and performers often use noise to
create musical effects, and they delight in blurring the line between music and noise. One
of the earliest examples is by French composer Pierre Schaeffer called “Étude aux chemins
de fer” [Railway Study], a musique concrète piece that he composed in 1948 from his record-
ings of train sounds.
From a conventional recording point of view, we treat noise, in its various forms, as an
unwanted signal that enters into our desired signals. As we discussed above, noise distracts
listeners from the art we are trying to present. We need to consider whether extraneous
noises, which may enter into our recording, serve an artistic goal or simply distract listeners.
Sources of noise include the following:

• Clicks: Transient sounds resulting from equipment malfunction or digital synchronization


errors.
• Pops: Sounds resulting from plosive vocal sounds.
• Ground hum and buzz: Sounds originating from improperly grounded systems.
• Hiss, which is essentially low-level white noise: Sounds originating from analog electronics,
dither, or analog tape.
• Extraneous acoustic sounds: Sounds that are not intended to be recorded but that exist in
a recording space, such as air-handling systems or sound sources outside of a recording
room.
• Radio frequency interference (RFI): Audio production equipment can sometimes make an
excellent, but undesired, radio receiver.

First, let’s discuss unwanted noise that detracts from the quality of a sound recording. Ground
hum and buzz, loud exterior sounds, radio frequency interference, and air-handling (HVAC)
noise are some of the many sources and types of noise that we seek to avoid when making
recordings in the studio. Frequently noise exists at a low, yet still audible, level and may not
register significantly on a meter, especially in the presence of musical audio signals. Therefore
we need to use our ears to constantly track sound quality. Noises of all kinds can start and
stop at seemingly random times, so we must remain attentive at all times.

Clicks
Clicks are various types of short-duration, transient sounds that contain significant high-
frequency energy that originate from electronic equipment. Malfunctioning analog equip-
ment, loose analog cable connections, connecting and disconnecting analog cables, and digital
audio synchronization errors are all causes of unwanted clicks.
Clicks resulting from analog equipment malfunction can often be random and sporadic,
making it difficult to identify their exact source. In this case, meters can be helpful to indi-
cate which audio channel contains a click, especially if clicks are present in the absence of
program material. A peak hold meter can be invaluable in chasing down a problematic piece
of equipment, because the meter holds the peak level if we happen to miss seeing it when
the click occurs.
Distortion and Noise 99

Loose or dirty analog connections may randomly break a connection, causing dropouts,
clicks, and sporadic noise bursts. When we make analog connections in a patch bay or
directly on a piece of equipment, we create signal discontinuities and therefore also clicks
and pops. Breaking a phantom powered microphone signal can make a particularly loud
pop or click that can damage not only the microphone but also any loudspeakers that may
try to reproduce the loud click.
With digital connections between equipment, it is important to ensure that sampling rates
are identical across all interconnected equipment and that clock sources are consistent.
Without properly selected clock sources in digital audio, clicks are inevitable and will likely
occur at some regular interval, usually spaced by several seconds. Clicks that originate from
improper clock sources are often fairly subtle, and they require vigilance to identify them
aurally. Depending on the digital interconnections in a studio, the clock source for each
device needs to be either internal, digital input, or word clock.

Pops
Pops are transient thump-like sounds that typically have more significant low-frequency
energy than clicks. Usually pops occur as a result of vocal plosives that are produced in
front of a microphone. Plosives are consonant sounds, such as those that result from pro-
nouncing the letters p, b, and d, in which a singer or speaker produces a burst of air when
producing these consonant sounds. If you hold your hand up in front of your mouth and
make a “p” sound, you can feel the little burst of air coming from your mouth. When this
burst of air reaches a microphone capsule, the microphone produces a low-frequency, thump-
like sound. Usually we try to counter pops during vocal recording by placing a pop filter
in front of a vocal microphone. Pop filters are usually made of thin, acoustically transparent
fabric stretched across a circular frame.
We do not hear pops from a singer when we listen acoustically in the same space as the
singer. The pop artifact is purely a result of a microphone’s response to a burst of air pro-
duced by a vocalist. Pops distract listeners from a vocal performance because they are not
expecting to hear a low-frequency thump from a singer. Even if the song has a kick drum
in the mix, often a vocalist’s pop will not line up with a kick drum hit. We can filter out
a pop with a high-pass filter, making sure the cutoff frequency is low enough not to affect
low harmonics in the voice, or inserted only during the brief moment while a pop is
sounding.
Listen for low-frequency thumps when recording, mixing, or providing live sound rein-
forcement for sung or spoken voice. In live sound situations, the best way to remove pops
is to turn on a high-pass filter on the respective mixer channel or turn on the high-pass
filter on the microphone itself if it has one.

Hum and Buzz


Improperly grounded analog circuits and signal chains can cause noise in the form of hum
or buzz that is introduced into analog audio signals. Both are related to the frequency of
electrical alternating current (AC) power sources, also referred to as mains frequency in some
places. The frequency of a power source will be either 50 Hz or 60 Hz depending on
geographic location and the power source being used. Power distribution in North America
is 60 Hz, in Europe it is 50 Hz, in Japan it will be either 50 or 60 Hz depending on the
specific location within the country, and in most other countries it is 50 Hz.
100 Distortion and Noise

When a ground problem is present, there is either a hum or a buzz generated with a
fundamental frequency equal to the power source alternating current frequency, 50 or 60
Hz, with additional harmonics above the fundamental. A hum is identified as a sound con-
taining primarily just lower harmonics and buzz as that which contains mainly higher
harmonics.
We want to make sure we identify any hum or buzz before recording, when the problem
is easier to solve. Trying to remove such noises in postproduction is possible but will take
extra time. Because a hum or buzz often includes numerous harmonics of 50 or 60 Hz, a
number of narrow notch filters are needed, each tuned to a harmonic, to effectively remove
all of the offending sound. Sometimes this is the only option to remove the offending noise,
but these notch filters also affect our program material, of course.
Hum can also be caused by electromagnetic interference (EMI). If we place audio cables
(especially those carrying microphone level signals) alongside power cables, the power
cables can induce hum in the adjacent audio lines. An audio cable’s proximity to power cables
matters, so the farther away the two can be, the better. If they do need to cross, try to
make the crossing a 90-degree angle to reduce the strength of the electromagnetic field
that crosses the audio cable. Although we are not going to discuss the exact technical and
wiring problems that can cause hum and buzz and how such problems might be solved,
there are many excellent references that cover the topic in great detail, such as Giddings’s
book titled Audio Systems Design and Installation, a classic reference that has recently been
republished.
One of the best ways we can check for low-level ground hum is to bring up monitor
levels with microphones on and powered but while musicians are not playing. If we eventu-
ally apply dynamic range compression with makeup gain to an audio signal, what was once
inaudible low-level noise could be much more audible. If we can apprehend any ground
hum before getting to that stage, our recording will be much cleaner.

Extraneous Acoustic Sounds


Despite the hope for perfectly quiet recording spaces, there are often numerous sources of
noise both inside and outside of a recording space that we must deal with. Some of these
are relatively constant, steady-state sounds, such as air-handling noise, whereas other sounds
are unpredictable and somewhat random, such as car horns, people talking, footsteps, noise
from storms, or simply when we drop items or bump a microphone stand in the studio.
With most of the population concentrated in cities, sound isolation can be particularly
challenging as noise levels rise and our physical proximity to others increases. Besides air-
borne noise there is also structure-borne noise, where vibrations are transmitted through
building structures and end up producing sound in our recording spaces. Professionally built
recording studios are often constructed with what is called floating floors and room-in-room
construction to maximize sound isolation.
Keep your ears open for extraneous acoustic sounds. They can pop up at seemingly ran-
dom times. We need to monitor our audio constantly to identify them.

Radio Frequency Interference (RFI)


Radio station and cell phone signals are sometimes demodulated down to the audio range
and then amplified by our audio equipment. With radio station interference we hear what
a local FM radio station is broadcasting. The resulting audio is mainly high-frequency con-
tent, but it is annoying and distracting nonetheless. Cell phone interference usually sounds

You might also like