Soundhack Manual v0.888
Soundhack Manual v0.888
version 0.888
Tom Erbe
School of Music
CalArts
INTRODUCTION
SoundHack is a soundfile processing program for the Macintosh. It performs
many utility and esoteric sound processing functions available nowhere else.
These functions make SoundHack invaluable to computer musicians, sound
effects designers, multimedia artists, webmasters and anyone else who enjoys
working with sound.
Sound Processing
Utility Functions
• Play almost any type of soundfile (including AU, AIFF and WAVE).
• Record any size soundfile from the Macintosh sound input.
• Import soundfiles from audio CDs.
• Convert between different types of soundfiles with optional gain scaling
and sample rate conversion.
• Change values in the soundfile header (sample rate, number of channels,
loop points and marker info).
• Read and write Sound Designer II, Audio IFC, Audio IFF, BICSF (IRCAM),
DSP Designer, QuickTime/AIFF, Microsoft WAVE (RIFF), NeXT .snd, Sun .au,
ULaw, IMA4, TEXT and headerless (raw) soundfiles.
The rest of this document is a small tutorial (in progress) then a menu by menu
description of SoundHack. Please write me if you have any problems or
suggestions!
Tom Erbe
CONTENTS
Introduction . . . . . . . . 2
Contents . . . . . . . . 3
System Requirements . . . . . . . 4
Shareware Info
On the Internet . . . . . . . 5
New Features and Bug Fixes
Tutorial . . . . . . . . 6
File Menu . . . . . . . . 14
Record New...
Open...
Open Any... . . . . . . . 15
Close
Close & Edit . . . . . . . 16
Save a Copy...
Split Into Mono Files...
Play File
Import SND Resource... . . . . . 17
Export SND Resource...
Import CD Track... . . . . . . 18
Quit
Edit Menu
Hack Menu
Header Change... . . . . . . 19
Loops & Markers...
Binaural Filter...
Convolution... . . . . . . 20
Gain Change... . . . . . . 22
Mutation...
Phase Vocoder... . . . . . . 26
Spectral Dynamics... . . . . . . 27
Varispeed... . . . . . . . 28
Spectral Extractor... . . . . . . 29
Spectral Analysis...
QT Coder . . . . . . . 31
Normalize . . . . . . . 32
Draw Function...
SoundFile Menu . . . . . . . 33
Control Menu
Show Signal
Show Spectrum . . . . . . 34
Show Sonogram
Pause Process
Continue Process
Stop Process
Load Settings... . . . . . . 35
Save Settings...
Default Settings
Preferences...
Future Plans
History of Bug Fixes And Revisions . . . .. . 36
Bibliography . . . . . . . . 37
Acknowledgments . . . . . . . 38
SYSTEM REQUIREMENTS
SoundHack requires System 7.0 to operate, Sound Manager 3.0 to playback
soundfiles and Quicktime 2.0 to import CD audio tracks and read and write
sonograms. It comes in 3 flavors to accommodate different hardware
platforms.
SoundHack NF runs on all Macintoshes, albeit very slowly. This is the only version
that will work on 68LC040 based Macintoshes.
SoundHack FPU runs on 680x0 Macintoshes with hardware floating point. The
following 3 CPU/FPU configurations will work: 68020+68881, 68030+68882 and
68040. It will also run on FPUless Macintoshes (except 68LC040 machines) with
John Neil's excellent SoftwareFPU floating point emulator. SoftwareFPU is
available on most Macintosh software archives or from John Neil & Associates
([email protected]).
SHAREWARE INFO
SoundHack was once shareware/musicware, but no more. You may freely
do whatever you want with it. If you still want to send me some of your art,
music or favorite postcard my address is:
Tom Erbe
608 Carla Way
La Jolla, CA 92093
If you do not have internet access, you can buy it for $65 from my distributor.
ON THE INTERNET
The current shareware version of SoundHack (SoundHack, SoundHackNF
and documentation) is available at www.soundhack.com
Bug Fixes
+ AIFF uses the common chunk order
+ AIFF uses correct blocking and chunk boundaries
+ AIFF, AIFF-C and WAVE no longer overrun the end of the sound chunk (no
end
of file click)
+ Large Kaiser windows don't crash program.
+ Playback returns to beginning after reaching end.
+ Very short soundfiles playback OK.
+ Lack of QuickTime doesn't crash PPC version.
+ Better memory conservation.
+ QuickTime files now contain proper Moov resource.
1. Spatialization
SoundHack provides spatialization of monaural soundfiles with its binaural
filter. This filter, known as the head-related transfer function (HRTF) emulates
the filtering action of your head and outer ear for any position around your
head. The following example will spin a sound twice around your head.
Note that some sounds spatialize better than others. One of the main cues for
spatialization is the inter-aural time difference or ITD. This is the delay from the
moment the sound is heard at the near ear to the time the sound is heard at
the far ear. Because the ITD is so important, sounds with little temporal
variation will not spatialize well. For example, a snare drum is much easier to
locate than a constant flute tone. In addition to the ITD, keep in mind that
most of the spectral differences between the near and far ear are above
1500 Hz, as frequencies below this point will diffract around your head. Sounds
with little high frequency energy will therefore spatialize poorly.
Once the soundfile is open, type command - B to bring up the binaural filter
dialog. Here you can set the various parameters for a binaural spatialization.
For instance, you could set a specific azimuth angle by typing in the Angle:
box or by
clicking on one of the radio buttons. You can set the height to be above,
below or at ear level (though the detection of height is usually dependent on
head movement, so it is not well simulated with a static filter).
To spin the sound around your head, click the box called Moving Angle.
When you do this, the Angle Function button will appear. Click on this button
to bring up the Draw Function Window.
This window allows you to draw a curve to control the angle of the sound as
it moves around your head. You will see two legends on the bottom of this
window: Time:, which indicates the current time in the input soundfile, and
Degrees:, which is the azimuth (0 degrees is straight ahead). You will now use
one of the presets to smoothly spin a sound twice around your head. First,
type "2.0" in the Cycles box,
then click on the ramp function icon (see the mouse arrow in the picture to
the left). This creates two cycles of a ramp function, which will control the
azimuth angle during processing. Now click the Done button in this dialog (the
Draw Function Window will disappear), and then click the Process button in
the binaural function dialog. The "Save Soundfile as: " dialog will now appear,
which has options to set the soundfile type and format. You will probably just
want to set the soundfile type to Audio IFF and 16 Bit Linear, as this is the
format most commonly used by sound editors. After you click Save,
SoundHack will start processing. All you need to do now is wait. On a 33MHz
68040, the compute time to real time ratio is 55 to 1 (that is, it takes 55 seconds
to compute 1 second of
One of the main features of SoundHack is the ability to distort the timebase
and/or pitch of the soundfile. In this example, we will use a simple varispeed
(variable sample rate conversion) to modulate both the pitch and timebase.
3. Pitch shifting
SoundHack provides pitch shifting without time scaling with a technique
known as the phase vocoder. In this technique, the sound to be shifted is sent
into a bank of band-pass filters which are evenly spaced from 0 Hz to half the
sample rate. SoundHack measures the amplitude and phase for each
frequency at the output of this filter bank,. These amplitudes, phases and
frequencies are used to control a bank of oscillators. Pitch shifting simply
involves multiplying each frequency by a factor.
In this example, you will shift a soundfile up one octave. You should open a
soundfile as you did in the previous example (command - O). Pitch shifting is
not limited by sample rate or number of channels; any type of soundfile will
work. However, this effect sounds best with sounds that are not harmonically
dense. If more than one partial appears in any band of the filter bank, inter
modulation distortion will result in that band.
Just as the phase vocoder technique allows pitch shifting without time scaling,
it also allows time scaling without pitch shifting. In this example, you will stretch
a soundfile to twice its length.
5. Cross-synthesis nr. 1
Cross-synthesis is the combining of two sounds to create a new sound. This is
done by analyzing and extracting significant characteristics from the two
source soundfiles, then combining these characteristics in the synthesis of the
new soundfile. SoundHack has two functions which perform cross-synthesis,
convolution and mutation. In this example you will try convolution.
Convolution takes two sounds, analyses them for spectral content, then
multiplies the two spectra to create a new sound. This emphasizes frequencies
which are held in common and greatly reduces frequencies which are not.
For an interesting convolution it is good to start with sounds that have spectra
which are neither too similar nor dissimilar.
Open one of the two source soundfiles by typing command - O. Type
command - C to bring up the Convolve dialog. Open the second of the
source soundfiles by clicking on the Pick Impulse button. Click Moving to have
convolution move through the second soundfile, otherwise this process will
convolve against a fixed segment of the second soundfile. Click Brighten to
avoid over-attenuation of high frequencies. Select Triangle in the Window:
popup menu to smooth the convolution. If you prefer an unsmoothed
convolution, select Rectangle.
Now set the Length Used: number. This is used to designate how much sound
gets processed in each block. You can treat this number like the decay time
of a reverb, because convolution has a sound which is like a very complex,
Now that everything is set, click Process. Because the gain in convolution is
unpredictable, save the new soundfile in a format with a large dynamic
range. When the Save Soundfile as: dialog appears, select a File Type: of
NeXT (.snd) and a File Format: of 32 Bit Floating Point. Click Save and wait.
Once your convolution is done, you will need to convert the new soundfile
from a floating point format to something usable (usually 16 bit linear). You will
also need to adjust the optimum gain for this conversion. To do this, select the
new soundfile, then type command - G. The Gain Change dialog will appear.
Click Analyze to find the peak amplitude, then click Change Gain to create a
new, normalized soundfile. In the Save Soundfile as: dialog, set the File Format:
to 16 Bit Linear.
6. Noise reduction
As you have seen in the previous examples, much of SoundHack's processing
involves the spectral analysis of an input soundfile and subsequent
resynthesis of an output soundfile. One very practical use of spectral
analysis/resynthesis is noise reduction, where noise components are identified
and reduced before resynthesis. A very simple scheme for identifying noise
components is to assume that any spectral component with an amplitude
below a certain threshold is noise. This works well when the noise is fairly
constant, and not too loud (tape hiss, for instance).
Now click Process and let it go. After SoundHack creates a few seconds of
sound, it is probably a good idea to pause processing and play the output
soundfile. The parameters for noise reduction usually take a lot of
experimentation.
With a 33MHz 68040, and 44100, monaural soundfiles, the process time to real
time ratio for spectral dynamics is 70 to 1.
7. Cross-synthesis nr. 2
In this final example, you will use the second form of cross-synthesis in
SoundHack, spectral mutation. Mutation measures the spectral change over
time in two soundfiles (called the source and target) and resynthesizes a new
soundfile (the mutant) through various strategies of combination.
For mutation, you should choose soundfiles that have similar spectral
characteristics. When the soundfiles are too different, the mutant seems to
just fluctuate between the source and target sounds, without producing
many interesting merged sounds. Open one of the soundfiles (the source)
with command-O. Type command - M to bring up the Spectral Mutation
Function dialog and open the second soundfile (the target) by clicking on the
Pick Target button. Set the mutation Type: to LCM/IUIM. The differences
between various mutation types are discussed later in this manual (page 21).
Click the Function box so that you can vary the mutation index. For the
LCM/IUIM type, an index of 0.0 produces a mutant of all source and an index
of 1.0 produces a mutant of all target. Now click Edit Function... to bring up the
Draw Function Window.
To keep the mutation index near the center of its range, set the function limits
to 0.7 and 0.3. Set the number of cycles to 4.0 and click the sine wave preset
icon (see mouse arrow in picture). This will produce a mutant which oscillates
slowly from almost the source to almost the target. Click Done in this window
and Process in the Spectral Mutation Function dialog. Mutations are very hard
to predict, so a lot of experimentation is required before you will get satisfying
results. The main factor in the success or failure of a mutation is the choice of
source and target soundfiles.
SOUNDHACK REFERENCE
FILE MENU
(Command - N) Record New...
If you have an input device, this will allow you to record an AIFF soundfile. You
should first set your desired sample rate, sample size and number of channels,
and then click Create File... to create the soundfile. Once you have initialized
the soundfile the popup menus will no longer be accessible. Options... allows
you to change settings peculiar to your input device. You will then be able to
start recording by clicking Record. Stop will stop the recording, open the
soundfile and add it to the SoundFile menu. Some slower hard disks may
cause glitches while recording at the highest rate.
(Command - O) Open...
Clicking Open All will open all of the soundfiles in the current folder. Clicking
Play will play the soundfile.
Soundfiles with the following file types will appear in this dialog box:
If the file doesn't have a Macintosh 4 character file type, it will still appear in
the open file dialog box provided it has one of the following name extensions:
Once the soundfile is open, it is added to the SoundFile menu and the
soundfile information dialog box appears. This dialog box gives the name,
sample rate, length in seconds, number of channels, type and numeric
format of the soundfile. You are no longer restricted (as you were in earlier
versions of SoundHack) to having only one soundfile open at a time. The file
with the front-most dialog box (which I will refer to as the "selected soundfile") is
used as the input soundfile to the processes under the Hack menu.
Files can be opened with the open document Apple Event ('ODOC') in
SoundHack version 0.860 and later.
-0.054688
-0.015625
-0.007812
0.015625
0.000000
-0.117188
Soundfiles opened with Open Any... must be saved to another format before
being processed. You should set the proper sample rate and data format
with the Header Change... dialog before making this conversion.
(Command - W) Close
Closes the selected soundfile.
Saves a copy of the selected soundfile in any supported soundfile format. This
is the main command for those of you who are just copying from one
soundfile format to another.
This will allow you to convert an Apple sound resource ('snd ') to a soundfile.
This does not yet work with compressed 'snd ' resources or 16-bit resources.
(Command - E) Export SND resource...
This will allow you to write part of the selected soundfile into an Apple sound
resource (also known as a double-click able soundfile). The length of the
sound resource exported is limited to the amount of memory allocated to
SoundHack. This will only make 8-bit
' snd' resources.
(Command - =) Import CD Track...
This allows you to import a CD audio track and save it to an AIFF soundfile.
After selecting the track (make sure you know the one you want before
hand!), the Quicktime Audio CD Import Options dialog box will appear. From
this box you can set the sample rate, sample size, number of channels and
the portion of sound to import for the AIFF soundfile. You can also preview the
sound (though with only 8-bit quality). Importing sounds takes a long time, so
be prepared to wait. Quicktime 2.0 and a CD-ROM drive is required for this
feature. Importing may only work reliably on an Apple CDROM drive, though
some have reported success with the HDT CDROM Tool kit and third party
drives.
(Command - Q) Quit
This command allows you to do something else.
EDIT MENU
The required Macintosh functions. They are not too useful in this program
though (as SoundHack is not an editor).
HACK MENU
This is where all the soundfile processing is. Most of the things in this menu
involve a lot of calculations, and take time! SoundHack takes over your Mac
to do these, so if it appears that your Mac has frozen up, it probably hasn't.
Spectral Mutation is the slowest of these functions, so have patience.
Allows you to change the sample rate, number or channels, and data format
of the selected soundfile. If you open a headerless file, you should use this
dialog to set things properly before saving a copy.
To use the binaural filter, enter the desired position in the Angle text box (in
degrees) or click the appropriate radio button. This processing module has
filter data for 12 positions. If you enter an angle between 2 positions you will
get a filter which is the mix of the 2 filters around it. The Moving Angle box will
allow you to do moving spatialization.
(Command - C) Convolution...
This process takes 2 soundfiles: an input (the selected soundfile) and an
impulse response file. It multiplies the spectra of the 2 files together, producing
a new soundfile. The effect is a type of cross-synthesis, in which common
frequencies are reinforced. In this implementation of convolution the sound is
processed block by block, with each block as large as the impulse response.
The Length Used window allows you to designate how much of the impulse
response file to use. The kilobytes Needed number is an estimate of the
application memory size that needs to be set for processing. If you want to
use large impulse responses, that is, convolutions which cause this number to
go over 1200 (the default size), you will have to quit SoundHack and reset the
application memory size.
Checking the Ring Modulate box allows for ring modulation (or convolution in
frequency) between 2 soundfiles. The Brighten box applies a simple +6dB per
octave high-pass filter to the impulse. This is useful as most natural sounds
have a roll-off from 6-12 dB per octave. Convolve two of these sounds and
your result will have a roll-off from 12-24 dB per octave, much too dull. Brighten
is a simple (probably too simple) fix for this.
Checking the Window popup will cause SoundHack to apply the selected
envelope onto the impulse before convolution, resulting in a smoother
convolution. A smoothing window is desirable when performing a moving
impulse response convolution (described below), since the impulse response
will be changing for each block of samples processed. This is also true for
moving ring modulation. The triangular window is probably the best for
smoothing, a rectangular window is the same as no window at all. Check the
Normalize box if you want the output to be normalized after
computation.
The first 1.0 second frame of the input file (A) will be convolved with the first 1.0
second frame of the impulse response file (ab). Then the window on the input
file is moved 1.0 second forward to B, but the window on the impulse response
file is moved only 0.5 seconds to bc. This is so both files will finish at the same
time. (Actually, the impulse response file reaches the end first and the last
impulse response is zero-padded). In the other case, when the impulse
response file is longer than the soundfile, sections of the impulse response file
will be skipped over. It is a good idea to save the output in a floating point
format if using the moving impulse. Since the impulse is continually changing,
the scaling is unpredictable.
Click on Analyze and the peak amplitude, peak position (in samples), RMS
values
and DC offset will be calculated. The gain factors will be set to normalize both
channels independently and the additional offsets (a number between 1.0
and -1.0) will be set to correct the file. Change Gain will create a new file
adjusted by the gain factors set. If you are dealing with a monaural file, only
the channel 1 information is applicable. Analyze will tie up your machine
when it is doing its stuff, so please be patient. I will fix this in a future version.
(Command - M) Mutation...
(This section of the manual is written by Larry Polansky.)
The seven different spectral mutation functions (USIM, ISIM, IUIM, UUIM, LCM,
LCM/IUIM, LCM/UUIM ) produce different types of timbral "cross-fades." Each
mutation takes 2 soundfiles: a source and a target , and returns a third
soundfile, called the mutant . The mutation functions operate on the
phase/amplitude pair of each frequency band of the source and target
spectra. The output of the functions is a phase/amplitude pair for each
frequency band in the mutant soundfile. Each phase/amplitude pair in the
mutant is some "combination" of the phase/amplitude pairs of the source
and target, for the corresponding frequency band. The mutations work on
the sign (Contour) or the magnitude of an interval, or both. They change
completely a selected number of bands from the source to the target
(Irregular) or partially change all frames (Uniform).
Type
The Type box allows you to select between seven quite different mutation
functions: USIM (Uniform Signed Interval Mutation), ISIM(Irregular Signed), IUIM
(Irregular Unsigned), UUIM (Uniform Unsigned), LCM (Linear Contour
Mutation), and the concatenations LCM/IUIM and LCM/UUIM. (For specific
definitions of the functions, see below. For more information see the two
articles cited in the bibliography).
Try starting with the simplest ones, the USIM (a simple spectral crossfade) and
the ISIM (a spectral replacement). These two mutations, unlike the UUIM and
LCM, actually arrive at the source or target, depending on which direction
you mutate (that is, you will actually hear the source or target with Ω = 0.0 or Ω
= 1.0, respectively). The IUIM, UUIM and the LCM will, with Ω = 1.0, give an
image of the target. These "incomplete mutations" mutate either the sign or
the magnitude, but not both of the intervals between the amplitudes of
successive spectral bands.
Mutation Index ( )
Each of the mutation functions uses an index, called Ω (omega), or an Ω-
function. This determines the amount of spectral mix, from 0 to 1, between the
source and target resulting in the mutant. Ω = 0 results in all source file, Ω = 1 all
target. Ω may vary over the course of the mutation. A constant index will result
in a sound which is a spectral mix of the source and target. More dynamic
sounds are produced with an index function, which changes Ω over time.
Absolute Interval
There are two methods used by the mutation functions to compute intervals
between frequency bands: Absolute (the default) and Relative. You may
check or uncheck the Absolute Interval box to get these two methods. If
Absolute intervals are used, you may specify an absolute amplitude value
between 0.0 and 1.0 for the source and target (Source Abs. Value, Target
Abs. Value), from which all intervals will be taken. The choice of values can
produce interesting effects, often "centering" the frequencies in which the
mutation happens, making the mutations themselves less extreme. If two
different values are used, amplitudes will be "transposed" from the source to
the target. The use of Absolute intervals rather than Relative will be most
noticeable for the LCM, IUIM and UUIM , as well as the concatenated
mutations. Low values (around .1 - .2) are a good place to start (note that .1
means 1/10th of the total amplitude of the soundfile's spectra).
Delta Emphasis
If the mutation uses Relative Intervals (Absolute Interval box unchecked), you
can set a value for Delta Emphasis (DE). DE allows control over the degree to
which successive mutation intervals are emphasized in the resulting mutant.
DE values range from -1.0 to 1.0, with the default at 0.0 (no emphasis or de-
emphasis). For positive DE values, the current frame's intervalic characteristics
will be emphasized more than the previous mutant frames. For negative
values, the current frame will be "damped," emphasizing the previous
information. One way to think about this is as a way of "slowing down" the
mutation: a negative DE value will keep the more chaotic mutations from
getting "out of control." A negative DE value will function as a low-pass filter,
averaging the previous spectral frames into the current output. Positive DE
values will accentuate the often high-frequency activity of the mutations.
Delta Emphasis can be useful in tailoring the relative interval mutations,
especially the "incomplete" ones.
In irregular mutations, not every frequency band is mutated for each FFT
frame. Ω determines the percentage of bands that are mutated for a given
frame. If a band is mutated, it completely assumes the particular
characteristic (interval sign or magnitude) of the target interval, and retains
either the sign or the magnitude of the source interval. For example, the LCM
takes the sign of the target interval, and "pastes" it onto the magnitude of the
source. However, it only does that for (Ω * #-of-bands). The selection of which
bands to mutate in irregular mutations is done stochastically, but setting Band
Persist high will ensure that once a band is mutated, it will keep being mutated
as long as possible. That is why a high value for Band Persist will stabilize these
highly unusual mutations, making them a bit more "well-behaved." A good
experiment is to try an irregular mutation (LCM , IUIM , ISIM , the
concatenations) with a fixed Ω, and two different values of Band Persist, one
high and one low.
The USIM and the ISIM are the simplest mutations, a spectral cross-fade and
spectral band-replacement, respectively The UUIM and the IUIM are two
different ways of "pasting" the magnitude differences of the target spectra
onto the sign of the source, resulting in a mutant which, when completely
mutated, is still some combination of the source and target. The LCM, perhaps
the most unusual sounding and difficult mutation to control, does the
opposite, pasting the signs of the target onto the magnitudes of the
source.
The functions are defined below. S and T are the source and target soundfiles.
Si , Ti are the amplitudes for a given frequency band of the FFT for the ith
frame of the sound. Sj , Tj are either the amplitudes of the same band in the
previous frame (Relative Interval) or some absolute amplitude (Absolute
Interval). Mi is the new amplitude of the given frequency band of the current
frame of the output sound, Mj is the amplitude for that band in the previous
frame of the output sound (Relative Interval), or some absolute amplitude
value between 0.0 and 1.0 (Absolute Interval). Tint , Sint and Mint are the
signed magnitude intervals between the amplitude of the current frame for a
given band, and the amplitude of that band in the previous frame (Relative)
or to some fixed amplitude (Absolute). Each of the equations below applies
to one frequency band of the source, target and mutant soundfile spectra.
In other words, for all of these functions, Si , Ti , and Mi run from 0 to the
number of bands in the FFT.
Notes:
1) In Absolute Interval mutations, Mj, the absolute amplitude to which the new
interval is added, is interpolated between the absolute values for source and
target, according to the value for Ω.
2) The Irregular mutations are in two forms: one for the case when that
particular frequency band is chosen for mutation, one for when it is not.
To use the phase vocoder, set the number of Bands to the number of filter-
oscillator pairs one would like to use. A large number of bands will give one
better frequency resolution, a small number of bands will give one better time
resolution. The Window menu allows one to chose different pre-FFT windows for
different filtering characteristics. Only the Hamming, von Hann and Kaiser will
give good results (the others are there only because I wanted to use a single
menu throughout the program for all window selection). The Overlap setting
adjusts the size of the filter window (relative to the number of filter bands) for
analysis and synthesis and thus, the sharpness of the filter. A large setting (4x)
will give the sharpest filter. A sharper filter will differentiate better between
frequencies which are between bands, but responds to amplitude changes
slower. Click the Time Scale button for time scaling, Pitch Scale for pitch
scaling. Type the scale factor in the Scale box. Click on the word Scale (a
popup menu) to specify time scaling by the length desired, or pitch scaling
by equal tempered semitones. If one wants the time expansion factor or the
pitch transposition factor to change during processing, click the Scaling
Function box, and the Draw Function... button. This will bring up the Draw
Function Window, which is described later in this document.
Resynthesis Gating performs a simple spectral gate which lets only some of
the spectral data through. If a band is below the Minimum Amplitude it is not
let through. Threshold Under Max. cuts off all bands which are lower than the
threshold below the peak band in a given block of samples. So if the peak
band has an amplitude of -7 dB and Threshold Under Max. is set to -40 dB, all
bands below -47dB will be cut off.
Most controls are self-explanatory. The first popup menu allows you to select
the type of process to use; gating/ducking, expansion or compression. The
second popup sets the number of filter bands to separate the sound into. 512
is a good compromise for the number of bands at a 44100 sample rate as
each band is about 43 Hz apart and the filters used have a (512*2)/44100 or
.023 second delay. In other words, a pretty good frequency resolution
(provided no partials are closer than 43 Hz) and not too much time smearing.
The Highest Band and Lowest Band boxes allow one to limit the frequency
range affected. 3rd Octave Band Grouping causes the bands to be grouped
with a 3rd octave spacing with a threshold trigger for each group (instead of
each band). This octave grouping may give a more natural sounding
dynamics process. The next box is either labelled Gain/Reduction, Expand
Ratio or Compress Ratio. It allows you to set the amount of gain or reduction
for the bands which are past the threshold when gating. For compression
and expansion it allows you to set the gain ratio. When affecting sounds
below the threshold, the compressor and expander hold the highest level
steady and affect lower levels (also known as "downward" expansion or
compression) . When the process is set to affect sounds above the threshold,
the compressor and expander hold the threshold level steady and
compress/expand up from there.
(Command - V) Varispeed...
You can control the separation with this dialog box. Set the Bands to a high
number if the sound being processed is harmonically dense, otherwise keep it
around 512. Setting the number of Frames allows you to set the size of the
analysis frame (in multiples of FFT frames). Set this higher if you are having
difficulty separating the pitched material, lower if you are having difficulty
separating the transient material. The two frequency values specify the
amount of change allowed during each analysis frame. In this example, if the
harmonic deviates by more than 5 hertz in 0.035 seconds it is put into the
transient soundfile, if the harmonic deviates by less than 2 hertz in 0.035
seconds, it is put in the stable soundfile.
This function draws heavily from the work of Zack Settel and Cort Lippe on the
ISPW workstation using Max-DSP. Thank you Zack and Cort for sharing a great
idea.
typedef struct
{
long magic; // 517730 for Csound files,
// 'Erbe' for SoundHack files
long headBsize; // byte offset from start to data
// (usually sizeof(SpectHeader))
long dataBsize; // number of bytes of data not including
// the header
long dataFormat; // (short) format specifier
// always 36 for floating point
float samplingRate;
long channels;
long frameSize; // number of points in FFT
// (number of bands * 2)
long frameIncr; // number of new samples each frame
// (frames overlap)
long frameBsize; // bytes in each file frame
// frameBsize = sizeof(float)
// * (frameSize >> 1 + 1) << 1;
long frameFormat; // this is either 3 for SoundHack
// files (amplitude & phase) or 7 for
// Csound files (amplitude & frequency)
float minFreq; // 0.0
float maxFreq; // maxFreq = samplingRate/2.0;
long freqFormat; // flag for log/lin frequency
// (always 1 for linear)
char info[4]; // extendible byte area
} SpectHeader;
If the spectral file is stereo, the frames are interleaved, first left then right.
Included with SoundHack is the source code for a simple spectral data
processor (Spectral Assistant) which should illustrate how to read and write this
format.
(Command - U) QT Coder
This function will convert your soundfile into a QuickTime™ movie and will
convert the QuickTime™ movie back into the same sound. You can also turn
any QuickTime™ movie into a soundfile.
There are very few settings in this function. The number of bands one chooses
will affect the size of the movie frame. 256 bands will require a 343x257 frame
(a 4:3 aspect ratio). Since the QT Coder uses 32-bit uncompressed images, it
consumes a lot of memory, and the memory need increases exponentially as
the number of bands increase. For example: a 512 band, monaural QT
Coding will require about 3 megabytes allocated to SoundHack.
Window: selects the FFT windowing function. Kaiser is best for band separation,
Hamming is best for smooth band transition. The choice of window is probably
not critical to most uses of this function. Phase Center Color:selects which
color will represent a ∆ phase change of zero degrees. This color usually
becomes the predominant color in the sonogram. Amplitude Range (dB):
determines which sounds are encoded in the QuickTime™ movie. For a high
fidelty encoding, 120 dB seems sufficient. Color Inversion inverts the RGB color
values.
After the QT Coder creates the QuickTime™ movie, you will be able to open it
in any QuickTime™ application and view, edit or modify the sonogram. After
doing this, you will be able to open the modified movie in SoundHack and
convert it back into sound using the QT Coder again.
One could also use the QT Coder to convert any QuickTime™ movie into a
soundfile. The resultant sound will tend to be repetitive, it will also tend to be
biased toward the high frequencies since the frequency in the frame is
interpreted in a linear fashion.
This function is rather experimental and I am not quite sure what it will be good
for yet.
(Command - ;) Normalize
This does a simple, no-questions asked, normalization of the selected
soundfile.
The boxes in the upper right and lower right corners allow a very primitive type
of zooming. There is no facility for selecting, copy, paste or cut. One can read
or write control functions as soundfiles by clicking the Read and Write buttons.
The Time: legend refers to the time in the input soundfile and the other legend
(Scaling:) is updated depending on how the control function is applied.
SOUNDFILE MENU
This simply allows you to select between the open soundfiles. The first ten
soundfiles open are given command key equivalents (Command - 1 to
Command - 0).
CONTROL MENU
Show Signal
This will bring up a window to show the sound whenever SoundHack reads or
writes sound (except during file copying and normalization). There are
separate windows available for all active soundfiles.
Show Spectrum
This will bring up a window to show the spectral data in all spectral operations
(most everything but varispeed, which I do in the time domain). There are
separate windows available for all active soundfiles.
Show Sonogram
This will also show spectral information but over time. Intensity is represented
by different colors where purple-red is the lowest intensity, green is mid
intensity and bright red is the highest intensity. This display will slow down any
Macintosh dramatically.
Pause Process
This allows you to pause during a long process. This is especially useful if you
would like to here the sound you have processed so far. If you are running a
convolution, it sometimes takes a while to pause (up to 3 minutes on a slow
Mac II).
Continue Process
This will resume processing where you left off.
Stop Process
This will kill your process and close the output soundfile.
Load Settings...
This loads a previously created settings file. This is also done automatically on
startup to the "SoundHack Preferences" file in the ":System Folder:Preferences"
folder.
Save Settings...
This will create a settings file which contains the current settings from the
binaural, convolution, spectral analysis, spectral dynamics, mutation, phase
vocoder, varispeed, gain and preferences dialog panels. This is done
automatically when you quit SoundHack to the "SoundHack Preferences" file
in the ":System Folder:Preferences" folder.
Default Settings
This will reset all internal settings to a default set. Recommended if you suspect
your "SoundHack Pref" file is corrupt.
Preferences...
This allows you to set a few things. You can set SoundHack to automatically
play a soundfile when it is opened. This is a very nice way to set things if you
are using SoundHack as your web browser helper application. If you set an
editor, then you can use the Close & Edit command (described earlier). The
Default File Type, when set, will give you the selected file type and format
whenever you create a new soundfile. Play On Open will play the file as soon
as it is opened, Play After Processing will play the output file as soon as the
current Hack process is done with it.
FUTURE PLANS
1.0 SoundHack is finished. Tom and Betsy take a holiday.
.90 Last major release, only bug-fixes and minor updates from
now on. Make a suggestion for final features. Time is running out!
Port to IRIX, Linux? Solaris, Rhapsody, Wintel. AppleEvents and
scriptablity. MPEG. Spectral plugins?
.89 Mark Dolson phase vocoder enhancements. Simple graphic filtering.
SDS, SDI, TX16W, OMF, Sonic, ProTools, AIFF resource soundfile formats.
QT coder: log frequency sonograms, new color encodings. Buffered read
and write routines.
BIBLIOGRAPHY
Apple Computer, Inc., Inside Macintosh, Volume 1-6, Addison Wesley,
Reading, Mass., 1985-91.
Begault, Durand R., Control of Auditory Distance, Ph.D.. dissertation,
University of California, San Diego, 1987.
Begault, Durand R., 3-D sound for virtual reality and multimedia, Academic
Press Professional, Cambridge, MA, 1994.
Blauert, J., Spatial Hearing, MIT Press, Cambridge, Mass., 1983.
Dolson, Mark, "The Phase Vocoder: A Tutorial", Computer Music Journal
10:4, 1986.
Ellis, Dan, "pvanal.c", part of the Csound distribution, MIT, 1991.
Gordon, J. W. and Strawn, J., "An Introduction to the Phase Vocoder",
Digital Audio Signal Processing: An Anthology, editor J. Strawn,
Kaufmann, Los Altos, Calif., 1985.
Mark, David and Reed, Cartwright, Macintosh C Programming PRIMER,
Volume I, Addison Wesley, Reading, Mass., 1989.
Moore, F. Richard Elements of Computer Music, Prentice Hall, Englewood
Cliffs, NJ, 1990.
Polansky, L. "More on Morphological Mutations: Recent Techniques and
Developments, "Proceedings of the ICMC., San Jose, 1992.
Polansky, L. and McKinney, M. "Morphological Mutation Functions:
Applications to Motivic Transformations and to a New Class of Cross-
Synthesis Techniques." Proceedings of the ICMC. Montreal, 1991.
Reed, C. E. and Passin, T. B., Signal Processing in C, John Wiley, New York,
NY, 1992.
Settel, Z. and Lippe, C. "Real-time Musical Applications using FFT-based
Resynthesis", Proceedings of the 1994 International Computer Music
Conference. Diem Aarhus, Denmark 1994.
Vaseghi, S. and Frayling-Cork, R., "Restoration of Old Gramophone
Recordings", Journal of the AES, 40:10, 1992.
ACKNOWLEDGMENTS
Betsy Edwards for unending support, for teaching me how to write and for
editing this long winded and technical document.
Larry Polansky for his mutation functions, for helping me edit this document.
and for his unending encouragement, criticism and support.
Dr. Durand Begault of NASA-Ames for letting me use his binaural filter
coefficients.
F. Richard Moore, D. Gareth Loy and Mark Dolson for my initial exposure to the
wonders of computer music.
Dan Ellis of MIT's Media Lab for helping me with the Csound analysis feature.
Scott Morgan and Geoff Hufford for all their Macintosh toolbox help.
Tim Walters, Jeanne Parson, Kent Clelland, Vincent Carte, George Taylor, Zach
Belica, Phil Burk, Curtis Roads, Richard Boulanger, douglas repetto and many
others for their many comments, bug reports and encouragement.
The Center for Contemporary Music at Mills College and the CalArts School of
Music for sponsoring all of the rest of my activities.