0% found this document useful (0 votes)
212 views

Amath 482 Coding Project 2 Report Jonathanaalto

This document summarizes a coding project that analyzes a musical sound clip to identify predominant frequencies over time and isolate frequencies associated with specific instruments like bass/drums and guitar. The analysis uses the discrete Gabor transform to decompose the signal into time and frequency components. A sliding Gaussian filter isolates localized signatures in time, which are then converted to frequency space. Predominant frequencies are identified and isolated by applying a second Gaussian filter. Thresholding is used to isolate specific instrument frequency ranges. Results are plotted to show predominant frequencies over time and filtered musical signatures.

Uploaded by

api-615220844
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
212 views

Amath 482 Coding Project 2 Report Jonathanaalto

This document summarizes a coding project that analyzes a musical sound clip to identify predominant frequencies over time and isolate frequencies associated with specific instruments like bass/drums and guitar. The analysis uses the discrete Gabor transform to decompose the signal into time and frequency components. A sliding Gaussian filter isolates localized signatures in time, which are then converted to frequency space. Predominant frequencies are identified and isolated by applying a second Gaussian filter. Thresholding is used to isolate specific instrument frequency ranges. Results are plotted to show predominant frequencies over time and filtered musical signatures.

Uploaded by

api-615220844
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

AMATH 482 - Coding Project 2

Parsing Musical Frequency Signatures

Jonathan Aalto

Abstract
In this project, a sound clip was analyzed to identify the pre-
dominant frequencies over time and to isolate frequencies associated
with the bass/drums and guitar. The discrete Gabor transform was
used to identify the predominant frequencies. A sliding Gaussian fil-
ter was applied to the signature in physical space, which produced a
set of signatures that were localized in time. Each localized signature
was converted to k-space, where the highest-amplitude frequency was
identified. A second Gaussian filter, centered at the highest-amplitude
frequency, was applied to each localized signature in k-space to isolate
the predominant frequencies. To isolate the bass/drums and guitar, a
simple threshold, corresponding to the frequency range of the desired
instrument(s), was applied to the unfiltered signature in k-space. The
signature was then converted back to physical space. These results
were plotted in MATLAB.

1 Introduction
Musical notes are designated using the letters A through G, though some
letters are used to denote multiple notes (e.g. C and C#). Each note corre-
sponds to a specific sound frequency, though in practice, the notes produced
by instruments often vary slightly from these ”true” frequencies. The total
range of sound frequencies relevant in music (approximately 15 to 8000 Hz)
is divided into octaves, with each octave containing a set of notes A, A#, B,
C, etc. Converting between consecutive octaves can be done by multiplying
or dividing a note’s frequency by 2. For example, the G note in octave 2 (98
Hz) can be multiplied by two to obtain the G note in octave 3 (196 Hz) or

1
divided by two to obtain the G note in octave 1 (49 Hz). The sound clip
analyzed in this project consists largely of bass, drums, and guitar. The bass
and drums produce lower frequencies (approximately 40 to 300 Hz), while
the guitar produces higher frequencies (approximately 300 to 514 Hz).

The following sections of this report cover the mathematical basis, com-
putational implementation, and graphical results of techniques used to de-
compose and process musical signatures. First, an overview of the Gabor
transform is provided, and a discrete version of the transform, used in sci-
entific computing, is discussed. The subsequent section details the structure
of the MATLAB code used to process the sound signal. The results of this
analysis are then presented, which include 1) spectrograms detailing the pre-
dominant frequencies over time and 2) musical signatures of the sound signal
after it has been filtered in frequency space to isolate the contributions of
specific instruments. After discussing these results, a conclusion is presented.

2 Theoretical Background
This section discusses both the theoretical basis and computational imple-
mentation of the Gabor Transform, which expands upon the standard meth-
ods of Fourier analysis to provide additional resolution in the temporal do-
main.

2.1 The Gabor Transform


The Fourier transform of a signal f (t) is
Z ∞
1
F (k) = √ f (t) exp(−ikt) dt.
2π −∞
This converts the signal to k-space, but because this integral is carried out
over the entire span of t-values, the resulting frequency signature does not
contain any information about how the frequencies are distributed in time.
The Gabor Transform is similar to the Fourier transform, but instead of
multiplying the function f (t) by exp(−ikt), the function is instead multiplied
by ḡ(τ −t) exp(−iωτ ). In this case, ḡ(τ −t) denotes the complex conjugate of
a function, generally a Gaussian distribution, that is centered about a specific

2
time τ . Multiplying by ḡ(τ − t) isolates the values of f (t) near t = τ , and by
then transforming to k-space, the frequency distribution at time t = τ can
be obtained. The localized frequency distributions about every time point
can be obtained by integrating over all τ . Because τ is now the variable of
integration, f (t) is rewritten as f (τ ). If g(τ − t) is real, then ḡ(τ − t) =
g(τ − t), and the integral expression for the Gabor Transform becomes:
Z ∞
G[f ](t, ω) = f¯g (t, ω) = f (τ )g(τ − t) exp(−iωτ ) dτ.
−∞

The Gabor transform has a number of useful properties. As an example,


it offers a convenient method for calculating the energy about a point (t, ω),
which is done by taking the square norm:

Z ∞
2
|E(t, w)| = | f (τ )g(τ − t) exp(−iωτ ) dτ |2 .
−∞

Like the Fourier transform, the Gabor transform has an inverse. This
makes it useful for signal processing, as one can apply the Gabor transform
to convert to frequency space, apply modifications to the frequencies, then
convert back to physical space. Because the frequency space has increased
time resolution, the filters applied to the frequencies can also vary with time.
The integral expression for the inverse Gabor transform is:

Z ∞ Z ∞
1
f (τ ) = f¯g (t, ω)g(τ − t) exp(iωτ ) dωdτ.
2π||g||2 −∞ −∞

2.2 The Discrete Gabor Transform


In order to implement the Gabor transform on a computer, the temporal and
frequency domains must be discretized. A given value of τ can be written as
at0 , where a is an integer and t0 is the difference between consecutive points.
Similarly, a given frequency value, ω, can be written as 2πbν0 . Plugging these
substitutions into the standard Gabor transform equation gives the Discrete
Gabor Transform:

Z ∞ Z ∞
f¯(a, b) = f (t)ḡ(t − at0 ) exp(−i(2πbν0 )t) dt = f (t)ḡa,b (t)dt
−∞ −∞

3
When implementing the discrete Gabor transform in a scientific comput-
ing language, such as MATLAB, a ”for” loop is often used to cycle through
the set of time values. For each time point, the corresponding Gabor filter,
g(t), is calculated and applied to f (t), and the resulting function is then
subjected to a Fourier transform to convert to frequency space.

3 Numerical Methods
In the first section of the code (lines 1-27), the sound clip is imported
to MATLAB and separated into four components (S1 through S4 ) due to
its size. Using the length of each component in seconds, an appropriate set
of discrete time and frequency values is constructed. Another set of time
values, designated tau, establishes the set of time values that will be used as
center points of the g(τ − t) Gabor filter function. This is much smaller than
the entire span of t-values, as it would be impractical to iterate through the
hundreds of thousands of discrete values in t. This section also establishes
the width of the Gaussian Gabor filter (a), the sample rate (Fs), and the
vector into which the data about the predominant frequencies will be entered
(Sgt spec).

In the second section of the code (lines 49-201), the Gabor trans-
form is applied to each of the four components of the sound clip. For each
component, a ”for” loop is used to move through the values of tau. For each
tau(i), the Gaussian Gabor filter, centered at tau(i), is calculated and applied
to the sound component. The filtered sound component is then transformed
using the fft function. In Fourier space, the index of the maximum frequency
on the range [1:1800] (0 to 514 Hz) is identified using the max function. A
second Gaussian filter is then constructed, centered about the maximum fre-
quency, and applied to the Gabor-transformed function. This produces a
vector (gauss filtered transformed ) that has been processed in both physical
space (using the Gabor filter) and in frequency space (using the second Gaus-
sian filter). The effect of this processing is that the predominant frequency
at the time tau(i) has been identified and isolated. The vector is then added
to (Sgt spec), and this process is repeated for all 110 time values in tau, pro-
ducing a matrix containing information about the predominant frequencies
as a function of time.

4
The second section (lines 49-201) also includes code for visualizing
both the process and results of the Gabor transform. The subplot function
is used to display the S1 signal as it is processed at time t = tau(42) = 4.1
seconds. This produces a figure containing four graphs: 1) the raw signal
and Gabor Gaussian filter in physical space, 2) the signal after application of
the Gabor Gaussian filter, 3) the Fourier transform of the filtered signal and
the max-frequency Gaussian filter, and 4) the final, fully-processed signal
in frequency space. This figure is shown below. The second section also
contains code for visualizing the full Sgt spec matrix for each component
using the pcolor function. This produces four spectrograms, which contain
information about the predominant frequencies over time.

Figure 1: Visualization of the Gabor transform process for S1 at t = tau(42) = 4.1


seconds. From top to bottom: 1) original signal and Gabor gaussian filter, 2) signal
after application of Gabor Gaussian, 3) filtered signal after conversion to frequency
space and frequency Gaussian filter, and 4) frequency-filtered signal.

5
In the third section of the code (lines 223-343), the entire sound clip
is processed in frequency space (without time resolution) to produce signatures
associated with the bass/drums and with the guitar. Appropriate discrete time
and frequency domains are initialized, and for each instrument, the sound signal
is transformed using the fft function. In frequency space, a simple threshold is
applied. This threshold function returns a value of 1 for frequencies within the
range of the desired instrument(s) (40 - 300 Hz for bass/drums, and 300-515 Hz for
guitar), and a value of zero for frequencies outside that range. After multiplying
by the threshold, the processed signal is returned to physical space using the ifft
function. This produces a sound signature consisting of the frequencies associated
with the desired instrument(s). In this section of the code, the subplot function
is used to graph both the signal filtering process (see the figure below) and to
produce graphs of both the bass/drum and guitar signatures. These graphs are
shown in the ”Results” section.

Figure 2: Visualization of the process for isolating bass/drum frequencies using


thresholding in frequency space. From top to bottom: 1) original signal, 2) Fourier-
transformed signal and threshold, 3) signal after applying threshold, and 4) filtered
signal after converting back to physical space with ifft.

6
4 Results

Figure 3: Spectrogram for the first quarter of the sound clip (A1), displaying
intensity vs. frequency and time. A bright spot indicates a set of high intensity
frequencies associated with a particular time.

The first portion of the sound clip includes only a few instruments, and this is
reflected in the spectrogram of A1. The predominant frequencies lie almost entirely
in the vicinity of 260 Hz, which is the near the upper limit of the bass/drum
range. These predominant frequencies are also distributed regularly in time, with
each high-intensity cluster separated by approximately 1.8 seconds. These clusters
correspond to the beats in the sound clip, which, as the spectrogram suggests, are
evenly spaced and distinct from one another. As more instruments are added,
as seen in the following spectrograms, the individual beats will become harder to
distinguish.

7
Figure 4: Spectrogram for the second quarter of the sound clip (A2), displaying
intensity vs. frequency and time. A bright spot indicates a set of high intensity
frequencies associated with a particular time.

In the second portion of the sound clip, the music becomes more complex. The
”frequency clusters” are still separated by approximately 1.8 seconds, but the pre-
dominant frequencies within each cluster are more variable than the spectrogram
for A1. This indicates that the bass and drums are producing a greater range of
notes during this section of the sound clip. Additionally, toward the end of the clip
(approximately t = 10.5), there is a beat cluster centered around 360 Hz, which
is well within the guitar range. This indicates that the guitar is the predominant
instrument in this cluster.

8
Figure 5: Spectrogram for the third quarter of the sound clip (A3), displaying
intensity vs. frequency and time. A bright spot indicates a set of high intensity
frequencies associated with a particular time.

In the third portion of the sound clip, the music continues to increase in com-
plexity. The frequency clusters are no longer distinct and well-defined, which
indicates that additional notes have been inserted in the gaps between clusters.
As shown in the spectrogram, the standard 250-270 Hz bass/drum frequencies
are often interrupted by short clusters of high-frequency guitar notes or lower fre-
quency bass/drum notes. Toward the end of this section of the sound clip (t = 7
through t = 10.5), the gaps between clusters have almost completely disappered,
and there is a near-continuous stream of high-intensity notes.

9
Figure 6: Spectrogram for the fourth quarter of the sound clip (A4), displaying
intensity vs. frequency and time. A bright spot indicates a set of high intensity
frequencies associated with a particular time.

In the final portion of the sound clip, nearly every t-value is associated with
a high-intensity frequency, indicating that the music no longer has any ”quiet
moments” between beats. Instead, there is a stream of bass/drum notes, ranging
from 250 to 300 Hz, which is occasionally interrupted by high-frequency guitar
notes between 300 and 500 Hz. The high-frequency notes are less common than
the lower frequency notes, which indicates that the bass and drums are still the
predominant instruments in the sound clip. Additionally, the high notes have a
larger range of frequencies, which indicates that the guitar is producing a greater
range of notes than the bass/drums.

10
Figure 7: Signal intensity vs. time for original signature (top) and signature
after transforming to frequency space, applying threshold to isolate bass/drum
frequencies, and transforming back to physical space (bottom).

As can be seen in the graph above, the signature associated with the bass/drum
frequencies closely resembles the original signature. The bass/drum signature has
a lower intensity than the original signature, which makes sense because the thresh-
olding in Fourier space eliminated the contributions of any frequencies higher than
300 Hz or lower than 40 Hz. Another interesting difference is that the gaps between
the major beats are much flatter in the bass/drum signature, which indicates that
the majority of notes between the major beats are a result of other instruments,
such as the guitar. Finally, the beats in the bass/drum signature become more
difficult to distinguish at later time points, indicating that the music is acquir-
ing more layers as time increases. This agrees with the results obtained from the
spectrograms.

11
Figure 8: Signal intensity vs. time for original signature (top) and signature after
transforming to frequency space, applying threshold to isolate guitar frequencies,
and transforming back to physical space (bottom).

As can be seen in the graph above, the signature associated with the guitar
frequencies is much lower in intensity than the original signature. This is under-
standable, as the previous graph indicated that the bass/drum frequencies were
the predominant components of the sound clip. The guitar notes don’t line up
as well with the major beats of the original signature, which provides evidence
that the guitar significantly contributes to the audio between the major beats. As
with the bass/drum signature, the notes from the guitar become more difficult to
distinguish as time increases, reflecting the rising complexity that was observed in
the spectrograms.

12
5 Conclusion
This report outlined the mathematical basis and computational implementation
of the Gabor transform, and it discussed the advantage of the method over tra-
ditional Fourier analysis, namely its ability to offer increased resolution in time.
The report also provided an explanation of the MATLAB code used to apply the
Gabor transform to a sound clip, as well as an explanation of code used to isolate
frequencies associated with particular instruments. The results of this code were
then discussed. The spectrograms produced from the Gabor transform offered in-
sight into the predominant frequencies and increasing complexity of the audio clip
over time. The bass/drum beats were found to be the major component of the
audio, with guitar notes interspersed throughout and becoming more common as
time increased. These conclusions were supported by the signatures obtained from
frequency thresholding, which isolated the contributions from the bass/drums and
from the guitar.
As someone who knows next to nothing about music, this project was a great
opportunity to learn about the methods by which musical signatures can be de-
composed. I also learned which general frequencies are associated with which
instruments, and I was able to practice interpreting spectrograms. The Gabor
transform proved to be a useful technique, but it is not perfect. The primary
drawback of the Gabor transform is that the frequency data is not as accurate
as the frequency data obtained from the standard Fourier transform, but in this
case, the increased time resolution more than compensated for this disadvantage.
Probably the biggest issue with this technique was the loss of data when isolating
about the maximum frequency after transforming the Gabor-filtered signature. If,
at a certain time point, there had been two frequency regions with approximately
equal intensity, only one region would be selected by the filter, and the spectro-
gram, therefore, would not show the second region. As a result, some conclusions
made about the music based on the spectrograms may not accurately reflect the
original sound clip.
I did not initially recognize the song, but the Shazam application on my phone
was able to identify it as ”I’m Shipping up to Boston” by Dropkick Murphys.

Acknowledgment
I would to thank Michael Kupperman, one of the TAs for this course, for providing
helpful advice on how to properly format the spectrograms for A1 through A4.

13

You might also like