INTRODUCTION TO THE
SHORT-TIME FOURIER TRANSFORM (STFT)
Richard M. Stern
18-491 lecture
April 22, 2020
Department of Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213
Slide 1 ECE and LTI Robust Speech Group
Why consider short-time Fourier transforms?
Conventional DTFT sums over all time:
An example: “Welcome to DSP-I”
The DTFT averages frequency components over time
– (from the creation of the universe until ???}
Slide 2 ECE and LTI Robust Speech Group
“Welcome to DSP-I” in time and frequency
Slide 3 ECE and LTI Robust Speech Group
Why we want the STFT …
We are more interested in how the frequency components of
real sounds like speech and music vary over time
Example: the spectrogram of “Welcome to DSP-I”
5000
4000
Frequency
3000
2000
1000
0
0.2 0.4 0.6 0.8 1 1.2
Time
Slide 4 ECE and LTI Robust Speech Group
The direct (Fourier transform) approach to STFTs
Multiply the time function and by a sliding window, and take
the DTFT of the product:
Comments:
– Note that m is a dummy variable and that the window is time-reversed
» Notation is consistent with chapter by Nawab and Quatieri in book edited by
Lim and Oppenheim; OSPY notation is a little different
– Results are plotted as a vector function of n, which is called the index of
the analysis frame
– Windows most commonly used are Hamming, rectangular, and
exponential
Slide 5 ECE and LTI Robust Speech Group
An example with exponential windowing
Slide 6 ECE and LTI Robust Speech Group
Impact of window size and shape
The DTFT of the window is
Letting l = m–n and m = n–l, we obtain
Hence …
The STFT can be thought of as the circular convolution in
frequency of the DTFT of x[m] with the DTFT of w[n–m]
Slide 7 ECE and LTI Robust Speech Group
Effect of window duration
The window duration mediates the tradeoff between resolution
in time and frequency:
Short-duration window: Long-Duration window:
Best choice of window duration depends on the application
Slide 8 ECE and LTI Robust Speech Group
Can the STFT be inverted?
Yes, but ….
Consider the STFT as the transform of the windowed time
function:
For n=m we can write
Or, of course
So the only absolute constraint for inversion is
Slide 9 ECE and LTI Robust Speech Group
The discrete STFT
Normally we would like the STFT to be discrete in frequency as
well as time (for practical reasons)
We use
which is evaluated at
Slide 10 ECE and LTI Robust Speech Group
Summary: the Fourier transform
implementation of the STFT
The Fourier transform implementation of the STFT:
– Window input function
– Take Fourier transform
– Repeat, after shifting window
Slide 11 ECE and LTI Robust Speech Group
There are other ways of computing the STFT!
Again, the STFT equation is
Rearranging the terms, we obtain the convolution
This can expressed as the lowpass implementation of the
STFT:
Slide 12 ECE and LTI Robust Speech Group
The lowpass implementation of the STFT
Note that the frequency response of practical windows w[n] is
almost invariably that of a lowpass filter
The lowpass implementation translates the spectrum of x[n] to
the left by radians and passes through a lowpass filter
Slide 13 ECE and LTI Robust Speech Group
The Hamming window as a lowpass filter
The width of the main lobe of a Hamming window is
We will think of it as if it were an ideal LPF with the same
bandwidth
Spectrum of Approximated ideal
Hamming window, M = 40 rectangular spectrum
Single-sided BW is 4π/M
Slide 14 ECE and LTI Robust Speech Group
Also, the bandpass implementation of the STFT
The original STFT equation remains
Pre-multiplying and post-multiplying by produces
Which can be expressed as the bandpass implementation of
the SFFT:
Slide 15 ECE and LTI Robust Speech Group
The bandpass implementation of the STFT
The bandpass implementation can be thought of as passing
the signal through a (single-channel) bandpass filter and then
shifting the output down to “baseband”
All three implementations are mathematically equivalent
representations of the STFT
The signal at the output of the BPF has the same magnitude as
X[n,k] but different phase
Slide 16 ECE and LTI Robust Speech Group
Some additional comments on implementations
In the Fourier transform implementation will develop the STFT
on a column-by-column (or time frame by time frame) basis
In the LP and BP implementations we work on a row-by-row (or
frequency-by-frequency) basis
Because the STFT is lowpass in nature, it can be
downsampled. The downsampling ratio depends on the size
and shape of the window.
Slide 17 ECE and LTI Robust Speech Group
Reconstructing the time function
Two major methods used:
– Filterbank summation (FBS), based on LP and BP implementations
– Overlap-add (OLA), based on the Fourier transform implementation
Slide 18 ECE and LTI Robust Speech Group
Reconstructing the time function using FBS
j0n
e
X[n,0] Filterbank summation:
• Multiply each channel by
e
j2πn/N
• Add channels together and
X[n,1] multiply by a constant
jω n K y[n]
e k
X[n,k]
• This will work if all filters
.
e
j2π(N-1)n/N
. add to a constant in frequency
.
X[n,N-1]
Slide 19 ECE and LTI Robust Speech Group
The overlap-add (OLA) method of reconstruction
Procedure:
– Compute the IDTFT for each column of the STFT
– Add the IDTFTs together in the locations of the original window locations
The OLA resynthesis approach will work if all of the windows
add up to a constant. Two (of many) solutions:
– Abutting rectangular windows
– Hamming windows spaced by 50% of their length
Slide 20 ECE and LTI Robust Speech Group
How many numbers do we need to keep?
The answer depends on the method used for analysis and
synthesis.
For the Fourier transform STFT analysis with OLA resynthesis:
– Need at least N samples in frequency for windows of length N (as is
always true for DFTs)
– The analysis frames can be separated by N samples for rectangular
windows or N/2 samples for Hamming windows
– This means that the total number of STFT coefficients per second
needed will be NFs/N = Fs for rectangular windows or NFs/(N/2) for
Hamming windows
Hence, the STFT requires the same or double the number of
numbers in the original waveform. (And these numbers are
complex!) We accept this for the benefits that STFTs provide
Slide 21 ECE and LTI Robust Speech Group
Summary
Short-time Fourier transforms enable us to analyze how
frequency components evolve over time. The most
straightforward approach is to window the time function and
compute the DFT
The duration of the window mediates temporal versus spectral
resolution
The original waveform can be resynthesized from the STFT
representation
The number of numbers needed for the representation is
somewhat greater, but that is a small price to pay for the ability
to analyze and manipulate the input.
Slide 22 ECE and LTI Robust Speech Group
Slide 23 ECE and LTI Robust Speech Group