Speech compression-using-gsm

Speech Compression
using
GSM RPE-LTP
Faiza Nawaz
Bisma Hashmi
Mehrin Kiani

2
Introduction to GSM
 The Global System for Mobile Communications is the most
popular standard for mobile phones in the world.
 GSM service is used by over 2 billion people across more than
212 countries and territories.
 The ubiquity of the GSM standard makes international roaming
very common between mobile phone operators.
 GSM differs significantly from its predecessors in that both
signaling and speech channels are Digital call quality.
(so it is considered a second generation (2G) mobile phone
system.)

4
What is Speech?
 Speech Generation:

5
GSM 6.10 Vocoder
 Key principle: mathematical modeling of the human vocal tract,
leading to an efficient compression method for transmitting
speech.
 A vocoder (combination of voice and coder) is used to describe
GSM systems tailored for the compression of speech.
 The sampling rate is 8000 sample/s leading to an average bit
rate for the encoded bit stream of 13 K bit/s

6
GSM 6.10 Vocoder
 Coding scheme used by GSM 6.10 Vocoder is the Regular Pulse
Excitation - Long Term prediction - Linear Predictive Coder
(RPE-LTP)
 Vocoder sends three kinds of information to the receiver:
 Voiced or unvoiced signal
 (If it is voiced) The period of the excitation signal
 The parameters of the prediction filter.

7
Linear Predictive Coder (LPC)
 LPC algorithm assumes that each speech sample is a linear
combination of previous samples.
 Speech is sampled, stored and analyzed.
 Coefficients calculated from the sample are transmitted and
processed in the receiver.
 Receiver accurately processes and categorizes voiced and
unvoiced sounds.

8
Residual Pulse Excited (RPE) Coder
 Determines if the signal is voiced or unvoiced
 Determines the period for voiced sounds, encodes periodicity and
transmits the coefficient
 When the signal changes from voiced to unvoiced, RPE transmits a
code that stops the receiver from generating periodic pulses
 Starts generating random pulses to correspond to the noise like
nature of unvoiced

9
GSM Compression Technologies
 Four compression technologies are:
 Full Rate
 Enhanced Full Rate (EFR)
 Adaptive Multi-Rate (AMR)
 Half Rate

10
GSM Full Rate Vocoder Using RPE-LTP
 Described as an RPE-LTP linear predictive coder.
 Models the human vocal tract as a series of cylinders of
different widths.
 By forcing air through these cylinders, speech sounds
can be generated— the LPC coder models this with a
set of simultaneous equations.

11
GSM Full Rate Vocoder Using RPE-LTP
(…contd)
 The input data to the RPE-LTP coder is 20ms of speech
composed of 160 samples, each with 13bit resolution.
 The data is first passed through a pre-emphasis filter:
 Enhances high-frequency components of the signal. (better
transmission efficiency.)
 Also removes any offset on the signal. (Simplifies computation.)

12
LPC Speech Generation
 The model of speech generation can be thought of as air passing
through a set of different size cylinders.

13
Short Term Analysis Stage
 Uses autocorrelation to calculate a set of eight reflection
coefficients.
 Schur recursion is used to efficiently solve the set of
equations resulting from it.
 The parameters are then converted into log-area ratios
(LARs) -- that allow better quantizing in a smaller
number of bits — the first eight parameters of the
transmission stream.

14
 The coded LARs is then decoded back to coefficients
and used to filter the input samples.
 The reason for decoding the LARs is to ensure that the
encoder uses the same information available at the
decoder to perform the filtering.
 An array of weights lpc[P] is computed such that
s[n] ~ lpc[0]*s[n--1]+lpc[1]*s[n--2]+_+lpc[P--1]*s[n--P]
(P is usually between 8 and 14, GSM uses 8.)
Short Term Analysis Stage (…contd)

15
Long Term Prediction Stage
 The 160 samples are split into 4 sub-windows of 40
samples each.

16
 The long-term predictor produces two parameters for
each sub window: the lag and the gain.
 The LTP lag describes the source of the copy in time.

The LTP gain describes the scaling factor.
Long Term Prediction Stage (…contd)

17
Calculating Lag and Gain
 LAG:
Compute resemblance by correlation.
correlation of x[n] and y[n] =
Sum of products x[n]*y[n-lag]
 GAIN:
Maximum correlation divided by the energy of the
reconstructed short-term residual signal.

18
Residual Pulse Encoding
 To remove the long-term predictable signal from
its input, the algorithm then subtracts the scaled
40 samples.
 The residual signal is either weak or random and
consequently cheaper to encode and transmit.

19
Residual Signal(…contd)
 The algorithm down-samples by a factor of three,
discarding two out of three sample values.
 Results in four evenly spaced 13-value subsequences to
choose from, starting with samples 1, 2, 3, and 4.
 The algorithm picks the sequence with the most energy.
 That leaves us with 13 3-bit sample values and a 6-bit
scaling factor that turns the PCM encoding into an
APCM

20
Speech Decoder
 Decoder consists of three parts
 RPE Decoding
 LTP synthesis filter
 LPC short term synthesis filter

22
Speech Decoder (…contd)
 Algorithm multiplies the 13 3-bit samples by the scaling factor and
expands them back into 40 samples, zero-padding the gaps
 Resulting residual pulse is fed to the long-term synthesis filter
 40-sample segment is cut from the old estimated short-term residual
signal, scaled by the LTP gain and added to the incoming pulse
 Estimated short-term residual signal passes through the short-term
synthesis filter whose reflection coefficients are calculated by the
LPC module
 Noise from the excited long-term synthesis filter passes through the
tubes of the simulated vocal tract--and emerges as speech

Speech compression-using-gsm

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Speech compression-using-gsm (20)

Recently uploaded (20)

Speech compression-using-gsm