Source Codes
Source Codes
3.1 INTRODUCTION
In Chapter 2, it has been discussed that both source and channel coding
are essential for error-free transmission over a communication channel
(Figure 2.1). The task of the source encoder is to transform the source
output into a sequence of binary digits (bits) called the information
sequence. If the source is a continuous source, it involves analog-to-
digital (A/D) conversion. An ideal source encoder should have the
following properties:
1. The average bit rate required for representation of the source output should
be minimized by reducing the redundancy of the information source.
2. The source output can be reconstructed from the information sequence
without any ambiguity.
It is the average number of bits per source symbol in the source coding
process. L should be minimum for efficient transmission.
=1 (3.3)
L H(X) (3.4)
xj P(xj) Code
x1 0.8 0
x2 0.2 1
The entropy is
A. Fixed-length Codes If the code word length for a code is fixed, the
code is called fixed-length code. A fixed-length code assigns fixed
number of bits to the source symbols, irrespective of their statistics of
appearance. A typical example of this type of code is the ASCII code for
which all source symbols (A to Z, a to z, 0 to 9, punctuation mark,
commas etc.) have 7-bit code word.
Let us consider a DMS having source alphabet {x1, x2, ..., xm}. If m is a
power of 2, the number of bits required for unique coding is log2m.
When m is not a power of 2, the bits required will be [(log2m) + 1].
xj Code Word
x1 00
x2 01
x3 10
x4 11
Symbol Code 1 Co
A 00 0
B 01 1
C 10 00
D 11 01
A 0
B 10
C 110
D 1110
Example 3.2: Consider Table 3.5 where a source of size 4 has been
encoded in binary codes with 0 and 1. Identify different codes.
(Note that code 5 does not satisfy the prefix-free property, and still it is
uniquely decodable since the bit 0 indicates the beginning of each code
word.)
Example 3.3: Consider Table 3.6 illustrating two binary codes having
four symbols. Compare their efficiency.
The entropy is
The entropy is
The code efficiency is
Example 3.4: Verify that L H(X), where L and H(X) are the average
code word length per symbol and the source entropy, respectively.
Example 3.5: Consider a DMS with four source symbols encoded with
four different binary codes as shown in Table 3.7. Show that
Solution:
1. For code 1: n1 = n2 = n3 = n4 = 2
Hence, all codes except code 2 satisfy Kraft inequality.
2. Codes 1 and 4 are prefix codes; therefore, they are uniquely decodable.
An image file format is a standard way to organize and store image data.
It specifies how the data is arranged and which type of compression
technique (if any) is used. An image container is akin to a file format but
deals with multiple types of image data. Image compression
standards specify the procedures for compressing and decompressing
images. Table 3.8 provides a list of the image compression standards, file
formats, and containers presently used.
TIFF MPEG-4 A
Part 10 Ad
Coding)
AVS (Aud
Standard)
HDV (Hig
Video)
M-JPEG (
Quick Tim
VC-1 (or W
The conventional digital format for these signals is the Pulse Code
Modulation (PCM). Earlier, the compact disc (CD) quality stereo audio
was used as a standard for digital audio representation having sampling
frequency 44.1 kHz and 16 bits/sample for each of the two stereo
channels. Thus, the stereo net bit rate required is 2 16 44.1 = 1.41
Mbps. However, the CD needs a significant overhead (extra bits) for
synchronization and error correction, resulting in a 49-bit representation
of each 16-bit audio sample. Hence, the total stereo bit rate requirement
is 1.41 49/16 = 4.32 Mbps.
Any voice coder, regardless of the algorithm it exploits, will have to make
trade-offs between these attributes.
Solution:
2.
3.9 HUFFMAN CODING
Huffman coding produces prefix codes that always achieve the lowest
possible average code word length. Thus, it is an optimal code which has
the highest efficiency or the lowest redundancy. Hence, it is also known
as the minimum redundancy code or optimum code.
Example 3.8: A DMS X has seven symbols x1, x2, x3, x4, x5, x6,
and x7 with
respectively.
It has been already been shown that the Huffman codes are optimal only
when the probabilities of the source symbols are negative powers of two.
This condition of probability is not always valid in practical situations. A
more efficient way to match the code word lengths to the symbol
probabilities is implemented by using arithmetic coding. No one-to-one
correspondence between source symbols and code words exists in this
coding scheme; instead, an entire sequence of source symbols (message)
is assigned a single code word. The arithmetic code word itself defines an
interval of real numbers between 0 and 1. If the number of symbols in
the message increases, the interval used to represent it becomes
narrower. As a result, the number of information units (say, bits)
required to represent the interval becomes larger. Each symbol in the
message reduces the interval in accordance with its probability of
occurrence. The more likely symbols reduce the range by less, and
therefore add fewer bits to the message.
We first divide the interval [0, 1) into four intervals proportional to the
probabilities of occurrence of the symbols. The symbol A is thus
associated with subinterval [0, 0.2). B, C, and D correspond to [0.2, 0.4),
[0.4, 0.8), and [0.8, 1.0), respectively. A is the first symbol of the
message being coded, the interval is narrowed to [0, 0.2). Now, this
range is expanded to the full height of the figure with its end points
labelled as 0 and 0.2 and subdivided in accordance with the original
source symbol probabilities. The next symbol B of the message now
corresponds to [0.04, 0.08). We repeat the process to find the intervals
for the subsequent symbols. The third symbol C further narrows the
range to [0.056, 0.072). The fourth symbol C corresponds to [0.06752,
0.0688). The final message symbol Dnarrows the subinterval to
[0.06752, 0.688). Any number within this range (say, 0.0685) can be
used to represent the message.
3.11 LEMPELZIVWELCH CODING
Example 3.10: Encode and decode the following text message using
LZW coding:
Value Character
32 Space
98 b
105 i
110 n
116 t
121 y
Dictionary entries 256 and 257 are reserved for the clear dictionary and
end of transmission commands, respectively. During encoding and
decoding process, new dictionary entries are created using
all phrases present in the text that are not yet in the dictionary.
Encoding algorithm is as follows.
Accumulate characters of the message until the string does not match
any dictionary entry. Then define this string as a new entry, but send the
entry corresponding to the string without the last character, which will
be used as the first character of the next string to match.
In the given text message, the first character is i and the string
consisting of just that character is already present in the dictionary. So
the next character is added, and the accumulated string becomes it.
This string is not in the dictionary. At this point, i is sent and it is
added to the dictionary, at the next available entry, i.e., 258. The
accumulated string is reset to be just the last character, which was not
sent, so it is t. Now, the next character is added; hence, the accumulated
string becomes tt which is not in the dictionary. The process repeats.
Output the character string whose code is transmitted. For each code
transmission, add a new dictionary entry as the previous string plus the
first character of the string just received. It is to be noted that the coder
and decoder create the dictionary on the fly; the dictionary therefore
does not need to be explicitly transmitted, and the coder deals with the
text in a single pass.
As seen from Table 3.14, we sent eighteen 8-bit characters (144 bits) in
fourteen 9-bit transmissions (126 bits). It is a saving of 12.5% for such a
short text message. In practice, larger text files often compress by a
factor of 2, and drawings by even more.
RLE was developed in the 1950s and became, along with its 2-D
extensions, the standard compression technique in facsimile (FAX)
coding. FAX is a two-colour (black and white) image which is
predominantly white. If these images are sampled for conversion into
digital data, many horizontal lines are found to be entirely white (long
runs of 0s). Besides, if a given pixel is either black or white, the
possibility that the next pixel will match is also very high. The code for a
fax machine is actually a combination of a Huffman code and a run-
length code. The coding of run-lengths is also used in CCITT, JBIG2,
JPEG, M-PEG, MPEG-1/2/4, BMP, etc.
11111111111111110000000000000000000011
Find the run-length code and its compression ratio.
Solution: The stream can be represented as: sixteen 1s, twenty 0s and
two 1s, i.e., (16, 1), (20, 0), (2, 1). Since the maximum number of
repetitions is 20, which can be represented with 5 bits, we can encode
the bit stream as (10000,1), (10100,0), (00010,1).
A. MPEG-1 In the MPEG-1 standard, out of a total bit rate of 1.5 Mbps
for CD quality multimedia storage, 1.2 Mbps is provided to video and 256
kbps is allocated to two-channel audio. It finds applications in web
movies, MP3 audio, video CD, etc.
The human auditory system (the inner ear) is fairly complicated. Results
of numerous psychoacoustic tests reveal that human auditory response
system performs short-term critical band analysis and can be modelled
as a bank of band pass filters with overlapping frequencies. The power
spectrum is not on linear frequency scale and the bandwidths are in the
order of 50 to 100 Hz for signals below 500 Hz and up to 5000 Hz at
higher frequencies. Such frequency bands of auditory response system
are calledcritical bands. Twenty six critical bands covering frequencies
of up to 24 kHz are taken into account.
It is observed that the ear is less sensitive to low level sound when there
is a higher level sound at a nearby frequency. When this occurs, the low
level audio signal becomes either less audible or inaudible. This
phenomenon is known as masking. The stronger signal that masks the
weaker signal is calledmasker and the weaker one that is masked is
known as maskee. It is also found that the masking is the largest in the
critical band within which the masker is present and the masking is also
slightly effective in the neighbouring bands.
In Figure 3.1, the 1-kHz signal acts as a masker. The masking threshold
(solid line) falls off sharply as we go away from the masker frequency.
The slope of the masking threshold is found to be steeper towards the
lower frequencies. Hence, it can be concluded that the lower frequencies
are not masked to the extent that the higher frequencies are masked. In
the above diagram, the three solid bars represent the maskee frequencies
and their respective SPLs are well below the masking threshold. The
dotted curve represents quiet threshold in the absence of any masker.
The quiet threshold has a lower value in the frequency range from 500
Hz to 5 kHz of the audio spectrum.
Figure 3.2 shows a masking threshold curve. The masking signal appears
at a frequency fm. The SMR, the signal-to-noise ratio (SNR) and the
MNR for a particular frequency f corresponding to a noise level have also
been shown in the figure. It is evident that
SMR (f) = SNR (f) MNR (f) (3.7)
So far we have considered only one masker. If more than one maskers
are present, then each masker has its own masking threshold and
a global masking threshold is evaluated that describes just noticeable
distortion as a function of frequency.
An efficient audio source coding algorithm must satisfy the following two
conditions:
As seen from the figure, Fast Fourier Transform (FFT) of the incoming
PCM audio samples is computed to obtain the complete audio spectrum,
from which the tonal components of masking signals can be determined.
Using this, a global masking threshold and also the SMR in the entire
audio spectrum is evaluated. The dynamic bit allocator uses the SMR
information while encoding the bit stream. A coding scheme is
called perceptually transparent if the quantization noise is below the
global masking threshold. The perceptually transparent encoding
process will produce the decoded output indistinguishable from the
input.
However, our knowledge in computing the global masking threshold is
limited as the perceptual model considers only simple and stationary
maskers and sometimes it can fail in practical situations. To solve this
problem, sufficient safety margin should be maintained.
3.15 DOLBY
Dolby Digital breaks the entire audio spectrum into narrow bands of
frequency using mathematical models derived from the characteristics of
the ear and then analyzes each band to determine the audibility of those
signals. A greater number of bits represent more audible signals, which,
in turn, increases data efficiency. In determining the audibility of signals,
the system makes use of masking. As mentioned earlier, a low level audio
signal becomes inaudible, if there is a simultaneous occurrence of a
stronger audio signal having frequency close to the former. This is
known as masking. By taking advantage of this phenomenon, audio
signals can be encoded much more efficiently than in other coding
systems with comparable audio quality, such as linear PCM. Dolby
Digital is an excellent choice for those systems where high audio quality
is desired, but bandwidth or storage space is limited. This is especially
true for multichannel audio. The compact Dolby Digital bit stream allows
full 5.1-channel audio to take less space than a single channel of linear
PCM audio.
1. Is the segment voiced or unvoiced? (Voiced sounds are usually vowels and
often have high average energy levels. They have very distinct resonant or
formant frequencies. Unvoiced sounds are usually consonants and generally
have less energy. They have higher frequencies than voiced sounds.)
2. What is the pitch of the segment?
3. What parameters are needed to construct a filter that models the vocal tract
for the current segment?
Solution:
The result indicates that the Kraft inequality is satisfied. The bound
on K is
Problem 3.2: Show that a code constructed with code word length
satisfying the condition given inProblem 3.1 will satisfy the following
relation:
H(X) L H(X) + 1
where H(X) and L are the source entropy and the average code word
length, respectively.
log2Pj nj log2Pj + 1
Solution:
Solution:
Table 3.18 Construction of Huffman Code
0.
1.
2. = Lmin L
3. none of these
Ans. (a)
0.
1.
2.
3.
Ans. (c)
7. The signal-to-mask ratio (SMR), mask to noise ratio (MNR) and signal to
noise ratio (SNR) are related by the formula
0. SMR (f) = SNR (f) MNR (f)
1. SMR (f) = MNR (f) SNR (f)
2. SMR (f) = SNR (f) + MNR (f)
3. none of these
Ans. (a)
10. LPC is a
0. waveform-following coder
1. model-based coder
2. lossless vocoder
3. none of these
Ans. (b)
REVIEW QUESTIONS
1.
1. Define the following terms:
1. average code length
2. code efficiency
3. code redundancy.
2. State source coding theorem.
2. With suitable example explain the following codes:
0. fixed-length code
1. variable-length code
2. distinct code
3. uniquely decodable code
4. prefix-free code
5. instantaneous code
6. optimal code.
3. Write short notes on
0. ShanonFano algorithm
1. Huffman coding
4.
0. Write down the advantages of Huffman coding over ShannonFano
coding.
1. A discrete memoryless source has seven symbols with
probabilities of occurrences 0.05, 0.15, 0.2, 0.05, 0.15,
0.3 and 0.1. Construct the Huffman code and determine
0. entropy
1. average code length
2. code efficiency.
5. A discrete memoryless source has five symbols with probabilities of
occurrences 0.4, 0.19, 0.16, 0.15 and 0.1. Construct both the ShannonFano
code and Huffman code and compare their code efficiency.
6. With a suitable example explain arithmetic coding. What are the advantages
of arithmetic coding scheme over Huffman coding?
7. Encode and decode the following text message using LZW coding:
0. Dolby digital
1. Linear predictive coding.