ECE421 Digital Communication
Huffman and Shannon Fano
Coding
Lecturer: Dr. Reham Samir
References
n K. Deergha Rao “Channel Coding Techniques for
Wireless Communications” Springer, 2015.
n Behrouz A. Forouzan “Data Communication and
Networking” (5th Edition), McGraw Hill, 2015.
n Behrouz A. Forouzan “Data Communication and
Networking” (3rd Edition), McGraw Hill, 2004.
n Peyton Z. Peebles, Jr., Ph.D “Digital Communication
Systems” Prentice-Hall, 1987.
Lossless and Lossy Compression
n Lossless compression, refers to the process of encoding data more
efficiently so that it occupies fewer bits or bytes but in such a way
that the original data can be reconstructed, bit-for-bit, when the
data is decompressed.
n Lossy compression techniques achieve compression by discarding
some of the original data.
n Lossless compression techniques produce an exact duplicate of the
original data but cannot achieve high levels of compression.
n Example: RGB color image with an original size of 9.9 megabytes
can only be reduced to 6.5 megabytes using the lossless PNG format
Huffman Coding
n Huffman coding is a source coding method for creating
instantaneous codes with minimum average code length, therefore,
it is called compact code.
n Any other code for the same alphabet cannot have a lower expected
length than the code constructed by the algorithm.
Huffman Coding
n It is a lossless data compressing technique generating variable
length codes for different symbols.
n It considers frequency/probability of alphabets for generating
codes.
n The length of the code for a character is inversely proportional to
frequency of its occurrence (the more often a symbol occurs in the
original data the shorter the binary string used to represent it in the
compressed data).
n No code is prefix of another code.
Huffman Coding
n Given
n Set of symbol probabilities [Pi] , i=1,2,3,…N , where N is the number
of symbols
n D is the base of representation D = 2 for c ={0,1} for binary code
D = 3 for c ={0,1,2} for ternary code
n The number of Huffman stages = (N-D)/(D-1)
n If the number is not an integer, add a dummy symbol.
n The dummy symbols have probability 0 and are inserted to fill the tree.
n At each stage, the number of symbols is reduced by D − 1.
Huffman Coding
n Huffman steps
1. Sort the symbol from the largest probability to the smallest.
2. Add the probability of the smallest (D) symbols into one symbol.
3. Repeat 1 and 2 until reaching D symbols.
4. Assign c to the D symbols (EX: for c ={0,1} , assign 0 to higher
branch, 1 to lower branch)
5. Go back to extract the compacted symbols, adding more X to the
extracted ones.
Huffman Coding
n Example: Consider a random variable X taking values in the set X
= {1, 2, 3, 4, 5} with probabilities 0.25, 0.25, 0.2, 0.15, 0.15,
respectively. Create a compact code, then calculate the average
length of code and Entropy. Use binary code c ={0,1}
Huffman Coding
n Sol: The number of stages = 3 stages (integer number)
n This code has average length:
n Entropy:
Huffman Coding
n Example: Repeat the last example, use ternary code c={0,1,2}.
n Sol: The number of stages = 1 stage (integer number)
n This code has an average length of 1.5 ternary digits.
n Entropy: H(x) = 1.442 ternary digits.
Huffman Coding
n Example: Consider a random variable X taking values in the set X
= {1, 2, 3, 4, 5, 6} with probabilities 0.25, 0.25, 0.2, 0.1, 0.1, 0.1
respectively. Create a compact code, then calculate the average
length of code. Use c ={0,1,2}.
n Sol:
n In this example N = 6, D = 3 then (N-D)/(D-1) =3/2 , then we need to
add dummy symbol.
Then N = 7 , D = 3 then (N-D)/(D-1) =2
Huffman Coding
n Sol:
n This code has an average length of 1.7 ternary digits.
Huffman Coding
n Example: Using Huffman coding, calculate the average length of
code required for encoding the message ‘mississippi’? Use binary
code c ={0,1}
Huffman Coding
n Sol: The number of stages = 2 stages (integer number)
n i 4/11 1 4/11 1 7/11 0
n s 4/11 00 4/11 00 4/11 1
n p 2/11 010 3/11 01
n m 1/11 011
n The average length of code =
4/11 +2(4/11)+3(2/11)+3(1/11) = 21/11 = 1.909 bits
Huffman Coding
n Note that: Depending on where one puts the merged probabilities, the
Huffman coding procedure results in different codeword lengths.
n Example: Consider a random variable X with a distribution
n The Huffman coding procedure results in codeword lengths of (2, 2,
2, 2) or (1, 2, 3, 3) with binary code c ={0,1} Prove
Huffman Coding
n Disadvantages:
n Huffman coding requires two passes one to build a statistical model
of the data and a second to encode it so is a relatively slow process.
This in turn means that lossless encoding techniques that use
Huffman coding are notably slower than other techniques when
reading or writing files.
n Another disadvantage of the Huffman coding is that the binary strings
or codes in the encoded data are all different lengths. This makes it
difficult for decoding software to determine when it has reached the
last bit of data and if the encoded data is corrupted.
Shannon-Fano-Elias (S-F-E) Coding
n It is a lossless coding scheme used in digital communication.
n Compared to Huffman encoding, Shannon-Fano-Elias coding
method is the simple.
n S-F-E steps
1. Sort the symbols.
2. Calculate cumulative probability.
n We will use the cumulative distribution function to allot codewords.
n We can take . Assume that .
n The cumulative distribution function F(x) is defined as:
Shannon-Fano-Elias (S-F-E) Coding
n S-F-E steps
3. Calculate modified cumulative distribution function.
n Where denotes the sum of the probabilities of all symbols less than
plus half the probability of the symbol .
4. Find modified cumulative distribution in binary.
5. Find the length of the code word.
n The codeword length bits suffice to describe .
2
6. Generate the code word.
Shannon-Fano-Elias (S-F-E) Coding
n The average codeword length is:
n Thus, this coding scheme achieves an average codeword length that
is within 2 bits of the entropy.
n Example: Create Shannon-Fano code given the following table,
then calculate the average codeword length and the entropy.
Shannon-Fano-Elias (S-F-E) Coding
n Sol:
n The average codeword length is 2.75 bits
n The entropy is 1.75 bits.
n Note that:
n For Huffman code, the average codeword length is 1.75 bits Prove
Shannon-Fano-Elias (S-F-E) Coding
n Example: Create Shannon-Fano code given the following table.
Then calculate the average codeword length.
Shannon-Fano-Elias (S-F-E) Coding
n Sol:
n Note that : We denote
n The average codeword length is 3.5 bits
n Compared with Huffman code, the average codeword length is 2.3
bits. Then shannon Fano code is 1.2 bits longer on the average than
Shannon-Fano-Elias (S-F-E) Coding
n Note That:
n The average code length in Huffman code is shorter than the average
code length in shannon code.