0% found this document useful (0 votes)
213 views23 pages

4 Huffman and Shannon Fano Coding

The document discusses Huffman and Shannon-Fano coding techniques for digital communication, highlighting their differences in lossless data compression. Huffman coding generates variable-length codes based on symbol frequency, while Shannon-Fano coding is a simpler method that uses cumulative probabilities to assign codewords. Both methods aim to minimize average code length, but Huffman coding generally results in shorter average code lengths compared to Shannon-Fano coding.

Uploaded by

ahmedomohdo20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
213 views23 pages

4 Huffman and Shannon Fano Coding

The document discusses Huffman and Shannon-Fano coding techniques for digital communication, highlighting their differences in lossless data compression. Huffman coding generates variable-length codes based on symbol frequency, while Shannon-Fano coding is a simpler method that uses cumulative probabilities to assign codewords. Both methods aim to minimize average code length, but Huffman coding generally results in shorter average code lengths compared to Shannon-Fano coding.

Uploaded by

ahmedomohdo20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

ECE421 Digital Communication

Huffman and Shannon Fano


Coding

Lecturer: Dr. Reham Samir


References

n K. Deergha Rao “Channel Coding Techniques for


Wireless Communications” Springer, 2015.

n Behrouz A. Forouzan “Data Communication and


Networking” (5th Edition), McGraw Hill, 2015.

n Behrouz A. Forouzan “Data Communication and


Networking” (3rd Edition), McGraw Hill, 2004.

n Peyton Z. Peebles, Jr., Ph.D “Digital Communication


Systems” Prentice-Hall, 1987.
Lossless and Lossy Compression

n Lossless compression, refers to the process of encoding data more


efficiently so that it occupies fewer bits or bytes but in such a way
that the original data can be reconstructed, bit-for-bit, when the
data is decompressed.

n Lossy compression techniques achieve compression by discarding


some of the original data.

n Lossless compression techniques produce an exact duplicate of the


original data but cannot achieve high levels of compression.
n Example: RGB color image with an original size of 9.9 megabytes
can only be reduced to 6.5 megabytes using the lossless PNG format
Huffman Coding

n Huffman coding is a source coding method for creating


instantaneous codes with minimum average code length, therefore,
it is called compact code.

n Any other code for the same alphabet cannot have a lower expected
length than the code constructed by the algorithm.
Huffman Coding

n It is a lossless data compressing technique generating variable


length codes for different symbols.

n It considers frequency/probability of alphabets for generating


codes.

n The length of the code for a character is inversely proportional to


frequency of its occurrence (the more often a symbol occurs in the
original data the shorter the binary string used to represent it in the
compressed data).

n No code is prefix of another code.


Huffman Coding

n Given
n Set of symbol probabilities [Pi] , i=1,2,3,…N , where N is the number
of symbols

n D is the base of representation  D = 2 for c ={0,1} for binary code

 D = 3 for c ={0,1,2} for ternary code

n The number of Huffman stages = (N-D)/(D-1)


n If the number is not an integer, add a dummy symbol.

n The dummy symbols have probability 0 and are inserted to fill the tree.

n At each stage, the number of symbols is reduced by D − 1.


Huffman Coding

n Huffman steps
1. Sort the symbol from the largest probability to the smallest.

2. Add the probability of the smallest (D) symbols into one symbol.

3. Repeat 1 and 2 until reaching D symbols.

4. Assign c to the D symbols (EX: for c ={0,1} , assign 0 to higher


branch, 1 to lower branch)

5. Go back to extract the compacted symbols, adding more X to the


extracted ones.
Huffman Coding

n Example: Consider a random variable X taking values in the set X


= {1, 2, 3, 4, 5} with probabilities 0.25, 0.25, 0.2, 0.15, 0.15,
respectively. Create a compact code, then calculate the average
length of code and Entropy. Use binary code c ={0,1}
Huffman Coding

n Sol: The number of stages = 3 stages (integer number)

n This code has average length:

n Entropy:
Huffman Coding

n Example: Repeat the last example, use ternary code c={0,1,2}.

n Sol: The number of stages = 1 stage (integer number)

n This code has an average length of 1.5 ternary digits.

n Entropy: H(x) = 1.442 ternary digits.


Huffman Coding

n Example: Consider a random variable X taking values in the set X


= {1, 2, 3, 4, 5, 6} with probabilities 0.25, 0.25, 0.2, 0.1, 0.1, 0.1
respectively. Create a compact code, then calculate the average
length of code. Use c ={0,1,2}.

n Sol:
n In this example N = 6, D = 3 then (N-D)/(D-1) =3/2 , then we need to
add dummy symbol.

Then N = 7 , D = 3 then (N-D)/(D-1) =2


Huffman Coding

n Sol:

n This code has an average length of 1.7 ternary digits.


Huffman Coding

n Example: Using Huffman coding, calculate the average length of


code required for encoding the message ‘mississippi’? Use binary
code c ={0,1}
Huffman Coding

n Sol: The number of stages = 2 stages (integer number)


n i 4/11 1 4/11 1 7/11 0

n s 4/11 00 4/11 00 4/11 1

n p 2/11 010 3/11 01

n m 1/11 011

n The average length of code =

4/11 +2(4/11)+3(2/11)+3(1/11) = 21/11 = 1.909 bits


Huffman Coding

n Note that: Depending on where one puts the merged probabilities, the
Huffman coding procedure results in different codeword lengths.

n Example: Consider a random variable X with a distribution


n The Huffman coding procedure results in codeword lengths of (2, 2,
2, 2) or (1, 2, 3, 3) with binary code c ={0,1} Prove
Huffman Coding

n Disadvantages:
n Huffman coding requires two passes one to build a statistical model
of the data and a second to encode it so is a relatively slow process.
This in turn means that lossless encoding techniques that use
Huffman coding are notably slower than other techniques when
reading or writing files.

n Another disadvantage of the Huffman coding is that the binary strings


or codes in the encoded data are all different lengths. This makes it
difficult for decoding software to determine when it has reached the
last bit of data and if the encoded data is corrupted.
Shannon-Fano-Elias (S-F-E) Coding

n It is a lossless coding scheme used in digital communication.

n Compared to Huffman encoding, Shannon-Fano-Elias coding


method is the simple.

n S-F-E steps
1. Sort the symbols.

2. Calculate cumulative probability.


n We will use the cumulative distribution function to allot codewords.

n We can take . Assume that .

n The cumulative distribution function F(x) is defined as:


Shannon-Fano-Elias (S-F-E) Coding

n S-F-E steps
3. Calculate modified cumulative distribution function.

n Where denotes the sum of the probabilities of all symbols less than
plus half the probability of the symbol .

4. Find modified cumulative distribution in binary.

5. Find the length of the code word.


n The codeword length bits suffice to describe .
2

6. Generate the code word.


Shannon-Fano-Elias (S-F-E) Coding

n The average codeword length is:

n Thus, this coding scheme achieves an average codeword length that


is within 2 bits of the entropy.

n Example: Create Shannon-Fano code given the following table,


then calculate the average codeword length and the entropy.
Shannon-Fano-Elias (S-F-E) Coding

n Sol:

n The average codeword length is 2.75 bits


n The entropy is 1.75 bits.

n Note that:
n For Huffman code, the average codeword length is 1.75 bits Prove
Shannon-Fano-Elias (S-F-E) Coding

n Example: Create Shannon-Fano code given the following table.


Then calculate the average codeword length.
Shannon-Fano-Elias (S-F-E) Coding

n Sol:

n Note that : We denote

n The average codeword length is 3.5 bits

n Compared with Huffman code, the average codeword length is 2.3


bits. Then shannon Fano code is 1.2 bits longer on the average than
Shannon-Fano-Elias (S-F-E) Coding

n Note That:
n The average code length in Huffman code is shorter than the average
code length in shannon code.

You might also like