Lecture 3-Huffman Coding
Lecture 3-Huffman Coding
Sudipta Mahapatra
Resources:
1. www.cis.upenn.edu/~matuszek/cit594-2002/slides/
huffman.ppt
2. NPTEL Lecture slides of Prof. Somnath Sen Gupta
3. Introduction to Data Compression, Khalid Sayood
Entropy?
• Entropy is a measure of information content: the
number of bits actually required to store data.
• Entropy is sometimes called a measure of surprise
– A highly predictable sequence contains little actual
information
• Example: 11011011011011011011011011 (what’s next?)
• Example: I didn’t win the lottery this week
– A completely unpredictable sequence of n bits contains
n bits of information
• Example: 01000001110110011010010000 (what’s next?)
• Example: I just won $10 million in the lottery!!!!
– Note that nothing says the information has to have any
“meaning” (whatever that is)
Actual information content
• A partially predictable sequence of n bits carries
less than n bits of information
– Example #1: 111110101111111100101111101100
Blocks of 3: 111110101111111100101111101100
– Example #2: 101111011111110111111011111100
Unequal probabilities: p(1) = 0.75, p(0) = 0.25
– Example #3: "We, the people, in order to form a..."
Unequal character probabilities: e and t are common, j
and q are uncommon
– Example #4: {we, the, people, in, order, to, ...}
Unequal word probabilities: the is very common
Information Theory
• Suppose the event A is a set of outcomes of some
random experiment.
• Let P(A)=Pr[Event A will occur]
• Then, the self information associated with A is
1
i ( A) log [ ] log [ P ( A)].
b P ( A) b
H z 0.2 log 0.2 0.1 log 0.1 0.05 log 0.05 0.6 log 0.6 0.05 log 0.05
= 1.67 bits
Shanon’s Coding Theorem
• For a noise-free channel
the average code word length of a source of symbols
using any coding scheme can at best be equal to the
source entropy and can never be less than it. The
coding is assumed to be lossless.
• If m(z) is the minimum of the average code word
length obtained out of different uniquely decodable
coding schemes, then as per this theorem,
0 1
Fixed and variable bit widths
• To encode English text, we need 26 lower case
letters, 26 upper case letters, and a handful of
punctuation
• We can get by with 64 characters (6 bits) in all
• Each character is therefore 6 bits wide
• We can do better, provided:
– Some characters are more frequent than others
– Characters may have different bit widths, so that for
example, e uses only one or two bits, while x uses several
– We have a way of decoding the bit stream?
• Must tell where each character begins and e%nds
Huffman Coding: Basic Principles
• Shorter code words are assigned to more probable
symbols and longer code words are assigned to
less probable symbols. (Optimum Code)
• No code word of a symbol is a prefix of another
code word. This makes Huffman coded symbols
uniquely decodable.
• Every source symbol must have a unique code
word assigned to it.
Example Huffman encoding
• A=0
B = 100
C = 1010
D = 1011
R = 11
• ABRACADABRA = 01001101010010110100110
• This is eleven letters in 23 bits
• A fixed-width encoding would require 3 bits for
five different letters, or 33 bits for 11 letters
• Notice that the encoded bit string can be decoded!
Why it works
• In this example, A was the most common letter
• In ABRACADABRA:
– 5 As code for A is 1 bit long
– 2 Rs code for R is 2 bits long
– 2 Bs code for B is 3 bits long
– 1C code for C is 4 bits long
– 1 D code for D is 4 bits long
Note: Codes for the two least frequently occurring
symbols have the same length. Can you prove this?
Moreover, these differ only in the last bit.
Creating a Huffman encoding
• For each encoding unit (letter, in this example),
associate a frequency (number of times it occurs)
– You can also use a percentage or a probability
• Create a binary tree whose children are the
encoding units with the smallest frequencies
– The frequency of the root is the sum of the frequencies
of the leaves
• Repeat this procedure until all the encoding units
are in the binary tree
Example, step I
• Assume that relative frequencies are:
– A: 40
– B: 20
– C: 10
– D: 10
– R: 20
a2 0.4 c(a2)
a1 0.2 c(a1)
a3 0.2 c(a3)
a4 0.1 c(a4)
a5 0.1 c(a5)