0% found this document useful (0 votes)
3 views

Lecture 3-Huffman Coding

The document discusses Huffman encoding, a method of data compression that assigns shorter code words to more probable symbols and longer code words to less probable ones. It explains concepts such as entropy, self-information, and the unique prefix property that ensures encoded strings are decodable. Additionally, it covers the practical considerations for using Huffman encoding and its applications in various fields like text and image compression.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture 3-Huffman Coding

The document discusses Huffman encoding, a method of data compression that assigns shorter code words to more probable symbols and longer code words to less probable ones. It explains concepts such as entropy, self-information, and the unique prefix property that ensures encoded strings are decodable. Additionally, it covers the practical considerations for using Huffman encoding and its applications in various fields like text and image compression.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Huffman Encoding

Sudipta Mahapatra
Resources:
1. www.cis.upenn.edu/~matuszek/cit594-2002/slides/
huffman.ppt
2. NPTEL Lecture slides of Prof. Somnath Sen Gupta
3. Introduction to Data Compression, Khalid Sayood
Entropy?
• Entropy is a measure of information content: the
number of bits actually required to store data.
• Entropy is sometimes called a measure of surprise
– A highly predictable sequence contains little actual
information
• Example: 11011011011011011011011011 (what’s next?)
• Example: I didn’t win the lottery this week
– A completely unpredictable sequence of n bits contains
n bits of information
• Example: 01000001110110011010010000 (what’s next?)
• Example: I just won $10 million in the lottery!!!!
– Note that nothing says the information has to have any
“meaning” (whatever that is)
Actual information content
• A partially predictable sequence of n bits carries
less than n bits of information
– Example #1: 111110101111111100101111101100
Blocks of 3: 111110101111111100101111101100
– Example #2: 101111011111110111111011111100
Unequal probabilities: p(1) = 0.75, p(0) = 0.25
– Example #3: "We, the people, in order to form a..."
Unequal character probabilities: e and t are common, j
and q are uncommon
– Example #4: {we, the, people, in, order, to, ...}
Unequal word probabilities: the is very common
Information Theory
• Suppose the event A is a set of outcomes of some
random experiment.
• Let P(A)=Pr[Event A will occur]
• Then, the self information associated with A is

1
i ( A) log [ ]  log [ P ( A)].
b P ( A) b

• We see that Pr[event]=high  Amount of inf. is


low and vice versa.
Information Theory
• Self Information contained obtained from the
occurrence of two independent events
1
i ( AB ) log [ ] i ( A)  i ( B ).
b P ( AB )
Entropy
• If P(E) is the probability of an event, its
information content I(E), also known as self
information is measured as

• If P(E)=1, that is, the event always occurs, it


means that there is no information associated with
it.
• Usually, the base of the logarithm is 2 and the unit
is of entropy is bits.
Average Self Information
• For an alphabet of n symbols ai, i=1to n, with
probabilities of occurrences P(ai).
• If k is the number of source outputs generated
(considered to be sufficiently large), the average
number of occurrences of symbol ai is kP(ai).
• The average self-information obtained from k
outputs is given by

• And, the average information per source output


for the source z is given by
Uncertainty and Entropy
• Consider two symbols a1
and a2 with corresponding
probabilities P(a1) and
P(a2);
• P(a2)=1-P(a1).
• H(z)=-P(a1)logP(a1)
-(1-P(a1))log(1-P(a1))
Example
• Say, we have five symbols having the following
probabilities:

• The source entropy is given by

H z   0.2 log 0.2  0.1 log 0.1  0.05 log 0.05  0.6 log 0.6  0.05 log 0.05

= 1.67 bits
Shanon’s Coding Theorem
• For a noise-free channel
the average code word length of a source of symbols
using any coding scheme can at best be equal to the
source entropy and can never be less than it. The
coding is assumed to be lossless.
• If m(z) is the minimum of the average code word
length obtained out of different uniquely decodable
coding schemes, then as per this theorem,

mz   H z  , where H(z) is the source entropy.


Coding Efficiency
• The coding efficiency is the ratio of the source
entropy to the average length of the code word:
H ( z)

L( z )

• As per Shannon’s theorem

0  1
Fixed and variable bit widths
• To encode English text, we need 26 lower case
letters, 26 upper case letters, and a handful of
punctuation
• We can get by with 64 characters (6 bits) in all
• Each character is therefore 6 bits wide
• We can do better, provided:
– Some characters are more frequent than others
– Characters may have different bit widths, so that for
example, e uses only one or two bits, while x uses several
– We have a way of decoding the bit stream?
• Must tell where each character begins and e%nds
Huffman Coding: Basic Principles
• Shorter code words are assigned to more probable
symbols and longer code words are assigned to
less probable symbols. (Optimum Code)
• No code word of a symbol is a prefix of another
code word. This makes Huffman coded symbols
uniquely decodable.
• Every source symbol must have a unique code
word assigned to it.
Example Huffman encoding
• A=0
B = 100
C = 1010
D = 1011
R = 11
• ABRACADABRA = 01001101010010110100110
• This is eleven letters in 23 bits
• A fixed-width encoding would require 3 bits for
five different letters, or 33 bits for 11 letters
• Notice that the encoded bit string can be decoded!
Why it works
• In this example, A was the most common letter
• In ABRACADABRA:
– 5 As code for A is 1 bit long
– 2 Rs code for R is 2 bits long
– 2 Bs code for B is 3 bits long
– 1C code for C is 4 bits long
– 1 D code for D is 4 bits long
Note: Codes for the two least frequently occurring
symbols have the same length. Can you prove this?
Moreover, these differ only in the last bit.
Creating a Huffman encoding
• For each encoding unit (letter, in this example),
associate a frequency (number of times it occurs)
– You can also use a percentage or a probability
• Create a binary tree whose children are the
encoding units with the smallest frequencies
– The frequency of the root is the sum of the frequencies
of the leaves
• Repeat this procedure until all the encoding units
are in the binary tree
Example, step I
• Assume that relative frequencies are:
– A: 40
– B: 20
– C: 10
– D: 10
– R: 20

• Smallest number are 10 and 10 (C and D), so connect those


Example, step II
• C and D have already been used, and the new node
above them (call it C+D) has value 20
• The smallest values are B, C+D, and R, all of
which have value 20
– Connect any two of these
Example, step III
• The smallest values is R, while A and B+C+D all
have value 40
• Connect R to either of the others
Example, step IV
• Connect the final two nodes
Example, step V
• Assign 0 to left branches, 1 to right branches
• Each encoding is a path from the root
• A=0
B = 100
C = 1010
D = 1011
R = 11
• Each path
terminates at a
leaf
• Do you see
why encoded
strings are
decodable?
Unique prefix property
• A=0
B = 100
C = 1010
D = 1011
R = 11
• No bit string is a prefix of any other bit string
• For example, if we added E=01, then A (0) would
be a prefix of E
• Similarly, if we added F=10, then it would be a
prefix of three other encodings (B=100, C=1010,
and D=1011)
• The unique prefix property holds because, in a
binary tree, a leaf is not on a path to any other node
Example
A=[a1, a2, a3, a4, a5]
Pr(a1)=P(a3)=0.2, P(a2)=0.4, P(a4)=P(a5)=0.1
Determine the corresponding codewords.
Letter Probability Codeword

a2 0.4 c(a2)
a1 0.2 c(a1)
a3 0.2 c(a3)
a4 0.1 c(a4)
a5 0.1 c(a5)

Average length of the code=2.2 bits/symbol


Example

Letter Probability Codeword


Source: K. Sayood, DC
a2 0.4 1
a1 0.2 01
a3 0.2 000
a4 0.1 0010
a5 0.1 0011
Example

Letter Probability Codeword


Source: K. Sayood, DC
a2 0.4 1
a1 0.2 01
a3 0.2 000
a4 0.1 0010
a5 0.1 0011
Decoding
Step-1: Examine the leftmost bit in the bit stream. If it
corresponds to any symbol, add the symbol to the list of
decoded symbols and remove the bit examined. Else,
follow step-2.
Step2: Append the next bit from the left to the examined
bits and again check. If any symbol results add it to the
decoded stream and go back to step-1. Else, repeat step-2.
Adaptive Huffman Coding
Non-adaptive Huffman coding would require two
steps?
How can you make it adaptive?
Applications: Image (b/w) compression.
Text compression
Audio compression
Practical considerations
• It is not practical to create a Huffman encoding for
a single short string, such as ABRACADABRA
– To decode it, you would need the code table
– If you include the code table in the entire message, the
whole thing is bigger than just the ASCII message
• Huffman encoding is practical if:
– The encoded string is large relative to the code table, OR
– We agree on the code table beforehand
• For example, it’s easy to find a table of letter frequencies for
English (or any other alphabet-based language)
Data compression
• Huffman encoding is a simple example of data
compression: representing data in fewer bits than
it would otherwise need
• A more sophisticated method is GIF (Graphics
Interchange Format) compression, for .gif files
• Another is JPEG (Joint Photographic Experts
Group), for .jpg files
– Unlike the others, JPEG is lossy—it loses information
– Generally OK for photographs (if you don’t compress
them too much), because decompression adds “fake”
data very similiar to the original
References
1. Huffman, D.A., “A Method for the
Construction of Minimum Redundancy Codes,”
Proc. IRE, vol. 40, no.10, pp.1098-1101, 1952.

2. Shannon, C.E., “A Mathematical Theory of


Communications”, The Bell Sys. Tech. J., vol.
XXVII, no. 3, pp.379-423, 1948.

You might also like