0% found this document useful (0 votes)

3 views

Lecture 3-Huffman Coding

The document discusses Huffman encoding, a method of data compression that assigns shorter code words to more probable symbols and longer code words to less probable ones. It explains concepts such as entropy, self-information, and the unique prefix property that ensures encoded strings are decodable. Additionally, it covers the practical considerations for using Huffman encoding and its applications in various fields like text and image compression.

Uploaded by

arkaisalegend0006

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Lecture 3-Huffman Coding

Uploaded by

arkaisalegend0006

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Huffman Encoding

Sudipta Mahapatra
Resources:
1. www.cis.upenn.edu/~matuszek/cit594-2002/slides/
huffman.ppt
2. NPTEL Lecture slides of Prof. Somnath Sen Gupta
3. Introduction to Data Compression, Khalid Sayood
Entropy?
• Entropy is a measure of information content: the
number of bits actually required to store data.
• Entropy is sometimes called a measure of surprise
– A highly predictable sequence contains little actual
information
• Example: 11011011011011011011011011 (what’s next?)
• Example: I didn’t win the lottery this week
– A completely unpredictable sequence of n bits contains
n bits of information
• Example: 01000001110110011010010000 (what’s next?)
• Example: I just won $10 million in the lottery!!!!
– Note that nothing says the information has to have any
“meaning” (whatever that is)
Actual information content
• A partially predictable sequence of n bits carries
less than n bits of information
– Example #1: 111110101111111100101111101100
Blocks of 3: 111110101111111100101111101100
– Example #2: 101111011111110111111011111100
Unequal probabilities: p(1) = 0.75, p(0) = 0.25
– Example #3: "We, the people, in order to form a..."
Unequal character probabilities: e and t are common, j
and q are uncommon
– Example #4: {we, the, people, in, order, to, ...}
Unequal word probabilities: the is very common
Information Theory
• Suppose the event A is a set of outcomes of some
random experiment.
• Let P(A)=Pr[Event A will occur]
• Then, the self information associated with A is

1
i ( A) log [ ]  log [ P ( A)].
b P ( A) b

• We see that Pr[event]=high  Amount of inf. is

low and vice versa.
Information Theory
• Self Information contained obtained from the
occurrence of two independent events
1
i ( AB ) log [ ] i ( A)  i ( B ).
b P ( AB )
Entropy
• If P(E) is the probability of an event, its
information content I(E), also known as self
information is measured as

• If P(E)=1, that is, the event always occurs, it

means that there is no information associated with
it.
• Usually, the base of the logarithm is 2 and the unit
is of entropy is bits.
Average Self Information
• For an alphabet of n symbols ai, i=1to n, with
probabilities of occurrences P(ai).
• If k is the number of source outputs generated
(considered to be sufficiently large), the average
number of occurrences of symbol ai is kP(ai).
• The average self-information obtained from k
outputs is given by

• And, the average information per source output

for the source z is given by
Uncertainty and Entropy
• Consider two symbols a1
and a2 with corresponding
probabilities P(a1) and
P(a2);
• P(a2)=1-P(a1).
• H(z)=-P(a1)logP(a1)
-(1-P(a1))log(1-P(a1))
Example
• Say, we have five symbols having the following
probabilities:

• The source entropy is given by

H z   0.2 log 0.2  0.1 log 0.1  0.05 log 0.05  0.6 log 0.6  0.05 log 0.05

= 1.67 bits
Shanon’s Coding Theorem
• For a noise-free channel
the average code word length of a source of symbols
using any coding scheme can at best be equal to the
source entropy and can never be less than it. The
coding is assumed to be lossless.
• If m(z) is the minimum of the average code word
length obtained out of different uniquely decodable
coding schemes, then as per this theorem,

mz   H z  , where H(z) is the source entropy.

Coding Efficiency
• The coding efficiency is the ratio of the source
entropy to the average length of the code word:
H ( z)

L( z )

• As per Shannon’s theorem

0  1
Fixed and variable bit widths
• To encode English text, we need 26 lower case
letters, 26 upper case letters, and a handful of
punctuation
• We can get by with 64 characters (6 bits) in all
• Each character is therefore 6 bits wide
• We can do better, provided:
– Some characters are more frequent than others
– Characters may have different bit widths, so that for
example, e uses only one or two bits, while x uses several
– We have a way of decoding the bit stream?
• Must tell where each character begins and e%nds
Huffman Coding: Basic Principles
• Shorter code words are assigned to more probable
symbols and longer code words are assigned to
less probable symbols. (Optimum Code)
• No code word of a symbol is a prefix of another
code word. This makes Huffman coded symbols
uniquely decodable.
• Every source symbol must have a unique code
word assigned to it.
Example Huffman encoding
• A=0
B = 100
C = 1010
D = 1011
R = 11
• ABRACADABRA = 01001101010010110100110
• This is eleven letters in 23 bits
• A fixed-width encoding would require 3 bits for
five different letters, or 33 bits for 11 letters
• Notice that the encoded bit string can be decoded!
Why it works
• In this example, A was the most common letter
• In ABRACADABRA:
– 5 As code for A is 1 bit long
– 2 Rs code for R is 2 bits long
– 2 Bs code for B is 3 bits long
– 1C code for C is 4 bits long
– 1 D code for D is 4 bits long
Note: Codes for the two least frequently occurring
symbols have the same length. Can you prove this?
Moreover, these differ only in the last bit.
Creating a Huffman encoding
• For each encoding unit (letter, in this example),
associate a frequency (number of times it occurs)
– You can also use a percentage or a probability
• Create a binary tree whose children are the
encoding units with the smallest frequencies
– The frequency of the root is the sum of the frequencies
of the leaves
• Repeat this procedure until all the encoding units
are in the binary tree
Example, step I
• Assume that relative frequencies are:
– A: 40
– B: 20
– C: 10
– D: 10
– R: 20

• Smallest number are 10 and 10 (C and D), so connect those

Example, step II
• C and D have already been used, and the new node
above them (call it C+D) has value 20
• The smallest values are B, C+D, and R, all of
which have value 20
– Connect any two of these
Example, step III
• The smallest values is R, while A and B+C+D all
have value 40
• Connect R to either of the others
Example, step IV
• Connect the final two nodes
Example, step V
• Assign 0 to left branches, 1 to right branches
• Each encoding is a path from the root
• A=0
B = 100
C = 1010
D = 1011
R = 11
• Each path
terminates at a
leaf
• Do you see
why encoded
strings are
decodable?
Unique prefix property
• A=0
B = 100
C = 1010
D = 1011
R = 11
• No bit string is a prefix of any other bit string
• For example, if we added E=01, then A (0) would
be a prefix of E
• Similarly, if we added F=10, then it would be a
prefix of three other encodings (B=100, C=1010,
and D=1011)
• The unique prefix property holds because, in a
binary tree, a leaf is not on a path to any other node
Example
A=[a1, a2, a3, a4, a5]
Pr(a1)=P(a3)=0.2, P(a2)=0.4, P(a4)=P(a5)=0.1
Determine the corresponding codewords.
Letter Probability Codeword

a2 0.4 c(a2)
a1 0.2 c(a1)
a3 0.2 c(a3)
a4 0.1 c(a4)
a5 0.1 c(a5)

Average length of the code=2.2 bits/symbol

Example

Letter Probability Codeword

Source: K. Sayood, DC
a2 0.4 1
a1 0.2 01
a3 0.2 000
a4 0.1 0010
a5 0.1 0011
Example

Letter Probability Codeword

Source: K. Sayood, DC
a2 0.4 1
a1 0.2 01
a3 0.2 000
a4 0.1 0010
a5 0.1 0011
Decoding
Step-1: Examine the leftmost bit in the bit stream. If it
corresponds to any symbol, add the symbol to the list of
decoded symbols and remove the bit examined. Else,
follow step-2.
Step2: Append the next bit from the left to the examined
bits and again check. If any symbol results add it to the
decoded stream and go back to step-1. Else, repeat step-2.
Adaptive Huffman Coding
Non-adaptive Huffman coding would require two
steps?
How can you make it adaptive?
Applications: Image (b/w) compression.
Text compression
Audio compression
Practical considerations
• It is not practical to create a Huffman encoding for
a single short string, such as ABRACADABRA
– To decode it, you would need the code table
– If you include the code table in the entire message, the
whole thing is bigger than just the ASCII message
• Huffman encoding is practical if:
– The encoded string is large relative to the code table, OR
– We agree on the code table beforehand
• For example, it’s easy to find a table of letter frequencies for
English (or any other alphabet-based language)
Data compression
• Huffman encoding is a simple example of data
compression: representing data in fewer bits than
it would otherwise need
• A more sophisticated method is GIF (Graphics
Interchange Format) compression, for .gif files
• Another is JPEG (Joint Photographic Experts
Group), for .jpg files
– Unlike the others, JPEG is lossy—it loses information
– Generally OK for photographs (if you don’t compress
them too much), because decompression adds “fake”
data very similiar to the original
References
1. Huffman, D.A., “A Method for the
Construction of Minimum Redundancy Codes,”
Proc. IRE, vol. 40, no.10, pp.1098-1101, 1952.

2. Shannon, C.E., “A Mathematical Theory of

Communications”, The Bell Sys. Tech. J., vol.
XXVII, no. 3, pp.379-423, 1948.

Huffman Coding Notes
No ratings yet
Huffman Coding Notes
7 pages
Huffman
No ratings yet
Huffman
17 pages
Huffman Encoding: Farhad Muhammad Riaz
No ratings yet
Huffman Encoding: Farhad Muhammad Riaz
17 pages
Entropy
No ratings yet
Entropy
10 pages
Image Compression
No ratings yet
Image Compression
50 pages
Ayni Source Coding
No ratings yet
Ayni Source Coding
15 pages
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
Digital Communication Chapter 3
No ratings yet
Digital Communication Chapter 3
37 pages
Data Compression
No ratings yet
Data Compression
28 pages
Lecture 2: Huffman Coding
No ratings yet
Lecture 2: Huffman Coding
12 pages
Huffman's Algorithm Lecture1
No ratings yet
Huffman's Algorithm Lecture1
69 pages
Lecture 4
No ratings yet
Lecture 4
34 pages
18ec501 U4lm1
No ratings yet
18ec501 U4lm1
20 pages
Information Theory: Mohamed Hamada
No ratings yet
Information Theory: Mohamed Hamada
19 pages
Introduction To Information Theory and Coding
No ratings yet
Introduction To Information Theory and Coding
46 pages
Day 20
No ratings yet
Day 20
33 pages
Huffman Trees and Codes-v1
No ratings yet
Huffman Trees and Codes-v1
15 pages
Introduction To Information Theory: Hsiao-Feng Francis Lu Dept. of Comm Eng. National Chung-Cheng Univ
No ratings yet
Introduction To Information Theory: Hsiao-Feng Francis Lu Dept. of Comm Eng. National Chung-Cheng Univ
45 pages
Data Compression - Unit 2
No ratings yet
Data Compression - Unit 2
31 pages
Cha 02
No ratings yet
Cha 02
45 pages
Entropy, Coding and Data Compression
No ratings yet
Entropy, Coding and Data Compression
33 pages
ECEVSP L03 Compression2
No ratings yet
ECEVSP L03 Compression2
40 pages
HuffmanCoding-2
No ratings yet
HuffmanCoding-2
16 pages
Chapter 3-Part II
100% (1)
Chapter 3-Part II
26 pages
Chapter Two - Part 1
No ratings yet
Chapter Two - Part 1
21 pages
Huffman
No ratings yet
Huffman
53 pages
3.source Coding Data Compression
No ratings yet
3.source Coding Data Compression
25 pages
Huffman Coding
No ratings yet
Huffman Coding
23 pages
CH - 03 Huffman & Extended Huffman
No ratings yet
CH - 03 Huffman & Extended Huffman
10 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
No ratings yet
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
13 pages
Huffman Coding
No ratings yet
Huffman Coding
6 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Lecture 1
No ratings yet
Lecture 1
18 pages
Group Assignment Multimedia System
No ratings yet
Group Assignment Multimedia System
26 pages
Group Presentation Digital Communication Systems
No ratings yet
Group Presentation Digital Communication Systems
29 pages
2.3a Huffman Coding
No ratings yet
2.3a Huffman Coding
25 pages
Source Coding Theory: TSBK01 Image Coding and Data Compression
No ratings yet
Source Coding Theory: TSBK01 Image Coding and Data Compression
14 pages
Source Coding Theory: TSBK01 Image Coding and Data Compression
No ratings yet
Source Coding Theory: TSBK01 Image Coding and Data Compression
14 pages
Lecture 14
No ratings yet
Lecture 14
25 pages
Source Coding
No ratings yet
Source Coding
9 pages
Lec 2 X
No ratings yet
Lec 2 X
6 pages
Module IV
No ratings yet
Module IV
37 pages
3 Information Theory
No ratings yet
3 Information Theory
48 pages
Lossless Math
No ratings yet
Lossless Math
32 pages
Mesleki Yeterlilik
No ratings yet
Mesleki Yeterlilik
106 pages
Algorithm
No ratings yet
Algorithm
14 pages
0g Huffman
No ratings yet
0g Huffman
23 pages
M1 Greedy - Huffman Codes
No ratings yet
M1 Greedy - Huffman Codes
2 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
2010 Canadian Computing Competition: Senior Division: Sponsor
No ratings yet
2010 Canadian Computing Competition: Senior Division: Sponsor
12 pages
Lecture 8-Print
No ratings yet
Lecture 8-Print
24 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Huffman Coding: Version of September 17, 2016
No ratings yet
Huffman Coding: Version of September 17, 2016
27 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
EC Cryptography Tutorials - Herong's Tutorial Examples
From Everand
EC Cryptography Tutorials - Herong's Tutorial Examples
Herong Yang
No ratings yet
C Programming for Arduino
From Everand
C Programming for Arduino
Julien Bayle
4/5 (13)
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
BUS502 Assignment 2
No ratings yet
BUS502 Assignment 2
6 pages
Chapter 7 Lossless Compression Algorithms
No ratings yet
Chapter 7 Lossless Compression Algorithms
25 pages
Chapter 2
No ratings yet
Chapter 2
21 pages
Business English
No ratings yet
Business English
19 pages
Analog and Digital Communication Systems
75% (4)
Analog and Digital Communication Systems
50 pages
Business Communication Notes For University Exam 2019 PDF
No ratings yet
Business Communication Notes For University Exam 2019 PDF
87 pages
Mostafa Abd-El-Barr Design and Analysis of Reliabookfi
No ratings yet
Mostafa Abd-El-Barr Design and Analysis of Reliabookfi
463 pages
Trees: Trees, Binary Search Trees, Heaps & Applications
No ratings yet
Trees: Trees, Binary Search Trees, Heaps & Applications
32 pages
ABM104 Module9
No ratings yet
ABM104 Module9
4 pages
2d Full User Manual
No ratings yet
2d Full User Manual
99 pages
Python Astm Readthedocs Io en Stable
No ratings yet
Python Astm Readthedocs Io en Stable
44 pages
Weygand Klara
No ratings yet
Weygand Klara
159 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Finals Mathm Reviewer
No ratings yet
Finals Mathm Reviewer
7 pages
Chapter 4 Computer Codes
No ratings yet
Chapter 4 Computer Codes
30 pages
IATA CodeList SpecificationV01
No ratings yet
IATA CodeList SpecificationV01
2 pages
Retail Management: A. K. Kher
No ratings yet
Retail Management: A. K. Kher
39 pages
Directing Controlling
No ratings yet
Directing Controlling
19 pages
Worksheet: Communication Skills-Ii
No ratings yet
Worksheet: Communication Skills-Ii
1 page
Eng 111 Module 1 L1
No ratings yet
Eng 111 Module 1 L1
8 pages
What Is Business Communication
No ratings yet
What Is Business Communication
8 pages
Vega-Lite Tutorial UC Davis
No ratings yet
Vega-Lite Tutorial UC Davis
69 pages
Goal of Business Communication
No ratings yet
Goal of Business Communication
18 pages
Unit 4
No ratings yet
Unit 4
21 pages
Bootstrap Corewar
No ratings yet
Bootstrap Corewar
4 pages
Information Theory and Coding
0% (1)
Information Theory and Coding
8 pages
Chapter 7 - Foundation of Interpersonal and Group Behavior
No ratings yet
Chapter 7 - Foundation of Interpersonal and Group Behavior
9 pages
Multi-Phase Shift Keying Modulation Using Generalised Array Codes
No ratings yet
Multi-Phase Shift Keying Modulation Using Generalised Array Codes
19 pages
IIT Block Codes
No ratings yet
IIT Block Codes
12 pages
Chapter 2: Data Mapping and Exchange: Visit
No ratings yet
Chapter 2: Data Mapping and Exchange: Visit
99 pages

Lecture 3-Huffman Coding

Uploaded by

Lecture 3-Huffman Coding

Uploaded by

Huffman Encoding

• We see that Pr[event]=high  Amount of inf. is

• If P(E)=1, that is, the event always occurs, it

• And, the average information per source output

• The source entropy is given by

mz   H z  , where H(z) is the source entropy.

• As per Shannon’s theorem

• Smallest number are 10 and 10 (C and D), so connect those

Average length of the code=2.2 bits/symbol

Letter Probability Codeword

Letter Probability Codeword

2. Shannon, C.E., “A Mathematical Theory of

You might also like