Shannon Compression Techniques Explained

The document discusses lossless compression techniques including Shannon-Fano coding and Huffman coding, which are based on information theory and use minimum redundancy coding to assign codes to symbols based on their frequency of occurrence, allowing the most common symbols to be encoded with fewer bits. It also discusses adaptive coding techniques like adaptive Huffman and arithmetic coding, as well as dictionary-based coding like LZW.

Uploaded by

MuhamadAndi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

267 views10 pages

Shannon Compression Techniques Explained

Uploaded by

MuhamadAndi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Lossless Compression

Multimedia Systems (Module 2)

r Lesson 1:
m Minimum Redundancy Coding based on Information
Theory:
• Shannon-Fano Coding
• Huffman Coding
r Lesson 2:
m Adaptive Coding based on Statistical Modeling:
• Adaptive Huffman
• Arithmetic coding
r Lesson 3:
m Dictionary-based Coding
• LZW

1
Lossless Compression
Multimedia Systems (Module 2 Lesson 1)

Summary: Sources:
r The Data Compression Book,
r Compression 2nd Ed., Mark Nelson and
m With loss Jean-Loup Gailly.
m Without loss r Dr. Ze-Nian Li’s course
r Shannon: Information material
Theory
r Shannon-Fano Coding
Algorithm
r Huffman Coding Algorithm

2
Compression
Why Compression?
All media, be it text, audio, graphics or video has “redundancy”.
Compression attempts to eliminate this redundancy.
What is Redundancy?
m If one representation of a media content, M, takes X bytes and
another takes Y bytes(Y< X), then X is a redundant
representation relative to Y.
m Other forms of Redundancy
If the representation of a media captures content that is not
perceivable by humans, then removing such content will not
affect the quality of the content.
• For example, capturing audio frequencies outside the human hearing
range can be avoided without any harm to the audio’s quality.

Is there a representation with an optimal size Z that cannot be

improved upon?
This question is tackled by information theory.

3
Compression
Lossless Compression Compression with loss

M M

Lossless Compress Compress with loss

m m
M’  M

Uncompress Uncompress

M M’

4
Information Theory

According to Shannon, the entropy@ of an information

source S is defined as:
H(S) = Σi (pi log 2 (1/pi ))
m log 2 (1/pi ) indicates the amount of information contained in
symbol Si, i.e., the number of bits needed to code symbol Si.
m For example, in an image with uniform distribution of gray-
level intensity, i.e. pi = 1/256, with the number of bits
needed to code each gray level being 8 bits. The entropy of
the image is 8.
m Q: What is the entropy of a source with M symbols where
each symbol is equally likely?
• Entropy, H(S) = log2 M
m Q: How about an image in which half of the pixels are white
and half are black?
• Entropy, H(S) = 1
@Here is an excellent primer by Dr. Schnieder on this subject

5
Information Theory
Discussion:
m Entropy is a measure of how much information is encoded in
a message. Higher the entropy, higher the information
content.
• We could also say entropy is a measure of uncertainty in a
message. Information and Uncertainty are equivalent concepts.
m The units (in coding theory) of entropy are bits per symbol.
• It is determined by the base of the logarithm:
2: binary (bit);
10: decimal (digit).
m Entropy gives the actual number of bits of information
contained in a message source.
• Example: If the probability of the character ‘e’ appearing in
this slide is 1/16, then the information content of this
character is 4 bits. So, the character string “eeeee” has a total
content of 20 bits (contrast this to using an 8-bit ASCII
coding that could result in 40 bits to represent “eeeee”).

6
Data Compression = Modeling + Coding
Data Compression consists of taking a stream of symbols and
transforming them into codes.
m The model is a collection of data and rules used to process input
symbols and determine their probabilities.
m A coder uses a model (probabilities) to spit out codes when its
given input symbols

Let’s take Huffman coding to demonstrate the distinction:

Probabilities Codes
Input Symbols Encoder
Output
Model Stream
Stream

r The output of the Huffman encoder is determined by the Model

(probabilities). Higher the probability shorter the code.
r Model A could determine raw probabilities of each symbol
occurring anywhere in the input stream. (pi = # of occurrences of Si /
Total number of Symbols)
r Model B could determine prob. based on the last 10 symbols in the
i/p stream. (continuously re-computes the probabilities)
7
The Shannon-Fano Encoding Algorithm
Example
1. Calculate the frequencies
of the list of symbols Symbol A B C D E
(organize as a list).
2. Sort the list in Count 15 7 6 6 5
(decreasing) order of 0 0 1 1 1
frequencies. 0 1 0 1 1
3. Divide list into two 0 1
halfs, with the total
Symbol Count Info. Code Subtotal
freq. Counts of each half # of
-log2(pi)
being as close as Bits
possible to the other. A 15 x 1.38 00 30
4. The upper half is B 7 x 2.48 01 14
assigned a code of 0 and
C 6 x 2.70 10 12
lower a code of 1.
5. Recursively apply steps 3 D 6 x 2.70 110 18
and 4 to each of the E 5 x 2.96 111 15
halves, until each symbol
85.25 89
has become a
corresponding code leaf
It takes a total of 89 bits to encode
on a tree.
85.25 bits of information (Pretty
good huh!)
8
The Huffman Algorithm
Example
1. Initialization: Put all
nodes in an OPEN list L,
1 0
keep it sorted at all
times (e.g., ABCDE). 39 1
0
2. Repeat the following 24
0 1 0 1
steps until the list L 11
13
has only one node left: Count 15 7 6 6 5
1. From L pick two nodes Symbol A B C D E
having the lowest
frequencies, create a Symbol Count Info. Code Subtotal
parent node of them. -log2(pi) # of
Bits
2. Assign the sum of the
A 15 x 1.38 1 15
children's frequencies
to the parent node and B 7 x 2.48 000 21
insert it into L. C 6 x 2.70 001 18
3. Assign code 0, 1 to
D 6 x 2.70 010 18
the two branches of
the tree, and delete E 5 x 2.96 011 15

the children from L. 85.25 87

9
Huffman Alg.: Discussion
Decoding for the above two algorithms is trivial as long as the
coding table (the statistics) is sent before the data. There
is an overhead for sending this, negligible if the data file is
big.
Unique Prefix Property: no code is a prefix to any other code
(all symbols are at the leaf nodes)
--> great for decoder, unambiguous; unique Decipherability?
If prior statistics are available and accurate, then Huffman
coding is very good.
Number of bits (per symbol) needed for Huffman Coding
is:
87 / 39 = 2.23
Number of bits (per symbol)needed for Shannon-Fano
Coding is:
89 / 39 = 2.28
10

Lossless Compression Techniques Overview
No ratings yet
Lossless Compression Techniques Overview
10 pages
Understanding Data Compression Techniques
No ratings yet
Understanding Data Compression Techniques
21 pages
Digital Transmission & Information Theory
No ratings yet
Digital Transmission & Information Theory
18 pages
Information Theory and Compression
No ratings yet
Information Theory and Compression
29 pages
Image Compression Techniques Explained
No ratings yet
Image Compression Techniques Explained
146 pages
Lossless Compression Algorithms Overview
No ratings yet
Lossless Compression Algorithms Overview
57 pages
Data Compression Techniques Explained
No ratings yet
Data Compression Techniques Explained
29 pages
Understanding Information Theory Concepts
No ratings yet
Understanding Information Theory Concepts
31 pages
Lossless Multimedia Compression Techniques
No ratings yet
Lossless Multimedia Compression Techniques
46 pages
Introduction to Data Compression
No ratings yet
Introduction to Data Compression
32 pages
Multimedia Data Compression Techniques
No ratings yet
Multimedia Data Compression Techniques
22 pages
Source Coding in Digital Communication
No ratings yet
Source Coding in Digital Communication
35 pages
Huffman Coding for Lossless Compression
No ratings yet
Huffman Coding for Lossless Compression
39 pages
Understanding Information Theory Concepts
No ratings yet
Understanding Information Theory Concepts
31 pages
Introduction to Information Theory
No ratings yet
Introduction to Information Theory
28 pages
Information Entropy and Coding Techniques
No ratings yet
Information Entropy and Coding Techniques
13 pages
Data Compression Techniques Overview
No ratings yet
Data Compression Techniques Overview
20 pages
Huffman Coding and Entropy Explained
No ratings yet
Huffman Coding and Entropy Explained
31 pages
Huffman Encoding and Information Theory
No ratings yet
Huffman Encoding and Information Theory
30 pages
Spread Spectrum & Information Theory Concepts
No ratings yet
Spread Spectrum & Information Theory Concepts
41 pages
Lossless Compression Algorithms Overview
No ratings yet
Lossless Compression Algorithms Overview
49 pages
Information Theory: Entropy & Coding
No ratings yet
Information Theory: Entropy & Coding
44 pages
Multimedia Data Compression Techniques
No ratings yet
Multimedia Data Compression Techniques
31 pages
Data Compression Techniques Explained
No ratings yet
Data Compression Techniques Explained
9 pages
Lossless Compression Algorithms Overview
No ratings yet
Lossless Compression Algorithms Overview
53 pages
Data Compression Fundamentals Explained
No ratings yet
Data Compression Fundamentals Explained
13 pages
Advanced Image Processing Techniques
No ratings yet
Advanced Image Processing Techniques
15 pages
Quantum Data Compression Overview
No ratings yet
Quantum Data Compression Overview
7 pages
Source Coding Techniques Overview
No ratings yet
Source Coding Techniques Overview
18 pages
Digital Image Processing: Compression Techniques
No ratings yet
Digital Image Processing: Compression Techniques
49 pages
Shannon-Fano vs Huffman Coding Explained
No ratings yet
Shannon-Fano vs Huffman Coding Explained
16 pages
Advanced Digital Communication Overview
No ratings yet
Advanced Digital Communication Overview
31 pages
Digital Communication: Source Coding Concepts
No ratings yet
Digital Communication: Source Coding Concepts
44 pages
Lecture 4
No ratings yet
Lecture 4
42 pages
Multimedia System Design Lecture 3
No ratings yet
Multimedia System Design Lecture 3
75 pages
Arithmetic Coding in Information Theory
No ratings yet
Arithmetic Coding in Information Theory
43 pages
Greedy Algorithms & Huffman Coding Guide
No ratings yet
Greedy Algorithms & Huffman Coding Guide
17 pages
Shannon's Source Coding Theorem Explained
No ratings yet
Shannon's Source Coding Theorem Explained
30 pages
Understanding Compression Techniques
No ratings yet
Understanding Compression Techniques
15 pages
Multimedia Compression Techniques Explained
No ratings yet
Multimedia Compression Techniques Explained
4 pages
Video Coding: Lossless Encoding Techniques
No ratings yet
Video Coding: Lossless Encoding Techniques
19 pages
Information Coding Techniques Overview
No ratings yet
Information Coding Techniques Overview
42 pages
Data Compression Techniques Overview
No ratings yet
Data Compression Techniques Overview
35 pages
Shannon-Fano Coding Explained
No ratings yet
Shannon-Fano Coding Explained
33 pages
Introduction to Information Theory
No ratings yet
Introduction to Information Theory
45 pages
Communication Systems Engineering Overview
No ratings yet
Communication Systems Engineering Overview
25 pages
Source Coding Techniques in ECE421
No ratings yet
Source Coding Techniques in ECE421
31 pages
(2) Source Encoding
No ratings yet
(2) Source Encoding
4 pages
Multimedia Data Compression Techniques
No ratings yet
Multimedia Data Compression Techniques
42 pages
Data Compression: Rate, Distortion, Coding
No ratings yet
Data Compression: Rate, Distortion, Coding
36 pages
Uncertainty and Information in Computing
No ratings yet
Uncertainty and Information in Computing
108 pages
Data Compression Fundamentals Explained
No ratings yet
Data Compression Fundamentals Explained
34 pages
Efficient Source Encoding Techniques
No ratings yet
Efficient Source Encoding Techniques
30 pages
Imran Farid on Source Coding Techniques
No ratings yet
Imran Farid on Source Coding Techniques
32 pages
Compression Techniques for Data Types
No ratings yet
Compression Techniques for Data Types
28 pages
Huffman Coding for Image Compression
No ratings yet
Huffman Coding for Image Compression
24 pages
LZW Compression Algorithm Overview
No ratings yet
LZW Compression Algorithm Overview
22 pages
Adaptive Arithmetic Coding in Multimedia
100% (1)
Adaptive Arithmetic Coding in Multimedia
71 pages
Multimedia Compression Techniques Overview
0% (1)
Multimedia Compression Techniques Overview
24 pages
Understanding Compression Techniques
No ratings yet
Understanding Compression Techniques
17 pages
Computer Vision Exam Questions Guide
No ratings yet
Computer Vision Exam Questions Guide
3 pages
Applsci 10 01878
No ratings yet
Applsci 10 01878
16 pages
Uniform-Cost and Depth-First Search Explained
No ratings yet
Uniform-Cost and Depth-First Search Explained
3 pages
Case-Based Reasoning Explained
No ratings yet
Case-Based Reasoning Explained
22 pages
Polynomial Review: Worksheet D Solutions
No ratings yet
Polynomial Review: Worksheet D Solutions
7 pages
Xhibit: ART OF
No ratings yet
Xhibit: ART OF
45 pages
Understanding Binary Search Algorithm
No ratings yet
Understanding Binary Search Algorithm
11 pages
AI Search Algorithms Overview
No ratings yet
AI Search Algorithms Overview
33 pages
Multigrid Algorithms For Inverse Problems With Linear Parabolic Pde Constraints
No ratings yet
Multigrid Algorithms For Inverse Problems With Linear Parabolic Pde Constraints
29 pages
Optimal Bitonic Tour Algorithm
No ratings yet
Optimal Bitonic Tour Algorithm
20 pages
Understanding Sorting Algorithms
No ratings yet
Understanding Sorting Algorithms
109 pages
Minimum Spanning Trees: Prim & Kruskal
No ratings yet
Minimum Spanning Trees: Prim & Kruskal
61 pages
Power Delay Profile in Wireless Channels
No ratings yet
Power Delay Profile in Wireless Channels
5 pages
Performance Evalution of Hyperparameter Tuning Techniques For Machine Learning Models in Spam Detection
No ratings yet
Performance Evalution of Hyperparameter Tuning Techniques For Machine Learning Models in Spam Detection
12 pages
ServiceNow LeetCode Problem Overview
No ratings yet
ServiceNow LeetCode Problem Overview
3 pages
Gradient Descent Optimization Techniques
No ratings yet
Gradient Descent Optimization Techniques
22 pages
ADSA Previous Question Papers R23
No ratings yet
ADSA Previous Question Papers R23
8 pages
Deep Dynamic Factor Models for Forecasting
No ratings yet
Deep Dynamic Factor Models for Forecasting
40 pages
Understanding Image Homographies
No ratings yet
Understanding Image Homographies
86 pages
Haar Cascade Face Detection Overview
No ratings yet
Haar Cascade Face Detection Overview
6 pages
Optimizing Transportation Costs in Logistics
No ratings yet
Optimizing Transportation Costs in Logistics
67 pages
CS 480 Midterm Review Guide
No ratings yet
CS 480 Midterm Review Guide
8 pages
Finite Element Formulation: Energy Methods
No ratings yet
Finite Element Formulation: Energy Methods
16 pages
Deep Neural Networks Laboratory Course
No ratings yet
Deep Neural Networks Laboratory Course
2 pages
Digital Signal Processing with MATLAB
No ratings yet
Digital Signal Processing with MATLAB
329 pages
Time and Resource Scheduling Strategies
No ratings yet
Time and Resource Scheduling Strategies
11 pages
Advanced Digital Signal Processing Course
No ratings yet
Advanced Digital Signal Processing Course
2 pages
Adaptive Delta Modulation Experiment
No ratings yet
Adaptive Delta Modulation Experiment
9 pages
Block Diagrams for LTI System Realization
No ratings yet
Block Diagrams for LTI System Realization
51 pages
Dijkstra's Algorithm Explained
No ratings yet
Dijkstra's Algorithm Explained
16 pages

Shannon Compression Techniques Explained

Uploaded by

Shannon Compression Techniques Explained

Uploaded by

Lossless Compression

Multimedia Systems (Module 2)

Is there a representation with an optimal size Z that cannot be

Lossless Compress Compress with loss

According to Shannon, the entropy@ of an information

Let’s take Huffman coding to demonstrate the distinction:

r The output of the Huffman encoder is determined by the Model

the children from L. 85.25 87

You might also like