Huffman Coding
Huffman Coding
• Huffman codes can be used to compress
information
• The basic idea is that instead of storing each
character in a file as an 8-bit ASCII value, we will
instead store the more frequently occurring
characters using fewer bits and less frequently
occurring characters using more bits
Huffman Coding
• We then pick the nodes with the smallest
frequency and combine them together to
form a new node
• The two selected nodes are removed from
the set, but replace by the combined node
• This continues until we have only 1 node
left in the set
Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
duke blue devils
D =2 times
U =2
K =1
E =3
B =1
L =2
V =1
I =1
S =1
Space =2
Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 2
i,1 s,1
Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1 2 2
b,1 v,1 i,1 s,1
Huffman Coding
e,3 d,2 u,2 l,2 sp,2 3 2
k,1 2 i,1 s,1
b,1 v,1
Huffman Coding
e,3 d,2 u,2 4 3 2
l,2 sp,2 k,1 2 i,1 s,1
b,1 v,1
Huffman Coding
e,3 4 4 3 2
d,2 u,2 l,2 sp,2 k,1 2 i,1 s,1
b,1 v,1
Huffman Coding
e,3 4 4 5
d,2 u,2 l,2 sp,2 2 3
i,1 s,1 k,1 2
b,1 v,1
Huffman Coding
7 4 5
e,3 4 l,2 sp,2 2 3
d,2 u,2 i,1 s,1 k,1 2
b,1 v,1
Huffman Coding
7 9
e,3 4 4 5
d,2 u,2 l,2 sp,2 2 3
i,1 s,1 k,1 2
b,1 v,1
Huffman Coding
16
7 9
e,3 4 4 5
d,2 u,2 l,2 sp,2 2 3
i,1 s,1 k,1 2
b,1 v,1
Huffman Coding
• Now we assign codes to the tree by placing
a 0 on every left branch and a 1 on every
right branch
• A traversal of the tree from root to leaf give
the Huffman code for that particular leaf
character
• Note that no code is the prefix of another
code
e 00
d 010
Huffman Coding u 011
16 l 100
sp 101
7 9 i 1100
s 1101
e,3 4 4 5 k 1110
b 11110
d,2 u,2 l,2 sp,2 2 3 v 11111
i,1 s,1 k,1 2
b,1 v,1
Huffman Coding
• These codes are then used to encode the string
• Thus, “duke blue devils” turns into:
010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101
• When grouped into 8-bit bytes:
01001111 10001011 11101000 11001010 10001111 11100100 1101xxxx
• Thus it takes 7 bytes of space compared to 16
characters * 1 byte/char = 16 bytes
uncompressed