In this article, we will learn the implementation of Huffman Coding in C++.
Huffman Coding is a popular algorithm used for lossless data compression. It assigns variable-length codes to input characters, with shorter codes assigned to more frequent characters. This technique ensures that the most common characters are represented by shorter bit strings, reducing the overall size of the encoded data.
How does Huffman Coding work in C++?
Huffman Coding works by building a binary tree called the Huffman Tree from the input characters. The algorithm processes the input characters to construct this tree, where each leaf node represents a character and the path from the root to the leaf node determines the code for that character.
Steps to Build Huffman Tree in C++
Take an array of unique characters along with their frequency of occurrences as input and output the Huffman Tree.
- Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)
- Extract two nodes with the minimum frequency from the min heap.
- Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.
- Repeat steps 2 and 3 until the heap contains only one node. The remaining node is the root node and the tree is complete.
Algorithm to Implement Huffman Coding
- Frequency Calculation:
- Calculate the frequency of each character in the input data.
- Priority Queue Initialization:
- Initialize a priority queue to store nodes of the Huffman Tree based on their frequencies.
- Building the Huffman Tree:
- Construct the Huffman Tree by repeatedly combining the two nodes with the lowest frequencies into a new node until only one node remains, which becomes the root of the Huffman Tree.
- Generating Huffman Codes:
- Traverse the Huffman Tree to generate the Huffman codes for each character. Assign '0' and '1' based on left and right traversal in the tree.
- Encoding the Input Data:
- Encode the input data using the generated Huffman codes to produce the compressed output.
- Decoding the Encoded Data:
- Decode the encoded data back to the original input using the Huffman Tree.
C++ Program to Implement Huffman Coding
The below program demonstrates how we can implement huffman coding in C++.
C++
// C++ program to implement huffman coding
#include <iostream>
#include <queue>
#include <unordered_map>
#include <vector>
using namespace std;
// A Huffman tree node
struct Node {
char ch;
int freq;
Node* left;
Node* right;
Node(char ch, int freq)
: ch(ch)
, freq(freq)
, left(nullptr)
, right(nullptr)
{
}
Node(char ch, int freq, Node* left, Node* right)
: ch(ch)
, freq(freq)
, left(left)
, right(right)
{
}
};
// Comparison object to be used to order the heap
struct compare {
bool operator()(Node* l, Node* r)
{
return l->freq > r->freq;
}
};
// Function to print the Huffman Codes
void printCodes(Node* root, string str,
unordered_map<char, string>& huffmanCode)
{
if (root == nullptr)
return;
// Found a leaf node
if (!root->left && !root->right) {
huffmanCode[root->ch] = str;
}
printCodes(root->left, str + "0", huffmanCode);
printCodes(root->right, str + "1", huffmanCode);
}
// Function to build the Huffman Tree and generate Huffman
// Codes
void buildHuffmanTree(string text)
{
// Count frequency of appearance of each character and
// store it in a map
unordered_map<char, int> freq;
for (char ch : text) {
freq[ch]++;
}
// Create a priority queue to store live nodes of the
// Huffman tree
priority_queue<Node*, vector<Node*>, compare> pq;
// Create a leaf node for each character and add it to
// the priority queue
for (auto pair : freq) {
pq.push(new Node(pair.first, pair.second));
}
// Do till there is more than one node in the queue
while (pq.size() != 1) {
// Remove the two nodes of highest priority (lowest
// frequency) from the queue
Node* left = pq.top();
pq.pop();
Node* right = pq.top();
pq.pop();
// Create a new internal node with these two nodes
// as children and with frequency equal to the sum
// of the two nodes' frequencies
int sum = left->freq + right->freq;
pq.push(new Node('\0', sum, left, right));
}
// Root stores pointer to the root of the Huffman Tree
Node* root = pq.top();
// Traverse the Huffman Tree and store Huffman Codes in
// a map
unordered_map<char, string> huffmanCode;
printCodes(root, "", huffmanCode);
// Print Huffman Codes
cout << "Huffman Codes:\n";
for (auto pair : huffmanCode) {
cout << pair.first << " " << pair.second << "\n";
}
// Print original string
cout << "\nOriginal string:\n" << text << "\n";
// Print encoded string
string str = "";
for (char ch : text) {
str += huffmanCode[ch];
}
cout << "\nEncoded string:\n" << str << "\n";
// Function to decode a given Huffman encoded string
auto decode = [&](string str) {
cout << "\nDecoded string:\n";
Node* curr = root;
for (char bit : str) {
if (bit == '0') {
curr = curr->left;
}
else {
curr = curr->right;
}
// Reached a leaf node
if (!curr->left && !curr->right) {
cout << curr->ch;
curr = root;
}
}
cout << "\n";
};
decode(str);
}
int main()
{
string text = "HUFFMAN";
buildHuffmanTree(text);
return 0;
}
OutputHuffman Codes:
M 111
A 110
U 00
F 01
N 100
H 101
Original string:
HUFFMAN
Encoded string:
101000101111110100
Decoded string:
HUFFMAN
Time complexity: O(nlogn), where n is the number of unique characters. If there are n nodes, extractMin() is called 2*(n – 1) times. extractMin() takes O(logn) time as it calls minHeapify(). So, the overall complexity is O(nlogn).
If the input array is sorted, there exists a linear time algorithm. We will soon be discussing this in our next post.
Auxiliary Space: O(N)
Working example of Huffman Coding in C
Consider the string "HUFFMAN".
1. Count Frequencies:
H: 1, U: 1, F: 2, M: 1, A: 1, N: 1
2. Build a Priority Queue:
[(1, H), (1, U), (2, F), (1, M), (1, A), (1, N)]
3. Build the Huffman Tree:
Combine U and N:
(2, UN)
(2)
/ \
U(1) N(1)
Combine H and A:
(2, HA)
(2)
/ \
H(1) A(1)
Combine M and the subtree containing H and A:
(3, MHA)
(3)
/ \
M(1) (2)
/ \
H(1) A(1)
Combine F and the subtree containing U and N:
(4, UNF)
(4)
/ \
(2) F(2)
/ \
U(1) N(1)
Combine the two subtrees:
(7, UNFMHA)
(7)
/ \
(3) (4)
/ \ / \
M(1) (2) (2) F(2)
/ \ / \
H(1) A(1) U(1) N(1)
Final Huffman Tree
Here's the final Huffman Tree for the string "HUFFMAN":
(7)
/ \
(3) (4)
/ \ / \
M(1) (2) (2) F(2)
/ \ / \
H(1) A(1) U(1) N(1)
Character Codes
H: 101, U: 00, F: 01, M: 111, A: 110, N: 100
- M: 111
- A: 110
- U: 00
- F: 01
- N: 100
- H: 101
Encode Data:
"HUFFMAN" -> 101000101111110100
Decode Data:
"101000101111110100" -> "HUFFMAN"
Applications of Huffman Coding
- It is used in file compression for reducing the size of files such as text, images, and videos (ZIP, GZIP).
- It can be used to efficiently transmits data over networks by reducing the amount of data to be sent.
- It is commonly used in formats like JPEG, MP3, and MPEG.
Similar Reads
Huffman Coding in C
In this article, we will learn the implementation of Huffman Coding in C. What is Huffman Coding?Huffman Coding is a lossless data compression algorithm. It assigns variable-length codes to input characters, with shorter codes assigned to more frequent characters. This algorithm makes sure that the
8 min read
Huffman Coding Java
The Huffman coding is a popular algorithm used for lossless data compression. It works by assigning the variable-length codes to the input characters with the shorter codes assigned to the more frequent characters. This results in the prefix-free code meaning no code is a prefix of any other code. T
5 min read
Huffman Coding in Python
Huffman Coding is one of the most popular lossless data compression techniques. This article aims at diving deep into the Huffman Coding and its implementation in Python. What is Huffman Coding?Huffman Coding is an approach used in lossless data compression with the primary objective of delivering r
3 min read
Naming Convention in C++
Naming a file or a variable is the first and the very basic step that a programmer takes to write clean codes, where naming has to be appropriate so that for any other programmer it acts as an easy way to read the code. In C++, naming conventions are the set of rules for choosing the valid name for
6 min read
Header Files in C++
C++ offers its users a variety of functions, one of which is included in header files. In C++, all the header files may or may not end with the ".h" extension unlike in C, Where all the header files must necessarily end with the ".h" extension. Header files in C++ are basically used to declare an in
6 min read
String Functions In C++
A string is referred to as an array of characters. In C++, a stream/sequence of characters is stored in a char array. C++ includes the std::string class that is used to represent strings. It is one of the most fundamental datatypes in C++ and it comes with a huge set of inbuilt functions. In this ar
9 min read
Function Pointer in C++
Prerequisites: Pointers in C++Function in C++ Pointers are symbolic representations of addresses. They enable programs to simulate call-by-reference as well as to create and manipulate dynamic data structures. Iterating over elements in arrays or other data structures is one of the main use of point
4 min read
Edit Distance in C++
The Edit Distance problem is a common dynamic programming problem in which we need to transform one string into another with the minimum number of operations. In this article, we will learn how to solve this problem in C++ for two given strings, str1 and str2, and find the minimum number of edits re
9 min read
C++ Map of Functions
Prerequisites: Functions in C++Map in C++ Maps in C++ are a very useful data structure for storing key-value pairs. A map is the best way to store where we can directly find and access the value using a key with the best time complexity. However, did you know that you can also store functions as val
5 min read
Calling Conventions in C/C++
In C/C++ programming, a calling convention is a set of rules that specify how a function will be called. You might have seen keywords like __cdecl or __stdcall when you get linking errors. For example: error LNK2019: unresolved external symbol "void __cdecl A(void)" (?A@@YAXXZ) referenced in functio
11 min read