Open In App

Huffman Coding in C++

Last Updated : 16 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article, we will learn the implementation of Huffman Coding in C++.

What is Huffman Coding?

Huffman Coding is a popular algorithm used for lossless data compression. It assigns variable-length codes to input characters, with shorter codes assigned to more frequent characters. This technique ensures that the most common characters are represented by shorter bit strings, reducing the overall size of the encoded data.

How does Huffman Coding work in C++?

Huffman Coding works by building a binary tree called the Huffman Tree from the input characters. The algorithm processes the input characters to construct this tree, where each leaf node represents a character and the path from the root to the leaf node determines the code for that character.

Steps to Build Huffman Tree in C++

Take an array of unique characters along with their frequency of occurrences as input and output the Huffman Tree. 

  1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)
  2. Extract two nodes with the minimum frequency from the min heap.
  3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.
  4. Repeat steps 2 and 3 until the heap contains only one node. The remaining node is the root node and the tree is complete.

Algorithm to Implement Huffman Coding

  • Frequency Calculation:
    • Calculate the frequency of each character in the input data.
  • Priority Queue Initialization:
    • Initialize a priority queue to store nodes of the Huffman Tree based on their frequencies.
  • Building the Huffman Tree:
    • Construct the Huffman Tree by repeatedly combining the two nodes with the lowest frequencies into a new node until only one node remains, which becomes the root of the Huffman Tree.
  • Generating Huffman Codes:
    • Traverse the Huffman Tree to generate the Huffman codes for each character. Assign '0' and '1' based on left and right traversal in the tree.
  • Encoding the Input Data:
    • Encode the input data using the generated Huffman codes to produce the compressed output.
  • Decoding the Encoded Data:
    • Decode the encoded data back to the original input using the Huffman Tree.

C++ Program to Implement Huffman Coding

The below program demonstrates how we can implement huffman coding in C++.

C++
// C++ program to implement huffman coding

#include <iostream>
#include <queue>
#include <unordered_map>
#include <vector>
using namespace std;

// A Huffman tree node
struct Node {
    char ch;
    int freq;
    Node* left;
    Node* right;
    Node(char ch, int freq)
        : ch(ch)
        , freq(freq)
        , left(nullptr)
        , right(nullptr)
    {
    }
    Node(char ch, int freq, Node* left, Node* right)
        : ch(ch)
        , freq(freq)
        , left(left)
        , right(right)
    {
    }
};

// Comparison object to be used to order the heap
struct compare {
    bool operator()(Node* l, Node* r)
    {
        return l->freq > r->freq;
    }
};

// Function to print the Huffman Codes
void printCodes(Node* root, string str,
                unordered_map<char, string>& huffmanCode)
{
    if (root == nullptr)
        return;

    // Found a leaf node
    if (!root->left && !root->right) {
        huffmanCode[root->ch] = str;
    }

    printCodes(root->left, str + "0", huffmanCode);
    printCodes(root->right, str + "1", huffmanCode);
}

// Function to build the Huffman Tree and generate Huffman
// Codes
void buildHuffmanTree(string text)
{
    // Count frequency of appearance of each character and
    // store it in a map
    unordered_map<char, int> freq;
    for (char ch : text) {
        freq[ch]++;
    }

    // Create a priority queue to store live nodes of the
    // Huffman tree
    priority_queue<Node*, vector<Node*>, compare> pq;

    // Create a leaf node for each character and add it to
    // the priority queue
    for (auto pair : freq) {
        pq.push(new Node(pair.first, pair.second));
    }

    // Do till there is more than one node in the queue
    while (pq.size() != 1) {
        // Remove the two nodes of highest priority (lowest
        // frequency) from the queue
        Node* left = pq.top();
        pq.pop();
        Node* right = pq.top();
        pq.pop();

        // Create a new internal node with these two nodes
        // as children and with frequency equal to the sum
        // of the two nodes' frequencies
        int sum = left->freq + right->freq;
        pq.push(new Node('\0', sum, left, right));
    }

    // Root stores pointer to the root of the Huffman Tree
    Node* root = pq.top();

    // Traverse the Huffman Tree and store Huffman Codes in
    // a map
    unordered_map<char, string> huffmanCode;
    printCodes(root, "", huffmanCode);

    // Print Huffman Codes
    cout << "Huffman Codes:\n";
    for (auto pair : huffmanCode) {
        cout << pair.first << " " << pair.second << "\n";
    }

    // Print original string
    cout << "\nOriginal string:\n" << text << "\n";

    // Print encoded string
    string str = "";
    for (char ch : text) {
        str += huffmanCode[ch];
    }
    cout << "\nEncoded string:\n" << str << "\n";

    // Function to decode a given Huffman encoded string
    auto decode = [&](string str) {
        cout << "\nDecoded string:\n";
        Node* curr = root;
        for (char bit : str) {
            if (bit == '0') {
                curr = curr->left;
            }
            else {
                curr = curr->right;
            }

            // Reached a leaf node
            if (!curr->left && !curr->right) {
                cout << curr->ch;
                curr = root;
            }
        }
        cout << "\n";
    };

    decode(str);
}

int main()
{
    string text = "HUFFMAN";
    buildHuffmanTree(text);
    return 0;
}

Output
Huffman Codes:
M 111
A 110
U 00
F 01
N 100
H 101

Original string:
HUFFMAN

Encoded string:
101000101111110100

Decoded string:
HUFFMAN

Time complexity: O(nlogn), where n is the number of unique characters. If there are n nodes, extractMin() is called 2*(n – 1) times. extractMin() takes O(logn) time as it calls minHeapify(). So, the overall complexity is O(nlogn).
If the input array is sorted, there exists a linear time algorithm. We will soon be discussing this in our next post.

Auxiliary Space: O(N)

Working example of Huffman Coding in C

Consider the string "HUFFMAN".

1. Count Frequencies:

H: 1, U: 1, F: 2, M: 1, A: 1, N: 1

2. Build a Priority Queue:

[(1, H), (1, U), (2, F), (1, M), (1, A), (1, N)]

3. Build the Huffman Tree:

Combine U and N:

(2, UN)
    (2)
/ \
U(1) N(1)

Combine H and A:

(2, HA)
    (2)
/ \
H(1) A(1)

Combine M and the subtree containing H and A:

(3, MHA)
      (3)
/ \
M(1) (2)
/ \
H(1) A(1)

Combine F and the subtree containing U and N:

(4, UNF)
      (4)
/ \
(2) F(2)
/ \
U(1) N(1)

Combine the two subtrees:

(7, UNFMHA)
         (7)
/ \
(3) (4)
/ \ / \
M(1) (2) (2) F(2)
/ \ / \
H(1) A(1) U(1) N(1)

Final Huffman Tree

Here's the final Huffman Tree for the string "HUFFMAN":

         (7)
/ \
(3) (4)
/ \ / \
M(1) (2) (2) F(2)
/ \ / \
H(1) A(1) U(1) N(1)

Character Codes

H: 101, U: 00, F: 01, M: 111, A: 110, N: 100
  • M: 111
  • A: 110
  • U: 00
  • F: 01
  • N: 100
  • H: 101

Encode Data:

"HUFFMAN" -> 101000101111110100

Decode Data:

"101000101111110100" -> "HUFFMAN"

Applications of Huffman Coding

  • It is used in file compression for reducing the size of files such as text, images, and videos (ZIP, GZIP).
  • It can be used to efficiently transmits data over networks by reducing the amount of data to be sent.
  • It is commonly used in formats like JPEG, MP3, and MPEG.

Next Article
Article Tags :
Practice Tags :

Similar Reads