0% found this document useful (0 votes)

10 views8 pages

Dmhuff

Uploaded by

manabideka19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views8 pages

Dmhuff

Uploaded by

manabideka19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Domain Specific Hierarchical Huffman Encoding

[Link], [Link] Manik, [Link],[Link]

Department of Computer Science and Engineering,

Indian Institute of Information Technology, Design and Manufacturing, Kancheepuram, Chennai, India.
{coe09b007,coe09b008,sadagopan,sivaselvanb}@[Link]

Abstract. In this paper, we revisit the classical data compression problem for domain specific texts.
It is well-known that classical Huffman algorithm is optimal with respect to prefix encoding and the
compression is done at character level. Since many data transfer are domain specific, for example,
downloading of lecture notes, web-blogs, etc., it is natural to think of data compression in larger
arXiv:1307.0920v1 [[Link]] 3 Jul 2013

dimensions (i.e. word level rather than character level). Our framework employs a two-level compression
scheme in which the first level identifies frequent patterns in the text using classical frequent pattern
algorithms. The identified patterns are replaced with special strings and to acheive a better compression
ratio the length of a special string is ensured to be shorter than the length of the corresponding
pattern. After this transformation, on the resultant text, we employ classical Huffman data compression
algorithm. In short, in the first level compression is done at word level and in the second level it is at
character level. Interestingly, this two level compression technique for domain specific text outperforms
classical Huffman technique. To support our claim, we have presented both theoretical and simulation
results for domain specific texts.

1 Introduction

Data transfer between a pair of nodes is a fundamental problem in computer networks. Data compression is
a technique that speeds up the data transfer by compressing the data at the sender and the original data
is recovered at the receiver by employing a decompression technique. Data compression (decompression) is
a classical problem in computer science and it has attracted many researchers in the past and the popu-
lar one is due to Huffman. Huffman data compression technique does character-level compression and does
not assume anything about the underlying domain. Huffman’s approach is the following: assign a shorter
code for a character which occurs most in the text to be compressed. Interestingly, this approach is optimal
with respect to prefix encoding. With the discovery of Data mining and in particular, the data compression
perspective of data mining looks at the text from a larger dimension and focuses on identifying patterns
(words) that occur frequently in the text. This line of research was initiated in [9,10]. In both approaches
there is little assumption about the input text and the patterns to be searched is precisely from the standard
dictionary words. However, many data transfer operation is domain specific, for example, downloading of
lecture notes, web-blogs, etc. Moroever, we noticed that the data available in the Web Servers (Academic
Servers) are tagged or classified according to the domain from which the text is derived, for example, the
blog websites or news aggregators running on the web servers display plots (news) tagged with the domain of
the text. A technical blog on programming (computer science) containing posts related to computer science
can be better compressed and sent to the readers. These observations motivate us look at data compression
perspective for domain specific text. Moreover, the existing approaches identify patterns using dictionary
as the reference which are not efficient enough for domain specific text as most of the domain specific key-
words do not appear in the dictionary. So, this calls for a different data compression approach for domain
specific text. Therefore, the combinatorial problem at hand is to compress a domain specific text based on
the frequency of patterns generated from the text and the objective is to maximize the compression ratio by
minimizing the size of file to be transferred between sender and the receiver.
Related Work: The study of data compression was initiated by Huffman[3]. His technique focuses on
character-level compression using the frequency count of the characters in the text. Huffman’s result is very
well-known in the literature and it is in use even today. In [5], the possibility of data compression by replacing
certain characters in the words by a special character 0 ∗0 and retaining a few characters so that the word is
still retrievable unambiguously were discussed. It is important to note that any compression technique must
be loss less and towards this end Ian et al. proposed a language model for representing the data efficiently
through the identification of new tokens and tokens in context of the text under consideration. A slightly
different approach inspired by Pitman Shorthand was adopted in [7]. The approach is to encode a group of
successive 2-3 text characters into a single code. In [7], it is also shown that further application of Huffman
coding on the codes generated is possible and is expected to result in a greater compression. As far as pattern
learning and discovery algorithms are concerned, apart from classical frequent pattern mining algorithms
[2,1], the use of Genetic algorithms to arrive at rules or hypotheses for pattern learning in the text compres-
sion is presented in [4]. Searching of fixed length patterns in the text can be done very efficiently using the
’TARA’ algorithm proposed in [8]. Two level dictionary based text compression scheme is proposed in [10].
It involves the transformation of the original text with a dictionary of fixed frequent English words. The
disadvantage is the need for a huge dictionary to be present on both the compressor and de-compressor.
Our work: In this paper, we present a data compression algorithm for a domain specific text. As men-
tioned before, many data transfers are domain specific and it is natural to think of domain specific data
compression algorithms. We propose a framework for compression and it works for any domain in general.
For our discussion purposes, we work with computer science domain. Our framework employs a two-level
compression scheme in which the first level identifies frequent patterns in the text using classical frequent
pattern algorithms. The identified patterns are replaced with special strings and to ensure a better compres-
sion ratio the length of a special string is shorter than the length of the corresponding pattern. After this
transformation, on the resultant text, we employ classical Huffman data compression algorithm. In short, in
the first level compression is done at word level and in the second level it is at character level. Interestingly,
this two level compression technique for domain specific text outperforms classical Huffman technique. To
support our claim, we have presented both theoretical and simulation results for domain specific texts.
Road map: In Section 2, we present theoretical results of our framework. Simulation results of our proposed
algorithm is presented in Section 3. We conclude this section with a flowchart describing how our proposed
approach is employed for compression and decompression stages during data transfer.

2 Hierarchical Huffman Encoding: Theory and Simulation

In this section, we present a theoretical study of Hierarchical Huffman encoding followed by simulation
results. We first discuss our approach to find frequent patterns following which we present a polynomial-
time algorithm for Hierarchical Huffman encoding and it is polynomial in the size of the input text. In the
subsequent sections, we present an implementation using Hash table data structure followed by the run-time
analysis of the algorithm. Further, we show that Hierarchical Huffman encoding achieves better compression
ratio than classical Huffman for Domain specific text. We support our theoretical study by a thorough
simulation of Hierarchical-Huffman algorithm for various Domain specific text inputs. Our findings, both
theoretical and simulation reveal that the proposed two-level compression outperforms classical Huffman
encoding algorithm.

2.1 Identifying Frequent Patterns from the text

We mine the input text to identify frequent patterns. We make use of Python Natural Language Processing
Toolkit (NLTK).

– The NLTK toolkit supports the reading of the training data into the so called Corpus Reader, a specialized
object class to enable faster access to large text files stored on the secondary memory.
– The training data are loaded into the Corpus Reader in NLTK along with parameters Length and
Frequency. The clustering is done based on the parameter values. The words extracted from the text are
clustered into 4 clusters based on the parameters provided.

2
– The 4 clusters are infrequent short, infrequent long, frequent short, and frequent long clusters. The NLTK
clustering algorithm is so designed to choose only the frequent long cluster. The other 3 clusters are
avoided because their contribution in the improvement of compression ratio is minimal. i.e., the infrequent
long and infrequent short clusters are not replaced with special strings and they are simply passed to the
next stage of algorithm. Similarly, the frequent short patterns are also not replaced with special strings
as the overhead for replacing short pattern with a special string is more than leaving the pattern as such
in the text.
– Since our approach is for a domain specific text, we also append the keywords (frequent patterns) from
the standard text books to our corpus generated out of NLTK algorithm. The complete set of the frequent
patterns is now sorted in the order of their length and written to a file further processing.

Fig. 1. The Flowchart of Mining Algorithm to find Clusters

2.2 Hierarchical Huffman Algorithm

Hierarchical Huffman algorithm for domain specific texts is presented in Algorithm 1.

2.3 Implementation of Hierarchical-Huffman

– We maintain a Hash-table to store the set P of patterns obtained. Against, each pattern Pi , we store
the replacement string ri . For a given Pi , the location in Hash table is identified using the function
MAP-CONTAINER() available in UNIX systems.
– At the decoding stage, for a given ri , we can uniquely retrieve Pi using a bijective function.
– To implement classical-Huffman, we make use of a Min-heap data structure.

2.4 Run-time Analysis of Hierarchical-Huffman

Let the size of the text T be n, n being the number of characters in T . Clearly, |P | ≤ n. On an average, the
Hash table operations Insert and Search take O(1) time. The operations supported by Min-heap for Classical

3
Algorithm 1 Hierarchical Huffman Encoding Hierarchical-Huffman(Text T)
1: Get Patterns of length at least 3 using Frequent Pattern Algorithm to populate PATTERN-DATABASE
2: Also, get keywords of length at least 3 from standard texts (specific to domain) to populate PATTERN-
DATABASE
3: Let P = {P1 , P2 , . . . , Pk } denote the set of patterns such that |Pi | ≥ 3 and R = {r1 , . . . , rk } denote the set of
replacement strings for encoding.
4: while text T is not exhausted do
5: for each word w of T , check whether w is a pattern in P do
6: if w is Pi for some i then
7: Replace Pi by ri in T
8: else
9:
10: end if
11: end for
12: end while
13: Let Tlevel1 is the updated text of T
14: Call Classical-Huffman(Tlevel1 )
15: Let TH be the Huffman tree corresponding to Tlevel1 and CH be the corresponding codes for encoding

Huffman can be implemented in O(n log n) time. Therefore, the overall time-complexity is O(n log n). In the
worst case, the Hash table operations incur O(n) which is the size of the Hash table. The overall run-time
is still O(n log n). The data structure B-trees can also be used to maintain the set of patterns instead of
Hash table. For B-trees, the dictionary operations incur O(log n) time. Note that the overall run-time is still
O(n. log n).

2.5 Hierarchical Huffman Outperforms Classical Huffman

In this section, we show that Hierarchical Huffman achieves a better compression ratio than classical Huffman
for domain specific input text.
Theorem 1. Hierarchical Huffman Outperforms Classical Huffman.

Proof. Consider the set P = {P1 , P2 , . . . , Pk } of patterns such that |Pi | ≥ 3. Since each Pi occurs at random
in text T , we denote the number of occurrences of each Pi using a random variable. Let Xi be a random
variable denoting the number of occurrences of Pi in T . In Tlevel1 , each Pi is replaced with a special string
ri such that |ri | ≤ |Pi |. This is an invariant maintained by our algorithm during each scan of the text T .
Moreover, the patterns of length at most two are discarded and not stored in the Hash table. The set P
is organized in such a way that for each Pi and Pj , i < j, the replacement strings ri and rj are such that
|ri | ≤ |rj |. Therefore, the size of the text resulting from first-level compression is

– Tlevel1 = r1 .X1 + . . . + rk .Xk + Tinf requent + Tshortpattern , where Tshortpattern denotes patterns of length
at most 2 and Tinf requent denotes patterns that are infrequent.
– The compressed text Tlevel1 contains replacement strings for patterns which are frequent (say, for exam-
ple, it occurs at least 10 times in T ) and each such pattern is of length at least 3 in T .
– Since |ri | ≤ |Pi |, Tlevel1 ≤ T , where T is the original text size. i.e., to say that the size of the text
resulting from first level compression (replacement of patterns by special strings) is at most the input
text.
– In the second level compression, Tlevel1 is given as an input to Classical-Huffman. It is a well-known fact
that Classical Huffman is optimal with respect to prefix encoding. Therefore, Hierarchical Huffman is
optimal. Hence, the claim follows. t
u

Inference: Note that if |ri | < |Pi |, for many Pi ’s and the frequency count of each Pi is also higher, then the
size of the compressed text Tlevel1 << T .

4
Compression Ratio: Compression Ratio= Performance of Hierarchical Huffman , where
Performance of Classical Huffman
Two level compression on T
Performance of Hierarchical Huffman= Input Text T
Performance of Classical Huffman= Classical Huffman on T .
Input Text T
Two level compression on T
Essentially, Compression Ratio=
Classical Huffman on T
Two level compression on T refers to the replacement of Pi ’s by ri ’s followed by Classical Huffman. Clearly,
the lower the ratio, the better is the two-level compression.

3 Simulation of Hierarchical-Huffman Algorithm

In this section, we validate our theoretical study by a thorough simulation of Hierarchical Huffman for differ-
ent input texts. We have considered input texts from Computer Science Domain. Keywords from standard
computer science texts are also considered as patterns for the study. Our simulation includes various text
files whose size ranging from 500kB to 2MB. Various plots and findings from simulation are given below:
– Figure 2, illustrates the performance of Hierarchical Huffman and Classical Huffman for different text
inputs. From the plot we infer that for large input texts, the performance of Hierarchical Huffman is much
better than the performance of classical Huffman. This is true because, for large input texts, frequency
of patterns increases which lead to reduction in the size of the text output by two-level compression
routine.

Fig. 2. Input Text vs Performance

– In Figure 3, we show the plot between compression ratio and various input texts. Recall that, from our
theoretical study we infered that the higher the input text size, the better the compression ratio. This is
precisely evident in Figure 3 as well.
– Theoretical analysis reveals that to get a good compression ratio, the input text must contain smaller
patterns with large frequencies. For input texts with large patterns, it appears like Hierarchical Huffman

5
Fig. 3. Input Text vs Compression Ratio

as good as classical Huffman and this observation is clearly evident from our simulation study and it is
illustrated in Figure 4

Fig. 4. Comparison of Two Techniques for Large Patterns

– Note that during data transmission, the Hash table of reference strings of patterns will be sent along
with the input text. Since it is a domain-specific downloads, it is sufficient to send the Hash table exactly
once. Although, it is an overhead, the performance can be seen if the number of downloads is large.
This is precisely illustrated in Figure 5. Due to this overhead, if the number of downloads is small, then
classical Huffman performs better than the Hierarchical Huffman. There is critical point which denotes
the minimum number of downloads beyond which Hierarchical Huffman outperforms classical Huffman.
– As mentioned before, shorter patterns with good frequency count gives good compression ratio than
larger patterns with same frequency count. This observation is validated for different texts and it is

6
Fig. 5. Identifying Critical Point for the two Compression Schemes

illustrated in Figure 6. The plot is done by taking average of the compression ratios for different input
texts versus the vector (length of the pattern,frequency of the pattern).

Fig. 6. Effects of Shorter vs Larger Patterns on the Compression Ratio

3.1 Conclusions and Further Research

In this paper, we have proposed a framework for data compression for domain specific input text. Our
framework involves two-level compression in which the first level is done at word level and the second
level is at character level. We have also shown that this two-level approach outperforms classical Huffman
for domain specific input text. All our claims are supported by theoretical results and validated with a

7
thorough simulation. An interesting problem for further research is to analyse our two-level scheme for video
compression.

References
1. Naren Ramakrishnan, Ananth Grama: Data Mining: From Serendipity to Science, IEEE Computers 32(8): 34-37
(1999).
2. Jiawei Han, Micheline Kamber: Data Mining: Concepts and Techniques, Morgan Kaufmann 2000, ISBN 1-55860-
489-8.
3. Aho and Hopcroft: Data Structures and Algorithms, Academic Press, 1990.
4. David E. Goldberg: Genetic Algorithms in Search Optimization and Machine Learning. Addison-Wesley 1989,
ISBN 0-201-15767-5
5. Robert Franceschini, Amar Mukherjee: Data Compression Using Encrypted Text. ADL 1996: 130-138
6. Ian H. Witten, Zane Bray, Malika Mahoui, W. J. Teahan: Text Mining: A New Frontier for Lossless Compression.
Data Compression Conference 1999: 198-207
7. P. Nagabhushan, S. Murali: Pitman Shorthand Inspired Model for Plain Text Compression. ICDAR 2001: 132
8. TARA: An Algorithm for Fast Searching of Multiple Patterns on Text Files, Technical Report, Turkish Army
Gendarme Headquarter,Bestepe,Ankara,Turkey, 2007
9. Weifeng Sun, Nan Zhang, Amar Mukherjee: A Dictionary-Based Multi-Corpora Text Compression System. DCC
2003: 448
10. Md. Ziaul Karim Zia, Dewan Md. Fayzur Rahman, and Chowdhury Mofizur Rahman, Two-Level Dictionary
Based Text Compression Scheme, Proceedings of 11th International Conference on Computer and Information
Technology (ICCIT 2008), 2008, pp. 13-18

Huffman Coding: Data Compression Study
No ratings yet
Huffman Coding: Data Compression Study
3 pages
24mcs10054 Adsa Mini Project Report
No ratings yet
24mcs10054 Adsa Mini Project Report
21 pages
Data Compresion 1
No ratings yet
Data Compresion 1
2 pages
A Novel Encoding Algorithm For Textual Data Compression
No ratings yet
A Novel Encoding Algorithm For Textual Data Compression
14 pages
Rakib Project
No ratings yet
Rakib Project
14 pages
Algorithms: Compressed Matching in Dictionaries
No ratings yet
Algorithms: Compressed Matching in Dictionaries
14 pages
Haufman 1
No ratings yet
Haufman 1
8 pages
10 1016@j Aei 2008 05 001
No ratings yet
10 1016@j Aei 2008 05 001
8 pages
Application of Compression
No ratings yet
Application of Compression
14 pages
Data Compression for Tech Experts
100% (1)
Data Compression for Tech Experts
14 pages
Content-Based Textual Big Data Analysis and Compression: Fei Gao Ananya Dutta Jiangjiang Liu
No ratings yet
Content-Based Textual Big Data Analysis and Compression: Fei Gao Ananya Dutta Jiangjiang Liu
6 pages
Huffman Coding
No ratings yet
Huffman Coding
11 pages
Chapter Four Indexing Structure
100% (2)
Chapter Four Indexing Structure
60 pages
Algorithm Analysis of Huffman Coding Using Python
No ratings yet
Algorithm Analysis of Huffman Coding Using Python
16 pages
Implementing Huffman Coding in Python
No ratings yet
Implementing Huffman Coding in Python
5 pages
Image Compression by Retaining Image Quality - Ieee Format
No ratings yet
Image Compression by Retaining Image Quality - Ieee Format
4 pages
Huffman Coding: Techniques & Applications
No ratings yet
Huffman Coding: Techniques & Applications
8 pages
Deep Dive Into Huffman Coding Techniques
No ratings yet
Deep Dive Into Huffman Coding Techniques
3 pages
IJCST V4I3P43 With Cover Page v2
No ratings yet
IJCST V4I3P43 With Cover Page v2
7 pages
Assignment No-05
No ratings yet
Assignment No-05
3 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Group-8 DIP Presentation
No ratings yet
Group-8 DIP Presentation
100 pages
Paper 1 Adaptive Huffman Algorithm For Data Compression Using Text Clustering
No ratings yet
Paper 1 Adaptive Huffman Algorithm For Data Compression Using Text Clustering
11 pages
Huffman Coding Compression Intro
No ratings yet
Huffman Coding Compression Intro
4 pages
A New Approach For Compression On Textual Data
No ratings yet
A New Approach For Compression On Textual Data
4 pages
Huffman Data Compression Explained
No ratings yet
Huffman Data Compression Explained
4 pages
Comparison of Lossless Compression Algorithms
No ratings yet
Comparison of Lossless Compression Algorithms
12 pages
Data Encryption & Comp 2
No ratings yet
Data Encryption & Comp 2
6 pages
Huffman Coding for Tech Enthusiasts
No ratings yet
Huffman Coding for Tech Enthusiasts
5 pages
Lossless Img Comp
No ratings yet
Lossless Img Comp
8 pages
Huffman Coding Using MATLAB (PoojaS)
75% (4)
Huffman Coding Using MATLAB (PoojaS)
20 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
Department of Information and Communication Engineering (ICE)
No ratings yet
Department of Information and Communication Engineering (ICE)
11 pages
Algorithm Analysis and Implementation of Huffman Coding For Grayscale Image Compression Using Python
No ratings yet
Algorithm Analysis and Implementation of Huffman Coding For Grayscale Image Compression Using Python
18 pages
Adaptive Huffman Coding Analysis
No ratings yet
Adaptive Huffman Coding Analysis
2 pages
DC 3
No ratings yet
DC 3
20 pages
Comparison of Huffman Algorithm and Lempel-Ziv Algorithm For Audio, Image and Text Compression
No ratings yet
Comparison of Huffman Algorithm and Lempel-Ziv Algorithm For Audio, Image and Text Compression
7 pages
16 San
No ratings yet
16 San
7 pages
Analysis and Comparison of Algorithms For Lossless Data Compression
No ratings yet
Analysis and Comparison of Algorithms For Lossless Data Compression
8 pages
Huffman Coding for Data Compression
No ratings yet
Huffman Coding for Data Compression
5 pages
Huffman Encoding Project Report
No ratings yet
Huffman Encoding Project Report
36 pages
Huffman Coding: Brief Theory
No ratings yet
Huffman Coding: Brief Theory
6 pages
5658 IJSRM Paper
No ratings yet
5658 IJSRM Paper
6 pages
Data Compression Techniques
No ratings yet
Data Compression Techniques
5 pages
Data Compression: Debra A. Lelewer and Daniel S. Hirschberg
No ratings yet
Data Compression: Debra A. Lelewer and Daniel S. Hirschberg
36 pages
V3i1 0198
No ratings yet
V3i1 0198
5 pages
Huffman Coding for Data Compression
No ratings yet
Huffman Coding for Data Compression
102 pages
Lecture 22 Compression
No ratings yet
Lecture 22 Compression
42 pages
Mobile Text Compression Efficiency
No ratings yet
Mobile Text Compression Efficiency
5 pages
Synopsis On: Data Compression
No ratings yet
Synopsis On: Data Compression
25 pages
Huffman Coding for CS Students
100% (1)
Huffman Coding for CS Students
54 pages
On Improving Tunstall Codes: Shmuel T. Klein and Dana Shapira
No ratings yet
On Improving Tunstall Codes: Shmuel T. Klein and Dana Shapira
16 pages
Huffman Coding: Document Compression Study
No ratings yet
Huffman Coding: Document Compression Study
5 pages
Huffman and LZW Coding Techniques
No ratings yet
Huffman and LZW Coding Techniques
41 pages
Huffman Coding for Data Compression
No ratings yet
Huffman Coding for Data Compression
13 pages
Block Sorting Text Compression - Final Report: Peter Fenwick, Technical Report 130 ISSN 1173-3500 23 April 1996
No ratings yet
Block Sorting Text Compression - Final Report: Peter Fenwick, Technical Report 130 ISSN 1173-3500 23 April 1996
25 pages
Text Data Compression Algorithms
No ratings yet
Text Data Compression Algorithms
25 pages
Huffman Coding by Akas
No ratings yet
Huffman Coding by Akas
48 pages
Programming Fundamentals Using MATLAB Analyze Data and Develop Algorithms 1st Edition by Michael Weeks B09NZC2XC1
100% (18)
Programming Fundamentals Using MATLAB Analyze Data and Develop Algorithms 1st Edition by Michael Weeks B09NZC2XC1
77 pages
EB8000 Manual All in One PDF
No ratings yet
EB8000 Manual All in One PDF
702 pages
FAQ in Java Mock Scaler
No ratings yet
FAQ in Java Mock Scaler
4 pages
Java String and StringBuffer Handling
No ratings yet
Java String and StringBuffer Handling
10 pages
IR Chapter Three
No ratings yet
IR Chapter Three
59 pages
Java Programming Challenges
100% (4)
Java Programming Challenges
332 pages
SAP ABAP 7.4 Syntax
No ratings yet
SAP ABAP 7.4 Syntax
8 pages
G8 Unit 1 RP
No ratings yet
G8 Unit 1 RP
14 pages
Syntax Analysis: CD: Compiler Design
No ratings yet
Syntax Analysis: CD: Compiler Design
90 pages
Blue-J: ICSE Class - 10
100% (2)
Blue-J: ICSE Class - 10
186 pages
STRING WORKSHEET FOR X QP
No ratings yet
STRING WORKSHEET FOR X QP
4 pages
Inti Go Docs
No ratings yet
Inti Go Docs
12 pages
Refman
No ratings yet
Refman
66 pages
Whole Genome Alignment System
No ratings yet
Whole Genome Alignment System
8 pages
Datascience Notes Unit-2
No ratings yet
Datascience Notes Unit-2
44 pages
Assignment 1 (Programming in C)
No ratings yet
Assignment 1 (Programming in C)
20 pages
Boyer Moore
100% (1)
Boyer Moore
19 pages
TOC Unit1-1
No ratings yet
TOC Unit1-1
40 pages
DSA by Shradha Didi & Aman Bhaiya
No ratings yet
DSA by Shradha Didi & Aman Bhaiya
7 pages
EXCEL VBA - Step by Step Guide To Learning Excel Programming Language For Beginners - Excel VBA Progra PDF
100% (1)
EXCEL VBA - Step by Step Guide To Learning Excel Programming Language For Beginners - Excel VBA Progra PDF
75 pages
Appian Interview Question and Answers
100% (2)
Appian Interview Question and Answers
12 pages
Active Server Pages PDF
No ratings yet
Active Server Pages PDF
34 pages
12th Comp - Sci (EM) - SLM - 2024 - 25
No ratings yet
12th Comp - Sci (EM) - SLM - 2024 - 25
11 pages
Python Cryptography Project Guide
No ratings yet
Python Cryptography Project Guide
7 pages
Ab Initio Functions Cheat Sheet Full
No ratings yet
Ab Initio Functions Cheat Sheet Full
2 pages
Class Xi PT 2 Cs 2022
No ratings yet
Class Xi PT 2 Cs 2022
2 pages
PHP Quiz for Beginners
No ratings yet
PHP Quiz for Beginners
27 pages
MIPS Assembly with QtSpim Guide
No ratings yet
MIPS Assembly with QtSpim Guide
122 pages
A Data Scientist S Guide To Acquiring Cleaning and Managing Data in R 1st Edition Samuel E. Buttrey Latest PDF 2025
No ratings yet
A Data Scientist S Guide To Acquiring Cleaning and Managing Data in R 1st Edition Samuel E. Buttrey Latest PDF 2025
141 pages
ISC Board Class 11 Computer Science Syllabus 1
No ratings yet
ISC Board Class 11 Computer Science Syllabus 1
8 pages

Dmhuff

Uploaded by

Dmhuff

Uploaded by

Domain Specific Hierarchical Huffman Encoding

[Link], [Link] Manik, [Link],[Link]

Department of Computer Science and Engineering,

2 Hierarchical Huffman Encoding: Theory and Simulation

2.1 Identifying Frequent Patterns from the text

Fig. 1. The Flowchart of Mining Algorithm to find Clusters

2.2 Hierarchical Huffman Algorithm

2.3 Implementation of Hierarchical-Huffman

2.4 Run-time Analysis of Hierarchical-Huffman

2.5 Hierarchical Huffman Outperforms Classical Huffman

3 Simulation of Hierarchical-Huffman Algorithm

Fig. 2. Input Text vs Performance

Fig. 4. Comparison of Two Techniques for Large Patterns

Fig. 6. Effects of Shorter vs Larger Patterns on the Compression Ratio

3.1 Conclusions and Further Research

You might also like