Modified Run Length Encoding Scheme With Introduction of Bit Stuffing For Efficient Data Compression
Modified Run Length Encoding Scheme With Introduction of Bit Stuffing For Efficient Data Compression
Emirates
Modified Run Length Encoding Scheme with Introduction of Bit Stuffing for efficient
Data Compression
Asjad Amin, Haseeb Ahmad Qureshi, Muhammad Junaid, Muhammad Yasir Habib, Waqas Anjum
Department of Telecommunication and Electronic Engineering
The Islamia University of Bahawalpur, Pakistan
Abstract—This paper presents a modified scheme for run elements) are stored as a single data value and count, rather
length encoding. A significant improvement in compression than as the original run. This is most useful on data that
ratio for almost any kind of data can be achieved by the contains many such runs: for example, simple graphic
proposed scheme. All the limitations and problems in the images such as icons, line drawings, and animations. It is not
original run length encoding scheme have been highlighted and useful with files that don't have many runs as it could greatly
discussed in detail in this research paper. A proposed solution increase the file size.
has been suggested and performed for each problem to achieve The RLE algorithm performs a lossless compression of
intelligent and efficient coding. One of the major problems input data based on sequences of identical values (runs). It is
with original design is that a larger number of bits are used to
a historical technique, originally exploited by fax machine
represent length of each run. This has been resolved by
introducing bit stuffing in RLE. Such larger sequences that
and later adopted in image processing. The algorithm is quite
affects compression ratio are broken into small sequences easy: each run, instead of being represented explicitly, is
using bit stuffing. To allow more compression and flexibility, translated by the encoding algorithm in a pair (l,v) where l is
the length of maximum allowable bit sequence is not fixed and the length of the run and v is the value of the run elements.
can be adjusted with input. Secondly we ignore the large The longer the run in the sequence to be compressed, the
numbers of small sequences that are largely responsible for better is the compression ratio [4].
expansion of data instead of compression. Four random
A. Working of Run Length Encoding
sequences have been analyzed and when applied by modified
scheme, a compression ratio of as high as 50% is observed. An n bit of data is compressed by arranging it in the form
of run and the count of each run. The count of each run is
I. INTRODUCTION then represented in binary for the case of binary data.
Amount of data compressed is directly related to length and
Data compression is a process that reduces the amount of
number of longer runs. Run-length encoding when applied
data in order to reduce data transmitted and decreases
on data with information bits
transfer time because the size of the data is reduced [1]. Data
‘11111111111110000000000000000011111” gives us a
compression is commonly used in modern database systems.
subset of 3 pairs, each pair representing number of runs and
Compression can be utilized for different reasons including:
the bit (No of times bit occur, Bit). Hence, the above
1) Reducing storage/archival costs, which is particularly
mentioned bit pattern in the run length encoding scheme is
important for large data warehouses 2) Improving query
represented as (13,1)(17,0)(5,1). In binary form the latter
workload performance by reducing the I/O costs [2].
pairs are expressed as follows:13=01101, 17=10001 and
Data compression involves transforming a string of
5=00101. The final output comes out to
characters in some representation (such as ASCII) into a new
be011011100010001011. In this way, the original pattern of
string which contains the same information but with smallest
n bits can be compressed to a great extent thereby reducing
possible length. Data compression has important application
the data.
in the areas of data transmission and data storage.
Compressing data reduces storage and communication costs. This encoding scheme does not always perform data
Similarly, compressing a file to half of its original size is compression. In some scenarios where the runs of smaller
equivalent to doubling the capacity of the storage medium. length are in excess, this scheme performs poorly and instead
Data compression is rapidly becoming a standard component of compressing the data, the resultant output is an expanded
of communications hardware and data storage devices [3]. form of the input. Consider a pattern “101010” when applied
The paper is organized as follows: Section II presents the by Run Length Encoding, the final output comes out to be an
Original Run length encoding scheme, Section III presents expanded form of input data. The final output is
modified run length encoding scheme to overcome the (1,1)(1,0)(1,1)(1,0)(1,1)(1,0)or 111011101110 which is
problems, Section IV verifies the result of proposed larger in size than the input.
encoding scheme for four randomly chosen inputs. Section V In case of large consecutive runs of 1’s or 0’s, RLE
performs efficient compression whereas in case of a data
presents Conclusion remarks.
with large number of single 0’s or 1’s, the output is an
II. RUN LENGTH ENCODING expanded form of input sometimes the output is twice the
size of input. This expansion of data instead of compression
Run-length encoding (RLE) is a very simple form of data proves RLE technique less reliable. That is why run length
compression in which runs of data (that is, sequences in encoding is a poor technique and practically not efficient for
which the same data value occurs in many consecutive data
669
because we have ignored single and double one/zeros and the 600 Input data
smallest sequence that is included in RLE is 000 or 111.
It is clear that a single zero/one or double zero/one occurs Figure 3. Distribution of consecutive bit sequences for Input 3
more than any other bit sequence. It can be verified by all the
four figures. It can also be seen that some very larger 600 Input data
sequences occur in almost every input data. These sequences
350
No of times sequence occur
100
300
0
0 5 10 15 20 25 30 35 40 45
250 Consecutive Bit Sequence
400
bits to 5. Bit stuffed sequence for the above input data is
shown in figure 5, 6, 7 and 8.
300 500 Bit Stuffing of Input data
No of times sequence occur
450
400
200
350
300
100
250
200
0 150
0 5 10 15 20 25 30
100
Consecutive Bit Sequence
50
0
0 5 10 20 25 30
Consecutive Bit Sequence
Figure 2. Distribution of consecutive bit sequences for Input 2
Figure 5. Data of Input 1 after Bit Stuffing
670
700 Bit Stuffing of Input data Data after ignoring small sequences
80
600 70
60
500
50
400
40
300
30
200 20
100 10
0
0 0 2 4 6 8 10 12 14 16
0 5 10 15 20 25 30
Consecutive Bit Sequence
Consecutive Bit Sequence
Figure 6. Data of Input 2 after Bit Stuffing Figure 9. Data of Input 1 after bit stuffing & ignoring small sequences
700
300 40
200 30
20
100
10
0
0 5 10 15 20 25 30 35 40 45
0
Consecutive Bit Sequence 0 2 4 6 8 10 12 14 16
Consecutive Bit Sequence
Figure 7. Data of Input 3 after Bit Stuffing
Figure 10. Data of Input 2 after bit stuffing & ignoring small sequences
800
Bit Stuffing of Input data
160
Data after ignoring small sequences
No of times sequence occur
700
No of times sequence occur
140
600
120
500
100
400
80
300
60
200
40
100
20
0
0 5 10 15 20 25 30 35 40 45 0
0 5 10 15 20 25 30 35
Consecutive Bit Sequence Consecutive Bit Sequence
Figure 8. Data of Input 4 after Bit Stuffing Figure 11. Data of Input 3 after bit stuffing & ignoring small sequences
120
In the last step, we ignore single 0's/1's and double 0's/1's
and apply the Run Length Encoding Scheme on the 100
remaining data. The data for the above inputs after bit 80
671
Figure 13 and figure 14 shows the amount of Intelligent coding is done on the remaining data. Then, a
compression that has been achieved usinng modified run combination of ignored single and double 0’s/1’s with run
length encoding for five randomly chosen input sequences, length encoded data is sent to receiiver. The receiver applies
four of which are shown previously in figuure 1, 2, 3 and 4. all the steps in reverse order. A receeiver applies a run length
Figure 15 shows a comparative chart of original and decoding scheme followed by de-stuffing
d of bits. The
compressed data. original data is then recovered at the receiver end. In this
way expansion of data can be avoiided in cases where Run
Length Encoding fails. Our resultss show a compression of
Input Sequence 1 2 3 4 5
50% to 10%.Even in worse scenario the modified algorithm
Total Bits 6327 6287 36488 16455 23381 will not expand the input.
Bits Saved 1583 681 18280 4959 2333
REFERENCE
ES
% of Data Saved 25 10.8 50.1 30.1 10
[1] Eug`enePamba Capo-Chichi, Herv´eG Guyennet, Jean-Michel Friedt,
Figure 13. TCP Frame with Data segment diviided into cells
“A new Data Compression Algorithm forWireless Sensor Network,”
in Proc Third International Conferencce on Sensor Technologies and
Applications,2009, pp.1-6 DOI 10.11099/SENSORCOMM.2009.84
% of Data Saved [2] StratosIdreos, RaghavKaushik, VivekNarasayya,
V Ravishankar
Ramamurthy, “Estimating the Comp pression Fraction of an Index
using Sampling,”in Proc. Internattional Conference on Data
25 50.1 Engineering (ICDE), 2010, doi. 10.110
09/ICDE.2010.5447694
10.8 30.1 [3] James A. Storer, “Data Compresssion Methods and Theory,”
10 Computer Science Press, 1988, 413 pp,, ISBN-10: 0716781565
[4] Stefano Ferilli, “Automatic Digitall Document Processing and
1 2 Management: Problems, Algorithm ms and techniques,”ISBN:
3 4 5 0857291971
[5] Martin H. Weik, “Computer Sccience and Communications
Dictionary,”2000, Volume 1, p.129
Figure 14. TCP Frame with Data segment diviided into cells
Figure 15. TCP Frame with Data segment diviided into cells
V. CONCLUSION
This research paper provides a new annd more reliable
technique for data compression. It solvess the limitations
present in Run Length Encoding Schem me. Problems in
traditional run length encoding are highlightted and discussed
in detail. A solution to each problem is tthen proposed in
modified run length encoding scheme. Foour random input
sequences are taken and analyzed. To makke RLE work we
first use bit stuffing to break larger sequuences and then
ignore single 0’s/1’s and double 0’s/11’s respectively.
672