0% found this document useful (0 votes)
99 views5 pages

Modified Run Length Encoding Scheme With Introduction of Bit Stuffing For Efficient Data Compression

Uploaded by

Ankit Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views5 pages

Modified Run Length Encoding Scheme With Introduction of Bit Stuffing For Efficient Data Compression

Uploaded by

Ankit Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

6th International Conference on Internet Technology and Secured Transactions, 11-14 December 2011, Abu Dhabi, United Arab

Emirates

Modified Run Length Encoding Scheme with Introduction of Bit Stuffing for efficient
Data Compression
Asjad Amin, Haseeb Ahmad Qureshi, Muhammad Junaid, Muhammad Yasir Habib, Waqas Anjum
Department of Telecommunication and Electronic Engineering
The Islamia University of Bahawalpur, Pakistan

[[email protected],[email protected], [email protected], [email protected],


[email protected]]

Abstract—This paper presents a modified scheme for run elements) are stored as a single data value and count, rather
length encoding. A significant improvement in compression than as the original run. This is most useful on data that
ratio for almost any kind of data can be achieved by the contains many such runs: for example, simple graphic
proposed scheme. All the limitations and problems in the images such as icons, line drawings, and animations. It is not
original run length encoding scheme have been highlighted and useful with files that don't have many runs as it could greatly
discussed in detail in this research paper. A proposed solution increase the file size.
has been suggested and performed for each problem to achieve The RLE algorithm performs a lossless compression of
intelligent and efficient coding. One of the major problems input data based on sequences of identical values (runs). It is
with original design is that a larger number of bits are used to
a historical technique, originally exploited by fax machine
represent length of each run. This has been resolved by
introducing bit stuffing in RLE. Such larger sequences that
and later adopted in image processing. The algorithm is quite
affects compression ratio are broken into small sequences easy: each run, instead of being represented explicitly, is
using bit stuffing. To allow more compression and flexibility, translated by the encoding algorithm in a pair (l,v) where l is
the length of maximum allowable bit sequence is not fixed and the length of the run and v is the value of the run elements.
can be adjusted with input. Secondly we ignore the large The longer the run in the sequence to be compressed, the
numbers of small sequences that are largely responsible for better is the compression ratio [4].
expansion of data instead of compression. Four random
A. Working of Run Length Encoding
sequences have been analyzed and when applied by modified
scheme, a compression ratio of as high as 50% is observed. An n bit of data is compressed by arranging it in the form
of run and the count of each run. The count of each run is
I. INTRODUCTION then represented in binary for the case of binary data.
Amount of data compressed is directly related to length and
Data compression is a process that reduces the amount of
number of longer runs. Run-length encoding when applied
data in order to reduce data transmitted and decreases
on data with information bits
transfer time because the size of the data is reduced [1]. Data
‘11111111111110000000000000000011111” gives us a
compression is commonly used in modern database systems.
subset of 3 pairs, each pair representing number of runs and
Compression can be utilized for different reasons including:
the bit (No of times bit occur, Bit). Hence, the above
1) Reducing storage/archival costs, which is particularly
mentioned bit pattern in the run length encoding scheme is
important for large data warehouses 2) Improving query
represented as (13,1)(17,0)(5,1). In binary form the latter
workload performance by reducing the I/O costs [2].
pairs are expressed as follows:13=01101, 17=10001 and
Data compression involves transforming a string of
5=00101. The final output comes out to
characters in some representation (such as ASCII) into a new
be011011100010001011. In this way, the original pattern of
string which contains the same information but with smallest
n bits can be compressed to a great extent thereby reducing
possible length. Data compression has important application
the data.
in the areas of data transmission and data storage.
Compressing data reduces storage and communication costs. This encoding scheme does not always perform data
Similarly, compressing a file to half of its original size is compression. In some scenarios where the runs of smaller
equivalent to doubling the capacity of the storage medium. length are in excess, this scheme performs poorly and instead
Data compression is rapidly becoming a standard component of compressing the data, the resultant output is an expanded
of communications hardware and data storage devices [3]. form of the input. Consider a pattern “101010” when applied
The paper is organized as follows: Section II presents the by Run Length Encoding, the final output comes out to be an
Original Run length encoding scheme, Section III presents expanded form of input data. The final output is
modified run length encoding scheme to overcome the (1,1)(1,0)(1,1)(1,0)(1,1)(1,0)or 111011101110 which is
problems, Section IV verifies the result of proposed larger in size than the input.
encoding scheme for four randomly chosen inputs. Section V In case of large consecutive runs of 1’s or 0’s, RLE
performs efficient compression whereas in case of a data
presents Conclusion remarks.
with large number of single 0’s or 1’s, the output is an
II. RUN LENGTH ENCODING expanded form of input sometimes the output is twice the
size of input. This expansion of data instead of compression
Run-length encoding (RLE) is a very simple form of data proves RLE technique less reliable. That is why run length
compression in which runs of data (that is, sequences in encoding is a poor technique and practically not efficient for
which the same data value occurs in many consecutive data

978-1-908320-00-1/11/$26.00 ©2011 IEEE 668


larger data. The core objective of this research paper is to The above mentioned problem in Run Length Encoding
improve this encoding technique. Scheme can be overcome by intelligent compression.
The length of largest run decides the number of bits Analyzing the input data is the first and core step. We
needed to represent the count or length of each run in run analyze data to highlight if there are any largest numbers of
length encoding technique. Consider a bit pattern sequences that may increase the number of bits to represent
01010011111111111. The largest run in the data is 11 the length of each run. Secondly we highlight the smaller
number of 1’s appearing consecutively. Therefore this run sequences of single zeros/ones double zeros/ones or triple
decides the number of bits needed to represent length of each zeros/ones that may result in expansion of data instead of
run in data. The number of bits needed to represent the compression.
length of run is 4 or the given scenario. The above sequence
is written in run length encoding as: B. Bit Stuffing:
In data transmission bit stuffing is the insertion of non-
TABLE I. RUN LENGTH ENCODING PAIRS FOR ABOVE DATA information bits into data. Stuffed bits should not be
BIT RUN LENTH ENCODING confused with overhead bits. Bit stuffing is used for various
0 0001,0 purposes, such as for bringing bit streams that do not
1 0001,1 necessarily have the same or rationally related bit rates up to
0 0001,0 a common rate, or to fill buffers or frames. The location of
1 0001,1 the stuffing bits is communicated to the receiving end of the
00 0010,0 data link, where these extra bits are removed to return the bit
11111111111 1011,1 streams to their original bit rates or form [5].
In run length encoding, bit stuffing is used to limit the
number of consecutive bits of the same value in the data to
B. Problem with Run Length Encoding be transmitted. A bit of the opposite value is inserted after
There are two basic problems that degrade the the maximum allowed number of consecutive bits. Since this
performance of Run Length encoding schemes. is a general rule the receiver doesn't need extra information
Most of the Bits in a data are arranged in runs of smaller about the location of the stuffing bits in order to do the de-
lengths that include single zeroes or single ones, double stuffing. The receiver will only require the number
zeroes or double ones. Such combination requires more maximum allowable consecutive bits.
number of bits than their actual size to represent them in run We first break the large sequences that are usually very
length encoding technique. A single 0 may be represented as fewer in number but results in increasing the number of bits
(000001, 0) which is seven times larger than the input data. needed to represent sequence length. We break such
This is one of the major performances limiting factor and sequences by introducing bit stuffing in RLE.15 consecutive
results in expansion of data instead of compression. ones are represented by 4 bits and 17 consecutive ones are
Sometimes a data may contain a very large sequence of represented by 5 bits. Therefore stuffing a zero after 15 ones
consecutive ones or zeros. Such sequences are represented in in a17 bit sequence will break the overall sequence into two
fewer numbers of bits in RLE but they might affect the parts. This will limit the largest sequence to 15 consecutive
overall compression in a negative way as largest sequence ones and we will need only 4 bits to represent the length of
decides the number of bits to represent the length of a run in each sequence instead of 5 bits. Length of a sequence after
each pair. As a result the length of run in all the other which a bit will be stuff is not fixed. For the above case it is
sequences is also represented by the same number of bits for taken as 15 but it can be adjusted as per the distribution of
which the largest run is represented. For the scenario given bits in a specific data. A single zero will be used to break a
below we need 4 extra bits to represent the length each single sequence of consecutive ones and a single one will be used
bit 0/1 because the longest sequence is represented by 4 bits to break a sequence of consecutive zeros.
as shown below C. Leaving small sequences out of RLE
0 0001,0
All the smaller sequences that may result in expansion of
111111111111111 1111,1 data are kept out of RLE. We ignore the single zeros/ones,
double zeros/ones that contribute in expanding data. Such
III. MODIFIED RUN LENGTH ENCODING SCHEME sequences are left untouched and run length encoding
scheme is not applied on them. The v in (l,v) or value of a
A. Proposed Solution run is not a single zero or one as a single zero or one will be
As highlighted above, there are two very clear problems mixed with the ignored sequences. We take the value of v as
that decrease the performance of run length encoding the smallest consecutive sequence that is considered for
scheme. We have proposed some modifications in run length RLE. Consider a data 0101010000001111111. If we apply
encoding scheme. These modifications are specially modified scheme and ignore single or double one/zeros. The
designed to counter the above mentioned problems. The output comes out to be 0101(110,000)(111,111). The value
modified run length encoding scheme gives a significant of a run for the above case will always be 000 for a sequence
improvement in compression ratio for almost any kind of of zero bits and 111 for a sequence of one bit. This is
data.

669
because we have ignored single and double one/zeros and the 600 Input data
smallest sequence that is included in RLE is 000 or 111.

No of times sequence occur


500

IV. VERIFYING MODIFIED RLE RESULTS


400
We have taken different random input sequences to
verify our algorithm’s results. These sequences are shown in 300

figure 1, 2, 3and4. We can fairly analyze the data and 200


highlight the problems with the help of given figures. Each
figure represents the number of times a consecutive bit 100

sequences is appearing in an input.


0
0 5 10 15 20 25 30 35 40 45
A. Step I, Analyzing Input Data Consecutive Bit Sequence

It is clear that a single zero/one or double zero/one occurs Figure 3. Distribution of consecutive bit sequences for Input 3
more than any other bit sequence. It can be verified by all the
four figures. It can also be seen that some very larger 600 Input data
sequences occur in almost every input data. These sequences

No of times sequence occur


are very fewer in numbers but even a single such sequence is 500

enough to increase the number of bits to represent each data 400


in case of original run length encoding.
300

400 Input data


200

350
No of times sequence occur

100

300
0
0 5 10 15 20 25 30 35 40 45
250 Consecutive Bit Sequence

200 Figure 4. Distribution of consecutive bit sequences for Input 4

150 B. Step II, Bit stuffing to break lager sequences


100 We get rid of larger sequences by applying bit stuffing.
The maximum allowable consecutive bits are different for
50
each case. It depends on the distribution of input. In figure 5
0
we break all the sequences greater than 18 consecutive
0 5 10 15 20 25 30
0's/1’s as such sequences are very fewer in number and will
Consecutive Bit Sequence
result in increasing the number of bits used to represent the
length of every sequence. Breaking the sequence greater then
Figure 1. Distribution of consecutive bit sequences for Input 1 18 bits, as shown in figure 6, helps us representing the length
of each sequence in 4 bits instead of 5. As we ignore as
600
Input data single zero/one and double zero/one. Therefore the
sequences are 3-18 and count can be represented by 4 bits.
500
For the case of figure 7 and 8 the maximum allowable
consecutive bits is taken as 34 to limit the number of count
No of times sequence occur

400
bits to 5. Bit stuffed sequence for the above input data is
shown in figure 5, 6, 7 and 8.
300 500 Bit Stuffing of Input data
No of times sequence occur

450
400
200
350
300
100
250
200
0 150
0 5 10 15 20 25 30
100
Consecutive Bit Sequence
50

0
0 5 10 20 25 30
Consecutive Bit Sequence
Figure 2. Distribution of consecutive bit sequences for Input 2
Figure 5. Data of Input 1 after Bit Stuffing

670
700 Bit Stuffing of Input data Data after ignoring small sequences
80

600 70

No of times sequence occur


No of times sequence occur

60
500

50
400
40
300
30

200 20

100 10

0
0 0 2 4 6 8 10 12 14 16
0 5 10 15 20 25 30
Consecutive Bit Sequence
Consecutive Bit Sequence

Figure 6. Data of Input 2 after Bit Stuffing Figure 9. Data of Input 1 after bit stuffing & ignoring small sequences

Bit Stuffing of Input data 100


Data after ignoring small sequences
800
90
No of times sequence occur

700

No of times sequence occur


80
600
70
500
60
400 50

300 40

200 30

20
100
10
0
0 5 10 15 20 25 30 35 40 45
0
Consecutive Bit Sequence 0 2 4 6 8 10 12 14 16
Consecutive Bit Sequence
Figure 7. Data of Input 3 after Bit Stuffing
Figure 10. Data of Input 2 after bit stuffing & ignoring small sequences

800
Bit Stuffing of Input data
160
Data after ignoring small sequences
No of times sequence occur

700
No of times sequence occur

140
600
120
500
100
400
80
300
60
200
40
100
20

0
0 5 10 15 20 25 30 35 40 45 0
0 5 10 15 20 25 30 35
Consecutive Bit Sequence Consecutive Bit Sequence

Figure 8. Data of Input 4 after Bit Stuffing Figure 11. Data of Input 3 after bit stuffing & ignoring small sequences

C. Step III, Ignoring small runs and applying modified 160


Data after ignoring small sequences
RLE 140
No of times sequence occur

120
In the last step, we ignore single 0's/1's and double 0's/1's
and apply the Run Length Encoding Scheme on the 100

remaining data. The data for the above inputs after bit 80

stuffing and ignoring single 0/1 and double 0/1 is shown in 60


figure 9, 10, 11 and 12. We observe that this data does not
40
include any larger sequences that may effect the
compression. This data also does not contain any large 20

number smaller sequences that result in expansion of data. 0


0 5 10 15 20 25 30 35
Therefore when RLE is applied on such data, we get a Consecutive Bit Sequence
remarkable amount of improvement in compression.
Figure 12. Data of Input 4 after bit stuffing & ignoring small sequences

671
Figure 13 and figure 14 shows the amount of Intelligent coding is done on the remaining data. Then, a
compression that has been achieved usinng modified run combination of ignored single and double 0’s/1’s with run
length encoding for five randomly chosen input sequences, length encoded data is sent to receiiver. The receiver applies
four of which are shown previously in figuure 1, 2, 3 and 4. all the steps in reverse order. A receeiver applies a run length
Figure 15 shows a comparative chart of original and decoding scheme followed by de-stuffing
d of bits. The
compressed data. original data is then recovered at the receiver end. In this
way expansion of data can be avoiided in cases where Run
Length Encoding fails. Our resultss show a compression of
Input Sequence 1 2 3 4 5
50% to 10%.Even in worse scenario the modified algorithm
Total Bits 6327 6287 36488 16455 23381 will not expand the input.
Bits Saved 1583 681 18280 4959 2333
REFERENCE
ES
% of Data Saved 25 10.8 50.1 30.1 10
[1] Eug`enePamba Capo-Chichi, Herv´eG Guyennet, Jean-Michel Friedt,
Figure 13. TCP Frame with Data segment diviided into cells
“A new Data Compression Algorithm forWireless Sensor Network,”
in Proc Third International Conferencce on Sensor Technologies and
Applications,2009, pp.1-6 DOI 10.11099/SENSORCOMM.2009.84
% of Data Saved [2] StratosIdreos, RaghavKaushik, VivekNarasayya,
V Ravishankar
Ramamurthy, “Estimating the Comp pression Fraction of an Index
using Sampling,”in Proc. Internattional Conference on Data
25 50.1 Engineering (ICDE), 2010, doi. 10.110
09/ICDE.2010.5447694
10.8 30.1 [3] James A. Storer, “Data Compresssion Methods and Theory,”
10 Computer Science Press, 1988, 413 pp,, ISBN-10: 0716781565
[4] Stefano Ferilli, “Automatic Digitall Document Processing and
1 2 Management: Problems, Algorithm ms and techniques,”ISBN:
3 4 5 0857291971
[5] Martin H. Weik, “Computer Sccience and Communications
Dictionary,”2000, Volume 1, p.129
Figure 14. TCP Frame with Data segment diviided into cells

Total Bits Vs Bits Saveed


40000
30000
20000
10000
0
1 2 3 4 5
Total Bits 6327 6287 36488 164555 23381
Bits Saved 1583 681 18280 4959 2333

Figure 15. TCP Frame with Data segment diviided into cells

At the receiver end, the output data is theen fed as an input


and same steps are performed in reversse order (i.e. in
sequence of steps III, II & I) to extract the saame data that was
originally used as an input in Step I.

V. CONCLUSION
This research paper provides a new annd more reliable
technique for data compression. It solvess the limitations
present in Run Length Encoding Schem me. Problems in
traditional run length encoding are highlightted and discussed
in detail. A solution to each problem is tthen proposed in
modified run length encoding scheme. Foour random input
sequences are taken and analyzed. To makke RLE work we
first use bit stuffing to break larger sequuences and then
ignore single 0’s/1’s and double 0’s/11’s respectively.

672

You might also like