0% found this document useful (0 votes)
28 views

Lecture19 PDF

Dictionary coding is a lossless data compression technique that does not use statistical knowledge of the data. It works by building a dictionary of strings found in the input and encoding repeated substrings with the index of that string in the dictionary. The encoder transmits the dictionary indexes while the decoder reconstructs the dictionary to decode the input. Examples of dictionary coding algorithms include LZW, LZ77, and Sequitur, which are used in applications like gzip.

Uploaded by

Aadil Nazir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Lecture19 PDF

Dictionary coding is a lossless data compression technique that does not use statistical knowledge of the data. It works by building a dictionary of strings found in the input and encoding repeated substrings with the index of that string in the dictionary. The encoder transmits the dictionary indexes while the decoder reconstructs the dictionary to decode the input. Examples of dictionary coding algorithms include LZW, LZ77, and Sequitur, which are used in applications like gzip.

Uploaded by

Aadil Nazir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Dictionary Coding

• Does not use statistical knowledge of data.


• Encoder: As the input is processed develop a
Dictionaries for Data Compression dictionary and transmit the index of strings
found in the dictionary.
• Decoder: As the code is processed
CSE 326 reconstruct the dictionary to invert the
process of encoding.
Autumn 2005
• Examples: LZW, LZ77, Sequitur,
Lecture 19
• Applications: Unix Compress, gzip, GIF

Dictionary Data Compression - Lecture 19 2

LZW Encoding Algorithm LZW Encoding Example (1)


Dictionary
ababababa
Repeat 0 a
find the longest match w in the dictionary 1 b
output the index of w
put wa in the dictionary where a was the
unmatched symbol

Dictionary Data Compression - Lecture 19 3 Dictionary Data Compression - Lecture 19 4

LZW Encoding Example (2) LZW Encoding Example (3)


Dictionary Dictionary
ababababa ababababa
0 a 0 0 a 01
1 b 1 b
2 ab 2 ab
3 ba

Dictionary Data Compression - Lecture 19 5 Dictionary Data Compression - Lecture 19 6

1
LZW Encoding Example (4) LZW Encoding Example (5)
Dictionary Dictionary
ababababa ababababa
0 a 01 2 0 a 01 2 4
1 b 1 b
2 ab 2 ab
3 ba 3 ba
4 aba 4 aba
5 abab

Dictionary Data Compression - Lecture 19 7 Dictionary Data Compression - Lecture 19 8

LZW Encoding Example (6) LZW Decoding Algorithm


• Emulate the encoder in building the dictionary.
Dictionary
ababababa Decoder is slightly behind the encoder.
0 a 01 2 4 3
1 b
2 ab initialize dictionary;
3 ba decode first index to w;
4 aba put w? in dictionary;
5 abab repeat
decode the first symbol s of the index;
complete the previous dictionary entry with s;
finish decoding the remainder of the index;
put w? in the dictionary where w was just decoded;

Dictionary Data Compression - Lecture 19 9 Dictionary Data Compression - Lecture 19 10

LZW Decoding Example (1) LZW Decoding Example (2a)


Dictionary Dictionary
012436 012436
0 a a 0 a a b
1 b 1 b
2 a? 2 ab

Dictionary Data Compression - Lecture 19 11 Dictionary Data Compression - Lecture 19 12

2
LZW Decoding Example (2b) LZW Decoding Example (3a)
Dictionary Dictionary
012436 012436
0 a a b 0 a a ba
1 b 1 b
2 ab 2 ab
3 b? 3 ba

Dictionary Data Compression - Lecture 19 13 Dictionary Data Compression - Lecture 19 14

LZW Decoding Example (3b) LZW Decoding Example (4a)


Dictionary Dictionary
012436 012436
0 a a b ab 0 a a b ab a
1 b 1 b
2 ab 2 ab
3 ba 3 ba
4 ab? 4 aba

Dictionary Data Compression - Lecture 19 15 Dictionary Data Compression - Lecture 19 16

LZW Decoding Example (4b) LZW Decoding Example (5a)


Dictionary Dictionary
012436 012436
0 a a b ab aba 0 a a b ab aba b
1 b 1 b
2 ab 2 ab
3 ba 3 ba
4 aba 4 aba
5 aba? 5 abab

Dictionary Data Compression - Lecture 19 17 Dictionary Data Compression - Lecture 19 18

3
LZW Decoding Example (5b) LZW Decoding Example (6a)
Dictionary Dictionary
012436 012436
0 a a b ab aba ba 0 a a b ab aba ba b
1 b 1 b
2 ab 2 ab
3 ba 3 ba
4 aba 4 aba
5 abab 5 abab
6 ba? 6 bab

Dictionary Data Compression - Lecture 19 19 Dictionary Data Compression - Lecture 19 20

LZW Decoding Example (6b) Decoding Exercise


Dictionary
012436 Base Dictionary
0 a a b ab aba ba bab 0 1 4 0 2 0 3 5 7
1 b 0 a
2 ab 1 b
3 ba 2 c
4 aba 3 d
5 abab 4 r
6 bab
7 bab?

Dictionary Data Compression - Lecture 19 21 Dictionary Data Compression - Lecture 19 22

Trie Data Structure for Encoder’s Encoder Uses a Trie (1)


Dictionary
• Fredkin (1960) 0 1 2 3 4
a b c d r
0 a 9 ca 0 1 2 3 4
1 b 10 ad a b c d r b 5 c 8 d 10 r 6 a 9 a 11 a 7
2 c 11 da
3 d 12 abr
4 r 13 raa b 5 c 8 d 10 r 6 a 9 a 11 a 7 r 12 a 13
5 ab 14 abra
6 br r 12 a 13 a 14
7 ra
8 ac a 14 abracadabraabracadabra
0 1 4 0 2 0 3 5 7 12

Dictionary Data Compression - Lecture 19 23 Dictionary Data Compression - Lecture 19 24

4
Encoder Uses a Trie (2) Decoder’s Data Structure
0 1 2 3 4 • Simply an array of strings
a b c d r
0 a 9 ca
b 5 c 8 d 10 r 6 a 9 a 11 a 7 1 b 10 ad
2 c 11 da 0 1 4 0 2 0 3 5 7 12 8 ...
r 12 a 15 a 13 3 d 12 abr a b r a c a d ab ra abr
4 r 13 raa
5 ab 14 abr?
a 14
6 br
7 ra
abracadabraabracadabra 8 ac
0 1 4 0 2 0 3 5 7 12 8

Dictionary Data Compression - Lecture 19 25 Dictionary Data Compression - Lecture 19 26

Bounded Size Dictionary Implementing the LRV Strategy


Doubly linked queue
• Bounded Size Dictionary Least Recent
0 1 2 3 4 Circular sibling lists
a b c d r Parent pointers
– n bits of index allows a dictionary of size 2n
– Doubtful that long entries in the dictionary will be b 5 c 8 d 10 r 6 a 9 a 11 a 7
useful.
• Strategies when the dictionary reaches its limit. r 12 a 13

1. Don’t add more, just use what is there.


a 14
2. Throw it away and start a new dictionary.
3. Double the dictionary, adding one more bit to indices. abracadabraabracadabra
4. Throw out the least recently visited entry to make Most Recent 0 1 4 0 2 0 3 5 7 12
room for the new entry.
Dictionary Data Compression - Lecture 19 27 Dictionary Data Compression - Lecture 19 28

Implementing the LRV Strategy Notes on LZW


Doubly linked queue • Extremely effective when there are repeated
Least Recent
0 1 2 3 4 Circular sibling lists
a b c d r Parent pointers patterns in the data that are widely spread.
• Negative: Creates entries in the dictionary
b 5 c 8 d 10 a 9 a 11 a 7
that may never be used.
r 12 a 6 a 13 • Applications:
– Unix compress, GIF, V.42 bis modem standard
a 14

abracadabraabracadabra
Most Recent 0 1 4 0 2 0 3 5 7 12 8

Dictionary Data Compression - Lecture 19 29 Dictionary Data Compression - Lecture 19 30

5
LZ77 Solution A
• Ziv and Lempel, 1977 • If xn+1xn+2...xn+k is a substring of x1x2...xn then
• Dictionary is implicit xn+1xn+2...xn+k can be coded by <j,k> where j is
• Use the string coded so far as a dictionary. the beginning of the match.
• Given that x1x2...xn has been coded we want • Example
to code xn+1xn+2...xn+k for the largest k ababababa babababababababab....
possible. coded
ababababa babababa babababab....
<2,8>

Dictionary Data Compression - Lecture 19 31 Dictionary Data Compression - Lecture 19 32

Solution A Problem Solution B


• What if there is no match at all in the • If xn+1xn+2...xn+k is a substring of x1x2...xn and
dictionary? xn+1xn+2... xn+kxn+k+1 is not then xn+1xn+2...xn+k
ababababa cabababababababab.... xn+k+1 can be coded by
coded <j,k, xn+k+1 >
• Solution B. Send tuples <j,k,x> where where j is the beginning of the match.
– If k = 0 then x is the unmatched symbol • Examples
– If k > 0 then the match starts at j and is k long and
the unmatched symbol is x. ababababa cabababababababab....
ababababa c ababababab ababab....
<0,0,c> <1,9,b>

Dictionary Data Compression - Lecture 19 33 Dictionary Data Compression - Lecture 19 34

Solution B Example Surprise Code!


a bababababababababababab..... a bababababababababababab$
<0,0,a> <0,0,a>
a b ababababababababababab..... a b ababababababababababab$
<0,0,b> <0,0,b>
a b aba bababababababababab..... a b ababababababababababab$
<1,2,a> <1,22,$>

a b aba babab ababababababab.....


<2,4,b>
a b aba babab abababababa bab.....
<1,10,a>

Dictionary Data Compression - Lecture 19 35 Dictionary Data Compression - Lecture 19 36

6
Surprise Decoding Surprise Decoding
<0,0,a><0,0,b><1,22,$> <0,0,a><0,0,b><1,22,$>

<0,0,a> a <0,0,a> a
<0,0,b> b <0,0,b> b
<1,22,$> a <1,22,$> a
<2,21,$> b <2,21,$> b
<3,20,$> a <3,20,$> a
<4,19,$> b <4,19,$> b
... ...
<22,1,$> b <22,1,$> b
<23,0,$> $ <23,0,$> $

Dictionary Data Compression - Lecture 19 37 Dictionary Data Compression - Lecture 19 38

Solution C In Class Exercise


• The matching string can include part of itself! • Use Solution C to code the string
• If xn+1xn+2...xn+k is a substring of – abaabaaabaaaab$
x1x2...xn xn+1xn+2...xn+k
that begins at j < n and xn+1xn+2... xn+kxn+k+1 is
not then xn+1xn+2...xn+k xn+k+1 can be coded by
<j,k, xn+k+1 >

Dictionary Data Compression - Lecture 19 39 Dictionary Data Compression - Lecture 19 40

Bounded Buffer – Sliding Window Search in the Sliding Window


• We want the triples <j,k,x> to be of bounded size. To offset length
achieve this we use bounded buffers. aaaabababaaab$ 1 0
– Search buffer of size s is the symbols xn-s+1...xn
j is then the offset into the buffer. aaaabababaaab$ 2 1
– Look-ahead buffer of size t is the symbols xn+1...xn+t
• Match pointer can start in search buffer and go into the aaaabababaaab$ 2 2
look-ahead buffer but no farther.
aaaabababaaab$ 2 3
match pointer uncoded text pointer
Sliding window
tuple aaaabababaaab$ 2 4
aaaabababaaab$ <2,5,a> tuple
search buffer look-ahead buffer
aaaabababaaab$ 2 5 <2,5,a>
coded uncoded

Dictionary Data Compression - Lecture 19 41 Dictionary Data Compression - Lecture 19 42

7
Coding Example Coding the Tuples
s = 4, t = 4, a = 3
• Simple fixed length code
tuple log2 (s + 1) + log2 (s + t + 1) + log2a
aaaabababaaab$ <0,0,a>
aaaabababaaab$ <1,3,b> tuple fixed code
s = 4, t = 4, a = 3
aaaabababaaab$ <2,5,a> 010 0101 00
<2,5,a>
aaaabababaaab$ <4,2,$> • Variable length code using adaptive Huffman
or arithmetic code on Tuples
– Two passes, first to create the tuples, second to
code the tuples
– One pass, by pipelining tuples into a variable
length coder
Dictionary Data Compression - Lecture 19 43 Dictionary Data Compression - Lecture 19 44

Zip and Gzip Example


12
• Search Window
aaaabababaaabaaaababababaaabba$
– Search buffer 32KB
– Look-ahead buffer 258 Bytes 9
• How to store such a large dictionary Offset =12 – 8 = 4
aba 8 6 4 Length = 5
– Hash table that stores the starting positions for all
three byte sequences. Tuple = <4,5,a>
7 5
– Hash table uses chaining with newest entries at the
beginning of the chain. Stale entries can be ignored. 11 3
• Second pass for Huffman coding of tuples.
• Coding done in blocks to avoid disk accesses.
10 2 1

Dictionary Data Compression - Lecture 19 45 Dictionary Data Compression - Lecture 19 46

Example Notes on LZ77


18
• Very popular especially in unix world
aaaabababaaabaaaababababaaabba$
• Many variants and implementations
13 9 – Zip, Gzip, PNG, PKZip,Lharc, ARJ
17 12 8 6 4 • Tends to work better than LZW
bab – LZW has dictionary entries that are never used
7 5
No match – LZW has past strings that are not in the dictionary
16 11 3 Tuple = <0,0,b>
– LZ77 has an implicit dictionary. Common tuples
are coded with few bits.

15 14 10 2 1

Dictionary Data Compression - Lecture 19 47 Dictionary Data Compression - Lecture 19 48

You might also like