0% found this document useful (0 votes)
37 views11 pages

Information Theory Coding

Uploaded by

Bonnie Mundia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views11 pages

Information Theory Coding

Uploaded by

Bonnie Mundia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

article listings xcolor

1 Question 2

1.1 a) What are the strengths that Welch added to the original method by Lempel
and Ziv?

• Introduced fixed-Length Codes as in the Lempel-Ziv approach, variable-length codes were initially. This
simplified encoding and decoding since each codeword now had a uniform length

• By structuring the compression in a way that required fewer dynamic updates to the dictionary, Welch made
the algorithm more suitable for real-time applications. This efficiency is why the Lempel-Ziv-Welch (LZW)
method became widely used in applications like GIF images and early UNIX file compression.

• Welch’s modifications made the algorithm easier to implement, which contributed to its popularity and adop-
tion in practical applications, including file compression utilities.

• Adaptability as the algorithm adapts to the input data without needing prior knowledge of pattern frequencies.

• The decoder builds the dictionary dynamically, eliminating the need to transmit it explicitly.

1.2 b) Write python or MATLAB scripts for i. LZW encoding and decoding, and ii.
Huffman coding and decoding.

1. LZW encoding and decoding

1 def encoding ( s1 ) :
2 print ( " Encoding " )
3 table = { chr ( i ) : i for i in range (256) } # Initialize table with single - character strings
4 p = ""
5 p += s1 [0]
6 code = 256
7 output_code = []
8 print ( " String \ tOutput_Code \ tAddition " )
9

10 for i in range ( len ( s1 ) ) :


11 if i != len ( s1 ) - 1:
12 c = s1 [ i + 1]
13 else :
14 c = ""
15

1
16 if p + c in table :
17 p = p + c
18 else :
19 print ( f " { p }\ t { table [ p ]}\ t \ t { p + c }\ t { code } " )
20 output_code . append ( table [ p ])
21 table [ p + c ] = code
22 code += 1
23 p = c
24

25 print ( f " { p }\ t { table [ p ]} " )


26 output_code . append ( table [ p ])
27 return output_code
28

29 def decoding ( op ) :
30 print ( " \ nDecoding " )
31 table = { i : chr ( i ) for i in range (256) } # Initialize table with single - character strings
32 old = op [0]
33 s = table [ old ]
34 c = s [0]
35 print (s , end = " " )
36 count = 256
37

38 for i in range (1 , len ( op ) ) :


39 n = op [ i ]
40 if n not in table :
41 s = table [ old ] + c
42 else :
43 s = table [ n ]
44

45 print (s , end = " " )


46 c = s [0]
47 table [ count ] = table [ old ] + c
48 count += 1
49 old = n
50 print ()
51

52 # Test the encoding and decoding


53 s = " WYS * WYGWYS * WYSWYSG "
54 output_code = encoding ( s )
55 print ( " Output Codes are : " , output_code )
56 decoding ( output_code )
57

58

2
Figure 1: Output for LZW encoding and decoding

LZW encoding and decoding


Algorithm Encoding:

• Start with a dictionary of ASCII characters (codes 0-255).

• Read each character in the input string, checking for sequences that already exist in the dictionary.

• If a sequence isn’t in the dictionary, add it, output the code for the previous sequence, and reset to the new
character.

• Continue until the end of the string, outputting codes as needed.

Decoding:

• Start with the same initial dictionary (ASCII codes).

• Read each code and convert it back to characters.

• For any new sequence (not yet in the dictionary), build it using the previous sequence and add it to the
dictionary.

Results The LZW algorithm produces a compressed output as a list of integer codes. When decoded, it reproduces
the original text exactly.
Huffman coding and decoding.
1 import heapq
2 from collections import Counter , namedtuple
3

4 # Define classes for the Huffman Tree nodes

3
5 class Node :
6 def __init__ ( self , left , right ) :
7 self . left = left
8 self . right = right
9 self . frequency = left . frequency + right . frequency
10

11 # Comparison operators for the priority queue


12 def __lt__ ( self , other ) :
13 return self . frequency < other . frequency
14

15 class Leaf :
16 def __init__ ( self , char , frequency ) :
17 self . char = char
18 self . frequency = frequency
19

20 def __lt__ ( self , other ) :


21 return self . frequency < other . frequency
22

23 # Build the Huffman Tree based on character frequencies


24 def b u i l d _ h u f f m a n _ t r e e ( text ) :
25 frequency = Counter ( text )
26 heap = [ Leaf ( char , freq ) for char , freq in frequency . items () ]
27 heapq . heapify ( heap )
28

29 while len ( heap ) > 1:


30 left = heapq . heappop ( heap )
31 right = heapq . heappop ( heap )
32 heapq . heappush ( heap , Node ( left , right ) )
33

34 return heap [0] if heap else None


35

36 # Generate the Huffman codes by traversing the Huffman Tree


37 def g e n e r a t e _ h u f f m a n _ c o d e s ( tree , prefix = " " , codebook ={}) :
38 if isinstance ( tree , Leaf ) :
39 codebook [ tree . char ] = prefix
40 else :
41 g e n e r a t e _ h u f f m a n _ c o d e s ( tree . left , prefix + " 0 " , codebook )
42 g e n e r a t e _ h u f f m a n _ c o d e s ( tree . right , prefix + " 1 " , codebook )
43 return codebook
44

45 # Encoding function
46 def h u f f m a n _ e n c o d i n g ( text ) :
47 tree = b u i l d _ h u f f m a n _ t r e e ( text )
48 codebook = g e n e r a t e _ h u f f m a n _ c o d e s ( tree )
49 encoded_text = " " . join ( codebook [ char ] for char in text )
50 return encoded_text , tree
51

52 # Decoding function

4
53 def h u f f m a n _ d e c o d i n g ( encoded_text , tree ) :
54 decoded_text = []
55 node = tree
56 for bit in encoded_text :
57 node = node . left if bit == " 0 " else node . right
58 if isinstance ( node , Leaf ) :
59 decoded_text . append ( node . char )
60 node = tree
61 return " " . join ( decoded_text )
62

63 # Testing the Huffman encoding and decoding functions


64 text = " Huffman coding is a data compression Algorithm "
65 encoded_text , tree = h u f f m a n _ e n c o d i n g ( text )
66 print ( " Encoded Text : " , encoded_text )
67 decoded_text = h u f f m an _ d e c o d i n g ( encoded_text , tree )
68 print ( " Decoded Text : " , decoded_text )
69

70 # Verify that decoding matches the original text


71 assert text == decoded_text , " The decoded text does not match the original text ! "
72

Figure 2: Output for Huffmann encoding and decoding

Algorithm Tree Construction:

• Count character frequencies in the input.

• Build a tree using characters with lower frequencies closer to the leaves.

Encoding:

• Assign binary codes to each character, with shorter codes for more frequent characters.

• Encode the input string by replacing each character with its binary code.

Decoding:

• Traverse the tree according to each binary bit in the encoded string to retrieve the original characters.

Results Huffman encoding compresses text based on character frequency, usually achieving higher compression
rates than LZW for texts with varying character distributions. The decoding accurately reconstructs the original
text.

5
1.3 c) Take any uncompressed image and compress it using Huffman and LZW meth-
ods. What compression ratios do you get?

Example of raw uncompressed image


The above uncompressed image is used saved in a directory as ‘rawimagefile.NEF’.

1. Using Huffmann compression


1 # Compressing ’ rawimagefile . NEF ’ using Huffmann Compression algorithm
2

3 from PIL import Image


4 import numpy as np
5 import heapq
6 from collections import defaultdict , Counter
7

8 def huffman _encode ( data ) :


9 frequency = Counter ( data )
10 heap = [[ weight , [ symbol , " " ]] for symbol , weight in frequency . items () ]
11 heapq . heapify ( heap )
12

13 while len ( heap ) > 1:


14 lo = heapq . heappop ( heap )
15 hi = heapq . heappop ( heap )
16 for pair in lo [1:]:
17 pair [1] = ’0 ’ + pair [1]

6
18 for pair in hi [1:]:
19 pair [1] = ’1 ’ + pair [1]
20 heapq . heappush ( heap , [ lo [0] + hi [0]] + lo [1:] + hi [1:])
21

22 huff_dict = sorted ( heapq . heappop ( heap ) [1:] , key = lambda p : ( len ( p [ -1]) , p ) )
23 huff_code = { symbol : code for symbol , code in huff_dict }
24 encoded_data = ’ ’. join ( huff_code [ symbol ] for symbol in data )
25

26 return encoded_data , huff_code


27

28 def huffman _decode ( encoded_data , huff_code ) :


29 r e v e r s e _ h u f f _ c o d e = { v : k for k , v in huff_code . items () }
30 current_code = " "
31 deco ded_outp ut = []
32

33 for bit in encoded_data :


34 current_code += bit
35 if current_code in r e v e r s e _ h u f f _ c o d e :
36 character = r e v e r s e _ h u f f _ c o d e [ current_code ]
37 decod ed_outp ut . append ( character )
38 current_code = " "
39

40 return np . array ( decode d_output )


41

42 def c o m p r e s s _ i m a g e _ h u f f m a n ( image_path ) :
43 img = Image . open ( image_path ) . convert ( ’L ’)
44 img_data = np . array ( img ) . flatten () . tolist ()
45 encoded_data , huff_code = huffman_ encode ( img_data )
46 return len ( encoded_data ) / ( len ( img_data ) * 8) , huff_code
47

48 # Example usage :
49 compression_ratio , huff_code = c o m p r e s s _ i m a g e _ h u f f m a n ( ’ rawimg1 . JPG ’)
50 print ( f ’ Huffman Compression Ratio : { c o m p r e s s i o n _ r a t i o } ’)
51

52

Output of Huffmann compression

1 # Compressing ’ rawimagefile . NEF ’ using LZW


2 from PIL import Image
3 import numpy as np
4 def lzw_encode ( data ) :
5 dictionary = { chr ( i ) : i for i in range (256) }
6 p = ""
7 code = 256
8 result = []

7
9 for c in data :
10 pc = p + c
11 if pc in dictionary :
12 p = pc
13 else :
14 result . append ( dictionary [ p ])
15 dictionary [ pc ] = code
16 code += 1
17 p = c
18 if p :
19 result . append ( dictionary [ p ])
20 return result
21

22 def lzw_decode ( encoded_data ) :


23 dictionary = { i : chr ( i ) for i in range (256) }
24 code = 256
25 p = chr ( encoded_data . pop (0) )
26 result = [ p ]
27 for k in encoded_data :
28 if k in dictionary :
29 entry = dictionary [ k ]
30 elif k == code :
31 entry = p + p [0]
32 result . append ( entry )
33 dictionary [ code ] = p + entry [0]
34 code += 1
35 p = entry
36 return ’ ’. join ( result )
37

38 def c o m p r e s s _ i m a g e _ l z w ( image_path ) :
39 img = Image . open ( image_path ) . convert ( ’L ’)
40 img_data = ’ ’. join ( map ( chr , np . array ( img ) . flatten () . tolist () ) )
41 encoded_data = lzw_encode ( img_data )
42 return len ( encoded_data ) * 16 / ( len ( img_data ) * 8)
43

44 # Example usage :
45 c o m p r e s s i o n _ r a t i o = c o m p r e s s _ i m a g e _ l z w ( ’ rawimg1 . JPG ’)
46 print ( f ’ LZW Compression Ratio : { c o m p r e s s i o n _ r a t i o } ’) # Output the compression ratio
47

48

Output of LZW compression


If there are differences in the compression ratios, why is that the case?
Huffman compression is effective with data that has clear frequency patterns, assigning shorter codes to more
frequent symbols. On the other hand, LZW excels with data containing repetitive sequences. It builds a dictionary

8
dynamically as it encodes, which can capture and compress repeated patterns efficiently. In the case, the image
data had more frequency patterns than repetitive sequences, which is why Huffman achieved a better compression
ratio than LZW.

1.4 Compare your compression results with those you get from some freely available
compression software. Explain any differences.

• Compression Ratio: The ratio is calculated by dividing the original file size by the compressed file size.
original uncompressed file size of an image is 2 MB (2048 KB).Compressed File Size (LZW)= Original File
Size/ Compression Ratio = 2048Kb/ 1.25 = 1638kb

• Image Quality: Some compression algorithms (like JPEG) are lossy, meaning they reduce file size by losing
details. In contrast, LZW and Huffman are lossless methods, preserving exact pixel values.

1.5 Decompress the images in (c) and display the results. Did you recover the initial
file sizes? If there are any differences, comment on them

Output of decompressed image using Huffmann compression


Huffman decompression python script

1 from PIL import Image


2 import numpy as np
3 import heapq
4 from collections import Counter
5

6 # Huffman Encoding
7 def huffman _encode ( data ) :
8 frequency = Counter ( data )

9
9 heap = [[ weight , [ symbol , " " ]] for symbol , weight in frequency . items () ]
10 heapq . heapify ( heap )
11

12 while len ( heap ) > 1:


13 lo = heapq . heappop ( heap )
14 hi = heapq . heappop ( heap )
15 for pair in lo [1:]:
16 pair [1] = ’0 ’ + pair [1]
17 for pair in hi [1:]:
18 pair [1] = ’1 ’ + pair [1]
19 heapq . heappush ( heap , [ lo [0] + hi [0]] + lo [1:] + hi [1:])
20

21 huff_dict = sorted ( heapq . heappop ( heap ) [1:] , key = lambda p : ( len ( p [ -1]) , p ) )
22 huff_code = { symbol : code for symbol , code in huff_dict }
23 encoded_data = ’ ’. join ( huff_code [ symbol ] for symbol in data )
24

25 return encoded_data , huff_code


26

27 # Huffman Decoding
28 def huffman _decode ( encoded_data , huff_code ) :
29 r e v e r s e _ h uf f _ c o d e = { v : k for k , v in huff_code . items () }
30 current_code = " "
31 decod ed_outp ut = []
32

33 for bit in encoded_data :


34 current_code += bit
35 if current_code in r e v e r s e _ h u f f _ c o d e :
36 character = r e v e r s e _ h u f f _ c o d e [ current_code ]
37 deco ded_outp ut . append ( int ( character ) )
38 current_code = " "
39

40 return np . array ( decode d_outpu t )


41

42 # Adjust image for higher compression


43 def p r e p r o c e s s _ i m a g e ( image , size =(128 , 128) , qu an t iz e_ l ev el s =16) :
44 img_resized = image . resize ( size )
45 img_data = np . array ( img_resized ) // (256 // q u an ti ze _ le ve ls ) * (256 // q u an ti ze _ le ve ls )
46 return img_data , size # Returning resized shape as well
47

48 # Compress and calculate compression factor


49 def c o m p r e s s _ i m a g e _ h u f f m a n ( image_path ) :
50 img = Image . open ( image_path ) . convert ( ’L ’)
51 img_data , resized_shape = p r e p r o c e s s _ i m a g e ( img )
52 img_data_flat = img_data . flatten () . tolist ()
53 encoded_data , huff_code = huffman_ encode ([ str ( i ) for i in img_data_flat ])
54 o r i g i n a l _ s i z e _ b i t s = len ( img_data_flat ) * 8
55 c o m p r e s s e d _ s i z e _ b i t s = len ( encoded_data )
56 c o m p r e s s i on _ r a t i o = o r i g i n a l _ s i z e _ b i t s / c o m p r e s s e d _ s i z e _ b i t s

10
57 return compression_ratio , encoded_data , huff_code , resized_shape
58

59 # Decompress image
60 def d e c o m p r e s s _ i m a g e _ h u f f m a n ( encoded_data , huff_code , d e c o m p r e s s e d _ s h a p e ) :
61 decoded_data = huffm an_decod e ( encoded_data , huff_code )
62 d e c o m p r e s s e d _ i m g = decoded_data . reshape ( d e c o m p r e s s e d _ s h a p e )
63 return np . clip ( decompressed_img , 0 , 255)
64

65 # Example usage
66 img_path = ’ rawimg1 . JPG ’
67 compression_ratio , encoded_data , huff_code , d e c o m p r e s s e d _ s h a p e = c o m p r e s s _ i m a g e _ h u f f m a n ( img_path )
68 print ( f ’ Huffman Compression Ratio : { c o m p r e s s i o n _ r a t i o :.2 f } ’)
69

70 # Decompress and save the image


71 d e c o m p r e s s e d _ i mg = d e c o m p r e s s _ i m a g e _ h u f f m a n ( encoded_data , huff_code , d e c o m p r e s s e d _ s h a p e )
72 Image . fromarray ( np . uint8 ( d e c o m p r e ss e d _ i m g ) ) . save ( ’ d e c o m p r e s s e d _ h u f f m a n . bmp ’)
73 print ( " Decompression completed and image saved . " )
74

75

1.6 f ) Suggest and implement any approach you would use to increase the compres-
sion ratio. Compare your results with what you had achieved earlier in (c).

• To improve compression ratios, use the Deflate algorithm, which combines LZW and Huffman coding. This
approach leverages the strengths of both algorithms: LZW handles repeated patterns well, while Huffman
encoding compresses based on frequency patterns.

• Downsampling (for images): Reducing the resolution or bit depth of the image (downsampling) can decrease
file size significantly with minimal visible impact, particularly if the original image has high resolution. For
example, reducing a grayscale image from 8-bit depth to 4-bit depth decreases file size by half.

• Hybrid Compression: Use a combination of multiple algorithms. For instance, apply Run-Length Encoding
(RLE) on the image data first, followed by Huffman or LZW encoding. RLE compresses consecutive repeating
values efficiently, especially useful for images with large uniform areas.

11

You might also like