0% found this document useful (0 votes)
14 views

ADA UNIT 3 Complete Notes

Chapter 6 discusses space-time trade-offs in computer science, highlighting how problems can be solved faster with more storage or vice versa. It covers concepts like precomputing, prestructuring, and specific algorithms such as Counting Sort and Horspool's Algorithm for string matching. Additionally, it explains hashing, hash tables, and the characteristics of good hash functions, including collision handling.

Uploaded by

ahmedfaraz1102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

ADA UNIT 3 Complete Notes

Chapter 6 discusses space-time trade-offs in computer science, highlighting how problems can be solved faster with more storage or vice versa. It covers concepts like precomputing, prestructuring, and specific algorithms such as Counting Sort and Horspool's Algorithm for string matching. Additionally, it explains hashing, hash tables, and the characteristics of good hash functions, including collision handling.

Uploaded by

ahmedfaraz1102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Chapter 6

Space and Time Trade-offs


Definition:
In computer science, a space time trade off is a way of solving a problem in less time by using
more storage space or by solving a problem in very little space by spending a long time.
Example: Hash table data structure is used for fast lookup. However, the size of the hash table
is large.
Precomputing or Input Enhancement:
Precomputing or Input enhancement is a strategy in algorithm design where certain
calculations are performed in advance, before the main execution of the algorithm. It requires
additional storage space to hold the precomputed values.
Example:
1. Counting methods for sorting
2. Boyer-Moore Algorithm for string matching
3. Horspool’s algorithm
Prestructuring:
Prestructuring is a method used in algorithm design that organizes data in a specific way in
advance to facilitate faster or more flexible access to the data during the algorithm’s execution.
This makes faster access but it occupies more space.
Example:
1. Hash table
2. Binary Search tree
SORTING BY COUNTING
Sorting by counting is an algorithm that uses precomputing or input enhancement to sort a list of
numbers. The basic idea is to count for each element of the list, the total number of elements
smaller than that element and record the results in the table. These numbers will indicate the
positions of the elements in the sorted list.
Algorithm:

CountingSort(A[0......n-1])
{
for i=0 to n-1
count[i]=0
for i=0 to n-2
{
for j=i+1 to n-1
{
if A[i]<A[j]
count[j]=count[j]+1
else
count[i]=count[i]+1
}
}
for i=0 to n-1
R[count[i]]=A[i]
}
Example:
Consider an array A=[4,3,3,2,1]
Step 1: initialize a count[ ] as same as size of A[ ] and set all elements as 0.
Step 2: compare the elements A[i] and A[j] where i < j
When i=0
• For j=1 (A[0]=4, A[1]=3) since A[0]>A[1] ⸫ increment count[0]. Now count=[1,0,0,0,0]
• For j=2 (A[0]=4, A[2]=3) since A[0]>A[2] ⸫ increment count[0]. Now count=[2,0,0,0,0]
• For j=3 (A[0]=4, A[3]=2) since A[0]>A[3] ⸫ increment count[0]. Now count=[3,0,0,0,0]
• For j=4 (A[0]=4, A[4]=1) since A[0]>A[4] ⸫ increment count[0]. Now count=[4,0,0,0,0]
When i=1
• For j=2 (A[1]=3, A[2]=3) since A[1]<A[2] ⸫ increment count[2]. Now count=[4,0,1,0,0]
• For j=3 (A[1]=3, A[3]=2) since A[1]>A[3] ⸫ increment count[1]. Now count=[4,1,1,0,0]
• For j=4 (A[1]=3, A[4]=1) since A[1]>A[4] ⸫ increment count[1]. Now count=[4,2,1,0,0]
When i=2
• For j=3 (A[2]=3, A[3]=2) since A[2]>A[3] ⸫ increment count[2]. Now count=[4,2,2,0,0]
• For j=4 (A[2]=3, A[4]=1) since A[2]>A[4] ⸫ increment count[2]. Now count=[4,2,3,0,0]
When i=3
• For j=4 (A[3]=2, A[4]=1) since A[3]>A[4] ⸫ increment count[3]. Now count=[4,2,3,1,0]
Step 3: place each element in R array at the position specified by count[i]
• For i=0, count[0]=4 ⸫ R[4]=A[0]=4
• For i=1, count[1]=2 ⸫ R[2]=A[1]=3
• For i=2, count[2]=3 ⸫ R[3]=A[2]=3
• For i=3, count[3]=1 ⸫ R[1]=A[3]=2
• For i=4, count[4]=0 ⸫ R[0]=A[4]=1
Hence the sorted array R=[1,2,3,3,4]
Complexity:
The efficiency of counting sorting technique is
𝑛−2 𝑛−1

𝑇(𝑛) = ∑ ∑ 1
𝑖=0 𝑗=𝑖+1

𝑛−2 𝑛−2

= ∑[(𝑛 − 1) − (𝑖 + 1) + 1] = ∑(𝑛 − 1 − 𝑖)
𝑖=0 𝑖=0

= (𝑛 − 1) + (𝑛 − 2) + ⋯ + 1
𝑖. 𝑒. 𝑆𝑢𝑚 𝑜𝑓 𝑓𝑖𝑟𝑠𝑡 (𝑛 − 1)𝑛𝑎𝑡𝑢𝑟𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟𝑠
(𝑛 − 1)(𝑛 − 1 + 1) (𝑛 − 1)𝑛
𝑇(𝑛) = =
2 2
∴ 𝑇(𝑛) = 𝑂(𝑛2 )
Advantages:
• Limited scope
• Inefficient for large ranges
DISTRIBUTION COUNTING
Distribution counting is a variation of the counting sort algorithm that can be used to sort an
array of integers with a known range of values. The basic idea is to compute the frequency of
each value in the input array and use this information to determine the position of each element
in the sorted array.
Algorithm: 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡𝑖𝑛𝑔(𝑨[𝟎 … … 𝒏 − 𝟏], 𝒍, 𝒖)
{
for j=0 to u-l do //initialize frequencies
D[j]=0
for i=0 to n-1 do // compute frequencies
D[A[i]-l]=D[A[i]-l]+1
for j=1 to u-l do //reuse for distribution
D[j]=D[j-1]+D[j]
for i=n-1 to 0 do
{
j=A[i]-1
S[D[j]-1]=A[i]
D[j]=D[j]+1
}
}
Example:
Consider
A= 21 21 23 21 22 23 22 23
The frequency and distribution array are shown below.
Distinct array element 21 22 23
Frequency array 3 2 3
Distribution array 3 5 8
The right most element A is 23 its corresponding distribution array element is 8. So 23 is placed
in S[8-1]=S[7]
S= 23
S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7]

Moving towards left, the next element is 22, its corresponding distribution array element is 5. So
22 is placed in S[5-1]=S[4]
S= 22 23
S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7]
Moving towards left, the next element is 23, its corresponding distribution array element is 8. But
it already arrived once. Therefore, 23 is placed in S[8-2]=S[6]
S= 22 23 23
S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7]
Moving towards left, the next element is 22, its corresponding distribution array element is 5. But
it already arrived once. Therefore, 22 is placed in S[5-2]=S[3]
S= 22 22 23 23
S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7]
Moving towards left, the next element is 21, its corresponding distribution array element is 3. So
21 is placed in S[3-1]=S[2]
S= 21 22 22 23 23
S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7]
The next element is 23, its corresponding distribution array element is 8. But it already arrived
twice. Therefore, 23 is placed in S[8-3]=S[5]
S= 21 22 22 23 23 23
S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7]
The next element is 21, its corresponding distribution array element is 3. But it already arrived
once. Therefore, 21 is placed in S[3-2]=S[1]
S= 21 21 22 22 23 23 23
S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7]
The next element is 21, its corresponding distribution array element is 3. But it already arrived
twice. Therefore, 21 is placed in S[3-3]=S[0]
S= 21 21 21 22 22 23 23 23
S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7]
Complexity: O(n+u)
INPUT ENHANCEMENT IN STRING MATCHING
String matching is finding an occurrence of a given string of ‘m’ characters called the pattern
in a string of ‘n’ characters called the text.
Horspool’s Algorithm:
The Horspool algorithm, suggested by R.N.Horspool, is a simplification of the Boyer-Moore
algorithm and is used for string matching.
Horspool’s algorithm pre-processes the pattern to create a table of shifts for every character. The
shift value is determined by the rightmost position of the character in the pattern is length of the
pattern and remaining characters shift value is calculated using
Shift value of a character=Length of pattern-Index of the character-1
1. Preprocessing step
Create a table with the size of the pattern. Calculate the shift value using the following
formula
Shift value of a character=Length of pattern-Index of the character-1
2. Search step
Start comparing the pattern with the text from the right end of the pattern. If all the characters
match, we have found an occurrence of the pattern. If a mismatch occurs, look at the
mismatched character in the text, shift the pattern according to the shift table, and continue
with the comparison.

Algorithm:
ShiftTable(P[0....m-1)
for j=0 to size-1 do //initialize the table
TB[j]=m
for j=0 to m-2 do
TB[P[j]] m-j-1

Algorithm:
HorspoolMatching(P[0...m-1],T[0...n-1])
i=m-1
while i≤ n-1 do
k=0
while k ≤ m − 1 and P[m − 1 − k] = T [i − k] do
k=k+1
if k= = m
return i − m + 1
else
i = i + TB[T [i]]
return −1

Example:
Consider the Text (T) as “THATCOLOURISNOTBROWN” and the Pattern (P) as
“BROWN”
Character B R O W N *
Shift value 4 3 2 1 5 5

T= T H A T C O L O U R I S N O T B R O W N

P= B R O W N
Here, the last character of pattern N is not equal to C in the text and there is no C in the pattern
then the entire pattern is to be shifted by its length size (5) towards right.
T= T H A T C O L O U R I S N O T B R O W N

P= B R O W N
Here, the last character of pattern N is not equal to R in the text. But R is present in the pattern
and its shift value is 3 therefore the entire pattern is shifted by 3
T= T H A T C O L O U R I S N O T B R O W N

P= B R O W N
Here, the last character of pattern N is equal to N in the text. Next compare the character W with
S. It is mismatch and S is not present in pattern therefore the entire pattern is shifted by 5
T= T H A T C O L O U R I S N O T B O W NR

P= B R O W N

Here, the last character of pattern N is not equal to O in the text. But O is present in the pattern
and its shift value is 2 therefore the entire pattern is shifted by 2
T= T H A T C O L O U R I S N O T B R O W N

P= B R O W N
All the characters of the pattern matches with corresponding characters in text.
Complexity

Time Complexity Space Complexity


Best Case O(m/n) O(m)
Worst Case O(mn) O(m)
Hashing
Definition

Hashing is the process of mapping keys to their appropriate locations in hash table. Hashing
in data structure is a two-way process
1. The hash function converts the item into a small integer or hash value
2. This hash value is used to store the data in a hash table.

Hash Table
Hash table is a data structure which stores data in an associative manner.

Hash function
The function of converting the key into table or array index is called hash function. It is
denoted by H.
Or
A hash function is a mathematical formula which when applied to a key that produces an
integer which can be used as an index for the key in the hash table.

Characteristics of Good hash function


1. Easy to Compute
2. Uniform distribution
3. Less collisions
4. High load factor
Hash Collision
Whenever there is more than one key that point to the same slot in the hash table, this
phenomenon is called collision.
For example, let keys be 11, 12, 23, 42, 51 and let the hash function be h(k) =k mod 10
h(16)=16 mod 10 = 6
h(12)= 12 mod 10 = 2
h(23)= 23 mod 10 = 3
h(42)= 42 mod 10 = 2
h(51)= 51 mod 10 = 1
Thus, both the keys 12 and 42 are generating the same index. So which value will we store in
that particular index as we can store only one of them.

Types of hash Function


1. Division method
2. Mid square method
3. Folding method
4. Mixed method

Division Method
This method divides x by M and then it uses remainder as the index of the key in hash table.
ℎ(𝑥) = 𝑥 𝑚𝑜𝑑 𝑀
Example
H(1675)= 1675 mod 97 =1675 % 97 = 26
H(2432)= 2432 mod 97 = 2432 % 97 = 07
H(5209)= 5209 mod 97 = 5209 % 97 = 68
Mid Square Method
In this method, we square the key first, then we take some digits from the middle of this number
as the generating address.

Example

K 1675 2432 5209


K2 2805625 5914624 27133681
H(K) 56 46 36
The third and fourth digits, counting from the right are chosen for the generating hash address.
Folding Method
The key is broken into pieces and then adding all of them to get the hash address. Each piece
should have the same number of digits except the last piece.
𝐻(𝐾) = 𝐾1 + 𝐾2 + ⋯ + 𝐾𝑛
Example
H(1675) = 16 + 75 = 91
H(2432) = 24 + 32 = 56
H(5209) = 52 + 09 = 61
H(8677)= 86 + 77 = 163 = 63

Mixed Method
If we use more than one type of hash function for generating address in the hash table, then it
is called as Mixed method.
Consider the following example with 8 digit key 27862123
i) Folding method: H (27862123) = 27 + 8621 + 23 = 8671
ii) Division method: H ( 8671 ) = 8671 % 97 = 38

Collision Resolution
A method used to solve the problem of collision is called collision resolution technique. The
techniques are:

1. Open Addressing
2. Chaining
1. Open Addressing
The process of examining memory locations in the hash table is called probing. Open
addressing computes new position using a probe sequence and the next record is stored in that
position. Open addressing can be implemented using
i. Linear probing
ii. Quadratic probing
iii. Double probing
iv. Rehashing
i. Linear probing
This hashing technique find the hash key through hash function and maps the key on
particular position in the hash table. In case if the key has same hash address, then it
will find the next empty position in the hash table.
Example
Consider a hash table with some elements.
25, 46, 10, 36, 18, 29 and 43 and the “table size” is 11
Here, 25 will be inserted at the 3rd position in the array. Next 46 will be inserted at
2nd position and 10 will be inserted at 10th position. Now 36 has same hash address 3
that is already by 25. So it will be inserted in the next free place which is 4th position.
Similarly, 18 and 29 also has the same hash address 7. So 18 will be inserted at 7th
position and 29 will be inserted at 8th position which is free. Now again 43 has the
samehash address 10 that is already occupied by 10. So 43 will be inserted at the next
free place which is 0th position.

Disadvantage:
• Records tend to cluster I.e., if half the table is full then it is difficult to find
freespace
ii. Quadratic probing
In quadratic probing, the location of insertion and searching takes place in (a+i2)
wherei=0,1,2,…. that is, at the location of a, a+1, a+4, a+9….. So it will decrease
cluster problem. If the table size is prime number, then it will not search half of the
hash tablepositions.

iii. Double probing


The generated address of a hash function ‘H’ is ‘a’ then in case of collision we
willagain do the hashing of this hash function.

Disadvantage
• Requires two times calculation of hash function
iv. Rehashing
If hash table is full then we will use a new hash function and insert all the elements
of the previous hash table one by one and calculate the hash key with new hash
function and insert them into the new hash table. This technique is called rehashing.

2. Chaining
Collision resolution by chaining combines linked representation with hash table. Whentwo or
more records hash to the same location, these records are constituted into a
singly-linked list called a chain. In chaining, we store all the values with the same indexwith the
help of a linked list
Example:

Operations on a chained hash table


• Insertion in a chained hash table
• Deletion from a chained hash table
• Searching in a chained hash table

Disadvantage of chaining
• Linked list requires extra pointer
Chapter 7
DYNAMIC PROGRAMMING

Dynamic programming is an algorithm design technique to solve complex problems by


breaking them into smaller parts. It is the process of planning and making a sequence of
choices. Choices can change based on the current situation.
Examples of Dynamic programming :
• Budgeting and expense tracking
• Travel planning
• Navigation apps
• Video games
Difference between Divide and Conquer vs Dynamic programming
SI No Divide and Conquer Dynamic Programming
1 Follows Top-down approach Follows Bottom-up approach
2 Used to solve decision problem Used to solve optimization problem
3 Solution of subproblem is The solution of subproblems is computed once
computed recursively more than and stored in a table for later use.
once.
4 Less memory is required More memory is required to store
subproblems for later use.
5 Breaks problem intonon- Breaks problem into overlapping sub-
overlapping sub-problems, solves problems, saves the solution of these sub-
sub-problems independently, and problems to avoid repetitive work
then combines to solve the original
problem.
6 Sub-problem are independent and Sub-problems are not independent, solutions
do not affect each other to the current sub-problem may use solutions
from previous sub-problems.
7 Example Example
Merge Sort Knapsack problem
Binary Search All Pair shortest path

Applications of Dynamic Programming :


• Computer Science
• Operations Research
• Artificial Intelligence
• Bioinformatics
• Computer Graphics
• Natural Language Processing
• Networking
• Mathematics
Process of solving problems using dynamic programming :
• Identifying the subproblems
• Establishing the recurrence that relates subproblems
• Recognizing and solving the base cases
• Build up the solution
• Get the final solution
BINOMIAL COEFFICIENT :
A binomial coefficient C(n, k) gives the number of ways, that k objects can be chosen from
among n objects without considering the order. It is the number of k-element subsets (or k-
combinations) of a n-element set.
𝑛!
𝐶(𝑛, 𝑘) =
𝑘! ∗ (𝑛 − 𝑘)!
Example :
Consider n = 5 and k = 2, calculate C(n,k) using brute force approach.
5!
𝐶(5,2) =
2! ∗ (5 − 2)!
5!
=
2! ∗ 3!
5∗4∗3∗2∗1
=
2∗1∗3∗2∗1

= 10
Computation of the Binomial Coefficient using Dynamic Programming
Computation of the binomial co-efficient is a classic example of applying dynamic
programming to a non optimization problem. The calculation of binomial coefficient is as
follows:
1 𝑖𝑓 𝑘 = 0 𝑜𝑟 𝑘 = 𝑛
𝐶(𝑛, 𝑘) = {𝐶(𝑛 − 1, 𝑘 − 1) + 𝐶(𝑛 − 1, 𝑘) 𝑖𝑓 0 < 𝑘 < 𝑛 }
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Algorithm:
Binomial_Coefficient(n,k)
for i=0 to n do
for j=0 to min(i,k) do
if j = = 0 or j = = k
c[i,j]=1
else
C[i,j]=C[i-1,j-1]+C[i-1,j]
return C[n,k]
The Binomial_Coefficient algorithm comptes C(n,k) using dynamic programming
technique. Here, input parameters n and k are positive integers where n ≥ k and it return
C[n,k] as output
Example
Find C(4,3) using dynamic programming.
Step 1 Make a table of n +1 rows and k +1 columns ( 5 rows and 4 columns )
0 1 2 3
0
1
2
3
4
Step 2 Set the first column entries and diagonal entries as 1
0 1 2 3
0 1
1 1 1
2 1 1
3 1 1
4 1
Step 3 Set the remaining entries using the formula C[i,j]=C[i-1,j-1]+C[i-1,j]

0 1 2 3
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4
Time complexity : O(n*k)
Space complexity : O(n*k)
PRINCIPLE OF OPTIMALITY :
The principle of optimality is a key concept in Dynamic programming that involves breaking
down a complex problem into smaller subproblems and solving each subproblem optimally.
OPTIMAL BINARY SEARCH TREE:
A binary search tree is a data structure that allows for efficient searching, insertion, and deletion
of elements. It is a binary tree where each node has at most two children and the left child of a
node contains elements that are smaller than the node, while the right child contains elements
that are greater than the node. This property allows for efficient searching as we can eliminate
half of the remaining elements at each step of the search.
An Optimal Binary Search Tree (OBST), also known as a Weighted Binary Search Tree, is
a binary search tree that minimizes the expected search cost.
An optimal binary search tree is a binary search tree for which the nodes are arranged on
levels such that the tree cost is as small as possible. Binary search trees are used to store data
in a way that makes it quick to look up, add and remove items.
Each search in a BST has a cost. The cost is equal to the depth of the node(the number of edges
from the root to the node). If the tree has fewer levels then the cost will be less else it will be
high.
The total cost of a binary search tree is calculated as the sum of the cost of accessing each
node multiplied by the probability of accessing each node.
Example 1:
Input: keys[] = {10, 12, 20}, freq[] = {34, 8, 50}
There can be following possible BSTs

10 12 20

12 12
10 20

20 10

Tree 1 Tree 2 Tree 3

10
20

20 10

12 12

Tree 4 Tree 5

Among all possible BSTs, the cost of the fifth BST is minimum.
Cost of the first BST is 1*34 + 2*8 + 3*54 = 34 + 16 + 150 = 142
Cost of the second BST is 1*8 + 2*34 + 2*50 = 176
Cost of the third BST is 1*50 + 2*8 + 3*34 = 168
Cost of the fourth BST is 1*34 + 2*50 + 3*8 = 158
Cost of the fifth BST is 1*50 + 2*34 + 3*8 = 142
Thus the fifth tree has the minimum cost and it is the optimal binary search tree.

Finding the optimal Binary Search tree using Dynamic Programming


Algorithm:
OptimalBST(P[1....n])
for i  1 to n do
C[i,i-1]=0
C[i,i]=p[i]
R[i,i]=i
C[n+1,n]=0
for d=1 to n-1 do
for i=1 to n-d do
j=i+d
minval  ∞
for k=i to j do
if C[i,k-1]+C[k+1,j] < minval
minval=C[i,k-1]+C[k+1,j]
kmin=k
R[i,j]=kmin
sum=P[i]
for S  i+1 to j do
sum=sum+P[s]
C[i,j]=minval+sum
return C[1,n], R
The above algorithm find an optimal binary search tree by dynamic programming. It takes an
array P[1…n] of search probabilities for a sorted list of n keys and it provides the optimal
BST and table R of subtrees.
Example
Input: keys[] = {10, 12, 20}, freq[] = {34, 8, 50}
Step 1: Identify the input
Keys :{10,12,20}
Probabilities: {34,8,50}
Step 2: Initialize tables C and R each with 1 to n+1 rows and 0 to n columns

C Table R Table
0 1 2 3 0 1 2 3
1 1
2 2
3 3
4 4
Step 3: Initialize C and R tables
C[i][i]=P[i] for all i
R[i][i]=i for all i
C[i][i-1]=0 for all i
C[n][n-1]=0
C Table R Table
0 1 2 3 0 1 2 3
1 0 34 1 1
2 0 8 2 2
3 0 50 3 3
4 0 4

Step 4: Compute the remaining elements of C and R using the below formula
𝑚𝑖𝑛 𝑗
𝐶(𝑖, 𝑗) = {𝐶(𝑖, 𝑘 − 1) + 𝐶(𝑘 + 1) + ∑𝑠=𝑖 𝑃𝑠 } for 1≤ 𝑖 ≤ 𝑗 ≤ 𝑛
1≤𝑘≤𝑗
To calculate C[1][2], k ranges from 1 to 2
𝑘 = 1, 𝐶[1][0] + 𝐶[2][2] + 𝑝1 + 𝑝2
C[1][2]=𝑚𝑖𝑛 { }
𝑘 = 2, 𝐶[1][1] + 𝐶[3][2] + 𝑝1 + 𝑝2
𝑘 = 1, 0 + 8 + 34 + 8 = 50
C[1][2]=𝑚𝑖𝑛 { }
𝑘 = 2, 34 + 0 + 34 + 8 = 76
C[1][2]=50, R[1][2]=𝑘𝑚𝑖𝑛 = 1
To calculate C[2][3], k ranges from 2 to 3
𝑘 = 2, 𝐶[2][1] + 𝐶[3][3] + 𝑝2 + 𝑝3
C[2][3]=𝑚𝑖𝑛 { }
𝑘 = 3, 𝐶[2][2] + 𝐶[4][3] + 𝑝2 + 𝑝3
𝑘 = 2, 0 + 50 + 8 + 50 = 108
C[2][3]=𝑚𝑖𝑛 { }
𝑘 = 3 8 + 0 + 8 + 50 = 66
C[2][3]=66, R[2][3]=𝑘𝑚𝑖𝑛 = 3
To calculate C[1][3], k ranges from 1 to 3
𝑘 = 1, 𝐶[1][0] + 𝐶[2][3] + 𝑝1 + 𝑝2 + 𝑝3
C[1][3]=𝑚𝑖𝑛 {𝑘 = 2, 𝐶[1][1] + 𝐶[3][3] + 𝑝1 + 𝑝2 + 𝑝3 }
𝑘 = 3, 𝐶[1][2] + 𝐶[4][3] + 𝑝1 + 𝑝2 + 𝑝3
𝑘 = 1, 0 + 66 + 34 + 8 = 158
C[1][3]=𝑚𝑖𝑛 {𝑘 = 2, 34 + 50 + 34 + 8 + 50 = 176}
𝑘 = 3, 50 + 0 + 34 + 8 + 50 = 142
C[1][3]=142, R[1][3]=𝑘𝑚𝑖𝑛 = 3
C Table R Table
0 1 2 3 0 1 2 3
1 0 34 50 142 1 1 1 3
2 0 8 66 2 2 3
3 0 50 3 3
4 0

Step 5: In the above table, C[1][3] indicates the optimal cost of the binary search tree and
R[1][3] indicates the root node of the binary tree.
Step 6: Create the BST with 20 as root node, then split the list into two parts. In this example
there is no element greater than 20 therefore no right child for BST. Find the root node for left
subtree ( Look at R[1][2]) ie., 1 in R table. Therefore the left subtree root node is 10. Repeat
the process until the final BST

20

10

12

Time complexity : O(n3)


Space complexity : O(n2)
WARSHALL’S ALGORITHM
Warshall’s Algorithm is a classic dynamic programming algorithm used to find the transitive
closure of a directed graph.
Definition: The transitive closure of a directed graph is a matrix that represents the
reachability between pairs of vertices in the graph. It provides information about whether there
exists a directed path from one vertex to another considering all possible paths of any length.
Algorithm: Warshall’s(C)
DC
for k  1 to n do
for I  1 to n do
for j  1 to n do
𝐷 𝑖 [𝑖, 𝑗] = 𝐷𝑘−1 [𝑖, 𝑗] ∨ [𝐷𝑘−1 [𝑖, 𝑘] ∧ 𝐷𝑘−1 [𝑘, 𝑗]]
end for
end for
end for
return 𝐷𝑛
Example
Apply Warshall’s algorithm to find the transitive closure for the given digraph

The adjacency matrix is shown below


D0 1 2 3 4

1 0 1 0 0
2 0 0 0 1
3 0 0 0 0
4 1 0 1 0
Using the formula calculate the values: 𝐷𝑖 [𝑖, 𝑗] = 𝐷𝑘−1 [𝑖, 𝑗] ∨ [𝐷𝑘−1 [𝑖, 𝑘] ∧ 𝐷𝑘−1 [𝑘, 𝑗]]
Step 1: When K=1 calculate D1
D0 1 2 3 4

1 0 1 0 0
2 0 0 0 1
3 0 0 0 0
4 1 0 1 0

𝐷1 [2,2] = 𝐷0 [2,2] ∨ [𝐷 0 [2,1] ∧ 𝐷0 [1,2]]= 0 ∨ [ 0 ∧ 1] = 0


𝐷1 [2,3] = 𝐷0 [2,3] ∨ [𝐷 0 [2,1] ∧ 𝐷0 [1,3]]= 0 ∨ [ 0 ∧ 0] = 0
𝐷1 [2,4] = 𝐷0 [2,4] ∨ [𝐷 0 [2,1] ∧ 𝐷0 [1,4]]= 1 ∨ [ 0 ∧ 0] = 1
𝐷1 [3,2] = 𝐷0 [3,2] ∨ [𝐷 0 [3,1] ∧ 𝐷0 [1,2]]= 0 ∨ [ 0 ∧ 1] = 0
𝐷1 [3,3] = 𝐷0 [3,3] ∨ [𝐷 0 [3,1] ∧ 𝐷0 [1,3]]= 0 ∨ [ 0 ∧ 0] = 0
𝐷1 [3,4] = 𝐷0 [3,4] ∨ [𝐷 0 [3,1] ∧ 𝐷0 [1,4]]= 0 ∨ [ 0 ∧ 0] = 0
𝐷1 [4,2] = 𝐷0 [4,2] ∨ [𝐷 0 [4,1] ∧ 𝐷0 [1,2]]= 0 ∨ [ 1 ∧ 1] = 1
𝐷1 [4,3] = 𝐷0 [4,3] ∨ [𝐷 0 [4,1] ∧ 𝐷0 [1,3]]= 1 ∨ [ 1 ∧ 0] = 1
𝐷1 [4,4] = 𝐷0 [4,4] ∨ [𝐷 0 [4,1] ∧ 𝐷0 [1,4]]= 0 ∨ [ 1 ∧ 0] = 0
D1 1 2 3 4
1 0 1 0 0
2 0 0 0 1
3 0 0 0 0
4 1 1 1 0
Step 2: When K=2 calculate D2

D1 1 2 3 4
1 0 1 0 0
2 0 0 0 1
3 0 0 0 0
4 1 1 1 0
𝐷2 [1,1] = 𝐷1 [1,1] ∨ [𝐷1 [1,2] ∧ 𝐷1 [2,1]]= 0 ∨ [ 1 ∧ 0] = 0
𝐷2 [1,3] = 𝐷1 [1,3] ∨ [𝐷1 [1,2] ∧ 𝐷1 [2,3]]= 0 ∨ [ 1 ∧ 0] = 0
𝐷2 [1,4] = 𝐷1 [1,4] ∨ [𝐷1 [1,2] ∧ 𝐷1 [2,4]]= 0 ∨ [ 1 ∧ 1] = 1
𝐷2 [3,1] = 𝐷1 [3,1] ∨ [𝐷1 [3,2] ∧ 𝐷1 [2,1]]= 0 ∨ [ 0 ∧ 0] = 0
𝐷2 [3,3] = 𝐷1 [3,3] ∨ [𝐷1 [3,2] ∧ 𝐷1 [2,3]]= 0 ∨ [ 0 ∧ 0] = 0
𝐷2 [3,4] = 𝐷1 [3,4] ∨ [𝐷1 [3,2] ∧ 𝐷1 [2,4]]= 0 ∨ [ 0 ∧ 1] = 0
𝐷2 [4,1] = 𝐷1 [4,1] ∨ [𝐷1 [4,2] ∧ 𝐷1 [2,1]]= 1 ∨ [ 1 ∧ 0] = 1
𝐷2 [4,3] = 𝐷1 [4,3] ∨ [𝐷1 [4,2] ∧ 𝐷1 [2,3]]= 1 ∨ [ 1 ∧ 0] = 1
𝐷2 [4,4] = 𝐷1 [4,4] ∨ [𝐷1 [4,2] ∧ 𝐷1 [2,4]]= 0 ∨ [ 1 ∧ 1] = 1

D2 1 2 3 4
1 0 1 0 0
2 0 0 0 1
3 0 0 0 0
4 1 1 1 1

Step 3: When K=3 calculate D3

D2 1 2 3 4
1 0 1 0 1
2 0 0 0 1
3 0 0 0 0
4 1 1 1 1
𝐷3 [1,1] = 𝐷2 [1,1] ∨ [𝐷 2 [1,3] ∧ 𝐷2 [3,1]]= 0 ∨ [ 0 ∧ 0] = 0
𝐷3 [1,2] = 𝐷2 [1,2] ∨ [𝐷 2 [1,3] ∧ 𝐷2 [3,2]]= 1 ∨ [ 0 ∧ 0] = 1
𝐷3 [1,4] = 𝐷2 [1,4] ∨ [𝐷 2 [1,3] ∧ 𝐷2 [3,4]]= 1 ∨ [ 0 ∧ 0] = 1
𝐷3 [2,1] = 𝐷2 [2,1] ∨ [𝐷 2 [2,3] ∧ 𝐷2 [3,1]]= 0 ∨ [ 0 ∧ 0] = 0
𝐷3 [2,2] = 𝐷2 [2,2] ∨ [𝐷 2 [2,3] ∧ 𝐷2 [3,2]]= 0 ∨ [ 0 ∧ 0] = 0
𝐷3 [2,4] = 𝐷2 [2,4] ∨ [𝐷 2 [2,3] ∧ 𝐷2 [3,4]]= 1 ∨ [ 0 ∧ 0] = 1
𝐷3 [4,1] = 𝐷2 [4,1] ∨ [𝐷 2 [4,3] ∧ 𝐷2 [3,1]]= 1 ∨ [ 1 ∧ 0] = 1
𝐷3 [4,2] = 𝐷2 [4,2] ∨ [𝐷 2 [4,3] ∧ 𝐷2 [3,2]]= 1 ∨ [ 1 ∧ 0] = 1
𝐷3 [4,4] = 𝐷2 [4,4] ∨ [𝐷 2 [4,3] ∧ 𝐷2 [3,4]]= 1 ∨ [ 1 ∧ 0] = 1
D3 1 2 3 4
1 0 1 0 1
2 0 0 0 1
3 0 0 0 0
4 1 1 1 1

Step 4: When K=4 calculate D4

D3 1 2 3 4
1 0 1 0 1
2 0 0 0 1
3 0 0 0 0
4 1 1 1 1
𝐷4 [1,1] = 𝐷3 [1,1] ∨ [𝐷 3 [1,4] ∧ 𝐷3 [4,1]]= 0 ∨ [ 1 ∧ 1] = 1
𝐷4 [1,2] = 𝐷3 [1,2] ∨ [𝐷 3 [1,4] ∧ 𝐷3 [4,2]]= 1 ∨ [ 1 ∧ 1 ] = 1
𝐷4 [1,3] = 𝐷3 [1,3] ∨ [𝐷 3 [1,4] ∧ 𝐷3 [4,3]]= 0 ∨ [ 1 ∧ 1] = 1
𝐷4 [2,1] = 𝐷3 [2,1] ∨ [𝐷 3 [2,4] ∧ 𝐷3 [4,1]]= 0 ∨ [ 1 ∧ 1] = 1
𝐷4 [2,2] = 𝐷3 [2,2] ∨ [𝐷 3 [2,4] ∧ 𝐷3 [4,2]]= 0 ∨ [ 1 ∧ 1] = 1
𝐷4 [2,3] = 𝐷3 [2,3] ∨ [𝐷 3 [2,4] ∧ 𝐷3 [4,3]]= 0 ∨ [ 1 ∧ 1] = 1
𝐷4 [3,1] = 𝐷3 [3,1] ∨ [𝐷 3 [3,4] ∧ 𝐷3 [4,1]]= 0 ∨ [ 0 ∧ 1] = 0
𝐷4 [3,2] = 𝐷3 [3,2] ∨ [𝐷 3 [3,4] ∧ 𝐷3 [4,2]]= 0 ∨ [ 0 ∧ 1] = 0
𝐷4 [3,3] = 𝐷3 [3,3] ∨ [𝐷 3 [3,4] ∧ 𝐷3 [4,3]]= 0 ∨ [ 0 ∧ 1] = 0

D4 1 2 3 4
1 1 1 1 1
2 1 1 1 1
3 0 0 0 0
4 1 1 1 1

The above table is the final transitive closure matrix.


Time Complexity: O(n3)
Space Complexity: O(n2)
THE KNAPSACK PROBLEM :
The knapsack problem is a classic optimization problem in computer science and mathematics.
It involves selecting a subset of items from a given set of items, subject to a weight constraint,
such that the total value of the selected items is maximized.
0/1 KNAPSACK PROBLEM :
The 0/1 knapsack problem is that the items can be placed completely or ignored similar to a
binary choice (0,1).
MEMORY FUNCTIONS :
Memorization is a technique for improving the performance of recursive algorithms. This s a
dynamic programming technique where the result of a function call is stored or memorized for
later use so that we can significantly speed up the computation by avoiding repetitive
calculations. It follows a top-down approach and the subproblems are solved exactly once.
nw
?*

Qi uo-n a- knaysaea O"A w ith tr ru


,

MI - ail\ % 'tfu L+^^1r+td*


rt - nlu^"lthr-" ebia-t tt
%
1/\) ' OJtv Anratt C-gr*ig,h hh N, , lnly , W8" " )'lr'
0 'T \t^t
r , P>t Pg '' '' Pb.

%
o,t L, A, o

rrot bae'
Sa.lzclzd a.nel a- ith obju't'
h",t born aulwte-d '

-Ihu 1LcDfrwLe fr.o.ta,'k"' k


"",p
gack bb'n is
P"o
al hoLlp u)t :

U
l5 L =1-- o

V [;,JJ =
,jJ -i Fj-rLLo
vCid,jJ , v[,-,,j - *J + ?; t5 ''t)',zo

dr6n^mic P'(*^*'Z )

=fz,t,f,L,5J a^d
tW1,Wy,lil3,VJ4t ta)

, ?r, P3, pEJ s, t5, to t ,rJ , *^funk *t"7 t0=8


|?, Ph, = fzo,
Salnbi on ,
#

cr
FirsL a, F*Utp "b n+t D< ard w+t coLwmn<

and sot fu"k r,oo ahd eo/.urnn o-nFri os t


fr b

74to.
2

;\j ot
o oo ooooo
I o
2 o
3 o
h o
5 o

: Wt = L , P, = Lo , i=l
vlr,r] = vlo,tJ j-wilo, t-Lzo is txrta

v[','tJ =o

v[,,, zJ = ma.,_ \ rfo,z], vfo,o] +

= ff)art{r,o+ I =>o
v f,,3J = mav { " [t,zJ , vlt, tJ r
= Iocvx \u, "+z\ =Lo

--t'oarr\o,o+>o\ =Lo
+ r"\
V [,, 5) = ra.o{ u ,5J , v [o '17

-- roaoc \u, o+La\ = Lo


v [-r , bJ -- Ntatx\ tlo ,6J , vlo 't+J 'o)
+
= rcnm \o ,

vlt,sl -- mail\uCo'sJ 'vlo'6J+'o)


3

U)L= | ,

v lz, tJ
z rrnaoc{o) D+s} =5
,1L,, = hnaae{ r[ \,L), v[ t,t)on\
= m*\La /D+s} =
vP ,Rl ' r"*u \ uf, , 3) , u[., , ,Jtr\
= *"t Lo , 2-ot5\ = &r

= mo,x\ L0 t >ots) --L5


vP,s] = hnc,rx{
"lt,rl, vf t,hJrr\
= rnaryLA ro , LooE\ = )-5
^nl>,6J =- na._{ v lt,6) , vCt '5J l5\

= n*"\L0,>o+s\ =LS
w3=$ , P3=t5 , t=3
,-5 z o
v [.] ,tJ = vl, ,rJ
=5 ZrS LO
v [.4 , rJ = vC" ,r)
= 2-O
3-9 Zo
vC3',3J = vf>,3J = LT
h

vlz,+J =, [, , t) 4-5 zo
=LS

z-mn'r["r ) o+trV=r,

=fi)at\Lt,t, +t5\-->E

? iltaor\>5 ) >otl5] =35

,[s , 8J -- **n\ vl''s) ' 'P '')+'5\


1'5 75+ rs ] = hD
= tnon''\ '
h)fi= L) P1r=lo u=tl t-L Lo
V[*,r -n vlz't)
z 5 >'>Zo

= **q, Lo , o t ''] -->o


tJ+ Pa\
v[q ,{ = *^n\ v[: ,zJ 'tti '
lrs , 5t'o\ =2-5
--.m,n
v [q ,vJ -- Ntax-\ ,fr,4J , u [:, z] + ro ]
z Yna,x\rn , Lo+l'o) -* 3O
uln,sJ = w)o{Y { ufg ,il,v [3,3] tro]
?,
\ *, , LSt ro -- 35
--- hnqoe
5

vfl,s] =,"I v[e,s], "fs,lJ+ rol


= rn"*\ L5 , L5 tro]-3S
v[+,r] -- hn*. 1v[s,r] , vf3,rJ +'o]
-n **" \ 3s ) y5 +,0) = ee

.* "rr-{ Ao )
2-s+rc1 = 4D

N5z5 , ?rrt2- i*S


t-5 Zo
. v[u,,] = vl+,rJ z5

v[s,:l = vle ,z) = zS

,[s , rJ = hnax{ulo ,sJ ,vfh,uJ+ F- t ,"1


= o"{ ss t o+,r\ --Bs
v[s, e] = maffi{ "p, ,6J , v0,, tJ+ 3
-,rl**{ 35 , 5t,"\ =3S

= roa*\ 35 , Lo*rr7 = jS
vfs,t] -- hom[
" to
,eJ , v [t, , { + p, )
fl" YCi , jJ EabLL is alrs,':n bo-t""* '

O O oo oo
OO O. o
I

t o o 20 Lo >o >o 20

L O 5>o L5'>, >s >9 zsi


3 O 5>o L' L9 >5 >9 3S lto
_h O 5 >u 3o 35 3s 35 ho
5 O 5^>o >5 3o 3s 3s 35

fnot*i ml vfs, *J =
u{yl
f "btk t's

'lo tna.l
f "d 'til.e
"p+i
akch vfs ,B) : vlt+
'*J
is neioet-d

vl4,e = vfz,t) is 4n't"d ,

Y ,gJ v[z'gJ 3 is inckdnol in

is inchd.eal

faL knayzaek caYnu'btr


* t'os
&*f =t

L,3J 'f" 3J
vf
-
o' ob)a* L is i,"cltd-"dl

vfr,rJ+rfo,Ll
,T
All Pair Shortest Path
All Pair Shortest Path
• The all pair shortest path algorithm is also known as Floyd-Warshall
algorithm
• It is used to find all pair shortest path problem from a given weighted
graph.
• The all pair shortest path refers to the problem of finding the shortest path
between every pair of vertices in a graph.
• The result of this algorithm, it will generate a matrix, which will represent
the minimum distance from any node to all other nodes in the graph.
• At first the output matrix is same as given cost matrix of the graph. Every
iteration the output matrix will be updated with all vertices k as the
intermediate vertex
2 1 1 1

3 3 2 4 2
1 2 3

4 4 4 3
Cost Matrix
D0 1 2 3 4
1 0 10 ∞ 40
2 ∞ 0 ∞ 20
3 50 ∞ 0 ∞
4 ∞ ∞ 60 0

Dk=min{Dk-1(i,j), Dk-1(i,k)+Dk-1(k,j)}
At first the output matrix is same as given cost matrix of the graph. Every
iteration the output matrix will be updated with all vertices k as the
intermediate vertex
The time complexity of this algorithm is Ο(𝑛3 )
Algorithm: FOLYD(C)
D C
for K 1 to n do
for i1 to n do
for j 1 to n do
D[i, j]=min{D[i, j], D[i, k] + D[k, j]}
end for
end for
end for
return D
Cost Matrix
D0 1 2 3 4
Dk=min{Dk-1(i,j), Dk-1(i,k)+Dk-1(k,j)} 0 10 ∞ 40
1
• Step 1: 2 ∞ 0 ∞ 20
• When K=1 50 ∞ 0 ∞
3
D1(2,3)=min{D0(2,3), D0(2,1)+D0(1,3)} 4 ∞ ∞ 60 0
=min(∞, ∞+∞) = ∞
D1(2,4)=min{D0(2,4), D0(2,1)+D0(1,4)}
D1 1 2 3 4
=min(20, ∞+40) = 20
1 0 10 ∞ 40
D1(3, 2)=min{D0(3,2), D0(3,1)+D0(1,2)} ∞ 0 ∞ 20
2
=min(∞, 50+10) = 60 3 50 60 0 90
D1(3,4)=min{D0(3,4), D0(3,1)+D0(1,4)} 4 ∞ ∞ 60 0
=min(∞, 50+40) = 90
D1(4,2)=min{D0(4,2), D0(4,1)+D0(1,2)} D1(4,3)=min{D0(4,3), D0(4,1)+D0(1,3)}
=min(∞, ∞+10) = ∞ =min(60, ∞+∞) = 60
• Step 2: D1 1 2 3 4
• When K=2 1 0 10 ∞ 40
2 ∞ 0 ∞ 20
D2(1,3)=min{D1(1,3), D1(1,2)+D1(2,3)}
3 50 60 0 90
=min(∞, 10+∞) = ∞ ∞ ∞ 60 0
4
D2(1,4)=min{D1(1,4), D1(1,2)+D1(2,4)}
=min(40, 10+20) = 30
D2(3, 1)=min{D1(3,1), D1(3,2)+D1(2,1)} D2 1 2 3 4
=min(50, 60+ ∞) = 50 1 0 10 ∞ 30
D2(3,4)=min{D1(3,4), D1(3,2)+D1(2,4)} 2 ∞ 0 ∞ 20

=min(90, 60+20) = 80 3 50 60 0 80

D2(4,1)=min{D1(4,1), D1(4,2)+D1(2,1)} 4 ∞ ∞ 60 0

=min(∞, ∞+ ∞) = ∞
D2(4,3)=min{D1(4,3), D1(4,2)+D1(2,3)}
=min(60, ∞+∞) = 60
• Step 3: D2 1 2 3 4
• When K=3 1 0 10 ∞ 30
2 ∞ 0 ∞ 20
D3(1,2)=min{D2(1,2), D2(1,3)+D2(3,2)}
3 50 60 0 80
=min(10, ∞ +60) = 10 ∞ ∞ 60 0
4
D3(1,4)=min{D2(1,4), D2(1,3)+D2(3,4)}
=min(30, ∞ +80) = 30
D3 1 2 3 4
D3(2, 1)=min{D2(2,1), D2(2,3)+D2(3,1)}
1 0 10 ∞ 30
=min(∞, ∞ + 50) = ∞
2 ∞ 0 ∞ 20
D3(2,4)=min{D2(2,4), D2(2,3)+D2(3,4)}
3 50 60 0 80
=min(20, ∞ +80) = 20 110 120 60 0
4
D3(4,1)=min{D2(4,1), D2(4,3)+D2(3,1)}
=min(∞, 60+ 50) = 110
D3(4,2)=min{D2(4,2), D2(4,3)+D2(3,2)}
=min(∞, 60+60) = 120
• Step 4: D3 1 2 3 4
• When K=4 1 0 10 ∞ 30
2 ∞ 0 ∞ 20
D4(1,2)=min{D3(1,2), D3(1,4)+D3(4,2)}
3 50 60 0 80
=min(10, 30 +120) = 10 110 120 60 0
4
D4(1,3)=min{D3(1,3), D3(1,4)+D3(4,3)}
=min(∞, 30 +60) = 90
D4(2, 1)=min{D3(2,1), D3(2,4)+D3(4,1)}
=min(∞, 20 + 110) = 130 D4 1 2 3 4
1 0 10 90 30
D4(2,3)=min{D3(2,3), D3(2,4)+D3(4,3)}
2 130 0 80 20
=min(∞, 20 +60) = 80
3 50 60 0 80
D4(3,1)=min{D3(3,1), D3(3,4)+D3(4,1)}
4 110 120 60 0
=min(50, 80+ 110) = 50
D4(3,2)=min{D3(3,2), D3(3,4)+D3(4,2)}
=min(60, 80+120) = 60
Complexity
• The time complexity of this algorithm is 𝜪(𝒏𝟑 )
• Space Complexity of this algorithm is 𝜪(𝒏𝟐 )
Advantages and Disadvantages of Dynamic Programming

Advantages: Disadvantages:
• Optimality • Memory Usage
• Re-usability of Subproblems • Difficulty of design
• Versatility • No direct solution
• Predictability
Exercise 1 Exercise 2 Exercise 3
HUFFMAN TREES AND HUFFMAN CODES
Huffman trees and Huffman codes are concepts used in data compression to create efficient variable
length encoding schemes. They were developed by David A Huffman in 1952.
Data Compression
Data Compression is a technique that is used to reduce the storage space used by data.
Fixed Length Code
Fixed length codes refer to the coding technique with ever symbol or character is represented by a code
of the same length.
Example: ‘A’ is represented as 1000001
Variable Length codes
Variable length codes refer to the coding technique where every symbol or character is represented by
a code of the different lengths.
Huffman Trees
A Huffman tree is a specific type of binary tree is used in data compression. It is constructed based on
the frequencies or probabilities of occurrence of symbols in a given text or data.
Huffman Codes
Huffman codes are variable-length prefix codes derived from a Huffman Tree and are used for data
compression. Huffman codes is also known as Huffman encodings are the variable-length prefix codes
assigned to symbols based on their frequencies or probabilities in a given text or data.
Huffman Algorithm
Step1: Create a frequency Queue Q consisting of each unique character.
Step 2: Sort frequencies in ascending order
Step 3: Loop
a) Create a new node
b) Extract the minimum value from Q and assign it to left child of new node.
c) Extract the minimum value from Q and assign it to right child of new node
d) Calculate the sum of these two minimum values and assign it to the value of new node.
e) Insert this new node into the queue.
Step 4: Create Huffman codes i.e., for each non-leaf node, assign 0 to the left edge and 1 to the right
edge.
Decoding Steps
Step 1: Start at the root of the tree
Step 2: Repeat until we reach an external leaf node
a) Read one message bit
b) Take the left branch in the tree if the bit is 0, take the right branch if it is 1
Step 3: Print the character in that external node
Example
Compress the following string using Huffman coding technique
B C A A D D D C C A C A C A C

Step1: Create a frequency Queue Q consisting of each unique character.


1 6 5 3

B C A D
Step 2: Sort frequencies in ascending order

1 3 5 6

B D A C

Step 3: Create a new node X, assign the minimum frequency (1) as left child and assign the second
minimum frequency (3) as right child. Assign the sum (4) as the new node value. The tree will be

1 3
B D

Step 4: Update the queue.


4 5 6

* A C
Create a new node X, assign the minimum frequency (4) as left child and assign the second minimum
frequency (5) as right child. Assign the sum (9) as the new node value. The tree will be

5
4
A

1 3

Step 5: Update the queue. B D

6 9

C *
Create a new node X, assign the minimum frequency (6) as left child and assign the second minimum
frequency (9) as right child. Assign the sum (15) as the new node value. The tree will be
15

6 9
C

5
4
A

1 3
B D
Step 6: For each non-leaf node, assign 0 to the left edge and 1 to the right edge.

15
0 1

6 9
C 0 1

5
4
0 1 A

1 3
B D
Character Frequency Code Size
A 5 11 5*2 = 10
B 1 100 1*3 = 3
C 6 0 6*1 = 6
D 3 101 3*3 = 9
4*8= 32 bits 15 bits 28 bits

Without encoding, the total size of the string was 120 bits, after encoding the size is reduced to
32+15+28 = 75
Complexity
Time complexity: O (n log n)
Space Complexity: O(n)

You might also like