DAAcourse
DAAcourse
Course Outcomes:
Argue the correctness of algorithms using inductive proofs and invariants.
Analyse worst-case running times of algorithms using asymptotic analysis.
Describe the divide-and-conquer paradigm and explain when an algorithmic design situation
calls for it. Recite algorithms that employ this paradigm. Synthesize divide-and-conquer
algorithms. Derive and solve recurrences describing the performance of divide-and-conquer
algorithms.
Explain the different ways to analyse randomized algorithms (expected running time,
probability of error). Recite algorithms that employ randomization. Explain the difference
between a randomized algorithm and an algorithm with probabilistic inputs.
Analyse randomized algorithms. Employ indicator random variables and linearity of
expectation to perform the analyses. Recite analyses of algorithms that employ this method of
analysis.
Explain what competitive analysis is and to which situations it applies. Perform competitive
analysis.
Compare between different data structures. Pick an appropriate data structure for a design
situation.
Explain what an approximation algorithm is, and the benefit of using approximation
algorithms. Be familiar with some approximation algorithms.
Analyse the approximation factor of an algorithm.
Prerequisites:
Discrete Mathematics, Data Structures
Goals and Objectives:
The main goal of this course is to study the fundamental techniques to design
efficient algorithms and analyse their running time. After a brief review of prerequisite
material (search, sorting, asymptotic notation).
Required Knowledge:
1. Computer programming skills
2. Knowledge of probability
3. Understanding of basic data structures and algorithms
4. Basic knowledge in discrete mathematics
UNIT - I
Algorithm:
Introduction:
What is Algorithm?
An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for
obtaining a required output for any legitimate input in a finite amount of time.
The unambiguity requirement for each step of an algorithm cannot be compromised.
The range of inputs for which an algorithm works has to be specified carefully. The same
algorithm can be represented in several different ways. There may exist several algorithms
for solving the same problem. The reference to “instructions” in the definition implies that
there is something or someone capable of understanding and following the instructions given.
We call this a “computer,” keeping in mind that before the electronic computer was invented,
the word “computer” meant a human being involved in performing numeric calculations.
Nowadays, of course, “computers” are those ubiquitous electronic devices that have become
indispensable in almost everything we do. Note, however, that although the majority of
algorithms are indeed intended for eventual computer implementation, the notion of
algorithm does not depend on such an assumption.
How to make an Algorithm:
A person well-trained in computer science knows how to deal with algorithms: how to
construct them, manipulate them, understand them, analyse them. This knowledge is
preparation for much more than writing good computer programs; it is a general-purpose
mental tool that will be a definite aid to the understanding of other subjects, whether they be
chemistry, linguistics, or music, etc. The reason for this may be understood in the following
way: It has often been said that a person does not really understand something until after
teaching it to someone else. Actually, a person does not really understand something until
after teaching it to a computer, i.e., expressing it as an algorithm . . . An attempt to formalize
things as algorithms leads to a much deeper understanding than if we simply try to
comprehend things in the traditional way.
Algorithms for the same problem can be based on very different ideas and can solve the
problem with dramatically different speeds.
Example 1:
Making a Tea:
In order to prepare Tea we need to follow some steps.
1. Boil the water
2. Add Tea powder in boiling water
3. Boil it for 5 minutes
4. Filter the tea extract
5. After that add sugar and milk
6. Ready to serve
These step by step procedure is said to be an Algorithm.
*Algorithm to find area of rectangle
Step 1: Start
Step 2: get l, b values
Step 3: Calculate A=l*b
Step 4: Display A
Step 5: Stop
Example 2:
The greatest common divisor of two nonnegative, not-both-zero integers m and n,
denoted gcd(m, n), is defined as the largest integer that divides both m and n evenly, i.e., with
a remainder of zero. Euclid of Alexandria (third century B.C.) outlined an algorithm for
solving this problem in one of the volumes of his Elements most famous for its systematic
exposition of geometry. In modern terms, Euclid’s algorithm is based on applying repeatedly
the equality
where m mod n is the remainder of the division of m by n, until m mod n is equal to 0. Since
gcd(m, 0) = m (why?), the last value of m is also the greatest common divisor of the
initial m and n.
Step 1 If n = 0, return the value of m as the answer and stop; otherwise, proceed to Step 2.
Why algorithm?
Why do you need to study algorithms? If you are going to be a computer professional,
there are both practical and theoretical reasons to study algorithms. From a practical
standpoint, you have to know a standard set of important algorithms from different areas of
computing; in addition, you should be able to design new algorithms and analyse their
efficiency. From the theoretical stand-point, the study of algorithms, sometimes
called algorithmics, has come to be recognized as the cornerstone of computer science.
Analysing an Algorithm
Sorting
Searching
String processing
Graph problems
Sorting
The sorting problem is to rearrange the items of a given list in non-decreasing order.
Of course, for this problem to be meaningful, the nature of the list items must allow such an
ordering. As a practical matter, we usually need to sort lists of numbers, characters from an
alphabet, character strings, and, most important, records similar to those maintained by
schools about their students, libraries about their holdings, and companies about their
employees. In the case of records, we need to choose a piece of information to guide sorting.
For example, we can choose to sort student records in alphabetical order of names or by
student number or by student grade-point average. Such a specially chosen piece of
information is called a key.
Searching
The searching problem deals with finding a given value, called a search key, in a
given set. There are plenty of searching algorithms to choose from. They range from the
straightforward sequential search to a spectacularly efficient but limited binary search and
algorithms based on representing the underlying set in a different form more conducive to
searching. The latter algorithms are of particular importance for real-world applications
because they are indispensable for storing and retrieving information from large databases.
String Processing
In recent decades, the rapid proliferation of applications dealing with non-numerical
data has intensified the interest of researchers and computing practitioners in string-handling
algorithms. A string is a sequence of characters from an alphabet. Strings of particular
interest are text strings, which comprise letters, numbers, and special characters; bit strings,
which comprise zeros and ones; and gene sequences, which can be modelled by strings of
characters from the four-character alphabet {A, C, G, T}. It should be pointed out, however,
that string-processing algorithms have been important for computer science for a long time in
conjunction with computer languages and compiling issues.
One particular problem—that of searching for a given word in a text—has attracted special
attention from researchers. They call it string matching.
Graph Problems
One of the oldest and most interesting areas in algorithmics is graph algorithms.
Informally, a graph can be thought of as a collection of points called vertices, some of which
are connected by line segments called edges. Graphs are an interesting subject to study, for
both theoretical and practical reasons. Graphs can be used for modelling a wide variety of
applications, including transportation, communication, social and economic networks, project
scheduling, and games. Studying different technical and social aspects of the Internet in
particular is one of the active areas of current research involving computer scientists,
economists, and social scientists
Asymptotic Notations and Basic Efficiency Classes
The efficiency analysis framework concentrates on the order of growth of an
algorithm’s basic operation count as the principal indicator of the algorithm’s efficiency. To
compare and rank such orders of growth, computer scientists use three notations: O (big oh),
(big omega), and (big theta). First, we introduce these notations informally, and then, after
several examples, formal definitions are given. In the following discussion, t(n) and g(n) can
be any non-negative functions defined on the set of natural numbers. In the context we are
interested in, t(n) will be an algorithm’s running time
O(g(n)) is the set of all functions with a lower or same order of growth as g(n) (to within a
constant multiple, as n goes to infinity). Thus, to give a few examples, the following
assertions are all true:
Indeed, the first two functions are linear and hence have a lower order of growth
than g(n) = n2, while the last one is quadratic and hence has the same order of growth as n2.
O-notation
A function t (n) is said to be in O(g(n)), denoted t (n) ∈ O(g(n)), if t (n) is bounded above by
some constant multiple of g(n) for all large n, i.e., if there exist some positive constant c and
some nonnegative integer n0 such that
Ω-notation
A function t(n) is said to be in (g(n)), denoted t (n) ∈ (g(n)), if t (n) is bounded below by
some positive constant multiple of g(n) for all large n, i.e., if there exist some positive
constant c and some nonnegative integer n0 such that
Ө Notation:
A function t (n) is said to be in (g(n)), denoted t (n) ∈ (g(n)), if t (n) is bounded both above
and below by some positive constant multiples of g(n) for all large n, i.e., if there exist some
positive constants c1 and c2 and some nonnegative integer n0 such that
The general framework for analysis of algorithms to recursive algorithms. We start with an
example often used to introduce novices to the idea of a recursive algorithm.
Example -2 :
Recursion is the process of defining a problem (or the solution to a problem) in terms of (a
simpler version of) itself.
For example, we can define the operation "find your way home" as:
1. If you are at home, stop moving.
2. Take one step toward home.
3. "find your way home".
Here the solution to finding your way home is two steps (three steps). First, we don't go home
if we are already home. Secondly, we do a very simple action that makes our situation
simpler to solve. Finally, we redo the entire algorithm.
Basic steps of recursive programs
Every recursive program follows the same basic sequence of steps:
1. Initialize the algorithm. Recursive programs often need a seed value to start with. This
is accomplished either by using a parameter passed to the function or by providing a
gateway function that is non-recursive but that sets up the seed values for the
recursive calculation.
2. Check to see whether the current value(s) being processed match the base case. If so,
process and return the value.
3. Redefine the answer in terms of a smaller or simpler sub-problem or sub-problems.
4. Run the algorithm on the sub-problem.
5. Combine the results in the formulation of the answer.
6. Return the results.
Properties
A recursive function can go infinite like a loop. To avoid infinite running of recursive
function, there are two properties that a recursive function must have −
Base criteria − There must be at least one base criteria or condition, such that, when
this condition is met the function stops calling itself recursively.
Progressive approach − The recursive calls should progress in such a way that each
time a recursive call is made it comes closer to the base criteria.
EXAMPLE Compute the factorial function F (n) = n! For an arbitrary non negative
integer n. Since
n! = 1 . . . . . (n − 1) . n = (n − 1)! . n for n ≥ 1
ALGORITHM F(n)
//Computes n! Recursively
//Input: A nonnegative integer n
//Output: The value of n!
if n = 0 return 1
else return F (n − 1) ∗ n
For simplicity, we consider n itself as an indicator of this algorithm’s input size (rather than
the number of bits in its binary expansion). The basic operation of the algorithm is
multiplication,5 whose number of executions we denote M(n). Since the function F (n) is
computed according to the formula
Indeed, M(n − 1) multiplications are spent to compute F (n − 1), and one more multiplication
is needed to multiply the result by n.
The last equation defines the sequence M(n) that we need to find. This equation
defines M(n) not explicitly, i.e., as a function of n, but implicitly as a function of its value at
another point, namely n − 1. Such equations are called recurrence relations
Our goal now is to solve the recurrence relation M(n) = M(n − 1) + 1, i.e., to find an explicit
formula for M(n) in terms of n only.
Applications of Recursive Algorithm:
Recursion has many, many applications. In this module, we'll see how to use recursion to
compute the factorial function, to determine whether a word is a palindrome, to compute
powers of a number, to draw a type of fractal, and to solve the ancient Towers of Hanoi
problem. Later modules will use recursion to solve other problems, including sorting.
Consider the problem of finding the value of the largest element in a list of n numbers. For
simplicity, we assume that the list is implemented as an array. The following is pseudo code
of a standard algorithm for solving the problem.
The obvious measure of an input’s size here is the number of elements in the array,
i.e., n. The operations that are going to be executed most often are in the algorithms for loop.
For example, imagine you have a small padlock with 4 digits, each from 0-9. You
forgot your combination, but you don't want to buy another padlock. Since you can't
remember any of the digits, you have to use a brute force method to open the lock.
So you set all the numbers back to 0 and try them one by one: 0001, 0002, 0003, and
so on until it opens. In the worst case scenario, it would take 104, or 10,000 tries to
find your combination.
A classic example in computer science is the traveling salesman problem (TSP).
Suppose a salesman needs to visit 10 cities across the country. How does one
determine the order in which those cities should be visited such that the total distance
traveled is minimized?
The brute force solution is simply to calculate the total distance for every possible
route and then select the shortest one. This is not particularly efficient because it is
possible to eliminate many possible routes through clever algorithms.
The time complexity of brute force is O(mn), which is sometimes written
as O(n*m) . So, if we were to search for a string of "n" characters in a string of "m"
characters using brute force, it would take us n * m tries.
Greedy Algorithms
Structure of a Greedy Algorithm
Greedy algorithms take all of the data in a particular problem, and then set a rule for
which elements to add to the solution at each step of the algorithm. In the animation
above, the set of data is all of the numbers in the graph, and the rule was to select the
largest number available at each level of the graph. The solution that the algorithm
builds is the sum of all of those choices.
If both of the properties below are true, a greedy algorithm can be used to solve the
problem.
Greedy choice property: A global (overall) optimal solution can be reached by
choosing the optimal choice at each step.
Optimal substructure: A problem has an optimal substructure if an optimal solution
to the entire problem contains the optimal solutions to the sub-problems.
In other words, greedy algorithms work on problems for which it is true that, at every
step, there is a choice that is optimal for the problem up to that step, and after the last
step, the algorithm produces the optimal solution of the complete problem.
To make a greedy algorithm, identify an optimal substructure or subproblem in the
problem. Then, determine what the solution will include (for example, the largest
sum, the shortest path, etc.). Create some sort of iterative way to go through all of the
subproblems and build a solution.
Dynamic Programming
Dynamic Programming (DP) is an algorithmic technique for solving an optimization
problem by breaking it down into simpler subproblems and utilizing the fact that the
optimal solution to the overall problem depends upon the optimal solution to its
subproblems.
Let’s take the example of the Fibonacci numbers. As we all know, Fibonacci
numbers are a series of numbers in which each number is the sum of the two
preceding numbers. The first few Fibonacci numbers are 0, 1, 1, 2, 3, 5, and 8, and
they continue on from there.
If we are asked to calculate the nth Fibonacci number, we can do that with the
following equation,
Fib(n) = Fib(n-1) + Fib(n-2), for n > 1
As we can clearly see here, to solve the overall problem (i.e. Fib(n)), we broke it
down into two smaller subproblems (which are Fib(n-1) and Fib(n-2)). This shows
that we can use DP to solve this problem.
Characteristics of Dynamic Programming
Before moving on to understand different methods of solving a DP problem, let’s first
take a look at what are the characteristics of a problem that tells us that we can apply
DP to solve it.
1. Overlapping Subproblems
Subproblems are smaller versions of the original problem. Any problem has
overlapping sub-problems if finding its solution involves solving the same
subproblem multiple times. Take the example of the Fibonacci numbers; to find
the fib(4), we need to break it down into the following sub-problems:
fib(4)fib(3)fib(2)fib(2)fib(1)fib(1)fib(0)fib(0)fib(1)
Recursion tree for calculating Fibonacci numbers
We can clearly see the overlapping subproblem pattern here, as fib(2) has been
evaluated twice and fib(1) has been evaluated three times.
2. Optimal Substructure Property
Any problem has optimal substructure property if its overall optimal solution can be
constructed from the optimal solutions of its subproblems. For Fibonacci numbers, as
we know,
Fib(n) = Fib(n-1) + Fib(n-2)
This clearly shows that a problem of size ‘n’ has been reduced to subproblems of size
‘n-1’ and ‘n-2’. Therefore, Fibonacci numbers have optimal substructure property.
Backtracking
Backtracking is an algorithmic-technique for solving problems recursively by trying
to build a solution incrementally, one piece at a time, removing those solutions that
fail to satisfy the constraints of the problem at any point of time (by time, here, is
referred to the time elapsed till reaching any level of the search tree).
According to the wiki definition,
Backtracking can be defined as a general algorithmic technique that considers
searching every possible combination in order to solve a computational problem.
There are three types of problems in backtracking –
1. Decision Problem – In this, we search for a feasible solution.
2. Optimization Problem – In this, we search for the best solution.
3. Enumeration Problem – In this, we find all feasible solutions.
Consider a situation that you have three boxes in front of you and only one of them has a
gold coin in it but you do not know which one. So, in order to get the coin, you will have to
open all of the boxes one by one. You will first check the first box, if it does not contain the
coin, you will have to close it and check the second box and so on until you find the coin.
This is what backtracking is, that is solving all sub-problems one by one in order to reach
the best possible solution.
Consider the below example to understand the Backtracking approach more formally,
A backtracking algorithm will then work as follows:
The Algorithm begins to build up a solution, starting with an empty solution set . S = {}
1. Add to the first move that is still left (All possible moves are added to one by one).
This now creates a new sub-tree in the search tree of the algorithm.
2. Check if satisfies each of the constraints in .
If Yes, then the sub-tree is “eligible” to add more “children”.
Else, the entire sub-tree is useless, so recurs back to step 1 using argument .
3. In the event of “eligibility” of the newly formed sub-tree , recurs back to step 1, using
argument
4. If the check for returns that it is a solution for the entire data . Output and terminate the
program.
If not, then return that no solution is possible with the current and hence discard it.
Difference between Recursion and Backtracking:
In recursion, the function calls itself until it reaches a base case. In backtracking, we use
recursion to explore all the possibilities until we get the best result for the problem.
Pseudo Code for Backtracking :
The expected output is a binary matrix which has 1s for the blocks where queens are
placed. For example, following is the output matrix for the above 4 queen solution.
{ 0, 1, 0, 0}
{ 0, 0, 0, 1}
{ 1, 0, 0, 0}
{ 0, 0, 1, 0}
Backtracking Algorithm: The idea is to place queens one by one in different columns,
starting from the leftmost column. When we place a queen in a column, we check for
clashes with already placed queens. In the current column, if we find a row for which there
is no clash, we mark this row and column as part of the solution. If we do not find such a
row due to clashes then we backtrack and return false.
1) Start in the leftmost column
2) If all queens are placed
return true
3) Try all rows in the current column. Do following for every tried row.
a) If the queen can be placed safely in this row then mark this [row,
column] as part of the solution and recursively check if placing
queen here leads to a solution.
b) If placing the queen in [row, column] leads to a solution then return
true.
c) If placing queen doesn't lead to a solution then unmark this [row,
column] (Backtrack) and go to step (a) to try other rows.
3) If all rows have been tried and nothing worked, return false to trigger
backtracking.
Knapsack Problem
Given a set of items, each with a weight and a value, determine a subset of items to include
in a collection so that the total weight is less than or equal to a given limit and the total value
is as large as possible.
The knapsack problem is in combinatorial optimization problem. It appears as a subproblem
in many, more complex mathematical models of real-world problems. One general approach
to difficult problems is to identify the most restrictive constraint, ignore the others, solve a
knapsack problem, and somehow adjust the solution to satisfy the ignored constraints.
Applications
In many cases of resource allocation along with some constraint, the problem can be derived
in a similar way of Knapsack problem. Following is a set of example.
Finding the least wasteful way to cut raw materials
portfolio optimization
Cutting stock problems
Problem Scenario
A thief is robbing a store and can carry a maximal weight of W into his knapsack. There are
n items available in the store and weight of ith item is wi and its profit is pi. What items
should the thief take?
In this context, the items should be selected in such a way that the thief will carry those
items for which he will gain maximum profit. Hence, the objective of the thief is to
maximize the profit.
Based on the nature of the items, Knapsack problems are categorized as
Fractional Knapsack
Knapsack
0-1 Knapsack
In 0-1 Knapsack, items cannot be broken which means the thief should take the item as a
whole or should leave it. This is reason behind calling it as 0-1 Knapsack.
Hence, in case of 0-1 Knapsack, the value of xi can be either 0 or 1, where other constraints
remain the same.
0-1 Knapsack cannot be solved by Greedy approach. Greedy approach does not ensure an
optimal solution. In many instances, Greedy approach may give an optimal solution.
The following examples will establish our statement.
Example-1
Let us consider that the capacity of the knapsack is W = 25 and the items are as shown in the
following table.
Item A B C D
Profit 24 18 18 10
Weight 24 10 10 7
Without considering the profit per unit weight (pi/wi), if we apply Greedy approach to solve
this problem, first item A will be selected as it will contribute maximum profit among all the
elements.
After selecting item A, no more item will be selected. Hence, for this given set of items total
profit is 24. Whereas, the optimal solution can be achieved by selecting items, B and C,
where the total profit is 18 + 18 = 36.
Example-2
Instead of selecting the items based on the overall benefit, in this example the items are
selected based on ratio pi/wi. Let us consider that the capacity of the knapsack is W = 60 and
the items are as shown in the following table.
Item A B C
Weight 10 40 20
Ratio 10 7 6
Using the Greedy approach, first item A is selected. Then, the next item B is chosen. Hence,
the total profit is 100 + 280 = 380. However, the optimal solution of this instance can be
achieved by selecting items, B and C, where the total profit is 280 + 120 = 400.
Hence, it can be concluded that Greedy approach may not give an optimal solution.
To solve 0-1 Knapsack, Dynamic Programming approach is required.
Problem Statement
A thief is robbing a store and can carry a maximal weight of W into his knapsack. There
are n items and weight of ith item is wi and the profit of selecting this item is pi. What items
should the thief take?
Dynamic-Programming Approach
Let i be the highest-numbered item in an optimal solution S for W dollars. Then S' = S -
{i} is an optimal solution for W - wi dollars and the value to the solution S is Vi plus the value
of the sub-problem.
We can express this fact in the following formula: define c[i, w] to be the solution for
items 1,2, … , i and the maximum weight w.
The algorithm takes the following inputs
The maximum weight W
The number of items n
The two sequences v = <v1, v2, …, vn> and w = <w1, w2, …, wn>
Dynamic-0-1-knapsack (v, w, n, W)
for w = 0 to W do
c[0, w] = 0
for i = 1 to n do
c[i, 0] = 0
for w = 1 to W do
if wi ≤ w then
if vi + c[i-1, w-wi] then
c[i, w] = vi + c[i-1, w-wi]
else c[i, w] = c[i-1, w]
else
c[i, w] = c[i-1, w]
The set of items to take can be deduced from the table, starting at c[n, w] and tracing
backwards where the optimal values came from.
If c[i, w] = c[i-1, w], then item i is not part of the solution, and we continue tracing with c[i-
1, w]. Otherwise, item i is part of the solution, and we continue tracing with c[i-1, w-W].
Analysis
This algorithm takes θ(n, w) times as table c has (n + 1).(w + 1) entries, where each entry
requires θ(1) time to compute.
Fractional Knapsack
In this case, items can be broken into smaller pieces, hence the thief can select fractions of
items.
According to the problem statement,
There are n items in the store
Weight of ith item wi>0wi>0
Profit for ith item pi>0pi>0 and
Capacity of the Knapsack is W
In this version of Knapsack problem, items can be broken into smaller pieces. So, the thief
may take only a fraction xi of ith item.
0 xi 10 xi 1
The ith item contributes the weight xi.wixi.wi to the total weight in the knapsack and
profit xi.pixi.pi to the total profit.
Hence, the objective of this algorithm is to
maximize∑n=1n(xi.pi)maximize∑n=1n(xi.pi)
subject to constraint,
∑n=1n(xi.wi) W∑n=1n(xi.wi) W
It is clear that an optimal solution must fill the knapsack exactly, otherwise we could add a
fraction of one of the remaining items and increase the overall profit.
Thus, an optimal solution can be obtained by
∑n=1n(xi.wi)=W∑n=1n(xi.wi)=W
In this context, first we need to sort those items according to the value of piwipiwi, so
that pi+1wi+1pi+1wi+1 ≤ piwipiwi . Here, x is an array to store the fraction of items.
Algorithm: Greedy-Fractional-Knapsack (w[1..n], p[1..n], W)
for i = 1 to n
do x[i] = 0
weight = 0
for i = 1 to n
if weight + w[i] ≤ W then
x[i] = 1
weight = weight + w[i]
else
x[i] = (W - weight) / w[i]
weight = W
break
return x
Analysis
If the provided items are already sorted into a decreasing order of piwipiwi, then the
whileloop takes a time in O(n); Therefore, the total time including the sort is in O(n logn).
Example
Let us consider that the capacity of the knapsack W = 60 and the list of provided items are
shown in the following table −
Item A B C D
Weight 40 10 20 24
Ratio (piwi)(piwi) 7 10 6 5
As the provided items are not sorted based on piwipiwi. After sorting, the items are as
shown in the following table.
Item B A C D
Weight 10 40 20 24
Ratio (piwi)(piwi) 10 7 6 5
Solution
After sorting all the items according to piwipiwi. First all of B is chosen as weight of B is
less than the capacity of the knapsack. Next, item A is chosen, as the available capacity of
the knapsack is greater than the weight of A. Now, C is chosen as the next item. However,
the whole item cannot be chosen as the remaining capacity of the knapsack is less than the
weight of C.
Hence, fraction of C (i.e. (60 − 50)/20) is chosen.
Now, the capacity of the Knapsack is equal to the selected items. Hence, no more item can
be selected.
The total weight of the selected items is 10 + 40 + 20 * (10/20) = 60
And the total profit is 100 + 280 + 120 * (10/20) = 380 + 60 = 440
This is the optimal solution. We cannot gain more profit selecting any different combination
of items.
=minC(S−{j},i)+d(i,j)wherei∈Sandi≠jc(S,j)=minC(s−{j},i)+d(i,j)wherei∈Sandi≠j
Algorithm: Traveling-Salesman-Problem
C ({1}, 1) = 0
for s = 2 to n do
for all subsets S Є {1, 2, 3, … , n} of size s and containing 1
C (S, 1) = ∞
for all j Є S and j ≠ 1
C (S, j) = min {C (S – {j}, i) + d(i, j) for i Є S and i ≠ j}
Return minj C ({1, 2, 3, …, n}, j) + d(j, i)
Analysis
There are at the most 2n.n2n.n sub-problems and each one takes linear time to solve.
Therefore, the total running time is O(2n.n2)O(2n.n2).
Example
In the following example, we will illustrate the steps to solve the travelling salesman
problem.
From the above graph, the following table is prepared.
1 2 3 4
1 0 10 15 20
2 5 0 9 10
3 6 13 0 12
4 8 8 9 0
S=Φ
Cost(2,Φ,1)=d(2,1)=5Cost(2,Φ,1)=d(2,1)=5Cost(2,Φ,1)=d(2,1)=5Cost(2,Φ,1)=d(2,1)=5
Cost(3,Φ,1)=d(3,1)=6Cost(3,Φ,1)=d(3,1)=6Cost(3,Φ,1)=d(3,1)=6Cost(3,Φ,1)=d(3,1)=6
Cost(4,Φ,1)=d(4,1)=8Cost(4,Φ,1)=d(4,1)=8Cost(4,Φ,1)=d(4,1)=8Cost(4,Φ,1)=d(4,1)=8
S=1
Cost(i,s)=min{Cost(j,s–(j))+d[i,j]}Cost(i,s)=min{Cost(j,s)−(j))+d[i,j]}Cost(i,s)=min{Cost(j,s
–(j))+d[i,j]}Cost(i,s)=min{Cost(j,s)−(j))+d[i,j]}
Cost(2,{3},1)=d[2,3]+Cost(3,Φ,1)=9+6=15cost(2,{3},1)=d[2,3]+cost(3,Φ,1)=9+6=15Cost(2,
{3},1)=d[2,3]+Cost(3,Φ,1)=9+6=15cost(2,{3},1)=d[2,3]+cost(3,Φ,1)=9+6=15
Cost(2,{4},1)=d[2,4]+Cost(4,Φ,1)=10+8=18cost(2,{4},1)=d[2,4]+cost(4,Φ,1)=10+8=18Cost(
2,{4},1)=d[2,4]+Cost(4,Φ,1)=10+8=18cost(2,{4},1)=d[2,4]+cost(4,Φ,1)=10+8=18
Cost(3,{2},1)=d[3,2]+Cost(2,Φ,1)=13+5=18cost(3,{2},1)=d[3,2]+cost(2,Φ,1)=13+5=18Cost(
3,{2},1)=d[3,2]+Cost(2,Φ,1)=13+5=18cost(3,{2},1)=d[3,2]+cost(2,Φ,1)=13+5=18
Cost(3,{4},1)=d[3,4]+Cost(4,Φ,1)=12+8=20cost(3,{4},1)=d[3,4]+cost(4,Φ,1)=12+8=20Cost(
3,{4},1)=d[3,4]+Cost(4,Φ,1)=12+8=20cost(3,{4},1)=d[3,4]+cost(4,Φ,1)=12+8=20
Cost(4,{3},1)=d[4,3]+Cost(3,Φ,1)=9+6=15cost(4,{3},1)=d[4,3]+cost(3,Φ,1)=9+6=15Cost(4,
{3},1)=d[4,3]+Cost(3,Φ,1)=9+6=15cost(4,{3},1)=d[4,3]+cost(3,Φ,1)=9+6=15
Cost(4,{2},1)=d[4,2]+Cost(2,Φ,1)=8+5=13cost(4,{2},1)=d[4,2]+cost(2,Φ,1)=8+5=13Cost(4,
{2},1)=d[4,2]+Cost(2,Φ,1)=8+5=13cost(4,{2},1)=d[4,2]+cost(2,Φ,1)=8+5=13
S=2
Cost(2,{3,4},1)= d[2,3]+Cost(3,{4},1)=9+20=29d[2,4]+Cost(4,{3},1)=10+15=25=25Co
st(2,{3,4},1){d[2,3]+cost(3,{4},1)=9+20=29d[2,4]+Cost(4,{3},1)=10+15=25=25Cost(2,{3,4
},1)={d[2,3]+Cost(3,{4},1)=9+20=29d[2,4]+Cost(4,{3},1)=10+15=25=25Cost(2,{3,4},1){d[
2,3]+cost(3,{4},1)=9+20=29d[2,4]+Cost(4,{3},1)=10+15=25=25
Cost(3,{2,4},1)= d[3,2]+Cost(2,{4},1)=13+18=31d[3,4]+Cost(4,{2},1)=12+13=25=25C
ost(3,{2,4},1){d[3,2]+cost(2,{4},1)=13+18=31d[3,4]+Cost(4,{2},1)=12+13=25=25Cost(3,{2
,4},1)={d[3,2]+Cost(2,{4},1)=13+18=31d[3,4]+Cost(4,{2},1)=12+13=25=25Cost(3,{2,4},1)
{d[3,2]+cost(2,{4},1)=13+18=31d[3,4]+Cost(4,{2},1)=12+13=25=25
Cost(4,{2,3},1)= d[4,2]+Cost(2,{3},1)=8+15=23d[4,3]+Cost(3,{2},1)=9+18=27=23Cos
t(4,{2,3},1){d[4,2]+cost(2,{3},1)=8+15=23d[4,3]+Cost(3,{2},1)=9+18=27=23Cost(4,{2,3},
1)={d[4,2]+Cost(2,{3},1)=8+15=23d[4,3]+Cost(3,{2},1)=9+18=27=23Cost(4,{2,3},1){d[4,2
]+cost(2,{3},1)=8+15=23d[4,3]+Cost(3,{2},1)=9+18=27=23
S=3
Cost(1,{2,3,4},1)= d[1,2]+Cost(2,{3,4},1)=10+25=35d[1
,3]+Cost(3,{2,4},1)=15+25=40d[1,4]+Cost(4,{2,3},1)=20+23=43=35cost(1,{2,3,4}),1)d[1,2]
+cost(2,{3,4},1)=10+25=35d[1,3]+cost(3,{2,4},1)=15+25=40d[1,4]+cost(4,{2,3},1)=20+23=
43=35Cost(1,{2,3,4},1)={d[1,2]+Cost(2,{3,4},1)=10+25=35d[1,3]+Cost(3,{2,4},1)=15+25=
40d[1,4]+Cost(4,{2,3},1)=20+23=43=35cost(1,{2,3,4}),1)d[1,2]+cost(2,{3,4},1)=10+25=35
d[1,3]+cost(3,{2,4},1)=15+25=40d[1,4]+cost(4,{2,3},1)=20+23=43=35
Start from cost {1, {2, 3, 4}, 1}, we get the minimum value for d [1, 2]. When s = 3, select
the path from 1 to 2 (cost is 10) then go backwards. When s = 2, we get the minimum value
for d [4, 2]. Select the path from 2 to 4 (cost is 10) then go backwards.
When s = 1, we get the minimum value for d [4, 3]. Selecting path 4 to 3 (cost is 9), then we
shall go to then go to s = Φ step. We get the minimum value for d [3, 1] (cost is 6).
UNIT – III
Graph traversals
Graph traversal means visiting every vertex and edge exactly once in a well-defined order.
While using certain graph algorithms, you must ensure that each vertex of the graph is visited
exactly once. The order in which the vertices are visited are important and may depend upon
the algorithm or question that you are solving.
During a traversal, it is important that you track which vertices have been visited. The most
common way of tracking vertices is to mark them.
Breadth First Search (BFS)
There are many ways to traverse graphs. BFS is the most commonly used approach.
BFS is a traversing algorithm where you should start traversing from a selected node (source
or starting node) and traverse the graph layerwise thus exploring the neighbour nodes (nodes
which are directly connected to source node). You must then move towards the next-level
neighbour nodes.
As the name BFS suggests, you are required to traverse the graph breadthwise as follows:
1. First move horizontally and visit all the nodes of the current layer
2. Move to the next layer
Pseudo Code
mark s as visited.
while ( Q is not empty)
//Removing that vertex from queue,whose neighbour will be visited now
v = Q.dequeue( )
Algorithm
The traversing will start from the source node and push s in the queue. s will be marked as
'visited'.
First iteration
● s will be popped from the queue
● Neighbors of s i.e. 1 and 2 will be traversed
● 1 and 2, which have not been traversed earlier, are traversed. They will be:
○ Pushed in the queue
○ 1 and 2 will be marked as visited
Second iteration
● 1 is popped from the queue
● Neighbors of 1 i.e. s and 3 are traversed
● s is ignored because it is marked as 'visited'
● 3, which has not been traversed earlier, is traversed. It is:
○ Pushed in the queue
○ Marked as visited
○
Third iteration
● 2 is popped from the queue
● Neighbors of 2 i.e. s, 3, and 4 are traversed
● 3 and s are ignored because they are marked as 'visited'
● 4, which has not been traversed earlier, is traversed. It is:
○ Pushed in the queue
○ Marked as visited
Fourth iteration
● 3 is popped from the queue
● Neighbors of 3 i.e. 1, 2, and 5 are traversed
● 1 and 2 are ignored because they are marked as 'visited'
● 5, which has not been traversed earlier, is traversed. It is:
○ Pushed in the queue
○ Marked as visited
Fifth iteration
● 4 will be popped from the queue
● Neighbors of 4 i.e. 2 is traversed
● 2 is ignored because it is already marked as 'visited'
Sixth iteration
● 5 is popped from the queue
● Neighbors of 5 i.e. 3 is traversed
● 3 is ignored because it is already marked as 'visited'
The queue is empty and it comes out of the loop. All the nodes have been traversed by using
BFS.
If all the edges in a graph are of the same weight, then BFS can also be used to find the
minimum distance between the nodes in a graph.
Traversing process
Depth first search (BFS)
The DFS algorithm is a recursive algorithm that uses the idea of backtracking. It involves
exhaustive searches of all the nodes by going ahead, if possible, else by backtracking.
Here, the word backtrack means that when you are moving forward and there are no more
nodes along the current path, you move backwards on the same path to find nodes to traverse.
All the nodes will be visited on the current path till all the unvisited nodes have been
traversed after which the next path will be selected.
This recursive nature of DFS can be implemented using stacks. The basic idea is as follows:
Pick a starting node and push all its adjacent nodes into a stack.
Pop a node from stack to select the next node to visit and push all its adjacent nodes into a
stack.
Repeat this process until the stack is empty. However, ensure that the nodes that are visited
are marked. This will prevent you from visiting the same node more than once. If you do not
mark the nodes that are visited and you visit the same node more than once, you may end up
in an infinite loop.
Pseudocode
DFS-recursive(G, s):
mark s as visited
for all neighbours w of s in Graph G:
if w is not visited:
DFS-recursive(G, w)
A graph is said to be disconnected if it is not connected, i.e. if two nodes exist in the graph
such that there is no edge in between those nodes. In an undirected graph, a connected
component is a set of vertices in a graph that are linked to each other by paths.
Consider the example given in the diagram. Graph G is a disconnected graph and has the
following 3 connected components.
● First connected component is 1 -> 2 -> 3 as they are linked to each other
● Second connected component 4 -> 5
● Third connected component is vertex 6
In DFS, if we start from a start node it will mark all the nodes connected to the start node as
visited. Therefore, if we choose any node in a connected component and run DFS on that
node it will mark the whole connected component as visited.
SHORTEST PATH:
Shortest path problem is a problem of finding the shortest path(s) between vertices of
a given graph.
Shortest path between two vertices is a path that has the least cost as compared to all
other existing paths.
Applications-
The single source shortest path algorithm (for arbitrary weight positive or negative) is also
known Bellman-Ford algorithm is used to find minimum distance from source vertex to any
other vertex. The main difference between this algorithm with Dijkstra’s algorithm is, in
Dijkstra’s algorithm we cannot handle the negative weight, but here we can handle it easily.
Bellman-Ford algorithm finds the distance in bottom up manner. At first it finds those
distances which have only one edge in the path. After that increase the path length to find all
possible solutions.
For the single-source shortest path problem, you can use Dijkstra's algorithm. With
a normal binary heap, this gives you a time complexity of O((E + V) log V). With a
Fibonacci heap, this can be improved to O(E + V log V), which is faster for dense
graphs.
The time complexity of this algorithm is O(V3), here V is the number of vertices in
the graph. Input − The cost matrix of the graph. Output − Matrix of all pair shortest
path.
Even the simplest of graphs can contain many spanning trees. For example, the following
graph:
…has many possibilities for spanning trees, including:
Prim's algorithm to find minimum cost spanning tree (as Kruskal's algorithm) uses the
greedy approach. Prim's algorithm shares a similarity with the shortest path
first algorithms.
Prim's algorithm, in contrast with Kruskal's algorithm, treats the nodes as a single tree and
keeps on adding new nodes to the spanning tree from the given graph.
To contrast with Kruskal's algorithm and to understand Prim's algorithm better, we shall use
the same example −
Remove all loops and parallel edges from the given graph. In case of parallel edges, keep the
one which has the least cost associated and remove all others.
After this step, S-7-A-3-C tree is formed. Now we'll again treat it as a node and will check
all the edges again. However, we will choose only the least cost edge. In this case, C-3-D is
the new edge, which is less than other edges' cost 8, 6, 4, etc.
After adding node D to the spanning tree, we now have two edges going out of it having the
same cost, i.e. D-2-T and D-2-B. Thus, we can add either one. But the next step will again
yield edge 2 as the least cost. Hence, we are showing a spanning tree with both edges
included.
We may find that the output spanning tree of the same graph using two different algorithms
is same.
Kruskal's algorithm to find the minimum cost spanning tree uses the greedy approach. This
algorithm treats the graph as a forest and every node it has as an individual tree. A tree
connects to another only and only if, it has the least cost among all available options and
does not violate MST properties.
To understand Kruskal's algorithm let us consider the following example −
In case of parallel edges, keep the one which has the least cost associated and remove all
others.
The least cost is 2 and edges involved are B,D and D,T. We add them. Adding them does
not violate spanning tree properties, so we continue to our next edge selection.
Next cost is 3, and associated edges are A,C and C,D. We add them again −
Next cost in the table is 4, and we observe that adding it will create a circuit in the graph. −
We ignore it. In the process we shall ignore/avoid all edges that create a circuit.
We observe that edges with cost 5 and 6 also create circuits. We ignore them and move on.
Now we are left with only one node to be added. Between the two least cost edges available
7 and 8, we shall add the edge with cost 7.
By adding edge S,A we have included all the nodes of the graph and we now have minimum
cost spanning tree.
DIFFERENCE BETWEEN PRIM’S AND KRUSKAL’S ALGORITHM:
TOPOLOGICAL SORT:
Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such
that for every directed edge u v, vertex u comes before v in the ordering. Topological
Sorting for a graph is not possible if the graph is not a DAG.
For example, a topological sorting of the following graph is “5 4 2 3 1 0”. There can be
more than one topological sorting for a graph. For example, another topological sorting of
the following graph is “4 5 2 3 1 0”. The first vertex in topological sorting is always a
vertex with in-degree as 0 (a vertex with no incoming edges).
The quantity f (u, v), which can be positive or negative, is known as the net flow from vertex
u to vertex v. In the maximum-flow problem, we are given a flow network G with source s
and sink t, and we wish to find a flow of maximum value from s to t.
The three properties can be described as follows:
1. Capacity Constraint makes sure that the flow through each edge is not greater than
the capacity.
2. Skew Symmetry means that the flow from u to v is the negative of the flow from v to
u.
3. The flow-conservation property says that the total net flow out of a vertex other than
the source or sink is 0. In other words, the amount of flow into a v is the same as the
amount of flow out of v for every vertex v ∈ V - {s, t}
The value of the flow is the net flow from the source,
The positive net flow leaving a vertex is described symmetrically. One interpretation of the
Flow-Conservation Property is that the positive net flow entering a vertex other than the
source or sink must equal the positive net flow leaving the vertex.
A flow f is said to be integer-valued if f (u, v) is an integer for all (u, v) ∈ E. Clearly, the
value of the flow is an integer is an integer-valued flow.
UNIT – IV
Course objective:
After completing this Unit, you will understand about
Classify problems as tractable or intractable
Define decision problems
Define the class P
Define nondeterministic algorithms
Define the class NP
Define polynomial transformations
Define the class of NP-Complete
Intractability:
Dictionary Definition of intractable:
“difficult to treat or work.”
Computer Science: problem is intractable if a computer has difficulty solving it
A problem is intractable if it is not tractable
Any algorithm with a growth rate not bounded by a polynomial
cn , c .01n , n logn , n!, etc.
Property of the problem not the algorithm
Unrealistic definition of the Problem (List all permutations of n numbers)
Towers of Hanoi
Un-Decidable problems: The Halting Problem (proven un-decidable by Alan Turing).
Decidable intractable problems: researchers have shown some problems from automata and
mathematical logic intractable
Tractable
A problem is tractable if there exists a polynomial bound algorithm that solves it.
Worst-case growth rate can be bounded by a polynomial
Function of its input size
P(n) = an n k + . . . + a1 n + a0 where k is a constant
P(n) is θ(n k )
nlgn not a polynomial
nlgn < n 2 bound by a polynomial
Decision problem:
Problem where the output is a simple “yes” or “no”
Theory of NP-completeness is developed by restricting problems to decision problems
Optimization problems can be transformed into decision problems
Optimization problems are at least as hard as the associated decision problem
If polynomial-time algorithm for the optimization problem is found, we would have a
polynomial time algorithm for the corresponding decision problem
Traveling Salesperson - For a given positive number d, is there a tour having length <= d?
0-1 Knapsack - For a given profit P, is it possible to load the knapsack such that total weight
<= W?
Class P:
The set of all decision problems that can be solved by polynomial-time algorithms
Decision versions of searching, shortest path, spanning tree, etc. belong to P
Do problems such as traveling salesperson and 0- 1 Knapsack (no polynomial-time algorithm
has been found), etc., belong to P?