DAA Notes
DAA Notes
A Course Material on
By
Mr. S.ANBARASU
ASSISTANT PROFESSOR
QUALITY CERTIFICATE
being prepared by me and it meets the knowledge requirement of the university curriculum.
Name: S.Anbarasu
Designation: AP
This is to certify that the course material being prepared by Mr. S. Anbarasu is of adequate quality.
He has referred more than five books among them minimum one is from abroad author.
Signature of HD
SEAL
1.1 Introduction
An algorithm is a sequence of unambiguous instruction for solving a problem, for obtaining
a required output for any legitimate input in a finite amount of time.
Definition
“Algorithmic is more than the branch of computer science. It is the core of computer
science, and, in all fairness, can be said to be relevant it most of science, business and
technology”
Understanding of Algorithm
Problem
Algorithm
UNDERSTAND THE
PROBLEM
DECIDE ON:
COMPUTATIONAL
MEANS,
EXACT VS APPROXIMATE
SOLVING
DATA STRUCTURE(S)
ALG DESIGN TECHNIQUES
PROVE CORRECTNESS
♦ There are situations, of course, where the choice of a parameter indicating an input size
does matter.
♦ Example - computing the product of two n-by-n matrices.
♦ There are two natural measures of size for this problem.
♦ The matrix order n.
♦ The total number of elements N in the matrices being multiplied.
♦ Since there is a simple formula relating these two measures, we can easily switch from
one to the other, but the answer about an algorithm's efficiency will be qualitatively
different depending on which of the two measures we use.
♦ The choice of an appropriate size metric can be influenced by operations of the
algorithm in question. For example, how should we measure an input's size for a spell-
checking algorithm? If the algorithm examines individual characters of its input, then
we should measure the size by the number of characters; if it works by processing
words, we should count their number in the input.
♦ We should make a special note about measuring size of inputs for algorithms involving
properties of numbers (e.g., checking whether a given integer n is prime).
♦ For such algorithms, computer scientists prefer measuring size by the number b of bits
in the n's binary representation:
b= log2n +1
♦ This metric usually gives a better idea about efficiency of algorithms in question.
♦ The worst-case efficiency of an algorithm is its efficiency for the worst-case input of
size n, which is an input (or inputs) of size n for which the algorithm runs the longest
among all possible inputs of that size.
♦ In the worst case, when there are no matching elements or the first matching element
happens to be the last one on the list, the algorithm makes the largest number of key
comparisons among all possible inputs of size n:
Cworst (n) = n.
♦ The way to determine is quite straightforward
♦ To analyze the algorithm to see what kind of inputs yield the largest value of the basic
operation's count C(n) among all possible inputs of size n and then compute this worst-
case value C worst (n)
♦ The worst-case analysis provides very important information about an algorithm's
efficiency by bounding its running time from above. In other words, it guarantees that
for any instance of size n, the running time will not exceed C worst (n) its running time
on the worst-case inputs.
♦ The best-case efficiency of an algorithm is its efficiency for the best-case input of size
n, which is an input (or inputs) of size n for which the algorithm runs the fastest among
all possible inputs of that size.
♦ We can analyze the best case efficiency as follows.
♦ First, determine the kind of inputs for which the count C (n) will be the smallest among
all possible inputs of size n. (Note that the best case does not mean the smallest input; it
means the input of size n for which the algorithm runs the fastest.)
♦ Then ascertain the value of C (n) on these most convenient inputs.
♦ Example- for sequential search, best-case inputs will be lists of size n with their first
elements equal to a search key; accordingly, Cbest(n) = 1.
♦ The analysis of the best-case efficiency is not nearly as important as that of the worst-
case efficiency.
♦ But it is not completely useless. For example, there is a sorting algorithm (insertion
sort) for which the best-case inputs are already sorted arrays on which the algorithm
works very fast.
♦ Thus, such an algorithm might well be the method of choice for applications dealing
with almost sorted arrays. And, of course, if the best-case efficiency of an algorithm is
unsatisfactory, we can immediately discard it without further analysis.
Average case efficiency
p p p p
Cavg(n) = [ 1 . — + 2. — + …. + i . — + ….. + n . —] + n .(1 - p)
n n n n
= p
— [1 + 2 + 3 +…. + i +…. + n] + n (1 - p)
= p n ( n +1)
— + n ( 1 - p)
n 2
p ( n + 1)
= + n ( 1 - p)
2
♦ Example, if p = 1 (i.e., the search must be successful), the average number of key
comparisons made by sequential search is (n + 1) /2.
♦ If p = 0 (i.e., the search must be unsuccessful), the average number of key comparisons
will be n because the algorithm will inspect all n elements on all such inputs.
Step count is to compare time complexity of two programs that compute same function and
also to predict the growth in run time as instance characteristics changes. Determining
exact step count is difficult and not necessary also. Because the values are not exact
quantities. We need only comparative statements like c1n2 ≤ tp(n) ≤ c2n2.
For example, consider two programs with complexities c1n2 + c2n and c3n respectively. For
small values of n, complexity depend upon values of c1, c2 and c3. But there will also be an
n beyond which complexity of c3n is better than that of c1n2 + c2n.This value of n is called
break-even point. If this point is zero, c3n is always faster (or at least as fast). Common
asymptotic functions are given below.
Function Name
1 Constant
log n Logarithmic
n Linear
n log n n log n
n2 Quadratic
n3 Cubic
SCE 10 DEPARTMENT OF CSE
CS 6402 DESIGN AND ANALYSIS OF ALGORITHM
2n Exponential
n! Factorial
Big‘Oh’Notation(O)
O(g(n)) = { f(n) : there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥
n0 }
It is the upper bound of any function. Hence it denotes the worse case complexity of any
algorithm. We can represent it graphically as
Fig 1.1
Linear Functions
Example 1.6
f(n) = 3n + 2
When n ≥ 2, 3n + 2 ≤ 3n + n = 4n
Hence f(n) = O(n), here c = 4 and n0 = 2
When n ≥ 1, 3n + 2 ≤ 3n + 2n = 5n
Hence f(n) = O(n), here c = 5 and n0 = 1
Hence we can have different c,n0 pairs satisfying for a given function.
Example
f(n) = 3n + 3
When n ≥ 3, 3n + 3 ≤ 3n + n = 4n
Hence f(n) = O(n), here c = 4 and n0 = 3
Example
f(n) = 100n + 6
When n ≥ 6, 100n + 6 ≤ 100n + n = 101n
Hence f(n) = O(n), here c = 101 and n0 = 6
Quadratic Functions
Example 1.9
f(n) = 10n2 + 4n + 2
When n ≥ 2, 10n2 + 4n + 2 ≤ 10n2 + 5n
When n ≥ 5, 5n ≤ n2, 10n2 + 4n + 2 ≤ 10n2 + n2 = 11n2
Hence f(n) = O(n2), here c = 11 and n0 = 5
Example 1.10
f(n) = 1000n2 + 100n - 6
f(n) ≤ 1000n2 + 100n for all values of n.
When n ≥ 100, 5n ≤ n2, f(n) ≤ 1000n2 + n2 = 1001n2
Hence f(n) = O(n2), here c = 1001 and n0 = 100
Exponential Functions
Example 1.11
f(n) = 6*2n + n2
When n ≥ 4, n2 ≤ 2n
So f(n) ≤ 6*2n + 2n = 7*2n
Hence f(n) = O(2n), here c = 7 and n0 = 4
Constant Functions
Example 1.12
f(n) = 10
f(n) = O(1), because f(n) ≤ 10*1
OmegaNotation(Ω)
Ω (g(n)) = { f(n) : there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n
≥ n0 }
It is the lower bound of any function. Hence it denotes the best case complexity of any
algorithm. We can represent it graphically as
Fig 1.2
Example 1.13
f(n) = 3n + 2
3n + 2 > 3n for all n.
Hence f(n) = Ω(n)
Similarly we can solve all the examples specified under Big ‗Oh‘.
ThetaNotation(Θ)
Θ(g(n)) = {f(n) : there exist positive constants c1,c2 and n0 such that c1g(n) ≤f(n) ≤c2g(n) for
all n ≥ n0 }
If f(n) = Θ(g(n)), all values of n right to n0 f(n) lies on or above c1g(n) and on or below
c2g(n). Hence it is asymptotic tight bound for f(n).
Fig 1.3
Little-O Notation
For non-negative functions, f(n) and g(n), f(n) is little o of g(n) if and only if f(n) = O(g(n)),
but f(n) ≠ Θ(g(n)). This is denoted as "f(n) = o(g(n))".
This represents a loose bounding version of Big O. g(n) bounds from the top, but it does
not bound the bottom.
Much like Little Oh, this is the equivalent for Big Omega. g(n) is a loose lower boundary of
the function f(n); it bounds from the bottom, but not from the top.
Regardless of how big or small the array is, every time we run find-min, we have to
initialize the i and j integer variables and return j at the end. Therefore, we can just think of
those parts of the function as constant and ignore them.
So, how can we use asymptotic notation to discuss the find-min function? If we search
through an array with 87 elements, then the for loop iterates 87 times, even if the very first
element we hit turns out to be the minimum. Likewise, for n elements, the for loop iterates
n times. Therefore we say the function runs in time O(n).
let maxim := j
What's the running time for find-min-plus-max? There are two for loops, that each
iterate n times, so the running time is clearly O(2n). Because 2 is a constant, we throw it
away and write the running time as O(n). Why can you do this? If you recall the definition
of Big-O notation, the function whose bound you're testing can be multiplied by some
constant. If f(x)=2x, we can see that if g(x) = x, then the Big-O condition holds. Thus O(2n)
= O(n). This rule is general for the various asymptotic notations.
Recurrence
When an algorithm contains a recursive call to itself, its running time can often be
described by a recurrence
Recurrence Equation
A recurrence relation is an equation that recursively defines a sequence. Each term of the
sequence is defined as a function of the preceding terms. A difference equation is a specific
type of recurrence relation.
It can be solved by methods described below yielding the closed form expression which
2
involve powers of the two roots of the characteristic polynomial t = t + 1; the generating
2
function of the sequence is the rational function t / (1 − t − t ).
which is similar to recurrences (4.2) and (4.3). We guess that the solution is T (n) = O(n lg
n).Our method is to prove that T (n) ≤ cn lg n for an appropriate choice of the constant c >
0. We start by assuming that this bound holds for ⌊n/2⌋, that is, that T (⌊n/2⌋) ≤ c ⌊n/2⌋
lg(⌊n/2⌋).
Mathematical induction now requires us to show that our solution holds for the boundary
conditions. Typically, we do so by showing that the boundary conditions are suitable as
base cases for the inductive proof. For the recurrence (4.4), we must show that we can
choose the constant c large enough so that the bound T(n) = cn lg n works for the boundary
conditions as well. This requirement can sometimes lead to problems. Let us assume, for
the sake of argument, that T (1) = 1 is the sole boundary condition of the recurrence. Then
for n = 1, the bound T (n) = cn lg n yields T (1) = c1 lg 1 = 0, which is at odds with T (1) =
1. Consequently, the base case of our inductive proof fails to hold.
This difficulty in proving an inductive hypothesis for a specific boundary condition can be
easily overcome. For example, in the recurrence (4.4), we take advantage of asymptotic
notation only requiring us to prove T (n) = cn lg n for n ≥ n0, where n0 is a constant of our
choosing. The idea is to remove the difficult boundary condition T (1) = 1 from
consideration
Observe that for n > 3, the recurrence does not depend directly on T
(1). Thus, we can replace T (1) by T (2) and T (3) as the base cases in the inductive proof,
letting n0 = 2. Note that we make a distinction between the base case of the recurrence (n =
1) and the base cases of the inductive proof (n = 2 and n = 3). We derive from the
recurrence that T (2) = 4 and T (3) = 5. The inductive proof that T (n) ≤ cn lg n for some
constant c ≥ 1 can now be completed by choosing c large enough so that T (2) ≤ c2 lg 2 and
T (3) ≤ c3 lg 3. As it turns out, any choice of c ≥ 2 suffices for the base cases of n = 2 and n
= 3 to hold. For most of the recurrences we shall examine, it is straightforward to extend
boundary conditions to make the inductive assumption work for small n.
recurrence and express it as a summation of terms dependent only on n and the initial
conditions. Techniques for evaluating summations can then be used to provide bounds on
the solution.
T(n) = 3T(n/4) + n.
We iterate it as follows:
T(n) = n + 3T(n/4)
= n + 3 (n/4 + 3T(n/16))
How far must we iterate the recurrence before we reach a boundary condition? The ith term
in the series is 3i n/4i . The iteration hits n = 1 when n/4i = 1 or, equivalently, when i
exceeds log4 n. By continuing the iteration until this point and using the bound n/4i n/4i,
we discover that the summation contains a decreasing geometric series:
where a ≥ 1 and b > 1 are constants and f (n) is an asymptotically positive function.
The master method requires memorization of three cases, but then the solution of
many recurrences can be determined quite easily, often without pencil and paper.
The recurrence (4.5) describes the running time of an algorithm that divides a
problem of size
n into a subproblems, each of size n/b, where a and b are positive constants. The a
subproblems are solved recursively, each in time T (n/b). The cost of dividing the
problem and combining the results of the subproblems is described by the function
f (n). (That is, using the notation from Section 2.3.2, f(n) = D(n)+C(n).) For example,
the recurrence arising from the MERGE-SORT procedure has a = 2, b = 2, and f (n)
=
Θ (n).
Linear Search, as the name implies is a searching algorithm which obtains its result by
traversing a list of data items in a linear fashion. It will start at the beginning of a list, and
mosey on through until the desired element is found, or in some cases is not found. The
aspect of Linear Search which makes it inefficient in this respect is that if the element is not
in the list it will have to go through the entire list. As you can imagine this can be quite
cumbersome for lists of very large magnitude, keep this in mind as you contemplate how
and where to implement this algorithm. Of course conversely the best case for this would
be that the element one is searching for is the first element of the list, this will be elaborated
more so in the ―Analysis & Conclusion‖ section of this tutorial.
Step 1 - Does the item match the value I’m looking for?
SCE 21 DEPARTMENT OF CSE
CS 6402 DESIGN AND ANALYSIS OF ALGORITHM
As always, visual representations are a bit more clear and concise so let me present one for
you now. Imagine you have a random assortment of integers for this list:
Legend:
-The key is blue
-The current item is green.
-Checked items are red
Ok so here is our number set, my lucky number happens to be 7 so let‘s put this value as
the key, or the value in which we hope Linear Search can find. Notice the indexes of the
array above each of the elements, meaning this has a size or length of 5. I digress let us
look at the first term at position 0. The value held here 3, which is not equal to 7. We move
on.
--0 1 2 3 4 5
[325170]
So we hit position 0, on to position 1. The value 2 is held here. Hmm still not equal to 7.
We march on.
--0 1 2 3 4 5
[325170]
Position 2 is next on the list, and sadly holds a 5, still not the number we‘re looking for.
Again we move up one.
--0 1 2 3 4 5
[325170]
Now at index 3 we have value 1. Nice try but no cigar let‘s move forward yet again.
--0 1 2 3 4 5
[325170]
Ah Ha! Position 4 is the one that has been harboring 7, we return the position in the array
which holds 7 and exit.
--0 1 2 3 4 5
[325170]
As you can tell, the algorithm may work find for sets of small data but for incredibly large
data sets I don‘t think I have to convince you any further that this would just be down right
inefficient to use for exceeding large sets. Again keep in mind that Linear Search has its
place and it is not meant to be perfect but to mold to your situation that requires a search.
SCE 22 DEPARTMENT OF CSE
CS 6402 DESIGN AND ANALYSIS OF ALGORITHM
Also note that if we were looking for lets say 4 in our list above (4 is not in the set) we
would traverse through the entire list and exit empty handed. I intend to do a tutorial on
Binary Search which will give a much better solution to what we have here however it
requires a special case.
//linearSearch Function
int linearSearch(int data[], int length, int val) {
As we have seen throughout this tutorial that Linear Search is certainly not the absolute
best method for searching but do not let this taint your view on the algorithm itself. People
are always attempting to better versions of current algorithms in an effort to make existing
ones more efficient. Not to mention that Linear Search as shown has its place and at the
very least is a great beginner‘s introduction into the world of searching algorithms. With
this is mind we progress to the asymptotic analysis of the Linear Search:
Worst Case:
The worse case for Linear Search is achieved if the element to be found is not in the list at
all. This would entail the algorithm to traverse the entire list and return nothing. Thus the
worst case running time is:
O(N).
Average Case:
The average case is in short revealed by insinuating that the average element would be
somewhere in the middle of the list or N/2. This does not change since we are dividing by a
constant factor here, so again the average case would be:
O(N).
Best Case:
The best case can be a reached if the element to be found is the first one in the list. This
would not have to do any traversing spare the first one giving this a constant time
complexity or:
O(1).
IMPORTANT QUESTIONS
PART-A
PART-B
Divide and conquer – General method – Binary search – Finding maximum and minimum
– Merge sort – Greedy algorithms – General method – Container loading – Knapsack
problem
Divide and Conquer is one of the best-known general algorithm design technique. Divide-
and-conquer algorithms work according to the following general plan:
1. A problem's instance is divided into several smaller instances of the same problem,
ideally of about the same size.
2. The smaller instances are solved (typically recursively, though sometimes a different
algorithm is employed when instances become small enough).
3. If necessary, the solutions obtained for the smaller instances are combined
to get a solution to the original problem.
Problem of size n
Examples of divide and conquer method are Binary search, Quick sort,
♦ More generally, an instance of size n can be divided into several instances of size nib,
with a of them needing to be solved. (Here, a and b are constants; a≥1 and b > 1.).
♦ Assuming that size n is a power of b, to simplify our analysis, we get the following
recurrence for the running time T(n).
T(n)=aT(n/b)+f(n)
♦ This is called the general divide and-conquer recurrence. Where f(n) is a function that
accounts for the time spent on dividing the problem into smaller ones and on combining
their solutions. (For the summation example, a = b = 2 and f (n) = 1.
♦ Obviously, the order of growth of its solution T(n) depends on the values of the constants
a and b and the order of growth of the function f (n). The efficiency analysis of many
divide-and-conquer algorithms is greatly simplified by the following theorem.
ADVANTAGES:
♦ The time spent on executing the problem using divide and conquer is smaller then other
methods.
♦ The divide and conquer approach provides an efficient algorithm in computer science.
♦ The divide and conquer technique is ideally suited for parallel computation in which each
sub problem can be solved simultaneously by its own processor.
♦ Merge sort is a perfect example of a successful application of the divide-and conquer
technique.
♦ It sorts a given array Al O .. n - 1) by dividing it into two halves A [O··└ n/2┘ - 1] and A[└
n/2┘..n - 1], sorting each of them recursively, and then merging the two smaller sorted
arrays into a single sorted one.
STEPS TO BE FOLLOWED
♦ The first step of the merge sort is to chop the list into two.
♦ If the list has even length, split the list into two equal sub lists.
♦ If the list has odd length, divide the list into two by making the first sub list one entry
greater than the second sub list.
♦ then split both the sub list into two and go on until each of the sub lists are of size one.
♦ finally, start merging the individual sub lists to obtain a sorted list.
♦ The operation of the algorithm for the array of elements 8,3,2,9,7,1,5,4 is explained as
follows,
♦ The binary search algorithm is one of the most efficient searching techniques which
require the list to be sorted in ascending order.
♦ To search for an element in the list, the binary search algorithms split the list and locate
the middle element of the list.
EXAMPLE:
The list of element are 3,14,27,31,39,42,55,70,74,81,85,93,98 and searching for k=70 in
the list.
index 0 1 2 3 4 5 6 7 8 9 10 11 12
value l m r
m – middle element
m = n div 2
=13 div 2
m=6
If k>a[m], then ,l =7. So, the search element is present in second half .
70 74 81 85 93 98
l r
m = (l + r ) div 2 = 19 div 2
m=9
7 8 9 10 11 12
70 74 81 85 93 98
l m r
Here k<a[m]
70 < 81
So, the element is present in the first half
70 74
l r
70 74
l,m r
Now k = a[m]
70 =70
SCE 29 DEPARTMENT OF CSE
Hence, the search key element 70 is found in the position 7 and the search operation is
completed.
ADVANTAGES
1. In this method elements are eliminated by half each time .So it is very faster than the
sequential search.
2. It requires less number of comparisons than sequential search to locate the search key
element.
DISADVANTAGES
1. An insertion and deletion of a record requires many records in the existing table be
physically moved in order to maintain the records in sequential order.
2. The ratio between insertion/deletion and search time is very high.
Figure: Graph and its spanning trees; T1 is the Minimum Spanning Tree
The nature of Prim‘s algorithm makes it necessary to provide each vertex not in the
current tree with the information about the shortest edge connecting the vertex to a tree
vertex. We can provide such information by attaching two labels to a vertex: the name of the
nearest tree vertex and the length (the weight) of the corresponding edge. Vertices that are
not adjacent to any of the tree vertices can be given the label indicating their ―infinite‖
distance to the tree vertices a null label for the name of the nearest tree vertex. With such
labels, finding the next vertex to be added to the current tree T = (VT, ET) become simple
task of finding a vertex with the smallest distance label in the set V - VT. Ties can be
broken arbitrarily.
After we have identified a vertex u* to be added to the tree, we need to perform two
operations:
• Move u* from the set V—VT to the set of tree vertices VT.
• For each remaining vertex U in V—VT - that is connected to u* by a shorter edge than
the u‘s current distance label, update its labels by u* and the weight of the edge
between u* and u, respectively.
The following figure demonstrates the application of Prim‘s algorithm to a specific graph.
Then, the original surface is deleted from the list of available surfaces and the newly
generated sub-surfaces are inserted into the list. Then, the algorithm selects the new lowest
usable surface and repeats the above procedures until no surface is available or all the
boxes have been packed into the container. The algorithm follows a similar basic
framework.
The pseudo-code of the greedy algorithm is given by the greedy heuristic procedure.
Given a layer of boxes of the same type arranged by the G4-heuristic, the layer is always
packed in the bottom-left corner of the loading surface.
As illustrated in above Figure, up to three sub-surfaces are to be created from the original
loading surface by the procedure divide surfaces, including the top surface, which is above
the layer just packed, and the possible spaces that might be left at the sides.
If l = L or w = W, the original surface is simply divided into one or two sub-surfaces, the
top surface and a possible side surface. Otherwise, two possible division variants exist, i.e.,
to divide into the top surface, the surface (B,C,E,F) and the surface (F,G,H, I), or to divide
into the top surface, the surface (B,C,D, I) and the surface (D,E,G,H).
The divisions are done according to the following criteria, which are similar to those in [2]
and [5]. The primary criterion is to minimize the total unusable area of the division variant.
If none of the remaining boxes can be packed onto a sub-surface, the area of the sub-
surface is unusable. The secondary criterion is to avoid the creation of long narrow strips.
―The underlying rationale is that narrow areas might be difficult to fill subsequently‖. More
specifically, if L−l ≥ W −w, the loading surface is divided into the top surface, the surface
(B,C,E,F) and the surface (F,G,H, I). Otherwise, it is divided into the top surface, the
surface (B,C,D, I) and the surface (D,E,G,H).
The problem often arises in resource allocation with financial constraints. A similar
problem also appears in combinatorics, complexity theory, cryptography and applied
mathematics.
The decision problem form of the knapsack problem is the question "can a value of at least
V be achieved without exceeding the weight W?"
E.g.
2 pd 2 pd 3 pd
2 pounds of item A
Solution = + 2 pounds of item C
2 pds 2 pds
A C
$100 $80
IMPORTANT QUESTIONS
PART-A
PART-B
C(n, 0) = C(n, n) = 1.
Given a weighted connected graph (undirected or directed), the all-pair shortest paths
problem asks to find the distances (the lengths of the shortest paths) from each vertex to all
other vertices. It is convenient to record the lengths of shortest paths in an n-by-n matrix D
called the distance matrix: the element dij in the ith row and the jth column of this matrix
indicates the length of the shortest path from the ith vertex to the jth vertex (1≤ i,j ≤ n). We
can generate the distance matrix with an algorithm called Floyd’s algorithm. It is
applicable to both undirected and directed weighted graphs provided that they do not
contain a cycle of a negative length.
Fig: (a) Digraph. (b) Its weight matrix. (c) Its distance matrix.
Floyd‘s algorithm computes the distance matrix of a weighted graph with vertices through
a series of n-by-n matrices:
Each of these matrices contains the lengths of shortest paths with certain constraints on the
paths considered for the matrix. Specifically, the element in the ith row and the jth
column of matrix D (k=0,1,. . . ,n) is equal to the length of the shortest path among all paths
from the ith vertex to the jth vertex with each intermediate vertex, if any, numbered not
higher than k. In particular, the series starts with D(0), which does not allow any
intermediate vertices in its paths; hence, D(0) is nothing but the weight matrix of the graph.
The last matrix in the series, D(n), contains the lengths of the shortest paths among all paths
that can use all n vertices as intermediate and hence is nothing but the distance matrix
being sought.
In an optimal binary search tree, the average number of comparisons in a search is the
smallest possible.
Fig: Two out of 14 possible binary search trees with keys A, B, C. and D
As an example, consider four keys A, B, C, and D to be searched for with probabilities 0.1,
0.2, 0.4, and 0.3, respectively. The above figure depicts two out of 14 possible binary
search trees containing these keys. The average number of comparisons in a successful
search in the first of this trees is 0.1·1 + 0.2·2 + 0.4·3 + 0.3·4 =2.9 while for the second one
it is 0.1·2+0.2·1 +0.4·2+0.3·3 = 2.1. Neither of these two trees is, in fact, optimal.
For this example, we could find the optimal tree by generating all 14 binary search trees
with these keys. As a general algorithm, this exhaustive search approach is unrealistic: the
total number of binary search trees with n keys is equal to the nth Catalan number
which grows to infinity as fast as 4n/n1.5.
If we count tree levels starting with 1 (to make the comparison numbers equal
the keys levels), the following recurrence relation is obtained:
Thus, out of two possible binary trees containing the first two keys, A and B, the root of the
SCE 45 DEPARTMENT OF CSE
optimal tree has index 2 (i.e., it contains B), and the average number of comparisons in a
successful search in this tree is 0.4.
Thus, the average number of key comparisons in the optimal tree is equal to 1.7. Since R[1,
4] = 3, the root of the optimal three contains the third key, i.e., C. Its left subtree is made up
of keys A and B, and its right subtree contains just key D.
To find the specific structure of these subtrees, we find first their roots by consulting the
root table again as follows. Since R[1, 2] = 2, the root of the optima] tree containing A and
B is B, with A being its left child (and the root of the one-node tree: R[1, 1] = 1). Since
R[4, 4] = 4, the root of this one-node optimal tree is its only key V. The following figure
presents the optimal tree in its entirety.
We can express this fact in the following formula: define c[i, w] to be the solution for
items 1,2, . . . , i and maximum weight w. Then
0 if i = 0 or w = 0
c[i,w] c[i-1, w] if wi ≥ 0
max [vi + c[i-1, w-wi], c[i-1, if i>0 and w ≥
=
w]} wi
This says that the value of the solution to i items either include ith item, in which case it is vi
plus a subproblem solution for (i - 1) items and the weight excluding wi, or does not include
ith item, in which case it is a subproblem's solution for (i - 1) items and the same weight.
That is, if the thief picks item i, thief takes vi value, and thief can choose from items w - wi,
and get c[i - 1, w - wi] additional value. On other hand, if thief decides not to take item i,
thief can choose from item 1,2, . . . , i- 1 upto the weight limit w, and get c[i - 1, w] value.
The better of these two choices should be made.
Although the 0-1 knapsack problem, the above formula for c is similar to LCS formula:
boundary values are 0, and other values are computed from the input and "earlier" values of
c. So the 0-1 knapsack algorithm is like the LCS-length algorithm given in CLR for finding
a longest common subsequence of two sequences.
The algorithm takes as input the maximum weight W, the number of items n, and the two
sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . , wn>. It stores the c[i, j] values in the
table, that is, a two dimensional array, c[0 . . n, 0 . . w] whose entries are computed in a
row-major order. That is, the first row of c is filled in from left to right, then the second
row, and so on. At the end of the computation, c[n, w] contains the maximum value that
can be picked into the knapsack.
The set of items to take can be deduced from the table, starting at c[n. w] and tracing
backwards where the optimal values came from. If c[i, w] = c[i-1, w] item i is not part of
the solution, and we are continue tracing with c[i-1, w]. Otherwise item i is part of the
solution, and we continue tracing with c[i-1, w-W].
Analysis
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as follows: θ(nw) times
to fill the c-table, which has (n +1).(w +1) entries, each requiring θ(1) time to compute.
O(n) time to trace the solution, because the tracing process starts in row n of the table and
moves up 1 row at each step.
One very simple lower bound can be obtained by finding the smallest element in the
intercity distance matrix D and multiplying it by the number of cities it. But there is a less
obvious and more informative lower bound, which does not require a lot of work to
compute.
It is not difficult to show that we can compute a lower bound on the length I of any tour as
follows.
For each city i, 1 ≤ i ≤ n, find the sum si of the distances from city i to the two nearest
cities; compute the sums of these n numbers; divide the result by 2, and, it all the distances
are integers, round up the result to the nearest integer:
lb= S/2
We now apply the branch and bound algorithm, with the bounding function given by
formula, to find the shortest Hamiltonian circuit for the graph of the above figure (a). To
reduce the amount of potential work, we take advantage of two observations.
First, without loss of generality, we can consider only tours that start at a.
Second, because our graph is undirected, we can generate only tours in which b is visited
before c. In addition, after visiting n-1 = 4 cities, a tour has no choice but to visit the
remaining unvisited city and return to the starting one. The state-space tree tracing the
algorithm‘s application is given in the above figure (b).
The comments we made at the end of the preceding section about the strengths and
weaknesses of backtracking are applicable to branch-and-bound as well. To reiterate the
main point: these state-space tree techniques enable us to solve many large instances of
difficult combinatorial problems.
The information might be obtainable—say, by exploiting specifics of the data or even, for
some problems, generated randomly—before we start developing a state-space tree. Then
we can use such a solution immediately as the best one seen so far rather than waiting for
the branch-and-bound processing to lead us to the first feasible solution.
Though the best-first rule we used above is a sensible approach, it mayor may not lead to a
solution faster than other strategies.
IMPORTANT QUESTIONS
PART A
PART B
1Solve the all pair shortest path problem for the diagraph with the
weighted matrix given below:-
abcd
a0∞3∞
b20∞∞
c∞701
d6∞∞0
4.1. BACKTRACKING
Leaves represent either nonpromising dead ends or complete solutions found by the
algorithm.
The problem is to place it queens on an n-by-n chessboard so that no two queens attack
each other by being in the same row or in the same column or on the same diagonal. For n
= 1, the problem has a trivial solution, and it is easy to see that there is no solution for n = 2
Steps to be followed
We start with the empty board and then place queen 1 in the first possible position of its
row, which is in column 1 of row 1.
Then we place queen 2, after trying unsuccessfully columns 1 and 2, in the first
acceptable position for it, which is square (2,3), the square in row 2 and column 3. This
proves to be a dead end because there i no acceptable position for queen 3. So, the
algorithm backtracks and puts queen 2 in the next possible position at (2,4).
Then queen 3 is placed at (3,2), which proves to be another dead end.
The algorithm then backtracks all the way to queen 1 and moves it to (1,2). Queen 2
then goes to (2,4), queen 3 to (3,1), and queen 4 to (4,3), which is a solution to the
problem.
Thus, a path from the root to a node on the ith level of the tree indicates which of the
first i numbers have been included in the subsets represented by that node.
We record the value of s‘ the sum of these numbers, in the node, Ifs is equal to d. we
have a solution to the problem.
We can either, report this result and stop or, if all the solutions need to he found,
continue by backtracking to the node‘s parent.
If s‘ is not equal to d, we can terminate the node as nonpromising if either of the two
inequalities holds:
(applied to the instance S = (3, 5, 6, 7) and d = 15 of the subset-sum problem. The number
inside a node is the sum of the elements already included in subsets represented by the
node. The inequality below a leaf indicates the reason for its termination)
The problem of coloring graphs has been studied for many decades, and the theory of
algorithms tells us a lot about this problem. Unfortunately, coloring an arbitrary graph with
as few colors as possible is one of a large class of problems called "NP-complete
problems," for which all known solutions are essentially of the type "try all possibilities."
The set V consists of a vertex for each variable, a vertex for the negation of
each variable, 5 vertices for each clause, and 3 special vertices: TRUE,
FALSE, and RED. The edges of the graph are of two types: "literal" edges
that are independent of the clauses and "clause" edges that depend on the
clauses. The literal edges form a triangle on the special vertices and also
form a triangle on x, ¬x, and RED for i = 1, 2,..., n.
The widget is used to enforce the condition corresponding to a clause (x y z). Each clause
requires a unique copy of the 5 vertices that are heavily shaded; they connect as shown to
the literals of the clause and the special vertex TRUE.
The first component of our future solution, if it exists, is a first intermediate vertex of a
Hamiltonian cycle to be constructed. Using the alphabet order to break the three-way tie
among the vertices adjacent to a, we select vertex b. From b, the algorithm proceeds to c,
then to d, then to e, and finally to f, which proves to be a dead end. So the algorithm
backtracks from f to e, then to d. and then to c, which provides the first alternative for the
algorithm to pursue.
Going from c to e eventually proves useless, and the algorithm has to backtrack from e to c
and then to b. From there, it goes to the vertices f, e, c, and d, from which it can
legitimately return to a, yielding the Hamiltonian circuit a, b, f, e, c, d, a. If we wanted to
find another Hamiltonian circuit, we could continue this process by backtracking from the
leaf of the solution found.
Figure 11.3: (a) Graph. (b) State-space tree for finding a Hamiltonian circuit. The
numbers above the nodes of the tree indicate the order in which the nodes are
generated.
It derives its name from the problem faced by someone who is constrained by a fixed-size
knapsack and must fill it with the most useful items.
The problem often arises in resource allocation with financial constraints. A similar
problem also appears in combinatorics, complexity theory, cryptography and applied
mathematics.
The decision problem form of the knapsack problem is the question "can a value of at least
V be achieved without exceeding the weight W?"
A B C
$10 $120
$100
2 pd 2 pd
3 pd
2 pds 2 pds
A C
$100 $80
3 pds
C
$120
150
C
240
1 A 3 2 5
pd pd pd 140 pd
200
cost
weight
200 80 70 30
1. Define Backtracking.
2. What are the applications of backtracking?
3. What are the algorithm design techniques?
4. Define n-queens problem.
5. Define Hamiltonian Circuit problem.
6. Define sum of subset problem.
7. What is state space tree?
8. Define Branch & Bound method.
9. Define assignment problem.
10. What is promising & non-promising node?
11. Define Knapsack‘s problem.
12. Define Travelling salesman problem.
13. State principle of backtracking.
14. Compare Backtracking & Branch and Bound techniques with an example.
15. What are the applications of branch & bound?(or) What are the examples
of branch & bound?
16.In Backtracking method,how the problem can be categorized?
17.How should be determine the solution in backtracking algorithm?
18.Obtain all possible solutions to 4-Queen‘s problem.
19.Generate atleast 3-solutions for 5-Queen‘s problem.
20.Draw a pruned state space tree for a given sum of subset problem:
S={3,4,5,6} and d=13
PART-B:
1. Explain the n-Queen‘s problem & discuss the possible solutions. (16)
2. Solve the following instance of the knapsack problem by the branch &
bound algorithm. (16)
3. Discuss the solution for Travelling salesman problem using branch &
bound technique. (16)
4. Apply backtracking technique to solve the following instance of subset
sum problem : S={1,3,4,5} and d=11 (16)
5. Explain subset sum problem & discuss the possible solution strategies using
backtracking. (16)
♦ The queue is initialized with the traversal's starting vertex, which is marked as visited. ♦
On each iteration, the algorithm identifies all unvisited vertices that are adjacent to the front
vertex, marks them as visited, and adds them to the queue; after that, the front vertex is
removed from the queue.
ALGORITHM BFS(G)
//Implements a breadth-first search traversal of a given graph
//Input: Graph G = (V, E)
//Output: Graph G with its vertices marked with consecutive integers in the order they
have been visited by the BFS traversal mark each vertex in V with 0 as a mark of being
5.1.2. EFFICIENCY
♦ Breadth-first search has the same efficiency as depth-first search:
♦ BFS can be used to check the connectivity and acyclicity of a graph as same as DFS.
for the adjacency linked list representation.
a) Graph. (b) Part of its BFS tree that identifies the minimum-edge path from a to g.
♦ For example, path a-b-e-g in the graph has the fewest number of edges among all
the paths between
Algorithm Dfs(G)
//Implements a depth-first search traversal of a given graph
//Input: Graph G = (V, E)
//0utput: Graph G with its vertices marked with consecutive integers in the
order they've been first encountered by the DFS traversal mark each vertex in V
with 0 as a mark of being "unvisited"
count 0
for each vertex v in V do
if v is marked with 0
dfs (v)
dfs(v)
//visits recursively all the unvisited vertices connected to vertex v and assigns
them the numbers in the order they are encountered via global variable count
count count + 1; mark v with count
for each vertex w in V adjacent to v do
if w is marked with 0
dfs(w)
♦ A DFS traversal itself and the forest-like representation of a graph it provides have
proved to be extremely helpful for the development of efficient algorithms for checking
many important properties of graphs.
These orders are qualitatively different, and various applications can take advantage of
either of them.
Articulation point
♦ A vertex of a connected graph is said to be its articulation point if its removal with all
edges incident to it breaks the graph into disjoint pieces.
The above Figure is a connected graph. It has only one connected component, namely
itself. Following figure is a graph with two connected components.
An unconnected graph.
An articulation point of a graph is a vertex v such that when we remove v and all edges
incident upon v , we break a connected component of the graph into two or more pieces.
The problem of finding articulation points is the simplest of many important problems
concerning the connectivity of graphs. As an example of applications of connectivity
algorithms, we may represent a communication network as a graph in which the vertices
are sites to be kept in communication with one another.
A graph has connectivity k if the deletion of any k-1 vertices fails to disconnect the graph.
For example, a graph has connectivity two or more if and only if it has no articulation
points, that is, if and only if it is biconnected.
The higher the connectivity of a graph, the more likely the graph is to survive the failure of
some of its vertices, whether by failure of the processing units at the vertices or external
attack.
We shall here give a simple depth-first search algorithm to find all the articulation points of
a connected graph, and thereby test by their absence whether the graph is biconnected.
1. Perform a depth-first search of the graph, computing dfnumber [v] for each vertex v. In
essence, dfnumber orders the vertices as in a preorder traversal of the depth-first spanning
tree.
a. The root is an articulation point if and only if it has two or more children. Since there
are no cross edges, deletion of the root must disconnect the subtrees rooted at its children,
as a disconnects {b, d, e} from {c, f, g} in Fig..
b. A vertex v other than the root is an articulation point if and only if there is some child w
of v such that low [w] ≥ dfnumber [v]. In this case, v disconnects w and its descendants
from the rest of the graph. Conversely, if low [w] < dfnumber [v], then there must be a way
to get from w down the tree and back to a proper ancestor of v (the vertex whose dfnumber
is low [w]), and therefore deletion of v does not disconnect w or its descendants from the
rest of the graph.
Let G = (V, E) be a connected graph with a cost function defined on the edges. Let U be
some proper subset of the set of vertices V. If (u, v) is an edge of lowest cost such that u
Definition:
A spanning tree of a connected graph is its connected acyclic subgraph (i.e., a tree) that
contains all the vertices of the graph. A minimum spanning tree of a weighted connected
graph is its spanning tree of the smallest weight, where the weight of a tree is defined as the
sum of the weights on all its edges. The minimum spanning tree problem is the problem of
finding a minimum spanning tree for a given weighted connected graph.
Figure: Graph and its spanning trees; T1 is the Minimum Spanning Tree
On each iteration, we expand the current tree in the greedy manner by simply attaching
to it the nearest vertex not in that tree. The algorithm stops after all the graph‘s vertices
have been included in the tree being constructed.
The nature of Prim‘s algorithm makes it necessary to provide each vertex not in the
current tree with the information about the shortest edge connecting the vertex to a tree
vertex.
We can provide such information by attaching two labels to a vertex: the name of the
nearest tree vertex and the length (the weight) of the corresponding edge.
Vertices that are not adjacent to any of the tree vertices can be given the label
indicating their ―infinite‖ distance to the tree vertices a null label for the name of the
nearest tree vertex.
With such labels, finding the next vertex to be added to the current tree T = (V T, ET)
become simple task of finding a vertex with the smallest distance label in the set V -
VT. Ties can be broken arbitrarily.
After we have identified a vertex u* to be added to the tree, we need to perform two
operations:
o Move u* from the set V—VT to the set of tree vertices VT.
This is another greedy algorithm for the minimum spanning tree problem that also always
yields an optimal solution.
Kruskal‘s algorithm looks at a minimum spanning tree for a weighted connected graph G =
{V, E} as an acyclic subgraph with |V|-1 edges for which the sum of the edge weights is
the smallest. Consequently, the algorithm constructs a minimum spanning tree as an
expanding sequence of subgraphs, which are always acyclic but are not necessarily
connected on the intermediate stages of the algorithm.
The correctness of Kruskal‘s algorithm can be proved by repeating the essential steps of
the proof of Prim‘s algorithm. The fact that ET is actually a tree in Prim‘s algorithm, but
generally just an acyclic subgraph in Kruskal‘s algorithm turns out to be an obstacle
that can be overcome.
Applying Prim‘s and Kruskal‘s algorithms to the same small graph by hand may create
an impression that the latter is simpler than the former. This impression is wrong
because, on each of its iterations, Kruskal‘s algorithm has to check whether the addition
of the next edge to the edges already selected would create a cycle.
It is not difficult to see that a new cycle is created if and only if the new edge connects
two vertices already connected by a path, i.e., if and only if the two vertices belong to
the same connected component. Note also that each connected component of a
subgraph generated by Kruskal‘s algorithm is a tree because it has no cycles.
5.5. BRANCH-AND-BOUND
Backtracking is to cut off a branch of the problem‘s state-space tree as soon as we can
deduce that it cannot lead to a solution.
This idea can be strengthened further if we deal with an optimization problem, one that
seeks to minimize or maximize an objective function, usually subject to some constraints.
Note that in the standard terminology of optimization problems, a feasible solution is a
SCE 72 DEPARTMENT OF CSE
point in the problem‘s search space that satisfies all the problem‘s constraints
(e.g. a Hamiltonian circuit in the traveling salesman problem, a subset of items
whose total weight does not exceed the knapsack‘s capacity), while an optimal
solution is a feasible solution with the best value of the objective function (e.g., the
shortest Hamiltonian circuit, the most valuable subset of items that fit the knapsack).
To find a lower bound on the cost of an optimal selection without actually solving the
problem, we can do several methods.
For example, it is clear that the cost of any solution, including an optimal one, cannot be
smaller than the sum of the smallest elements in each of the matrix‘s rows.
For the instance here, this sum is 2 +3 + 1 + 4 = 10.
It is important to stress that this is not the cost of any legitimate selection (3 and 1 came
from the same column of the matrix); it is just a lower bound on the cost of any legitimate
selection.
We Can and will apply the same thinking to partially constructed solutions. For example,
for any legitimate selection that selects 9 from the first row, the lower bound will be 9 + 3
+ 1 + 4 = 17.
Fig: Levels 0 and 1 of the State-space tree for the instance of the assignment problem
(being solved with the best-first branch-and- bound algorithm. The number above a node
shows the order in which the node was generated. A node‘s fields indicate the job number
assigned to person a, and the lower bound value, lb for this node.)
This problem deals with the order in which the tree‘s nodes will he generated. Rather than
generating a single child of the last promising node as we did in backtracking, we wilt
generate all the children of the most promising node among non-terminated leaves in the
current tree. (Non-terminated, je., still promising, leaves are also called live.)
This variation of the strategy is called the best-first branch-and-bound. Returning to the
instance of the assignment problem given earlier, we start with the root that corresponds to
no elements selected from the cost matrix. As the lower bound value for the root, denoted
lb is 10.
The nodes on the first level of the free correspond to four elements (jobs) in the first row of
the matrix since they are each a potential selection for the first component of the solution.
So we have four live leaves (nodes 1 through 4) that may contain an optimal solution. The
most promising of them is node 2 because it has the smallest lower bound value.
Following our best-first search strategy, we branch out from that node first by considering
the three different ways of selecting an element from the second row and not in the second
column—the three different jobs that can be assigned to person b.
Figure: Levels 0, 1. and 2 of the state-space tree for the instance of the assignment
problem
(being solved with the best-first branch-and- bound algorithm)
Of the six live leaves (nodes 1, 3, 4, 5, 6, and 7) that may contain an optimal solution, we
again choose the one with the smallest lower bound, node 5.
First, we consider selecting the third column‘s element from c‘s row (i.e., assigning person
c to job 3); this leaves us with no choice but to select the element from the fourth column of
d‘s row (assigning person d to job 4).
Note that if its cost were smaller than 13. we would have to replace the information about
the best solution seen so far with the data provided by this node.
Now, as we inspect each of the live leaves of the last state-space tree (nodes 1, 3, 4, 6, and
7 in the following figure), we discover that their lower bound values are not smaller than 13
the value of the best selection seen so far (leaf 8).
Hence we terminate all of them and recognize the solution represented by leaf 8 as the
optima) solution to the problem.
Figure: Complete state-space tree for the instance of the assignment problem
(Solved with the best-first branch-and-bound algorithm)
It is natural to structure the state-space tree for this problem as a binary tree constructed as
follows (following figure).
Each node on the ith level of this tree, 0 ≤ i ≤ n, represents all the subsets of n items that
include a particular selection made from the first i ordered items. This particular selection
is uniquely determined by a path from the root to the node: a branch going to the left
indicates the inclusion of the next item while the branch going to the right indicates its
exclusion.
We record the total weight w and the total value v of this selection in the node, along with
some upper bound ub on the value of any subset that can be obtained by adding zero or
more items to this selection.
A simple way to compute the upper bound ub is to add to v, the total value of the items
already selected, the product of the remaining capacity of the knapsack W - w and the best
per unit payoff among the remaining items, which is vi+1/wi+1 :
As a specific example, let us apply the branch-and-bound algorithm to die same instance of
the knapsack problem , At the root of the state-space tree (in the following figure), no items
have been selected as yet. Hence, both the total weight of the items already selected w and
their total value v are equal to 0.
The value of the upper bound computed by formula (ub=v+(W-w)(vi+1/wi+1) is $100. Node
1, the left child of the root, represents the subsets that include item, 1.
The total weight and value of the items already included are 4 and $40, respectively; the
value of the upper bound is 40 + (10-4)*6 = $76.
Since the total weight w of every subset represented by node 3 exceeds the knapsack‘s
capacity, node 3 can be terminated immediately. Node 4 has the same values of w and u as
its parent; the upper bound ub is equal to 40 + (10-4)*5 = $70.
Branching from node 5 yields node 7, represents no feasible solutions and node 8 that
represents just a single subset {1, 3}.
The remaining live nodes 2 and 6 have smaller upper-bound values than the value of the
solution represented by node 8. Hence, both can be terminated making the subset {1, 3} of
node 8 the optimal solution to the problem.
Typically, internal nodes of a state-space tree do not define a point of the problem‘s search
space, because some of the solution‘s components remain undefined. For the knapsack
problem, however, every node of the tree represents a subset of the items given.
We can use this fact to update the information about the best subset seen so far after
generating each new node in the tree. If we did this for the instance investigated above, we
could have terminated nodes 2 & 6 before node 8 was generated because they both are
inferior to the subset of value $65 of node 5.
Truth assignments:
ABCDE
0 0 0 0 0
. . .
1 1 1 1 1
Checking phase: O(n)
NP problems
NP-c omplete
problem
NP problems
known
NP-c omplete
problem
candidate f or
NP -
completenes s
Fig: NP Problems
Examples:
IMPORTANT QUESTIONS
PART-A
PART-B
Accessibility Testing: Verifying a product is accessible to the people having disabilities (deaf, blind,
mentally disabled etc.).
Ad Hoc Testing: A testing phase where the tester tries to 'break' the system by randomly trying the
system's functionality. Can include negative testing as well.
Agile Testing: Testing practice for projects using agile methodologies, treating development as the
customer of testing and emphasizing a test-first design paradigm.
Application Programming Interface (API): A formalized set of software calls and routines that can
be referenced by an application program in order to access supporting system or network services.
Automated Software Quality (ASQ): The use of software tools, such as automated testing tools, to
improve software quality.
Automated Testing:
Testing employing software tools which execute tests without manual intervention. Can be
applied in GUI, performance, API, etc. testing.
The use of software to control the execution of tests, the comparison of actual outcomes to
predicted outcomes, the setting up of test preconditions, and other test control and test
reporting functions.
Backus-Naur Form: A metalanguage used to formally describe the syntax of a language.
Basic Block: A sequence of one or more consecutive, executable statements containing no branches.
Basis Path Testing: A white box test case design technique that uses the algorithmic flow of the
program to design tests.
Basis Set: The set of tests derived using basis path testing.
Baseline: The point at which some deliverable produced during the software engineering process is
put under formal change control.
Binary Portability Testing: Testing an executable application for portability across system platforms
and environments, usually for conformation to an ABI specification.
Black Box Testing: Testing based on an analysis of the specification of a piece of software without
reference to its internal workings. The goal is to test how well the component conforms to the
published requirements for the component.
Bottom Up Testing: An approach to integration testing where the lowest level components are tested
first, then used to facilitate the testing of higher level components. The process is repeated until the
component at the top of the hierarchy is tested.
Boundary Testing: Test which focus on the boundary or limit conditions of the software being tested.
(Some of these tests are stress tests).
Boundary Value Analysis: In boundary value analysis, test cases are generated using the extremes of
the input domaini, e.g. maximum, minimum, just inside/outside boundaries, typical values, and error
values. BVA is similar to Equivalence Partitioning but focuses on "corner cases".
Branch Testing: Testing in which all branches in the program source code are tested at least once.
Breadth Testing: A test suite that exercises the full functionality of a product but does not test
features in detail.
Bug: A fault in a program which causes the program to perform in an unintended or unanticipated
manner.
Capture/Replay Tool: A test tool that records test input as it is sent to the software under test. The
input cases stored can then be used to reproduce the test at a later time. Most commonly applied to
GUI test tools.
CMM: The Capability Maturity Model for Software (CMM or SW-CMM) is a model for judging the
maturity of the software processes of an organization and for identifying the key practices that are
required to increase the maturity of these processes.
Cause Effect Graph: A graphical representation of inputs and the associated outputs effects which
can be used to design test cases.
Code Complete: Phase of development where functionality is implemented in entirety; bug fixes are
all that are left. All functions found in the Functional Specifications have been implemented.
Code Inspection: A formal testing technique where the programmer reviews source code with a
group who ask questions analyzing the program logic, analyzing the code with respect to a checklist
of historically common programming errors, and analyzing its compliance with coding standards.
Code Walkthrough: A formal testing technique where source code is traced by a group with a small
set of test cases, while the state of program variables is manually monitored, to analyze the
programmer's logic and assumptions.
Compatibility Testing: Testing whether software is compatible with other elements of a system with
which it should operate, e.g. browsers, Operating Systems, or hardware.
Concurrency Testing: Multi-user testing geared towards determining the effects of accessing the
same application code, module or database records. Identifies and measures the level of locking,
deadlocking and use of single-threaded code and locking semaphores.
Conformance Testing: The process of testing that an implementation conforms to the specification
on which it is based. Usually applied to testing conformance to a formal standard.
Context Driven Testing: The context-driven school of software testing is flavor of Agile Testing that
advocates continuous and creative evaluation of testing opportunities in light of the potential
information revealed and the value of that information to the organization right now.
Conversion Testing: Testing of programs or procedures used to convert data from existing systems
for use in replacement systems.
Data Dictionary: A database that contains definitions of all data items defined during analysis.
Data Flow Diagram: A modeling notation that represents a functional decomposition of a system.
Data Driven Testing: Testing in which the action of a test case is parameterized by externally
defined data values, maintained as a file or spreadsheet. A common technique in Automated Testing.
Debugging: The process of finding and removing the causes of software failures.
Dependency Testing: Examines an application's requirements for pre-existing software, initial states
and configuration in order to maintain proper functionality.
Emulator: A device, computer program, or system that accepts the same inputs and produces the
same outputs as a given system.
Endurance Testing: Checks for memory leaks or other problems that may occur with prolonged
execution.
End-to-End testing: Testing a complete application environment in a situation that mimics real-
world use, such as interacting with a database, using network communications, or interacting with
other hardware, applications, or systems if appropriate.
Equivalence Class: A portion of a component's input or output domains for which the component's
behaviour is assumed to be the same from the component's specification.
Equivalence Partitioning: A test case design technique for a component in which test cases are
designed to execute representatives from equivalence classes.
Error: A mistake in the system under test; usually but not always a coding mistake on the part of the
developer.
Exhaustive Testing: Testing which covers all combinations of input values and preconditions for an
element of the software under test.
Functional Decomposition: A technique used during planning, analysis and design; creates a
functional hierarchy for the software.
Functional Specification: A document that describes in detail the characteristics of the product with
regard to its intended features.
Functional Testing:
Testing the features and operational behavior of a product to ensure they correspond to its
specifications.
Testing that ignores the internal mechanism of a system or component and focuses solely on
the outputs generated in response to selected inputs and execution conditions.
Glass Box Testing: A synonym for White Box Testing.
Gray Box Testing: A combination of Black Box and White Box testing methodologies: testing a
piece of software against its specification but using some knowledge of its internal workings.
High Order Tests: Black-box tests conducted once the software has been integrated.
Independent Test Group (ITG): A group of people whose primary responsibility is software testing.
SCE 90 DEPARTMENT OF CSE
Inspection: A group review quality improvement process for written material. It consists of two
aspects; product (document itself) improvement and process improvement (of both document
production and inspection).
Installation Testing: Confirms that the application under test recovers from expected or unexpected
events without loss of data or functionality. Events can include shortage of disk space, unexpected
loss of communication, or power out conditions.
Localization Testing: This term refers to making software specifically designed for a specific
locality.
Loop Testing: A white box testing technique that exercises program loops.
Metric: A standard of measurement. Software metrics are the statistics describing the structure or
content of a program. A metric should be a real objective measurement of something such as number
of bugs per lines of code.
Monkey Testing: Testing a system or an Application on the fly, i.e just few tests here and there to
ensure the system or an application does not crash out.
Mutation Testing: Testing done on the application where bugs are purposely added to it.
Negative Testing: Testing aimed at showing software does not work. Also known as "test to fail". See
also Positive Testing.
N+1 Testing: A variation of Regression Testing. Testing conducted with multiple cycles in which
errors found in test cycle N are resolved and the solution is retested in test cycle N+1. The cycles are
typically repeated until the solution reaches a steady state and there are no errors. Path
Testing: Testing in which all paths in the program source code are tested at least once.
Performance Testing: Testing conducted to evaluate the compliance of a system or component with
specified performance requirements. Often this is performed using an automated test tool to simulate
large number of users. Also know as "Load Testing".
Positive Testing: Testing aimed at showing software works. Also known as "test to pass".
Quality Assurance: All those planned or systematic actions necessary to provide adequate
confidence that a product or service is of the type and quality needed and expected by the customer.
Quality Audit: A systematic and independent examination to determine whether quality activities and
related results comply with planned arrangements and whether these arrangements are implemented
effectively and are suitable to achieve objectives.
Quality Control: The operational techniques and the activities used to fulfill and verify requirements
of quality.
Quality Management: That aspect of the overall management function that determines and
implements the quality policy.
Quality Policy: The overall intentions and direction of an organization as regards quality as formally
expressed by top management.
Quality System: The organizational structure, responsibilities, procedures, processes, and resources
for implementing quality management.
Race Condition: A cause of concurrency problems. Multiple accesses to a shared resource, at least
one of which is a write, with no mechanism used by either to moderate simultaneous access.
Ramp Testing: Continuously raising an input signal until the system breaks down.
Recovery Testing: Confirms that the program recovers from expected or unexpected events without
loss of data or functionality. Events can include shortage of disk space, unexpected loss of
communication, or power out conditions.
Regression Testing: Retesting a previously tested program following modification to ensure that
faults have not been introduced or uncovered as a result of the changes made.
Release Candidate: A pre-release version, which contains the desired functionality of the final
version, but which needs to be tested for bugs (which ideally should be removed before the final
version is released).
Sanity Testing: Brief test of major functional elements of a piece of software to determine if its
basically operational.
Scalability Testing: Performance testing focused on ensuring the application under test gracefully
handles increases in work load.
Security Testing: Testing which confirms that the program can restrict access to authorized personnel
and that the authorized personnel can access the functions available to their security level.
Smoke Testing: A quick-and-dirty test that the major functions of a piece of software work.
Originated in the hardware testing practice of turning on a new piece of hardware for the first time and
considering it a success if it does not catch on fire.
Soak Testing: Running a system at high load for a prolonged period of time. For example, running
several times more transactions in an entire day (or night) than would be expected in a busy day, to
identify and performance problems that appear after a large number of transactions have been
executed.
SCE 92 DEPARTMENT OF CSE
Software Requirements Specification: A deliverable that describes all data, functional and
behavioral requirements, all constraints, and all validation requirements for software/
Software Testing: A set of activities conducted with the intent of finding errors in software.
Static Analysis: Analysis of a program carried out without executing the program.
Static Testing: Analysis of a program carried out without executing the program.
Storage Testing: Testing that verifies the program under test stores data files in the correct directories
and that it reserves sufficient space to prevent unexpected termination resulting from lack of space.
This is external storage as opposed to internal storage.
Stress Testing: Testing conducted to evaluate a system or component at or beyond the limits of its
specified requirements to determine the load under which it fails and how. Often this is performance
testing using a very high level of simulated load.
Structural Testing: Testing based on an analysis of internal workings and structure of a piece of
software.
System Testing: Testing that attempts to discover defects that are properties of the entire system
rather than of its individual components.
Testability: The degree to which a system or component facilitates the establishment of test criteria
and the performance of tests to determine whether those criteria have been met.
Testing:
The process of exercising software to verify that it satisfies specified requirements and to
detect errors.
The process of analyzing a software item to detect the differences between existing and
required conditions (that is, bugs), and to evaluate the features of the software item (Ref.
IEEE Std 829).
Test Bed: An execution environment configured for testing. May consist of specific hardware, OS,
network topology, configuration of the product under test, other application or system software, etc.
The Test Plan for a project should enumerated the test beds(s) to be used.
Test Case:
A set of inputs, execution preconditions, and expected outcomes developed for a particular
objective, such as to exercise a particular program path or to verify compliance with a specific
requirement.
Test Driven Development: Testing methodology associated with Agile Programming in which every
chunk of code is covered by unit tests, which must all pass all the time, in an effort to eliminate unit-
level and regression bugs during development. Practitioners of TDD write a lot of tests, i.e. an equal
number of lines of test code to the size of the production code.
Test Driver: A program or test tool used to execute a tests. Also known as a Test Harness.
Test Environment: The hardware and software environment in which tests will be run, and any other
software with which the software under test interacts when under test including stubs and test drivers.
Test First Design: Test-first design is one of the mandatory practices of Extreme Programming
(XP).It requires that programmers do not write any production code until they have first written a unit
test.
Test Harness: A program or test tool used to execute a tests. Also known as a Test Driver.
Test Plan: A document describing the scope, approach, resources, and schedule of intended testing
activities. It identifies test items, the features to be tested, the testing tasks, who will do each task, and
any risks requiring contingency planning. Ref IEEE Std 829.
Test Procedure: A document providing detailed instructions for the execution of one or more test
cases.
Test Scenario: Definition of a set of test cases or test scripts and the sequence in which they are to be
executed.
Test Script: Commonly used to refer to the instructions for a particular test that will be carried out by
an automated test tool.
Test Specification: A document specifying the test approach for a software feature or combination or
features and the inputs, predicted results and execution conditions for the associated tests.
Test Suite: A collection of tests used to validate the behavior of a product. The scope of a Test Suite
varies from organization to organization. There may be several Test Suites for a particular product for
example. In most cases however a Test Suite is a high level concept, grouping together hundreds or
thousands of tests related by what they are intended to test.
Test Tools: Computer programs used in the testing of a system, a component of the system, or its
documentation.
Top Down Testing: An approach to integration testing where the component at the top of the
component hierarchy is tested first, with lower level components being simulated by stubs. Tested
components are then used to test lower level components. The process is repeated until the lowest
level components have been tested.
Total Quality Management: A company commitment to develop a process that achieves high quality
product and customer satisfaction.
Traceability Matrix: A document showing the relationship between Test Requirements and Test
Cases.
Usability Testing: Testing the ease with which users can learn and use a product.
Use Case: The specification of tests that are conducted from the end-user perspective. Use cases tend
to focus on operating software as an end-user would conduct their day-to-day activities.
Validation: The process of evaluating software at the end of the software development process to
ensure compliance with software requirements. The techniques for validation is testing, inspection and
reviewing.
Verification: The process of determining whether of not the products of a given phase of the software
development cycle meet the implementation steps and can be traced to the incoming objectives
established during the previous phase. The techniques for verification are testing, inspection and
reviewing.
Volume Testing: Testing which confirms that any values that may become large over time (such as
accumulated counts, logs, and data files), can be accommodated by the program and will not cause the
program to stop working or degrade its operation in any manner.
Walkthrough: A review of requirements, designs or code characterized by the author of the material
under review guiding the progression of the review.
White Box Testing: Testing based on an analysis of internal workings and structure of a piece of
software. Includes techniques such as Branch Testing and Path Testing. Also known as Structural
Testing and Glass Box Testing. Contrast with Black Box Testing.
Workflow Testing: Scripted end-to-end testing which duplicates specific workflows which are
expected to be utilized by the end-user.