0% found this document useful (0 votes)
21 views

Module 1

Uploaded by

Niramay K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Module 1

Uploaded by

Niramay K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Introduction

 What is Algorithm?
 a clearly specified set of simple instructions to be followed
to solve a problem
○ Takes a set of values, as input and
○ produces a value, or set of values, as output
 May be specified
○ In English
○ As a computer program
○ As a pseudo-code

 Data structures
 Methods of organizing data
 Program = algorithms + data structures
Introduction
 An algorithm is a sequence of computational steps that transform input into output.

 It is a procedure or a mathematical entity, which is independent of a specific


programming language, machine, or compiler.

 What are reasons to study algorithms?

 Computing time is a bounded resource, and so is space in memory.

 Computers may be fast, but they are not infinitely fast.


 memory may be cheap, but it is not free.

 These resources should be used wisely, and algorithms that are efficient in terms
of time or space will help you do so.
Properties of Algorithm
1. It should take 0 or more input.
2. It should produce at least one output.
3. It should terminate with in finite time.
4. It should be deterministic in nature.
(Every step should be unambiguous
and clear)
5. It should give correct output.
6. It is programming independent (It is
generic to all programming languages).
Steps to construct Algorithm
 Problem definition (understand problem
clearly)
 Design algorithm – divide and conquer,
greedy, dynamic
 Flow chart
 Verification – by flowchart
 Coding or implementation by language
 Analysis – CPU time, memory
Analysis
 Time and memory (Space)
 CPU always more costly than memory
 So CPU time more important than
memory
 In Analysis – firstly check time & then
space
 Check space only when time of two
algorithm is same
Time complexity
 After giving input and before getting
output the time taken by the algorithm
 T(A) = C(A) + R(A)

 C(A) – Compile time (S/w)


- Programming language of compiler
 R(A) – Running time (H/w)
- Type of processor
Algorithms analysis
 Why need algorithm analysis ?
 writing a working program is not good
enough
 The program may be inefficient!
 If the program is run on a large data set,
then the running time becomes an issue
Algorithms analysis
 Analyzing an algorithm means investigate
efficiency with respect to resources:
running time and memory space.

 Given two algorithms for a task, how do we


find out which one is better?

 Analyzing framework:
 Measuring input size
 Measuring running time
 Order of growth
 Worst-case, best-case, Average case
Example: Selection Problem
 Given a list of N numbers, determine the
kth largest, where k  N.
 Algorithm 1:
(1) Read N numbers into an array
(2) Sort the array in decreasing order by
some simple algorithm
(3) Return the element in position k
Example: Selection Problem…
 Algorithm 2:
(1) Read the first k elements into an array
and sort them in decreasing order
(2) Each remaining element is read one by
one
○ If smaller than the kth element, then it is
ignored
○ Otherwise, it is placed in its correct spot in the
array, bumping one element out of the array.
(3) The element in the kth position is returned
as the answer.
Example: Selection Problem…
 Which algorithm is better when
 N =100 and k = 100?
 N =100 and k = 1?
 What happens when N = 1,000,000 and
k = 500,000?
 There exist better algorithms
Algorithm Analysis
 We only analyze correct algorithms
 An algorithm is correct
 If, for every input instance, it halts with the correct output
 Incorrect algorithms
 Might not halt at all on some input instances
 Might halt with other than the desired answer
 Analyzing an algorithm
 Predicting the resources that the algorithm requires
 Resources include
○ Memory
○ Communication bandwidth
○ Computational time (usually most important)
Algorithm Analysis…
 Factors affecting the running time
 computer
 compiler
 algorithm used
 input to the algorithm
○ The content of the input affects the running time
○ typically, the input size (number of items in the input) is the
main consideration
 E.g. sorting problem  the number of items to be sorted
 E.g. multiply two matrices together  the total number of
elements in the two matrices
 Machine model assumed
 Instructions are executed one after another, with no
concurrent operations  Not parallel computers
Example
N
 Calculate
 i
i 1
3

1
1
2 2N+2
3 4N
4 1

 Lines 1 and 4 count for one unit each


 Line 3: executed N times, each time four units
 Line 2: (1 for initialization, N+1 for all the tests, N for
all the increments) total 2N + 2
 total cost: 6N + 4  O(N)
Worst- / average- / best-case
 Worst-case running time of an algorithm
 The longest running time for any input of size n
 An upper bound on the running time for any input
 guarantee that the algorithm will never take longer
 Example: Sort a set of numbers in increasing order; and
the data is in decreasing order
 The worst case can occur fairly often
○ E.g. in searching a database for a particular piece of
information
 Best-case running time
 sort a set of numbers in increasing order; and the data is
already in increasing order
 Average-case running time
 May be difficult to define what “average” means
Running-time of algorithms
 Bounds are for the algorithms, rather than
programs
 programs are just implementations of an
algorithm, and almost always the details of the
program do not affect the bounds

 Bounds are for algorithms, rather than


problems
 A problem can be solved with several algorithms,
some are more efficient than others
Order of growth
 INSERTION SORT : worst case time
complexity an2+bn+c
 Consider only the leading term
Growth Rate

 The idea is to establish a relative order among functions for


large n
  c , n0 > 0 such that f(N)  c g(N) when N  n0
 f(N) grows no faster than g(N) for “large” N
Asymptotic notation: Big-Oh
 f(N) = O(g(N))
 There are positive constants c and n0
such that
f(N)  c g(N) when N  n0

 The growth rate of f(N) is less than or


equal to the growth rate of g(N)
 g(N) is an upper bound on f(N)
Big-Oh: example
 Let f(N) = 2N2. Then
 f(N) = O(N4)
 f(N) = O(N3)
 f(N) = O(N2) (best answer, asymptotically
tight)

 O(N2): reads “order N-squared” or “Big-Oh N-


squared”
Big Oh: more examples
 N2 / 2 – 3N = O(N2)
 1 + 4N = O(N)
 7N2 + 10N + 3 = O(N2) = O(N3)
 log10 N = log2 N / log2 10 = O(log2 N) = O(log N)
 sin N = O(1); 10 = O(1), 1010 = O(1)


 N
i 1
i  N  N  O(N ) 2

 i 1
N
i 2
 N  N 2
 O ( N 3
)
 log N + N = O(N)
 logk N = O(N) for any constant k
 N = O(2N), but 2N is not O(N)
 210N is not O(2N)
Math Review: logarithmic functions

x  b iff
a
log x b  a
log ab  log a  log b
log m b
log a b 
log m a
log a b  b log a
a log n  n log a
log b a  (log a)b  log a b
d log e x 1

dx x
Some rules
When considering the growth rate of a function
using Big-Oh
 Ignore the lower order terms and the coefficients
of the highest-order term
 No need to specify the base of logarithm
 Changing the base from one constant to another changes
the value of the logarithm by only a constant factor

 If T1(N) = O(f(N) and T2(N) = O(g(N)), then


 T1(N) + T2(N) = max(O(f(N)), O(g(N))),
 T1(N) * T2(N) = O(f(N) * g(N))
Big-Omega

  c , n0 > 0 such that f(N)  c g(N) when N  n0


 f(N) grows no slower than g(N) for “large” N
Big-Omega

 f(N) = (g(N))
 There are positive constants c and n0
such that
f(N)  c g(N) when N  n0

 The growth rate of f(N) is greater than or


equal to the growth rate of g(N).
Big-Omega: examples
 Let f(N) = 2N2. Then
 f(N) = (N)
 f(N) = (N2) (best answer)
f(N) = (g(N))

 the growth rate of f(N) is the same as the growth


rate of g(N)
Big-Theta

 f(N) = (g(N)) iff


f(N) = O(g(N)) and f(N) = (g(N))
 The growth rate of f(N) equals the
growth rate of g(N)
 Example: Let f(N)=N2 , g(N)=2N2
 Since f(N) = O(g(N)) and f(N) = (g(N)),
thus f(N) = (g(N)).
 Big-Theta means the bound is the
tightest possible.
Some rules

 If T(N) is a polynomial of degree k, then


T(N) = (Nk).

 For logarithmic functions,


T(logm N) = (log N).
Typical Growth Rates
Growth rates …
 Doubling the input size
 f(N) = c  f(2N) = f(N) = c
 f(N) = log N  f(2N) = f(N) + log 2
 f(N) = N  f(2N) = 2 f(N)
 f(N) = N2  f(2N) = 4 f(N)
 f(N) = N3  f(2N) = 8 f(N)
 f(N) = 2N  f(2N) = f2(N)
 Advantages of algorithm analysis
 To eliminate bad algorithms early
 pinpoints the bottlenecks, which are worth
coding carefully
Using L' Hopital's rule
 L' Hopital's rule
 If lim f ( N )  and lim g ( N )  
n  n 

f (N ) f ( N )
then lim = lim
n g ( N ) n   g ( N )

 Determine the relative growth rates (using L' Hopital's rule if


necessary)
 compute f (N )
lim
n g(N )
 if 0: f(N) = O(g(N)) and f(N) is not (g(N))
 if constant  0: f(N) = (g(N))
 if : f(N) = (f(N)) and f(N) is not (g(N))
 limit oscillates: no relation
General Rules
 For loops
 at most the running time of the statements
inside the for-loop (including tests) times the
number of iterations.
 Nested for loops

 the running time of the statement multiplied by


the product of the sizes of all the for-loops.
 O(N2)
General rules (cont’d)
 Consecutive statements

 These just add


 O(N) + O(N2) = O(N2)
 If S1
Else S2
 never more than the running time of the test plus the
larger of the running times of S1 and S2.
Another Example
 Maximum Subsequence Sum Problem
 Given (possibly negative) integers A1,
A2, ...., An, find the maximum value of
j

A
k i
k

 For convenience, the maximum subsequence


sum is 0 if all the integers are negative

 E.g. for input –2, 11, -4, 13, -5, -2


 Answer: 20 (A2 through A4)
Algorithm 1: Simple
 Exhaustively tries all possibilities (brute force)

 O(N3)
Algorithm 2: Divide-and-
conquer
 Divide-and-conquer
 split the problem into two roughly equal subproblems, which
are then solved recursively
 patch together the two solutions of the subproblems to arrive
at a solution for the whole problem

 The maximum subsequence sum can be


Entirely in the left half of the input
Entirely in the right half of the input
It crosses the middle and is in both halves
Algorithm 2 (cont’d)

 The first two cases can be solved recursively


 For the last case:
 find the largest sum in the first half that includes the last
element in the first half
 the largest sum in the second half that includes the first
element in the second half
 add these two sums together
Algorithm 2 …

O(1)

T(m/2)
T(m/2)
O(m)

O(1)
Algorithm 2 (cont’d)
 Recurrence equation
T (1)  1
N
T (N )  2T ( )  N
2

 2 T(N/2): two subproblems, each of size N/2


 N: for “patching” two solutions to find
solution to whole problem
Algorithm 2 (cont’d)
N
 Solving the recurrence: T (N )  2T ( ) N
2
N
 4T ( )  2N
4
N
 8 T ( )  3N
8

N
 2 k T ( k )  kN
2
 With k=log N (i.e. 2k = N), we have
T ( N )  N T (1)  N log N
 N log N  N
 Thus, the running time is O(N log N)
 faster than Algorithm 1 for large data sets
Analysis of Algorithms
(Insertion Sort and Selection Sort)
The Sorting Problem

• Input:

– A sequence of n numbers a1, a2, . . . , an

• Output:

– A permutation (reordering) a1’, a2’, . . . , an’ of the input

sequence such that a1’ ≤ a2’ ≤ · · · ≤ an’

2
Structure of data

3
Why Study Sorting Algorithms?
• There are a variety of situations that we can
encounter
– Do we have randomly ordered keys?
– Are all keys distinct?
– How large is the set of keys to be ordered?
– Need guaranteed performance?

• Various algorithms are better suited to some of


these situations

4
Some Definitions
• Internal Sort
– The data to be sorted is all stored in the computer’s
main memory.
• External Sort
– Some of the data to be sorted might be stored in
some external, slower, device.
• In Place Sort
– The amount of extra space required to sort the data is
constant with the input size.

5
Insertion Sort
• Idea: like sorting a hand of playing cards
– Start with an empty left hand and the cards facing
down on the table.
– Remove one card at a time from the table, and insert
it into the correct position in the left hand
• compare it with each of the cards already in the hand, from
right to left
– The cards held in the left hand are sorted
• these cards were originally the top cards of the pile on the
table

6
Insertion Sort

To insert 12, we need to


make room for it by moving
first 36 and then 24.

7
Insertion Sort

8
Insertion Sort

9
Insertion Sort

input array

5 2 4 6 1 3

at each iteration, the array is divided in two sub-arrays:

left sub-array right sub-array

sorted unsorted

10
Insertion Sort

11
INSERTION-SORT
Alg.: INSERTION-SORT(A) 1 2 3 4 5 6 7 8

for j ← 2 to n a1 a2 a3 a4 a5 a6 a7 a8

do key ← A[ j ] key
Insert A[ j ] into the sorted sequence A[1 . . j -1]
i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key
• Insertion sort – sorts the elements in place
12
Loop Invariant for Insertion Sort

Alg.: INSERTION-SORT(A)
for j ← 2 to n
do key ← A[ j ]
Insert A[ j ] into the sorted sequence A[1 . . j -1]

i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key

Invariant: at the start of the for loop the elements in A[1 . . j-1]
are in sorted order
13
Proving Loop Invariants
• Proving loop invariants works like induction
• Initialization (base case):
– It is true prior to the first iteration of the loop
• Maintenance (inductive step):
– If it is true before an iteration of the loop, it remains true before
the next iteration
• Termination:
– When the loop terminates, the invariant gives us a useful
property that helps show that the algorithm is correct
– Stop the induction when the loop terminates

14
Loop Invariant for Insertion Sort
• Initialization:
– Just before the first iteration, j = 2:
the subarray A[1 . . j-1] = A[1],
(the element originally in A[1]) – is
sorted

15
Loop Invariant for Insertion Sort
• Maintenance:
– the while inner loop moves A[j -1], A[j -2], A[j -3],
and so on, by one position to the right until the proper
position for key (which has the value that started out in
A[j]) is found
– At that point, the value of key is placed into this
position.

16
Loop Invariant for Insertion Sort
• Termination:
– The outer for loop ends when j = n + 1  j-1 = n
– Replace n with j-1 in the loop invariant:
• the subarray A[1 . . n] consists of the elements originally in
A[1 . . n], but in sorted order
j-1 j

• The entire array is sorted!

Invariant: at the start of the for loop the elements in A[1 . . j-1]
are in sorted order
17
Analysis of Insertion Sort
INSERTION-SORT(A) cost times
for j ← 2 to n c1 n
do key ← A[ j ] c2 n-1
Insert A[ j ] into the sorted sequence A[1 . . j -1] 0 n-1
i←j-1 c4 n-1

n
while i > 0 and A[i] > key c5 j 2 j
t
c6 
n
do A[i + 1] ← A[i] j 2
(t j  1)
c7 
n
i←i–1 j 2
(t j  1)

A[i + 1] ← key c8 n-1


tj: # of times the while statement is executed at iteration j

T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1  c7  t j  1  c8 (n  1)
n n n

j 2 j 2 j 2
18
Best Case Analysis
• The array is already sorted “while i > 0 and A[i] > key”
– A[i] ≤ key upon the first time the while loop test is run
(when i = j -1)

– tj = 1

• T(n) = c1n + c2(n -1) + c4(n -1) + c5(n -1) + c8(n-1)


= (c1 + c2 + c4 + c5 + c8)n + (c2 + c4 + c5 + c8)

= an + b = (n)
T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1  c7  t j  1  c8 (n  1)
n n n

j 2 j 2 j 2
19
Worst Case Analysis
• The array is in reverse sorted order “while i > 0 and A[i] > key”
– Always A[i] > key in while loop test
– Have to compare key with all elements to the left of the j-th
position  compare with j-1 elements  tj = j
n
n(n  1) n
n(n  1) n
n(n  1)
using j 1
j
2
  j 
j 2 2
 1   ( j  1) 
j 2 2
we have:

 n(n  1)  n(n  1) n(n  1)


T (n )  c1n  c2 (n  1)  c4 (n  1)  c5   1  c6  c7  c8 (n  1)
 2  2 2

 an2  bn  c a quadratic function of n

• T(n) = (n2) order of growth in n2


T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1  c7  t j  1  c8 (n  1)
n n n

j 2 j 2 j 2 20
Comparisons and Exchanges in
Insertion Sort
INSERTION-SORT(A) cost times
for j ← 2 to n c1 n
do key ← A[ j ] c2 n-1
Insert A[ j ] into the sorted sequence A[1 . . j -1] 0 n-1
i←j-1  n2/2 comparisons c4 n-1
c5 
n
while i > 0 and A[i] > key j 2 j
t

c6 
n
do A[i + 1] ← A[i] j 2
(t j  1)

 n2/2 exchanges c7 
n
i←i–1 j 2
(t j  1)

A[i + 1] ← key c8 n-1


21
Insertion Sort - Summary
• Advantages
– Good running time for “almost sorted” arrays (n)
• Disadvantages
– (n2) running time in worst and average case
–  n2/2 comparisons and exchanges

22
Selection Sort
• Idea:
– Find the smallest element in the array
– Exchange it with the element in the first position
– Find the second smallest element and exchange it with
the element in the second position
– Continue until the array is sorted
• Disadvantage:
– Running time depends only slightly on the amount of
order in the file

23
Example
8 4 6 9 2 3 1 1 2 3 4 9 6 8

1 4 6 9 2 3 8 1 2 3 4 6 9 8

1 2 6 9 4 3 8 1 2 3 4 6 8 9

1 2 3 9 4 6 8 1 2 3 4 6 8 9

24
Selection Sort
Alg.: SELECTION-SORT(A)
n ← length[A] 8 4 6 9 2 3 1

for j ← 1 to n - 1
do smallest ← j
for i ← j + 1 to n
do if A[i] < A[smallest]
then smallest ← i
exchange A[j] ↔ A[smallest]

25
Analysis of Selection Sort
Alg.: SELECTION-SORT(A) cost times
n ← length[A] c1 1
for j ← 1 to n - 1 c2 n
do smallest ← j c3 n-1
n2/2 for i ← j + 1 to n c4 nj11 (n  j  1)
comparisons
do if A[i] < A[smallest] c5 
n 1
j 1
(n  j )
n
exchanges then smallest ← i c6 
n 1
j 1
(n  j )

exchange A[j] ↔ A[smallest] c7 n-1


n 1 n 1 n 1
T (n)  c1  c2 n  c3 (n  1)  c4  (n  j  1)  c5   n  j   c6   n  j   c7 (n  1)  (n 2 ) 26
j 1 j 1 j 2
Bubble Sort
• Idea:
– Repeatedly pass through the array
– Swaps adjacent elements that are out of order

i
1 2 3 n

8 4 6 9 2 3 1
j

• Easier to implement, but slower than Insertion


sort

27
Example
8 4 6 9 2 3 1 1 8 4 6 9 2 3
i=1 j i=2 j

8 4 6 9 2 1 3 1 2 8 4 6 9 3
i=1 j i=3 j

8 4 6 9 1 2 3 1 2 3 8 4 6 9
i=1 j i=4 j

8 4 6 1 9 2 3 1 2 3 4 8 6 9
i=1 j i=5 j

8 4 1 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=6 j

8 1 4 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=7
j
1 8 4 6 9 2 3
28
i=1 j
Bubble Sort
Alg.: BUBBLESORT(A)
for i  1 to length[A]
do for j  length[A] downto i + 1
do if A[j] < A[j -1]
then exchange A[j]  A[j-1]
i
8 4 6 9 2 3 1
i=1 j

29
Bubble-Sort Running Time
Alg.: BUBBLESORT(A)
for i  1 to length[A] c1
do for j  length[A] downto i + 1 c2
Comparisons:  n2/2 do if A[j] < A[j -1] c3
Exchanges:  n2/2
then exchange A[j]  A[j-1] c4
n

 (n  i)
n n
T(n) = c1(n+1) + c2  (n  i  1)  c3  (n  i)  c4
i 1 i 1 i 1
n
= (n) + (c2 + c2 + c4)  (n  i)
i 1
n n n
n ( n  1) n 2
n
where  (n  i )  n   i  n 2   
i 1 i 1 i 1 2 2 2
Thus,T(n) = (n2)
30
Order of growth
• INSERTION SORT : worst case time complexity
an2+bn+c
• Consider only the leading term of formula, since
the lower-order terms are relatively insignificant
for large values of n.
• Also ignore the leading term constant coefficient.

You might also like