0% found this document useful (0 votes)
10 views84 pages

DSA Week2

Uploaded by

ayesha batool
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views84 pages

DSA Week2

Uploaded by

ayesha batool
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

Lecture 2

Asymptotic Notation,
Worst-Case Analysis, and MergeSort

06/10/2024 1
Last time
Philosophy
• Algorithms are awesome!
• Our motivating questions:
• Does it work?
• Is it fast?
• Can I do better?

Technical content
• Karatsuba integer multiplication
• Example of “Divide and Conquer”
• Not-so-rigorous analysis
06/10/2024 2
Today
• We are going to ask:
• Does it work?
• Is it fast?

• We’ll start to see how to answer these by looking at


some examples of sorting algorithms.
• InsertionSort
• MergeSort

06/10/2024 3
The Plan
• Sorting!
• Worst-case analyisis
• InsertionSort: Does it work?
• Asymptotic Analysis
• InsertionSort: Is it fast?
• MergeSort
• Does it work?
• Is it fast?

06/10/2024 4
The Sorting Problem

• Input:

• A sequence of n numbers a1, a2, . . . , an

• Output:

• A permutation (reordering) a1’, a2’, . . . , an’ of the input

sequence such that a1’ ≤ a2’ ≤ · · · ≤ an’


06/10/2024 5
Structure of data

06/10/2024 6
Sorting
• Important primitive
• For today, we’ll pretend all elements are distinct.

6 4 3 8 1 5 2 7

1 2 3 4 5 6 7 8

Length of the list is n


06/10/2024 7
Why Study Sorting Algorithms?
• There are a variety of situations that we can
encounter
• Do we have randomly ordered keys?
• Are all keys distinct?
• How large is the set of keys to be ordered?
• Need guaranteed performance?

• Various algorithms are better suited to some of


these situations

06/10/2024 8
Some Definitions
• Internal Sort
• The data to be sorted is all stored in the computer’s
main memory.
• External Sort
• Some of the data to be sorted might be stored in some
external, slower, device.
• In Place Sort
• The amount of extra space required to sort the data is
constant with the input size.

06/10/2024 9
Stability
• A STABLE sort preserves relative order of records with equal
keys
Sorted on first key:

Sort file on second key:

Records with key value 3


are not in order on first
key!!

06/10/2024 10
Bubble Sort
• Idea:
• Repeatedly pass through the array
• Swaps adjacent elements that are out of order
i
1 2 3 n

8 4 6 9 2 3 1
j

• Easier to implement, but slower than Insertion sort

06/10/2024 11
Selection Sort
• Idea:
• Find the smallest element in the array
• Exchange it with the element in the first position
• Find the second smallest element and exchange it with the
element in the second position
• Continue until the array is sorted
• Disadvantage:
• Running time depends only slightly on the amount of order
in the file

06/10/2024 12
Example
8 4 6 9 2 3 1 1 2 3 4 9 6 8

1 4 6 9 2 3 8 1 2 3 4 6 9 8

1 2 6 9 4 3 8 1 2 3 4 6 8 9

1 2 3 9 4 6 8 1 2 3 4 6 8 9

06/10/2024 13
Selection Sort
Alg.: SELECTION-SORT(A) 8 4 6 9 2 3 1

n ← length[A]
for j ← 1 to n - 1
do smallest ← j
for i ← j + 1 to n
do if A[i] < A[smallest]
then smallest ← i
exchange A[j] ↔ A[smallest]

06/10/2024 14
Analysis of Selection Sort
Alg.: SELECTION-SORT(A) cost
n ← length[A] times

for j ← 1 to n - 1 c1 1

do smallest ← j c2 n
»n2/2
c3  j 1 (nn-
n 1
 j  1)
comparisons for i ← j + 1 to n

n 1

do if A[i] < A[smallest] 1 j 1


(n  j )
»n
c4 
n 1
exchanges (n  j )
then smallest ← i j 1

c5
exchange A[j] ↔ A[smallest]
n 1 n 1 n 1
 c1  c2 n  c3 (n  1)  c4  (n  j  1)  c5   n  j   c6   n  j   c7 (n  1)  (n 2 ) 15
T (n) 06/10/2024
j 1 j 1 j 2
Insertion Sort
• Idea: like sorting a hand of playing cards
• Start with an empty left hand and the cards facing down
on the table.
• Remove one card at a time from the table, and insert it
into the correct position in the left hand
• compare it with each of the cards already in the hand, from
right to left
• The cards held in the left hand are sorted
• these cards were originally the top cards of the pile on the
table

06/10/2024 16
Insertion Sort

To insert 12, we need to make


room for it by moving first 36
and then 24.
6 1 0 2 4 36

12

06/10/2024 17
Insertion Sort

6 10 24 36

12

06/10/2024 18
Insertion Sort

6 10 24 3
6

12

06/10/2024 19
InsertionSort 6 4 3 8 5
example
Start by moving A[1] toward
the beginning of the list until
you find something smaller
(or can’t go any further): Then move A[3]:

6 4 3 8 5 3 4 6 8 5
4 6 3 8 5 3 4 6 8 5
Then move A[2]: Then move A[4]:
4 6 3 8 5 3 4 6 8 5
3 4 6 8 5 3 4 5 6 8
06/10/2024
Then we are20done!
Insertion Sort
input array

5 2 4 6 1 3

at each iteration, the array is divided in two sub-arrays:

left sub-array right sub-array

sorted unsorted

06/10/2024 21
Insertion Sort

06/10/2024 22
INSERTION-SORT
Alg.: INSERTION-SORT(A) 1 2 3 4 5 6 7 8

for j ← 2 to n a1 a2 a3 a4 a5 a6 a7 a8

do key ← A[ j ] key
Insert A[ j ] into the sorted sequence A[1 . . j -1]

i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key
• Insertion sort – sorts the elements in place
06/10/2024 23
Loop Invariant for Insertion Sort
Alg.: INSERTION-SORT(A)
for j ← 2 to n
do key ← A[ j ]
Insert A[ j ] into the sorted sequence A[1 . . j -1]

i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key
Invariant: at the start of the for loop the elements in A[1 . . j-1] are in
sorted order
06/10/2024 24
Proving Loop Invariants
• Proving loop invariants works like induction
• Initialization (base case):
• It is true prior to the first iteration of the loop

• Maintenance (inductive step):


• If it is true before an iteration of the loop, it remains true before the
next iteration

• Termination:
• When the loop terminates, the invariant gives us a useful property that
helps show that the algorithm is correct
• Stop the induction when the loop terminates

06/10/2024 25
Loop Invariant for Insertion Sort
• Initialization:
• Just before the first iteration, j = 2:
the subarray A[1 . . j-1] = A[1], (the
element originally in A[1]) – is sorted

06/10/2024 26
Loop Invariant for Insertion Sort
•Maintenance:
• the while inner loop moves A[j -1], A[j -2], A[j -3], and
so on, by one position to the right until the proper position
for key (which has the value that started out in A[j]) is
found
• At that point, the value of key is placed into this position.

06/10/2024 27
Loop Invariant for Insertion Sort
•Termination:
• The outer for loop ends when j = n + 1  j-1 = n
• Replace n with j-1 in the loop invariant:
• the subarray A[1 . . n] consists of the elements originally in A[1 . .
n], but in sorted order

j-1 j

• The entire array is sorted!

Invariant: at the start of the for loop the elements in A[1 . . j-1] are in
sorted order
06/10/2024 28
Analysis of Insertion Sort
INSERTION-SORT(A) cost
for j ← 2 to n times
do key ← A[ j ] c1 n
Insert A[ j ] into the sorted sequence A[1 . . j -1] c2 n-1
i←j-1 0 n-1
n-1t
n

while i > 0 and A[i] > key c4 j 2 j

 (t
n
 1)
do A[i + 1] ← A[i] c5 j 2 j

 (t
n
 1)
i←i–1 c6
j 2 j

A[i + 1] ← key
c7
tj: # of times the while statement is executed at iteration j
c n-1
T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1 c7  t j  1 c8 (n  1)
n n n 8

j 2 j 2 j 2
06/10/2024 29
Best Case Analysis
• The array is already sorted “while i > 0 and A[i] > key”

• A[i] ≤ key upon the first time the while loop test is run
(when i = j -1)

• tj = 1

• T(n) = c1n + c2(n -1) + c4(n -1) + c5(n -1) + c8(n-1) =

(c1 + c2 + c4 + c5 + c8)n + (c2 + c4 + c5 + c8)

= an + b = (n)
T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1 c7  t j  1 c8 (n  1)
n n n

j 2 j 2 j 2
06/10/2024 30
Worst Case Analysis
• The array is in reverse sorted order “while i > 0 and A[i] > key”
• Always A[i] > key in while loop test
• Have to compare key with all elements to the left of the j-th position
 compare with j-1 elements  tj = j
n
n(n  1) n
n(n  1) n
n(n  1)
using 
j 1
j
2
  j 
j 2 2
 1   ( j  1) 
j 2 2
we have:

 n( n  1)  n( n  1) n( n  1)
T ( n )  c1n  c2 ( n  1)  c4 ( n  1)  c5   1  c6  c7  c8 ( n  1)
 2  2 2

 an 2  bn  c a quadratic function of n

• T(n) = (n2) order ofn growthn in n2


T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1 c7  t j  1 c8 (n  1)
n

06/10/2024 j 2 j 2 j 2 31
Comparisons and Exchanges in Insertion Sort

INSERTION-SORT(A) cost
for j ← 2 to n times
do key ← A[ j ] c1 n

Insert A[ j ] into the sorted sequence A[1 . . j -1] c2 n-


 n2/2 comparisons 1
i←j-1
 n-t
n
0 j 2 j
while i > 0 and A[i] > key
 (t
n
1 j 2 j  1)
do A[i + 1] ← A[i]2
 n /2 exchanges c4  n-(t
n
j 2 j  1)
i←i–1 1
06/10/2024 A[i + 1] ← key c5 32
Insertion Sort - Summary
• Advantages
• Good running time for “almost sorted” arrays (n)
• Disadvantages
• (n2) running time in worst and average case
•  n2/2 comparisons and exchanges

06/10/2024 33
Insertion Sort
1. Does it work?
2. Is it fast?

What does that


mean???

Plucky the
06/10/2024 Pedantic Penguin34
Claim: InsertionSort “works”
• “Proof:” It just worked in this example:

6 4 3 8 5

6 4 3 8 5 3 4 6 8 5
4 6 3 8 5 3 4 6 8 5

4 6 3 8 5 3 4 6 8 5
3 4 6 8 5 3 4 5 6 8 Sorted!
06/10/2024 35
Claim: InsertionSort “works”
• “Proof:” I did it on a bunch of random lists and it
always worked:

06/10/2024 36
What does it mean to “work”?
• Is it enough to be correct on only one input?
• Is it enough to be correct on most inputs?

• In this class, we will use worst-case analysis:


• An algorithm must be correct on all possible inputs.
• The running time of an algorithm is the worst possible
running time over all inputs.

06/10/2024 37
Worst-case analysis
Think of it like a game: Worst-case analysis guarantee:
Algorithm should work (and be
fast) on that worst-case input.
Here is my algorithm!

Algorithm:
Do the thing
Do the stuff
Here is an input!
Return the answer
(Which I designed to be
terrible for your
algorithm!)
Algorithm
designer

• Pros: very strong guarantee


• Cons: very strong guarantee
06/10/2024 38
Insertion Sort
1. Does it work?
2. Is it fast?

• Okay, so it’s pretty obvious that it works.

• HOWEVER! In the future it won’t be so


obvious, so let’s take some time now to
see how we would prove this rigorously.
06/10/2024 39
Why does this work?

• Say you have a sorted list, 3 4 6 8 , and


another element 5 .

• Insert 5 right after the largest thing that’s still


smaller than 5 . (Aka, right after 4 ).

• Then you get a sorted list: 3 4 5 6 8


06/10/2024 40
So just use this logic at every step.
6 4 3 8 5 The first element, [6], makes up a sorted list.

So correctly inserting 4 into the list [6] means


4 6 3 8 5 that [4,6] becomes a sorted list.

The first two elements, [4,6], make up a


4 6 3 8 5 sorted list.
So correctly inserting 3 into the list [4,6] means
3 4 6 8 5 that [3,4,6] becomes a sorted list.
The first three elements, [3,4,6], make up a
3 4 6 8 5 sorted list.
So correctly inserting 8 into the list [3,4,6] means
3 4 6 8 5 that [3,4,6,8] becomes a sorted list.
The first four elements, [3,4,6,8], make up a
3 4 6 8 5 sorted list.
So correctly inserting 5 into the list [3,4,6,8]
3 4 5 6 8 means that [3,4,5,6,8] becomes a sorted list.
06/10/2024
YAY WE ARE41DONE!
What have we learned?
• In this class we will use worst-case analysis:
• We assume that a “random guy” comes up with a worst-
case input for our algorithm, and we measure
performance on that worst-case input.

06/10/2024 42
The Plan
• InsertionSort recap
• Worst-case Analysis
• Back to InsertionSort: Does it work?
• Asymptotic Analysis
• Back to InsertionSort: Is it fast?
• MergeSort
• Does it work?
• Is it fast?

06/10/2024 43
In this class we will use…
• Big-Oh notation!
• Gives us a meaningful way to talk about the
running time of an algorithm, independent of
programming language, computing platform, etc.,
without having to count all the operations.

06/10/2024 44
Main idea:
Focus on how the runtime scales with n (the input size).

(Only pay attention to the largest


Some examples… function of n that appears.)

Asymptotic Running
Number of operations Time

We say this algorithm is


“asymptotically faster”
than the others.
06/10/2024 45
Why is this a good idea?
• Suppose the running time of an algorithm is:

ms

This constant factor of 10


depends a lot on my
computing platform… These lower-order
terms don’t really
matter as n gets large.

We’re just left with the n2 term!


That’s what’s meaningful.
06/10/2024 46
Pros and Cons of Asymptotic
Analysis
Pros: Cons:
• Abstracts away from • Only makes sense if n is
hardware- and language- large (compared to the
specific issues. constant factors).
• Makes algorithm analysis
much more tractable. 1000000000 n
• Allows us to meaningfully is “better” than n2 ?!?!
compare how algorithms will
perform on large inputs.

06/10/2024 47
pronounced “big-oh of …” or sometimes “oh of …”

Informal definition for O(…)


• Let , be functions of positive integers.
• Think of as a runtime: positive and increasing in n.

• We say “ is ” if:
for large enough n,
is at most some constant multiple of .

Here, “constant” means “some number


06/10/2024 that doesn’t depend on n.” 48
Example for large enough n,
is at most some constant
multiple of .

10
2 +
2 n
) =
T( n

g(n) = n2

06/10/2024 49
Example for large enough n,
is at most some constant
multiple of .

3g(n) = 3n2

10
2 +
2 n
) =
T( n

g(n) = n2

06/10/2024 50
Example for large enough n,
is at most some constant
multiple of .

n0=4

3g(n) = 3n2

10
2 +
2 n
) =
T( n

g(n) = n2

06/10/2024 51
Formal definition of O(…)
• Let , be functions of positive integers.
• Think of as a runtime: positive and increasing in n.

• Formally,

“If and only if” “For all”

“There exists”
“such that”

06/10/2024 52
Example

10
2 +
2 n
) =
T( n

g(n) = n2

06/10/2024 53
Example

3g(n) = 3n2
(c=3)

10
2 +
2 n
) =
T( n

g(n) = n2

06/10/2024 54
Example

n0=4

3g(n) = 3n2
(c=3)

10
2 +
2 n
) =
T( n

g(n) = n2

06/10/2024 55
Example

n0=4
Formally:
3g(n) = 3n 2
• Choose c = 3
• Choose n0 = 4
0 • Then:
2 +
1
2 n
) =
T( n

g(n) = n2

06/10/2024 56
Same example

Formally:
7g(n) = 7n2 • Choose c = 7
• Choose n0 = 2
0 • Then:
2 +
1
2 n
) =
T( n

g(n) = n2
is no t a
There o ice
t ” c h
n0=2 “correc n0
o f c a n d
06/10/2024 57
Take-away from examples
• To prove T(n) = O(g(n)), you have to come up with c
and n0 so that the definition is satisfied.

• To prove T(n) is NOT O(g(n)), one way is proof by


contradiction:
• Suppose (to get a contradiction) that someone gives you
a c and an n0 so that the definition is satisfied.
• Show that this someone must by lying to you by deriving
a contradiction.

06/10/2024 58
This is my

Recap: Asymptotic Notation


happy face!

• This makes both Plucky and Lucky happy.


• Plucky the Pedantic Penguin is happy because
there is a precise definition.
• Lucky the Lackadaisical Lemur is happy because we
don’t have to pay close attention to all those pesky
constant factors.

• But we should always be careful not to abuse it.

• In the course, (almost) every algorithm we see


will be actually practical, without needing to
take
06/10/2024 59
Insertion Sort: running time
As you get more used to this, you won’t have to count up operations anymore.
For example, just looking at the pseudocode below, you might think…

def InsertionSort(A):
for i in range(1,len(A)):
current = A[i]
j = i-1
while j >= 0 and A[j] > current: n-1 iterations
A[j+1] = A[j] of the outer
j -= 1 loop
A[j+1] = current

In the worst case,


“There’s O(1) stuff going on inside the inner loop, so
about n iterations
each time the inner loop runs, that’s O(n) work. Then
of this inner loop the inner loop is executed O(n) times by the outer loop,
06/10/2024 60
so that’s O(n2).”
What have we learned?

InsertionSort is an algorithm that


correctly sorts an arbitrary n-element
array in time .

Can we do better?
06/10/2024 61
The Plan
• InsertionSort recap
• Worst-case analyisis
• Back to InsertionSort: Does it work?
• Asymptotic Analysis
• Back to InsertionSort: Is it fast?
• MergeSort
• Does it work?
• Is it fast?

06/10/2024 62
Can we do better?
• MergeSort: a divide-and-conquer approach
• Recall from last time:

Divide and
Conquer: Big problem

Smaller Smaller
problem problem
Recurse! Recurse!

Yet smaller Yet smaller Yet smaller Yet smaller


problem problem problem problem
06/10/2024 63
MergeSort
6 4 3 8 1 5 2 7

6 4 3 8 1 5 2 7
Recursive magic! Recursive magic!

3 4 6 8 1 2 5 7

MERGE! 1 2 3 4 5 6 7 8

06/10/2024 64
MergeSort Pseudocode
MERGESORT(A):
• n = length(A)
• if n 1: If A has length 1,
It is already sorted!
• return A
Sort the left half
• L = MERGESORT(A[ 0 : n/2])
Sort the right half
• R = MERGESORT(A[n/2 : n ])
• return MERGE(L,R) Merge the two halves

06/10/2024 65
MergeSort Pseudocode

06/10/2024 66
Two questions

1. Does this work?


2. Is it fast?

Empirically:
1. Seems to work.
2. Seems fast.
06/10/2024 69
Assume that n is a power of 2
for convenience.
It’s fast
CLAIM:
MergeSort runs in time

• Proof coming soon.


• But first, how does this compare to InsertionSort?
• Recall InsertionSort ran in time O.

06/10/2024 70
vs. ?

06/10/2024 72
All logarithms in this course are base 2
Aside:

Quick log refresher


• Def: log(n) is the number so that .
• Intuition: log(n) is how many times you need to divide
n by 2 in order to get down to 1.
⇒log(32) = 5
32, 16, 8, 4, 2, 1

Halve 5 times


64, 32, 16, 8, 4, 2, 1
log(64) = 6

Halve 6 times log(128) = 7


log(256) = 8
• log(n) grows log(512) = 9
very slowly! ….
06/10/2024
log(# particles in the universe) 73< 280
vs. ?
• grows much more slowly than
• grows much more slowly than

Punchline: A running time of O(n log n) is a


lot better than O(n2)!

06/10/2024 74
Assume that n is a power of 2
for convenience.
Now let’s prove the claim
CLAIM:
MergeSort runs in time

06/10/2024 75
Let’s prove the claim
Size n Level 0

n/2 n/2 Level 1

n/4 n/4 n/4 n/4


Focus on just one of
these sub-problems …
Level t
2t subproblems
n/2t n/2t n/2t n/2t n/2t n/2t
at level t.

06/10/2024
(Size 1) Level
76
log(n)
How much work in this sub-problem?

Time spent MERGE-ing


n/2 t
the two subproblems

+
Time spent within the
n/2 t+1
n/2 t+1
two sub-problems

06/10/2024 77
How much work in this sub-problem?
Let k=n/2t…

Time spent MERGE-ing


k the two subproblems

+
Time spent within the
k/2 k/2 two sub-problems

06/10/2024 78
How long does it k Code for the MERGE
step is given in the
Lecture2 notebook.
take to MERGE? k/2 k/2

k/2 k/2

3 4 6 8 1 2 5 7

MERGE! 1 2 3 4 5 6 7 8

06/10/2024 k 79
How long does it k Code for the MERGE
step is given in the
Lecture2 notebook.
take to MERGE? k/2 k/2

Question: in big-Oh notation, how long does it take to


run MERGE on two lists of size k/2?
Answer: It takes time O(k), since we just walk across the
list once.

There are O(k) operations done at this node.


Take-away: k (Not including work at recursive calls).

k/2 k/2

06/10/2024 80
Recursion tree
Size n

n/2 n/2

n/4 n/4 n/4 n/4


n/2t n/2t n/2t n/2t n/2t n/2t There are O(k) operations
done at this node.

k
k/2 k/281
(Size06/10/2024
1)
Recursion tree Think, Pair,
Share!

How many operations are done at this level of the


Size n tree? (Just MERGE-ing subproblems).

How about at this level of the tree?


n/2 n/2 (just MERGE-ing, between both n/2-sized
problems)

n/4 n/4 n/4 n/4


This level?

This level?
n/2t n/2t n/2t n/2t n/2t n/2t There are O(k) operations
done at this node.

k
k/2 k/282
(Size06/10/2024
1)
Recursion tree #
Size of
each
Amount of work
Level problems at this level
problem

Size n
0 1 n O(n)
n/2 n/2
1 2 n/2 O(n)
n/4 n/4 n/4 n/4 2 4 n/4 O(n)
… … Explanation for this table
done on the board!

n/2t
n/2t n/2t n/2t n/2t n/2t t 2t n/2t O(n)
… …
log(n)
(Size06/10/2024
1) n 1 O(n)
83
Total runtime…

• O(n) steps per level, at every level

• log(n) + 1 levels

• O( n log(n) ) total!

That was the claim!


06/10/2024 84
What have we learned?
• MergeSort correctly sorts a list of n integers in time
O(n log(n) ).
• That’s (asymptotically) better than InsertionSort!

06/10/2024 85
The Plan
• InsertionSort recap
• Worst-case analyisis
• Back to InsertionSort: Does it work?
• Asymptotic Analysis
• Back to InsertionSort: Is it fast?
• MergeSort
• Does it work?
• Is it fast?

Wrap-Up
06/10/2024 86
Recap
• InsertionSort runs in time O(n2)
• MergeSort is a divide-and-conquer algorithm that
runs in time O(n log(n))

• How do we measure the runtime of an algorithm?


• Worst-case analysis
• Asymptotic analysis
• How do we analyze the running time of a recursive
algorithm?
• One way is to draw a recursion tree.
06/10/2024 87

You might also like