DSA Week2
DSA Week2
Asymptotic Notation,
Worst-Case Analysis, and MergeSort
06/10/2024 1
Last time
Philosophy
• Algorithms are awesome!
• Our motivating questions:
• Does it work?
• Is it fast?
• Can I do better?
Technical content
• Karatsuba integer multiplication
• Example of “Divide and Conquer”
• Not-so-rigorous analysis
06/10/2024 2
Today
• We are going to ask:
• Does it work?
• Is it fast?
06/10/2024 3
The Plan
• Sorting!
• Worst-case analyisis
• InsertionSort: Does it work?
• Asymptotic Analysis
• InsertionSort: Is it fast?
• MergeSort
• Does it work?
• Is it fast?
06/10/2024 4
The Sorting Problem
• Input:
• Output:
06/10/2024 6
Sorting
• Important primitive
• For today, we’ll pretend all elements are distinct.
6 4 3 8 1 5 2 7
1 2 3 4 5 6 7 8
06/10/2024 8
Some Definitions
• Internal Sort
• The data to be sorted is all stored in the computer’s
main memory.
• External Sort
• Some of the data to be sorted might be stored in some
external, slower, device.
• In Place Sort
• The amount of extra space required to sort the data is
constant with the input size.
06/10/2024 9
Stability
• A STABLE sort preserves relative order of records with equal
keys
Sorted on first key:
06/10/2024 10
Bubble Sort
• Idea:
• Repeatedly pass through the array
• Swaps adjacent elements that are out of order
i
1 2 3 n
8 4 6 9 2 3 1
j
06/10/2024 11
Selection Sort
• Idea:
• Find the smallest element in the array
• Exchange it with the element in the first position
• Find the second smallest element and exchange it with the
element in the second position
• Continue until the array is sorted
• Disadvantage:
• Running time depends only slightly on the amount of order
in the file
06/10/2024 12
Example
8 4 6 9 2 3 1 1 2 3 4 9 6 8
1 4 6 9 2 3 8 1 2 3 4 6 9 8
1 2 6 9 4 3 8 1 2 3 4 6 8 9
1 2 3 9 4 6 8 1 2 3 4 6 8 9
06/10/2024 13
Selection Sort
Alg.: SELECTION-SORT(A) 8 4 6 9 2 3 1
n ← length[A]
for j ← 1 to n - 1
do smallest ← j
for i ← j + 1 to n
do if A[i] < A[smallest]
then smallest ← i
exchange A[j] ↔ A[smallest]
06/10/2024 14
Analysis of Selection Sort
Alg.: SELECTION-SORT(A) cost
n ← length[A] times
for j ← 1 to n - 1 c1 1
do smallest ← j c2 n
»n2/2
c3 j 1 (nn-
n 1
j 1)
comparisons for i ← j + 1 to n
n 1
c5
exchange A[j] ↔ A[smallest]
n 1 n 1 n 1
c1 c2 n c3 (n 1) c4 (n j 1) c5 n j c6 n j c7 (n 1) (n 2 ) 15
T (n) 06/10/2024
j 1 j 1 j 2
Insertion Sort
• Idea: like sorting a hand of playing cards
• Start with an empty left hand and the cards facing down
on the table.
• Remove one card at a time from the table, and insert it
into the correct position in the left hand
• compare it with each of the cards already in the hand, from
right to left
• The cards held in the left hand are sorted
• these cards were originally the top cards of the pile on the
table
06/10/2024 16
Insertion Sort
12
06/10/2024 17
Insertion Sort
6 10 24 36
12
06/10/2024 18
Insertion Sort
6 10 24 3
6
12
06/10/2024 19
InsertionSort 6 4 3 8 5
example
Start by moving A[1] toward
the beginning of the list until
you find something smaller
(or can’t go any further): Then move A[3]:
6 4 3 8 5 3 4 6 8 5
4 6 3 8 5 3 4 6 8 5
Then move A[2]: Then move A[4]:
4 6 3 8 5 3 4 6 8 5
3 4 6 8 5 3 4 5 6 8
06/10/2024
Then we are20done!
Insertion Sort
input array
5 2 4 6 1 3
sorted unsorted
06/10/2024 21
Insertion Sort
06/10/2024 22
INSERTION-SORT
Alg.: INSERTION-SORT(A) 1 2 3 4 5 6 7 8
for j ← 2 to n a1 a2 a3 a4 a5 a6 a7 a8
do key ← A[ j ] key
Insert A[ j ] into the sorted sequence A[1 . . j -1]
i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key
• Insertion sort – sorts the elements in place
06/10/2024 23
Loop Invariant for Insertion Sort
Alg.: INSERTION-SORT(A)
for j ← 2 to n
do key ← A[ j ]
Insert A[ j ] into the sorted sequence A[1 . . j -1]
i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key
Invariant: at the start of the for loop the elements in A[1 . . j-1] are in
sorted order
06/10/2024 24
Proving Loop Invariants
• Proving loop invariants works like induction
• Initialization (base case):
• It is true prior to the first iteration of the loop
• Termination:
• When the loop terminates, the invariant gives us a useful property that
helps show that the algorithm is correct
• Stop the induction when the loop terminates
06/10/2024 25
Loop Invariant for Insertion Sort
• Initialization:
• Just before the first iteration, j = 2:
the subarray A[1 . . j-1] = A[1], (the
element originally in A[1]) – is sorted
06/10/2024 26
Loop Invariant for Insertion Sort
•Maintenance:
• the while inner loop moves A[j -1], A[j -2], A[j -3], and
so on, by one position to the right until the proper position
for key (which has the value that started out in A[j]) is
found
• At that point, the value of key is placed into this position.
06/10/2024 27
Loop Invariant for Insertion Sort
•Termination:
• The outer for loop ends when j = n + 1 j-1 = n
• Replace n with j-1 in the loop invariant:
• the subarray A[1 . . n] consists of the elements originally in A[1 . .
n], but in sorted order
j-1 j
Invariant: at the start of the for loop the elements in A[1 . . j-1] are in
sorted order
06/10/2024 28
Analysis of Insertion Sort
INSERTION-SORT(A) cost
for j ← 2 to n times
do key ← A[ j ] c1 n
Insert A[ j ] into the sorted sequence A[1 . . j -1] c2 n-1
i←j-1 0 n-1
n-1t
n
(t
n
1)
do A[i + 1] ← A[i] c5 j 2 j
(t
n
1)
i←i–1 c6
j 2 j
A[i + 1] ← key
c7
tj: # of times the while statement is executed at iteration j
c n-1
T (n) c1n c2 (n 1) c4 (n 1) c5 t j c6 t j 1 c7 t j 1 c8 (n 1)
n n n 8
j 2 j 2 j 2
06/10/2024 29
Best Case Analysis
• The array is already sorted “while i > 0 and A[i] > key”
• A[i] ≤ key upon the first time the while loop test is run
(when i = j -1)
• tj = 1
= an + b = (n)
T (n) c1n c2 (n 1) c4 (n 1) c5 t j c6 t j 1 c7 t j 1 c8 (n 1)
n n n
j 2 j 2 j 2
06/10/2024 30
Worst Case Analysis
• The array is in reverse sorted order “while i > 0 and A[i] > key”
• Always A[i] > key in while loop test
• Have to compare key with all elements to the left of the j-th position
compare with j-1 elements tj = j
n
n(n 1) n
n(n 1) n
n(n 1)
using
j 1
j
2
j
j 2 2
1 ( j 1)
j 2 2
we have:
n( n 1) n( n 1) n( n 1)
T ( n ) c1n c2 ( n 1) c4 ( n 1) c5 1 c6 c7 c8 ( n 1)
2 2 2
an 2 bn c a quadratic function of n
06/10/2024 j 2 j 2 j 2 31
Comparisons and Exchanges in Insertion Sort
INSERTION-SORT(A) cost
for j ← 2 to n times
do key ← A[ j ] c1 n
06/10/2024 33
Insertion Sort
1. Does it work?
2. Is it fast?
Plucky the
06/10/2024 Pedantic Penguin34
Claim: InsertionSort “works”
• “Proof:” It just worked in this example:
6 4 3 8 5
6 4 3 8 5 3 4 6 8 5
4 6 3 8 5 3 4 6 8 5
4 6 3 8 5 3 4 6 8 5
3 4 6 8 5 3 4 5 6 8 Sorted!
06/10/2024 35
Claim: InsertionSort “works”
• “Proof:” I did it on a bunch of random lists and it
always worked:
06/10/2024 36
What does it mean to “work”?
• Is it enough to be correct on only one input?
• Is it enough to be correct on most inputs?
06/10/2024 37
Worst-case analysis
Think of it like a game: Worst-case analysis guarantee:
Algorithm should work (and be
fast) on that worst-case input.
Here is my algorithm!
Algorithm:
Do the thing
Do the stuff
Here is an input!
Return the answer
(Which I designed to be
terrible for your
algorithm!)
Algorithm
designer
06/10/2024 42
The Plan
• InsertionSort recap
• Worst-case Analysis
• Back to InsertionSort: Does it work?
• Asymptotic Analysis
• Back to InsertionSort: Is it fast?
• MergeSort
• Does it work?
• Is it fast?
06/10/2024 43
In this class we will use…
• Big-Oh notation!
• Gives us a meaningful way to talk about the
running time of an algorithm, independent of
programming language, computing platform, etc.,
without having to count all the operations.
06/10/2024 44
Main idea:
Focus on how the runtime scales with n (the input size).
Asymptotic Running
Number of operations Time
ms
06/10/2024 47
pronounced “big-oh of …” or sometimes “oh of …”
• We say “ is ” if:
for large enough n,
is at most some constant multiple of .
10
2 +
2 n
) =
T( n
g(n) = n2
06/10/2024 49
Example for large enough n,
is at most some constant
multiple of .
3g(n) = 3n2
10
2 +
2 n
) =
T( n
g(n) = n2
06/10/2024 50
Example for large enough n,
is at most some constant
multiple of .
n0=4
3g(n) = 3n2
10
2 +
2 n
) =
T( n
g(n) = n2
06/10/2024 51
Formal definition of O(…)
• Let , be functions of positive integers.
• Think of as a runtime: positive and increasing in n.
• Formally,
“There exists”
“such that”
06/10/2024 52
Example
10
2 +
2 n
) =
T( n
g(n) = n2
06/10/2024 53
Example
3g(n) = 3n2
(c=3)
10
2 +
2 n
) =
T( n
g(n) = n2
06/10/2024 54
Example
n0=4
3g(n) = 3n2
(c=3)
10
2 +
2 n
) =
T( n
g(n) = n2
06/10/2024 55
Example
n0=4
Formally:
3g(n) = 3n 2
• Choose c = 3
• Choose n0 = 4
0 • Then:
2 +
1
2 n
) =
T( n
g(n) = n2
06/10/2024 56
Same example
Formally:
7g(n) = 7n2 • Choose c = 7
• Choose n0 = 2
0 • Then:
2 +
1
2 n
) =
T( n
g(n) = n2
is no t a
There o ice
t ” c h
n0=2 “correc n0
o f c a n d
06/10/2024 57
Take-away from examples
• To prove T(n) = O(g(n)), you have to come up with c
and n0 so that the definition is satisfied.
06/10/2024 58
This is my
def InsertionSort(A):
for i in range(1,len(A)):
current = A[i]
j = i-1
while j >= 0 and A[j] > current: n-1 iterations
A[j+1] = A[j] of the outer
j -= 1 loop
A[j+1] = current
Can we do better?
06/10/2024 61
The Plan
• InsertionSort recap
• Worst-case analyisis
• Back to InsertionSort: Does it work?
• Asymptotic Analysis
• Back to InsertionSort: Is it fast?
• MergeSort
• Does it work?
• Is it fast?
06/10/2024 62
Can we do better?
• MergeSort: a divide-and-conquer approach
• Recall from last time:
Divide and
Conquer: Big problem
Smaller Smaller
problem problem
Recurse! Recurse!
6 4 3 8 1 5 2 7
Recursive magic! Recursive magic!
3 4 6 8 1 2 5 7
MERGE! 1 2 3 4 5 6 7 8
06/10/2024 64
MergeSort Pseudocode
MERGESORT(A):
• n = length(A)
• if n 1: If A has length 1,
It is already sorted!
• return A
Sort the left half
• L = MERGESORT(A[ 0 : n/2])
Sort the right half
• R = MERGESORT(A[n/2 : n ])
• return MERGE(L,R) Merge the two halves
06/10/2024 65
MergeSort Pseudocode
06/10/2024 66
Two questions
Empirically:
1. Seems to work.
2. Seems fast.
06/10/2024 69
Assume that n is a power of 2
for convenience.
It’s fast
CLAIM:
MergeSort runs in time
06/10/2024 70
vs. ?
06/10/2024 72
All logarithms in this course are base 2
Aside:
Halve 5 times
⇒
64, 32, 16, 8, 4, 2, 1
log(64) = 6
06/10/2024 74
Assume that n is a power of 2
for convenience.
Now let’s prove the claim
CLAIM:
MergeSort runs in time
06/10/2024 75
Let’s prove the claim
Size n Level 0
06/10/2024
(Size 1) Level
76
log(n)
How much work in this sub-problem?
+
Time spent within the
n/2 t+1
n/2 t+1
two sub-problems
06/10/2024 77
How much work in this sub-problem?
Let k=n/2t…
+
Time spent within the
k/2 k/2 two sub-problems
06/10/2024 78
How long does it k Code for the MERGE
step is given in the
Lecture2 notebook.
take to MERGE? k/2 k/2
k/2 k/2
3 4 6 8 1 2 5 7
MERGE! 1 2 3 4 5 6 7 8
06/10/2024 k 79
How long does it k Code for the MERGE
step is given in the
Lecture2 notebook.
take to MERGE? k/2 k/2
k/2 k/2
06/10/2024 80
Recursion tree
Size n
n/2 n/2
…
n/2t n/2t n/2t n/2t n/2t n/2t There are O(k) operations
done at this node.
…
k
k/2 k/281
(Size06/10/2024
1)
Recursion tree Think, Pair,
Share!
…
This level?
This level?
n/2t n/2t n/2t n/2t n/2t n/2t There are O(k) operations
done at this node.
…
k
k/2 k/282
(Size06/10/2024
1)
Recursion tree #
Size of
each
Amount of work
Level problems at this level
problem
Size n
0 1 n O(n)
n/2 n/2
1 2 n/2 O(n)
n/4 n/4 n/4 n/4 2 4 n/4 O(n)
… … Explanation for this table
done on the board!
n/2t
n/2t n/2t n/2t n/2t n/2t t 2t n/2t O(n)
… …
log(n)
(Size06/10/2024
1) n 1 O(n)
83
Total runtime…
• log(n) + 1 levels
• O( n log(n) ) total!
06/10/2024 85
The Plan
• InsertionSort recap
• Worst-case analyisis
• Back to InsertionSort: Does it work?
• Asymptotic Analysis
• Back to InsertionSort: Is it fast?
• MergeSort
• Does it work?
• Is it fast?
Wrap-Up
06/10/2024 86
Recap
• InsertionSort runs in time O(n2)
• MergeSort is a divide-and-conquer algorithm that
runs in time O(n log(n))