0% found this document useful (0 votes)

412 views42 pages

Algorithms For Data Science: CSOR W4246

This document discusses algorithms for data science and summarizes key points about asymptotic analysis. It begins by outlining topics on asymptotic notation, the divide and conquer principle applied to mergesort, and solving recurrences. It then provides definitions and examples for Big-O, Big-Omega, Theta, little-o, and little-omega notation to classify algorithms by growth rate. The document explains how mergesort uses the divide and conquer approach to sort arrays. It analyzes the running time of the Merge subroutine, showing it takes linear time in the total length of the two input lists.

Uploaded by

Rishabh Shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

412 views42 pages

Algorithms For Data Science: CSOR W4246

Uploaded by

Rishabh Shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Algorithms for Data Science

CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Thursday, September 10, 2015

Outline

1 Asymptotic notation

2 The divide & conquer principle; application: mergesort

3 Solving recurrences and running time of mergesort

Review of the last lecture

I
I

Introduced the problem of sorting.

Analyzed insertion-sort.
I
I

Worst-case running time: T (n) = O(n2 )

Space: in-place algorithm

Worst-case running time analysis: a reasonable measure of

algorithmic efficiency.

Defined polynomial-time algorithms as efficient.

Argued that detailed characterizations of running times are

not convenient for understanding scalability of algorithms.

Running time in terms of # primitive steps

We need a coarser classification of running times of algorithms;

exact characterizations
I

are too detailed;

do not reveal similarities between running times in an

immediate way as n grows large;

are often meaningless: pseudocode steps will expand by

a constant factor that depends on the hardware.

Today

1 Asymptotic notation

2 The divide & conquer principle; application: mergesort

3 Solving recurrences and running time of mergesort

Aymptotic analysis

A framework that will allow us to compare the rate of growth of

different running times as the input size n grows.
I

We will express the running time as a function of

the number of primitive steps.

The number of primitive steps is itself a function

of the size of the input n.

The running time is a function of the size of the input n.

To compare functions expressing running times, we will

ignore their low-order terms and focus solely on the
highest-order term.

Asymptotic upper bounds: Big-O notation

Definition 1 (O).
We say that T (n) = O(f (n)) if there exist constants c > 0 and
n0 0 s.t. for all n n0 , we have T (n) c f (n) .

T(n) = O(f(n))
c f(n)

T(n)

n
n0

Asymptotic upper bounds: Big-O notation

Definition 2 (O).
We say that T (n) = O(f (n)) if there exist constants c > 0 and
n0 0 s.t. for all n n0 , we have T (n) c f (n) .
Examples:
I

T (n) = an2 + b, a, b > 0 constants and f (n) = n2 .

T (n) = an2 + b, f (n) = n3 .

Asymptotic lower bounds: Big- notation

Definition 3 ().
We say that T (n) = (f (n)) if there exist constants c > 0 and
n0 0 s.t. for all n n0 , we have T (n) c f (n).

T(n) = (f(n))
T(n)

c f(n)

Asymptotic lower bounds: Big- notation

Definition 4 ().
We say that T (n) = (f (n)) if there exist constants c > 0 and
n0 0 s.t. for all n n0 , we have T (n) c f (n).
Examples:
I

T (n) = an2 + b, a, b > 0 constants and f (n) = n2 .

T (n) = an2 + b, a, b > 0 constants and f (n) = n.

Asymptotic tight bounds: notation

Definition 5 ().
We say that T (n) = (f (n)) if there exist constants c1 , c2 > 0
and n0 0 s.t. for all n n0 , we have
c1 f (n) T (n) c2 f (n).
c2 f(n)
T(n) = (f(n))

T(n)
c1 f(n)

n
n0

Asymptotic tight bounds: notation

Definition 6 ().
We say that T (n) = (f (n)) if there exist constants c1 , c2 > 0
and n0 0 s.t. for all n n0 , we have
c1 f (n) T (n) c2 f (n).
Equivalent definition
T (n) = (f (n)) if T (n) = O(f (n)) and T (n) = (f (n))
Examples:
I

T (n) = an2 + b, a, b > 0 constants and f (n) = n2 .

T (n) = n log n + n, and f (n) = n log n.

Asymptotic upper bounds that are not tight: little-o

Definition 7 (o).
We say that T (n) = o(f (n)) if for any constant c > 0 there
exists a constant n0 0 s.t. for all n n0 , we have
T (n) < c f (n) .

Asymptotic upper bounds that are not tight: little-o

Definition 7 (o).
We say that T (n) = o(f (n)) if for any constant c > 0 there
exists a constant n0 0 s.t. for all n n0 , we have
T (n) < c f (n) .
I

Intuitively, T (n) becomes insignificant relative to f (n) as

n .

Proof by showing that lim

T (n)
n f (n)

= 0 (if the limit exists).

Asymptotic upper bounds that are not tight: little-o

Definition 7 (o).
We say that T (n) = o(f (n)) if for any constant c > 0 there
exists a constant n0 0 s.t. for all n n0 , we have
T (n) < c f (n) .
I

Intuitively, T (n) becomes insignificant relative to f (n) as

n .

Proof by showing that lim

T (n)
n f (n)

= 0 (if the limit exists).

Examples:
I

T (n) = an2 + b, a, b > 0 constants and f (n) = n3 .

T (n) = n log n, a, b, d > 0 constants and f (n) = n2 .

Asymptotic lower bounds that are not tight: little-

Definition 8 ().
We say that T (n) = (f (n)) if for any constant c > 0 there
exists n0 0 s.t. for all n n0 , we have T (n) > c f (n).

Asymptotic lower bounds that are not tight: little-

Definition 8 ().
We say that T (n) = (f (n)) if for any constant c > 0 there
exists n0 0 s.t. for all n n0 , we have T (n) > c f (n).
I

Intuitively T (n) becomes arbitrarily large relative to f (n)

as n .

T (n) = (f (n)) implies that lim

T (n)
n f (n)

exists. Then f (n) = o(T (n)).

= if the limit

Asymptotic lower bounds that are not tight: little-

Definition 8 ().
We say that T (n) = (f (n)) if for any constant c > 0 there
exists n0 0 s.t. for all n n0 , we have T (n) > c f (n).
I

Intuitively T (n) becomes arbitrarily large relative to f (n)

as n .

T (n) = (f (n)) implies that lim

T (n)
n f (n)

exists. Then f (n) = o(T (n)).

Examples:
I

T (n) = n2 and f (n) = n log n.

T (n) = 2n and f (n) = n5 .

= if the limit

Basic rules for omitting low order terms from functions

1. Ignore multiplicative factors: e.g., 10n3 becomes n3

2. na dominates nb if a > b: e.g., n2 dominates n
3. Exponentials dominate polynomials: e.g., 2n dominates n4
4. Polynomials dominate logarithms: e.g., n dominates log3 n
For large enough n,
log n < n < n log n < n2 < 2n < 3n < nn

Notation: log n stands for log2 n

Properties of asymptotic growth rates

Transitivity
1. If f = O(g) and g = O(h) then f = O(h).
2. If f = (g) and g = (h) then f = (h).
3. If f = (g) and g = (h) then f = (h).
Sums of (up to a constant number of) functions
1. If f = O(h) and g = O(h) then f + g = O(h).
2. Let k be a fixed constant, and let f1 , f2 , . . . , fk , h be
functions s.t. for all i, fi = O(h). Then
f1 + f2 + . . . + fk = O(h).
Transpose symmetry
I

f = O(g) if and only if g = (f ).

f = o(g) if and only if g = (f ).

Today

1 Asymptotic notation

2 The divide & conquer principle; application: mergesort

3 Solving recurrences and running time of mergesort

The divide & conquer principle

Divide the problem into a number of subproblems that are

smaller instances of the same problem.

Conquer the subproblems by solving them recursively.

Combine the solutions to the subproblems into the

solution for the original problem.

Divide & Conquer applied to sorting

Divide the problem into a number of subproblems that are

smaller instances of the same problem.
Divide the input array into two lists of equal size.

Conquer the subproblems by solving them recursively.

Sort each list recursively. (Stop when lists have size 2.)

Combine the solutions to the subproblems into the

solution for the original problem.
Merge the two sorted lists and output the sorted array.

Mergesort: pseudocode
Mergesort (A, lef t, right)
if right == lef t then return
end if
middle = lef t + b(right lef t)/2c
Mergesort (A, lef t, middle)
Mergesort (A, middle + 1, right)
Merge (A, lef t, middle, right)

Remarks
I

Mergesort is a recursive procedure (why?)

Initial call: Mergesort(A, 1, n)

Subroutine Merge merges two sorted lists of sizes bn/2c, dn/2e

into one sorted list of size n. How can we accomplish this?

Merge: intuition

Intuition: To merge two sorted lists of size n/2 repeatedly

compare the two items in the front of the two lists;

extract the smaller item and append it to the output;

update the front of the list from which the item was
extracted.

Example: n = 8, L = {1, 3, 5, 7}, R = {2, 6, 8, 10}

Merge: pseudocode
Merge (A, lef t, right, mid)
L = A[lef t, mid]
R = A[mid + 1, right]
Maintain two pointers CurrentL, CurrentR initialized to point to
the first element of L, R
while both lists are nonempty do
Let x, y be the elements pointed to by CurrentL, CurrentR
Compare x, y and append the smaller to the output
Advance the pointer in the list with the smaller of x, y
end while
Append the remainder of the non-empty list to the output.
Remark: the output is stored directly in A[lef t, right], thus the
subarray A[lef t, right] is sorted after Merge(A, lef t, right, mid).

Merge: optional exercises

Exercise 1: write detailed pseudocode (or Python code) for

Merge

Exercise 2: write a recursive Merge

Analysis of Merge

1. Correctness

2. Running time

3. Space

Analysis of Merge: correctness

1. Correctness: the smaller number in the input is L[1] or

R[1] and it will be the first number in the output. The rest
of the output is just the list obtained by Merge(L, R) after
deleting the smallest element.
2. Running time

3. Space

Merge: pseudocode
Merge (A, lef t, right, mid)
L = A[lef t, mid]
not a primitive computational step!
R = A[mid + 1, right] not a primitive computational step!
Maintain two pointers CurrentL, CurrentR initialized to point to
the first element of L, R
while both lists are nonempty do
Let x, y be the elements pointed to by CurrentL, CurrentR
Compare x, y and append the smaller to the output
Advance the pointer in the list with the smaller of x, y
end while
Append the remainder of the non-empty list to the output.
Remark: the output is stored directly in A[lef t, right], thus the
subarray A[lef t, right] is sorted after Merge(A, lef t, right, mid).

Analysis of Merge: running time

1. Correctness: the smaller number in the input is L[1] or

R[1] and it will be the first number in the output. The rest
of the output is just the list obtained by Merge(L, R) after
deleting the smallest element.
2. Running time:
I
I

Suppose L, R have n/2 elements each

How many iterations before all elements from both lists have
been appended to the output?
How much work within each iteration?

3. Space

Analysis of Merge: space

1. Correctness: the smaller number in the input is L[1] or

R[1] and it will be the first number in the output. The rest
of the output is just the list obtained by Merge(L, R) after
deleting the smallest element.
2. Running time:
L, R have n/2 elements each
How many iterations before all elements from both lists have
been appended to the output? At most n 1.
I How much work within each iteration? Constant.
Merge takes O(n) time to merge L, R (why?).
I

3. Space: extra (n) space to store L, R (the sorted output

is stored directly in A).

Example of Mergesort

Input: 1, 7, 4, 3, 5, 8, 6, 2

Analysis of Mergesort

1. Correctness

2. Running time

3. Space

Mergesort: correctness
For simplicity, assume n = 2k , integer k 0. We will use
induction on k.
I

Base case: For k = 0, the input consists of n = 1 item;

Mergesort returns the item.

Induction Hypothesis: For k > 0, assume that

Mergesort correctly sorts any list of size 2k .
Induction Step: We will show that Mergesort correctly
sorts any list of size 2k+1 .

I
I

The input list is split into two lists, each of size 2k .

Mergesort recursively calls itself on each list. By the
hypothesis, when the subroutines return, each list is sorted.
Since Merge is correct, it will merge these two sorted lists
into one sorted output list of size 2 2k .
Thus Mergesort correctly sorts any input of size 2k+1 .

Running time of Mergesort

The running time of Mergesort satisfies:
T (n) = 2T (n/2) + cn, for n 2, constant c > 0
T (1) = c
This structure is typical of recurrence relations
I

an inequality or equation bounds T (n) in terms of an

expression involving T (m) for m < n

a base case generally says that T (n) is constant for small

constant n

Remarks
I

We ignore floor and ceiling notations.

A recurrence does not provide an asymptotic bound for

T (n): to this end, we must solve the recurrence.

Today

1 Asymptotic notation

2 The divide & conquer principle; application: mergesort

3 Solving recurrences and running time of mergesort

Solving recurrences, method 1: recursion trees

The technique consists of three steps

1. Analyze the first few levels of the tree of recursive calls
2. Identify a pattern
3. Sum over all levels of recursion
Example: analysis of running time of Mergesort
T (n) = 2T (n/2) + cn, n 2
T (1) = c

A general recurrence and its solution

The running times of many recursive algorithms can be
expressed by the following recurrence
T (n) = aT (n/b) + cnk , for a, c > 0, b > 1,k 0
What is the recursion tree for this recurrence?
I

a is the branching factor

b is the factor by which the size of each subproblem shrinks

at level i, there are ai subproblems, each of size n/bi

each subproblem at level i requires c(n/bi )k work
I

the height of the tree is logb n levels

Total work:

Plogb n
i=0

ai c(n/bi )k = cnk

log
Pb n
i=0

a i
bk

Solving recurrences, method 2: Master theorem

Theorem 9 (Master theorem).

If T (n) = aT (dn/be) + O(nk ) for some
k 0, then

O(nlogb a ) ,
O(nk log n) ,
T (n) =

O(nk ) ,

constants a > 0, b > 1,

if a > bk
if a = bk
if a < bk

Example: running time of Mergesort

T (n) = 2T (n/2) + cn:

a = 2, b = 2, k = 1, bk = 2 = a T (n) = O(n log n)

Solving recurrences, method 3: the substitution method

The technique consists of two steps
1. Guess a bound
2. Use (strong) induction to prove that the guess is correct

Remark 1 (simple vs strong induction).

1. Simple induction: the induction step at n requires that the
inductive hypothesis holds at step n 1.
2. Strong induction: the induction step at n requires that the
inductive hypothesis holds at all steps 1, 2, . . . , n 1.

Exercise: show inductively that Mergesort runs in time

O(n log n).

What about...

1. T (n) = 2T (n 1) + 1, T (1) = 2

2. T (n) = 2T 2 (n 1), T (1) = 4

3. T (n) = T (2n/3) + T (n/3) + cn

2 Mergesort
No ratings yet
2 Mergesort
43 pages
Asymptotic Notation in Algorithms
No ratings yet
Asymptotic Notation in Algorithms
25 pages
Unit-1 DAA - Notes
No ratings yet
Unit-1 DAA - Notes
25 pages
Dis01 Sol
No ratings yet
Dis01 Sol
6 pages
Algorithm Analysis & Asymptotic Performance
No ratings yet
Algorithm Analysis & Asymptotic Performance
99 pages
Introduction To Asymptotic Analysis
No ratings yet
Introduction To Asymptotic Analysis
38 pages
Course Notes 2: N I 1 N (n+1) 2
No ratings yet
Course Notes 2: N I 1 N (n+1) 2
7 pages
DSAIIMid Bank
No ratings yet
DSAIIMid Bank
323 pages
University Institute of Engineering CSE-2 Year: Advanced Data Structures and Algorithms
No ratings yet
University Institute of Engineering CSE-2 Year: Advanced Data Structures and Algorithms
23 pages
Asymptotic Analysis
No ratings yet
Asymptotic Analysis
11 pages
Unacademy - Algorithms
No ratings yet
Unacademy - Algorithms
46 pages
Week 2 Growth Asymptotic Insertion 05042021 022500pm
No ratings yet
Week 2 Growth Asymptotic Insertion 05042021 022500pm
48 pages
2022 CS124 Lec3 Notes
No ratings yet
2022 CS124 Lec3 Notes
8 pages
infIILecture5 en Handout
No ratings yet
infIILecture5 en Handout
50 pages
CSE 5311 Design And: Analysis of Algorithms
No ratings yet
CSE 5311 Design And: Analysis of Algorithms
19 pages
Introduction To Algorithms CS 445 Spring 2005
No ratings yet
Introduction To Algorithms CS 445 Spring 2005
17 pages
Asymptotic Complexity: Programming & Data Structures
No ratings yet
Asymptotic Complexity: Programming & Data Structures
9 pages
DSA02 Analysis
No ratings yet
DSA02 Analysis
48 pages
2IL50 Data Structures: 2018-19 Q3 Lecture 2: Analysis of Algorithms
No ratings yet
2IL50 Data Structures: 2018-19 Q3 Lecture 2: Analysis of Algorithms
39 pages
Coe428 3
No ratings yet
Coe428 3
26 pages
01 Slides
100% (1)
01 Slides
109 pages
Coe428 4
No ratings yet
Coe428 4
51 pages
Algortihm Short Notes
No ratings yet
Algortihm Short Notes
22 pages
Asymptotic Analysis and Recurrences
No ratings yet
Asymptotic Analysis and Recurrences
6 pages
Algorithm Analysis1
No ratings yet
Algorithm Analysis1
34 pages
RecursiveAlgo RunningTime
No ratings yet
RecursiveAlgo RunningTime
36 pages
Lecture 2 DS
No ratings yet
Lecture 2 DS
41 pages
Bhaichara
No ratings yet
Bhaichara
109 pages
Algorithm Complexity Guide
No ratings yet
Algorithm Complexity Guide
9 pages
Asymptotic Analysis of Algorithm Costs
No ratings yet
Asymptotic Analysis of Algorithm Costs
39 pages
1 Asymptotic Notation Intro
No ratings yet
1 Asymptotic Notation Intro
4 pages
Daa Unt 1
No ratings yet
Daa Unt 1
72 pages
Asymptotic Notations: DR Ashok Kumar Sahoo Mobile: 9810226795 E-Mail: Ashok - Sahoo@gehu - Ac.in
No ratings yet
Asymptotic Notations: DR Ashok Kumar Sahoo Mobile: 9810226795 E-Mail: Ashok - Sahoo@gehu - Ac.in
28 pages
Algorithm
No ratings yet
Algorithm
48 pages
1 - Basics and Asymptotic Analysis
No ratings yet
1 - Basics and Asymptotic Analysis
43 pages
DAA or Algorithms
No ratings yet
DAA or Algorithms
77 pages
2.maths Background
No ratings yet
2.maths Background
31 pages
Algorithms: Freely Using The Textbook by Cormen, Leiserson, Rivest, Stein
No ratings yet
Algorithms: Freely Using The Textbook by Cormen, Leiserson, Rivest, Stein
204 pages
GCD Paths in Rooted Trees
No ratings yet
GCD Paths in Rooted Trees
204 pages
Introduction To Algorithms
No ratings yet
Introduction To Algorithms
27 pages
Small 05 Div and Conquer
No ratings yet
Small 05 Div and Conquer
47 pages
Advanced Algorithms Lecture Notes
No ratings yet
Advanced Algorithms Lecture Notes
157 pages
CSE-221 Algorithms
No ratings yet
CSE-221 Algorithms
56 pages
Asymptotic Analysis of Algorithms
No ratings yet
Asymptotic Analysis of Algorithms
26 pages
CSE3358 Problem Set 2 Solutions
No ratings yet
CSE3358 Problem Set 2 Solutions
7 pages
Fundamentals of Analysis of Algorithms Efficiency
No ratings yet
Fundamentals of Analysis of Algorithms Efficiency
40 pages
Alg Wk3
No ratings yet
Alg Wk3
31 pages
Asymptotic Notations GATE Bits in PDF
No ratings yet
Asymptotic Notations GATE Bits in PDF
9 pages
Chapter Revi 1
No ratings yet
Chapter Revi 1
39 pages
Algorithm Analysis & Complexity Guide
No ratings yet
Algorithm Analysis & Complexity Guide
62 pages
CS161: Design and Analysis of Algorithms
No ratings yet
CS161: Design and Analysis of Algorithms
87 pages
Algorithm Assignment
No ratings yet
Algorithm Assignment
15 pages
DAA Unit 1 Notes
No ratings yet
DAA Unit 1 Notes
34 pages
Asymptotic Notations
No ratings yet
Asymptotic Notations
39 pages
Advanced DSA
No ratings yet
Advanced DSA
13 pages
Big-Oh Analysis of Algorithms
No ratings yet
Big-Oh Analysis of Algorithms
75 pages
DAA Unit 1 Notes
No ratings yet
DAA Unit 1 Notes
27 pages
Daa L2
No ratings yet
Daa L2
19 pages
Figure Out Processing Differences in R, SAS and SQL
No ratings yet
Figure Out Processing Differences in R, SAS and SQL
1 page
The R Inferno
No ratings yet
The R Inferno
126 pages
Coursera Financial Aid
100% (3)
Coursera Financial Aid
2 pages
Seemingly Unrelated Regressions Guide
No ratings yet
Seemingly Unrelated Regressions Guide
21 pages
A Systematic Approach To Identify Systemically Important Firms: A Summary
No ratings yet
A Systematic Approach To Identify Systemically Important Firms: A Summary
2 pages
Weekly Book Picks 2008-2011
No ratings yet
Weekly Book Picks 2008-2011
4 pages
Indian Readership Survey 2010 Q2 Insights
No ratings yet
Indian Readership Survey 2010 Q2 Insights
14 pages
Films of 2008
No ratings yet
Films of 2008
76 pages