Average-Case Analysis of Algorithms + Randomized Algorithms
Average-Case Analysis of Algorithms + Randomized Algorithms
Ruzzo
1
insertion sort
Array A[1]..A[n]
Sorted
for i = 1..n-1 {
T = A[i] j
j = i-1 “compare” i
Unsorted
while j >= 0 && T < A[j] {
“swap”
A[j+1] = A[j]
A[j] = T
j = j-1
or
}
A[j+1] = T
2
insertion sort
Run Time
Worst Case: O(n2)
( ~n2 swaps; #compares = #swaps + n - 1)
“Average Case”
? What’s an “average” input?
One idea (and about the only one that is
analytically tractable): assume all n! permutations
of input are equally likely.
3
permutations & inversions
A permutation π = (π1, π2, ..., πn) of 1, ..., n is simply a list
of the numbers between 1 and n, in some order.
(i,j) is an inversion in π if i < j but πi > πj G. Cramer, 1750
E.g.,
π=(35142)
has six inversions: (1,3), (1,5), (2,3), (2,4), (2,5), and (4,5)
Min possible: 0: π=(12345)
Max possible: n choose 2: π=(54321)
Obviously, the goal of sorting is to remove inversions
4
inversions & insertion sort
Swapping an adjacent pair of positions that are out-of-
order decreases the number of inversions by exactly 1.
So..., number of swaps performed by insertion sort is
exactly the number of inversions present in the input.
Counting them:
5
counting inversions
There is a 1-1 correspondence between permutations
having inversion (i,j) versus not:
So:
7
quicksort run-time
Worst case: already sorted (among others) –
T(n) = n + T(n-1)
= n + (n-1) + (n-2) + ... + 1 = n(n+1)/2
Best case: pivot is always median ~n log2 n
Average case: ?
Below. Will turn out to be ~40% slower than best
Why?
Random pivots are “near the middle on average”
8
average-case analysis
Assume input is a random permutation of 1, ..., n, i.e.,
that all n! permutations are equally likely
Important subtlety:
pivots at all recursive levels will be random, too,
(unless you do something funky in the partition phase)
9
1/N because all values 1 ≤ k ≤ N
for pivot are equally likely.
10
Multiply by N;
subtract same
for N-1
Rearrange
11
12
Notes
So, average run time, averaging over randomly ordered
inputs, = Θ(n log n).
13
another idea: randomize the algorithm
Algorithm as before, except pivot is a randomly selected
element of A[1]...A[n] (at top level; A[i]..A[j] for subproblem i..j)
Analysis is the same, but conclusion is different:
On any fixed input, average run time is n log n,
averaged over repeated (random) runs of the algorithm.
14
summary
Average Case Analysis:
1. for algorithm A, choose a sample space S and probability
distribution P from which inputs are drawn
2. for x ∈ S, let T(x) be the time taken by A on input x
3. calculate, as a function of the “size,” n, of inputs,
Σx∈S T(x)•P(x)
which is the expected or average run time of A
15
summary
Randomized Algorithms:
1.for a randomized algorithm A, input x is fixed, just as usual,
from some space I of possible inputs, but the algorithm
may draw (and use) random samples y = (y1, y2, ... ) from a
given sample space S and probability distribution P
2.for any x ∈ I and any y ∈ S, let T(x,y) be the time taken by
A on input x when y is sampled from S
3.calculate, as a function of the “size,” n, of inputs,
maxx∈I Σy∈S T(x,y)•P(y)
which is the expected or average run time of A on a worst-
case input
Randomized Quicksort: choosing pivots at random,
E[time] = Θ(n log n) for any input.
16
critique
Worst-case analysis is much more common than
average-case analysis because
it’s often easier
to get meaningful average case results, a reasonable
probability model for “typical inputs” is critical, but
may be unavailable, or difficult to analyze
as with insertion sort, the results are often similar
But in some important examples, such as quicksort,
average-case is sharply better
Randomized algorithms are very important in many
areas; sometimes easier to argue that bad stuff is rare
than to deterministically circumvent it. (Fascinating
open problem: is this intrinsic?)
17