Heap Sort
Heap Sort
Algorithms
(2.3,4.1,4.2,6.1,6.2)
Introduction to heapsort
Sorting Revisited
● So far we’ve talked about two algorithms to
sort an array of numbers
■ What is the advantage of merge sort?
■ What is the advantage of insertion sort?
● Next on the agenda: Heapsort
■ Combines advantages of both previous algorithms
● Like merge sort, but unlike insertion sort, heapsort’s
running time is O(n lg n).
● Like insertion sort, but unlike merge sort, heapsort
sorts in place: only a constant number of array
elements are stored outside the input array at any
time.
● Thus, heapsort combines the better attributes of the
two sorting algorithms we have already discussed .
Heaps
● A heap can be seen as a complete binary tree:
16
14 10
8 7 9 3
2 4 1
16
14 10
8 7 9 3
2 4 1 1 1 1 1 1
14 10
8 7 9 3
A = 16 14 10 8 7 9 3 2 4 1 =
2 4 1
Heaps
● To represent a complete binary tree as an array:
■ The root node is A[1]
■ Node i is A[i]
■ The parent of node i is A[i/2] (note: integer divide)
■ The left child of node i is A[2i]
■ The right child of node i is A[2i + 1]
16
14 10
8 7 9 3
A = 16 14 10 8 7 9 3 2 4 1 =
2 4 1
Referencing Heap Elements
● So…
Parent(i) { return i/2; }
Left(i) { return 2*i; }
right(i) { return 2*i + 1; }
● There are two kinds of binary heaps:
● max-heaps and min-heaps.
The Heap Property
● Heaps also satisfy the heap property:
● In a max-heap, the max-heap property is that
for every node i, other than the root,
A[Parent(i)] A[i] for all nodes i > 1
■ In other words, the value of a node is at most the
value of its parent
■ Where is the largest element in a heap stored?
The Heap Property
● A min-heap is organized in the opposite way; the
min-heap property is that for every node i other than
the root,
● A[PARENT(i )] ≤ A[i ] .
● The smallest element in a min-heap is at the root.
● Definitions:
■ The height of a node in the tree = the number of
edges on the longest downward path to a leaf
■ The height of a tree = the height of its root
Heap Height
● What is the height of an n-element heap?
Why?
● we define the height of the heap to be the
height of its root.
● Since a heap of n elements is based on a
complete binary tree, its height is θ (lg n)
Heap Operations: MAX_Heapify()
● MAX_Heapify(): maintain the heap property
■ Given: a node i in the heap with children l and r
■ Given: two subtrees rooted at l and r, assumed to be
heaps
■ Problem: The subtree rooted at i may violate the
heap property (How?)
■ Action: let the value of the parent node “float
down” so subtree at i satisfies the heap property
○ What do you suppose will be the basic operation between
i, l, and r?
Heap Operations: MAX_Heapify()
MAX_Heapify(A, i)
{
l = Left(i); r = Right(i);
if (l <= heap_size(A) && A[l] > A[i])
largest = l;
else
largest = i;
if (r <= heap_size(A) && A[r] > A[largest])
largest = r;
if (largest != i)
Swap(A, i, largest);
MAX_Heapify(A, largest);
}
Heapify() Example
16
4 10
14 7 9 3
2 8 1
A = 16 4 10 14 7 9 3 2 8 1
Heapify() Example
16
4 10
14 7 9 3
2 8 1
A = 16 4 10 14 7 9 3 2 8 1
Heapify() Example
16
4 10
14 7 9 3
2 8 1
A = 16 4 10 14 7 9 3 2 8 1
Heapify() Example
16
14 10
4 7 9 3
2 8 1
A = 16 14 10 4 7 9 3 2 8 1
Heapify() Example
16
14 10
4 7 9 3
2 8 1
A = 16 14 10 4 7 9 3 2 8 1
Heapify() Example
16
14 10
4 7 9 3
2 8 1
A = 16 14 10 4 7 9 3 2 8 1
Heapify() Example
16
14 10
8 7 9 3
2 4 1
A = 16 14 10 8 7 9 3 2 4 1
Heapify() Example
16
14 10
8 7 9 3
2 4 1
A = 16 14 10 8 7 9 3 2 4 1
Heapify() Example
16
14 10
8 7 9 3
2 4 1
A = 16 14 10 8 7 9 3 2 4 1
Analyzing MAX_Heapify(): Informal
● Aside from the recursive call, what is the
running time of MAX_Heapify()?
● How many times can MAX-Heapify()
recursively call itself?
● What is the worst-case running time of
MAX_Heapify() on a heap of size n?
Analyzing Heapify(): Formal
● Fixing up relationships between i, l, and r
takes (1) time
● If the heap at i has n elements, how many
elements can the subtrees at l or r have?
■ Draw it
● Answer: 2n/3
● So time taken by MAX_Heapify() is given
by T(n) T(2n/3) + (1)
Analyzing MAX_Heapify(): Formal
● So we have
T(n) T(2n/3) + (1)
● By the Master Theorem,
T(n) = O(lg n)
● Thus, MAX_Heapify() takes linear time
Heap Operations: BuildHeap()
● We can build a heap in a bottom-up manner by
running MAX-Heapify() on successive
subarrays
■ Fact: for array of length n, all elements in range
A[(n/2 + 1) .. n] are heaps (Why?)
○ elements in the subarray A[(n/2+1) . . n] are all leaves
of the tree, and so each is a 1-element heap to begin
with.
Heap Operations: BuildHeap()
16
4 10
14 7 9 3
2 8 1
A = 16 4 10 14 7 9 3 2 8 1
Heap Operations: BuildHeap()
■ So:
○ Walk backwards through the array from n/2 to 1, calling
MAX_Heapify() on each node.
○ i.e. The procedure BuildHeap goes through the
remaining nodes of the tree and runs MAX_HEAPIFY
on each one
○ Order of processing guarantees that the children of node
i are heaps when i is processed
BuildHeap()
// given an unsorted array A, make A a heap
BuildHeap(A)
{
heap_size(A) = length(A);
for (i = length[A]/2 downto 1)
MAX_Heapify(A, i);
}
BuildHeap() Example
● Work through example
A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7}
1 3
2 16 9 10
14 8 7
Analyzing BuildHeap()
● Each call to MAX_Heapify() takes O(lg n)
time
● There are O(n) such calls (specifically, n/2)
● Thus the running time is O(n lg n)
■ Is this a correct asymptotic upper bound?
■ Is this an asymptotically tight bound?
● A tighter bound is O(n)
■ How can this be? Is there a flaw in the above
reasoning?
Heapsort
● Given BuildHeap(), an in-place sorting
algorithm is easily constructed:
■ Maximum element is at A[1]
■ Discard by swapping with element at A[n]
○ Decrement heap_size[A]
○ A[n] now contains correct value
■ Restore heap property at A[1] by calling
MAX_Heapify()
■ Repeat, always swapping A[1] for A[heap_size(A)]
Heapsort
● The heapsort algorithm starts by using BUILD-HEAP to build
a max-heap on the input array A[1 . . n], where n = length[A].
● Since the maximum element of the array is stored at the root
A[1], it can be put into its correct final position by exchanging
it with A[n].
● If we now “discard” node n from the heap (by decrementing
heap-size[A]), we observe that A[1 . . (n − 1)] can easily be
made into a max-heap.
● The children of the root remain max-heaps, but the new root
element may violate the max-heap property.
Heapsort
● All that is needed to restore the max-heap property, however,
is one call to MAX-HEAPIFY(A, 1), which leaves a max-heap
in A[1 . . (n − 1)].
● The heapsort algorithm then repeats this process for the
maxheap of size n − 1 down to a heap of size 2.
Heapsort
Heapsort(A)
{
BuildHeap(A);
for (i = length(A) downto 2)
{
Swap(A[1], A[i]);
heap_size(A) -= 1;
MAX_Heapify(A, 1);
}
}
Analyzing Heapsort
● The call to BuildHeap() takes O(n) time
● Each of the n - 1 calls to MAX_Heapify()
takes O(lg n) time
● Thus the total time taken by HeapSort()
= O(n) + (n - 1) O(lg n)
= O(n) + O(n lg n)
= O(n lg n)
Priority Queues
● Heapsort is a nice algorithm, but in practice
Quicksort (coming up) usually wins
● Application: But the heap data structure is
incredibly useful for implementing priority
queues
● As with heaps, there are two kinds of priority
queues: max-priority queues and min-priority
queues.
○ We will focus here on how to implement max-priority
queues, which are in turn based on max-heaps
Priority Queues
■ A Priority Queue is a data structure for
maintaining a set S of elements, each with an
associated value or key
■ A max-priority queue supports the following
operations.
■ Supports the operations Insert(),
Maximum(), and ExtractMax()
■ What might a priority queue be useful for?
Priority Queues
● One application of max-priority queues is to
schedule jobs on a shared computer. The max-
priority queue keeps track of the jobs to be
performed and their relative priorities. When a
job is finished or interrupted, the highest-
priority job is selected from those pending
using EXTRACT-MAX. A new job can be
added to the queue at any time using INSERT.