0% found this document useful (0 votes)
36 views22 pages

Week 6

The document discusses the Union-Find data structure and its application in Kruskal's Algorithm for finding the Minimum Cost Spanning Tree (MCST). It details the operations of MakeUnionFind, Find, and Union, along with their complexities, and introduces improvements to the naive implementation, leading to an amortized complexity of O(log m) for Union operations. Additionally, it covers priority queues, heaps, and their implementation in algorithms like Dijkstra's, highlighting the efficiency of heaps for dynamic sorted data.

Uploaded by

Harshdeep Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views22 pages

Week 6

The document discusses the Union-Find data structure and its application in Kruskal's Algorithm for finding the Minimum Cost Spanning Tree (MCST). It details the operations of MakeUnionFind, Find, and Union, along with their complexities, and introduces improvements to the naive implementation, leading to an amortized complexity of O(log m) for Union operations. Additionally, it covers priority queues, heaps, and their implementation in algorithms like Dijkstra's, highlighting the efficiency of heaps for dynamic sorted data.

Uploaded by

Harshdeep Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Union-Find Data Structure

Kruskal's Algorithm for Minimum Cost Spanning Tree (MCST)


Process the edges in ascending order of cost
If edge (u, v) does not create a cycle, add it

(u, v) can be added if u and v are in different components


Adding edge (u, v) merges these components
How can we keep track of components and merge them efficiently?
Components partition vertices

Collection of disjoint sets


Need data structure to maintain collection of disjoint sets

find(v) - return set containing v


union(u, v) - merge sets of u, v

Union-Find Data Structure


A set S partitioned into components {C1 , C2 , . . . , Ck }

Each s ∈ S belongs to exactly one Cj


Support the following operations

MakeUnionFind(S) - set up initial singleton components {s} , for each s ∈ S

Find(s) - return the components containing s


Union(s, s') - merges components containing s, s'

Naive Implementation
Assume S = {0, 1, . . . , n − 1}

Set up an array/dictionary Component

MakeUnionFind(S)

Set Component[i] = i for each i

Find(i)

Return Component[i]

Union(i, j)
c_old = Component[i]
c_new = Component[j]

for k in range(n):
if Component[k] == c_old:
Component[k] = c_new

Complexity

MakeUnionFind(S) - O(n)
Find(i) - O(1)
Union(i, j) - O(n)
Sequence of m Union() operations takes time O(mn)
Improved Implementation
Another array/dictionary Members

For each component c , Members[c] is a list of its members

Size[c] = length(Members[c]) is the number of members

MakeUnionFind(S)

Set Component[i] = i for all i


Set Members[i] = i, Size[i] = 1 for all i

Find(i)

Return Component[i]

Union(i, j)

c_old = Component[i]
c_new = Component[j]

for k in Members[c_old]:
Component[k] = c_new
Members[c_new].append(k)
Size[c_new] += 1

Why does this help?


MakeUnionFind(S)

Set Component[i] = i for all i


Set Members[i] = [i], Size[i] = 1 for all i
Find(i)

Return Component[i]

Union(i, j)

c_old = Component[i]
c_new = Component[j]

for k in Members[c_old]:
Component[k] = c_new
Members[c_new].append(k)
Size[c_new] += 1

Members[c_old] allows us to merge Component[i] into Component[j] in time


O(S ize[c_old]) rather than O(n)

How can we make use of Size[c]

Always merge smaller component into the larger one


If Size[c] < Size[c'] re-label c as c' , else re-label c' as c

Individual merge operatios can still take time O(n)

Both Size[c], Size[c'] could be about n/2


More careful accounting

Always merge smaller component into the larger one

For each i , size of Component[i] at least doubles each time it is re-labelled

After m Union() operations, at most 2m elements have been "touched"

Size of Component[i] is at most 2m

Size of Component[i] grows as 1, 2, 4, . . . , so i changes component at most log m

times

Over m updates

At most 2m updates are re-labelled


Each one at most O(log m) times

Overall, m Union() operations take time O(m. log m)

Works out to time O(log m) per Union() operation

Amortized complexity of Union() is O(log m)

Back to Kruskal's Algorithm


Sort E = {e0 , e1 , . . . , em−1 } in ascending order
MakeUnionFind(V) - each vertex j is in component j
Adding an edge ek = (u, v) to the tree

Check that Find(u) != Find(v)


Merge components: Union(Component[u], Component[v])
Tree has n − 1 edges, so O(n) Union() operations

O(n. logn) amortized cost overall


Sorting E takes O(m. logm)

Equivalently, O(m. logm) , since m ≤ n


2

Overall time, O((m + n)logn)

Summary
Implement Union-Find using arrays/dictionaries Component, Member, Size

MakeUnionFind(S) is O(n)
Find(i) is O(1)
Across m operations, amortized complexity of each Union() operation is log m

Can also maintain Members[k] as a tree rather than as a list

Union() becomes O(1)


Priority Queues

Dealing with Priorities


Job Scheduler

A job scheduler maintains a list of pending jobs with their priorities


When the processor is free, the scheduler picks out the job with maximum priority in the
list and schedules it
New jobs may join the list at any time
How should the scheduler maintain the list of pending jobs and their priorities?

Priority Queue

Need to maintain a collection of items with priorities to optimize the following operations
delete_max()

Identify and remove item with the highest priority


Need not be unique
insert()

Add a new item to the collection

Implementing Priority Queues with one dimensional structures


delete_max()

Identify and remove item with highest priority


Need not be unique
insert()

Add a new item to the list

Unsorted list

insert() is $O(1)$
delete_max() is $O(n)$

Sorted list

delete_max() is $O(1)$
insert() is $O(n)$

Processing $n$ items requires $O(n^2)$

Moving to 2 dimensions
First Attempt

Assume $N$$N$ processes enter/leave the queue


Maintain a $\sqrt{N} \times \sqrt{N}$$\sqrt{N} \times \sqrt{N}$ array
Each row is in sorted order

N = 25
3 19 23 35 58

12 17 25 43 67

10 13 20

11 16 28 49
6 14

Summary
2D $\sqrt{N} \times \sqrt{N}$$\sqrt{N} \times \sqrt{N}$ array with sorted rows

insert() is $O(\sqrt{N})$$O(\sqrt{N})$
delete_max() is $O(\sqrt{N})$$O(\sqrt{N})$
Processing $N$$N$ items is $O(N \sqrt{N})$$O(N \sqrt{N})$
Can we do better than this?
Maintain a special binary tree - heap

Height $O(log \ N)$$O(log \ N)$


insert() is $O(log \ N)$$O(log \ N)$
delete_max() is $O(log \ N)$$O(log \ N)$
Processing $N$$N$ items is $O([Link] \ N)$$O([Link] \ N)$
Flexible - need not fix $N$$N$ in advance
Heaps

Priority Queue
Need to maintain a collection of items with priorities to optimize the following operations
delete_max()

Identify and remove item with highest priority


Need not be unique
insert()

Add a new item to the list


Maintaining as a list incurs cost O(N 2 ) across N inserts and deletions

− −
− −

Using a √N × √N array reduces the cost to O(√N ) per oprations


O(N √N ) across N inserts and deletions

Binary Trees
Values are stored as nodes in a rooted tree
Each node has up to two children

Left child and Right child


Order is important
Other than the root, each node has a unique parent
Leaf node - no children
Size - number of nodes
Height - number of levels
Heap
Binary tree filled level-by-level, left-to-right
The value at each node is at least as big as the values of its children

max-heap
Binary tree on the right is an example of a heap
Root always has the largest value

By induction, because of the max-heap property


Non-Examples
Complexity of insert()
Need to walk up from the leaf to the root

Height of the tree


Number of nodes at level $0$ is $2^0 = 1$
If we fill $k$ levels, $2^0 + 2^1 + ... + 2^{k - 1} = 2^k - 1$ nodes
If we have $N$ nodes, at most $1 + log \ N$ levels
insert() is $O(log \ N)$

delete_max()

Maximum value is always at the root


After we delete one value, tree shrinks

Node to delete is right-most at lowest level


Move "homeless" value to the root
Restore the heap property downwards
Only need to follow a single path down

Again $O(log \ N)$$O(log \ N)$

Implementation
Number the nodes top to bottom left right
Store as a list H = [h0, h1, h2, ..., h9]
Children of H[i] are at H[2 * i + 1], H[w * i + 2]
Parent of H[i] is at H[(i - 1)//2] , for i > 0

Building a heap - heapify()


Convert a list [v0, v1, ..., vN] into a heap
Simple strategy

Start with an empty heap


Repeatedly apply insert(vj)
Total time is $O([Link] \ N)$$O([Link] \ N)$

Better heapify()
List L = [v0, v1, ..., vN]
mid = len(L)//2 , Slice L[mid:] has only leaf nodes

Already satisfy the heap condition


Fix heap property downwards for second last level
Fix heap property downwards for third last level
...
Fix heap property at level 1
Fix heap property at the root
Each time we go up one level, one extra step per node to fix the heap peoperty
However, number of nodes to fix halves
Second last level, $n/4 \times 1$$n/4 \times 1$ steps
Third last level, $n/8 \times 2$$n/8 \times 2$ steps
Fourth last level, $n/16 \times 3$$n/16 \times 3$ steps
...
Cost turns out to be $O(n)$$O(n)$

Summary
Heaps are a tree implementation of priority queues
insert() is $O(log \ N)$$O(log \ N)$
delete_max() is $O(log \ N)$$O(log \ N)$
heapify() builds a heap in $O(N)$$O(N)$

Can invert the heap condition

Each node is smaller than its children


min-heap
delete_min() rather than delete_max()
Using Heaps in Algorithms

Priority Queues and Heaps


Priority Queues support the following operations

insert()
delete_max() or delete_min()

Heaps are tree based implementation of priority queues

insert(), delete_max()/delete_min() are both O(log n)

heapify() builds a heap from a list/array in time O(n)

Heap can be represented as a list/array

Simple index arithmetic to find parent and children of a node


What more do we need to use a heap in an algorithm?

Dijkstra's Algorithm
Maintain 2 dictionaries with vertices as keys

visited , initially False for all v


distance , initially infinity for all v

Set distance[v] to 0
Repeat, untill all the reachable vertices are visited

Find unvisited vertex nextv with minimum distance


Set visited[nextv] to True
Re-compute distance[v] for every neighbour v of nextv

def dijkstra(WMat,s):
(rows,cols,x) = [Link]
infinity = [Link](WMat)*rows+1
(visited,distance) = ({},{})

for v in range(rows):
(visited[v],distance[v]) = (False,infinity)

distance[s] = 0

for u in range(rows):
nextd = min([distance[v] for v in range(rows)
if not visited[v]])
nextvlist = [v for v in range(rows)
if (not visited[v]) and distance[v] == nextd]
if nextvlist == []:
break

nextv = min(nextvlist)
visited[nextv] = True

for v in range(cols):
if WMat[nextv,v,0] == 1 and (not visited[v]):
distance[v] = min(distance[v],distance[nextv] + WMat[nextv,v,1])

return distance

Bottleneck

Find unvisited vertex j with minimum distance

Naive implementation requires an O(n) scan


Maintain unvisited vertices as a min-heap

delete_min() in O(log n) time


But, also need to update distances of the neighbours

Unvisited neighbour's distances are inside the min-heap


Updating a value is not a basic heap operation

Heap sort
Start with an un-ordered list
Build a heap - O(n)
Call delete_max() n times to extract elements in descending order - O(n. log n)

After each delete_max() , heap shrinks by 1


Store maximum value at the end of current heap
In place O(n. log n) sort

Summary
Updating a value in a heap takes O(log n)

Need to maintain additional pointers to map values to heap positions and vice versa
With this extended notion of heap, Dijkstra’s algorithm complexity improves from O(n2 ) to
O((m + n). log n)

Heaps can also be used to sort a list in place in O(n. log n)


Search Trees

Dynamic Sorted Data


Sorting is useful for efficient searching
What if the data is changing dynamically?

Items are periodically inserted and deleted


Insert/delete in a sorted list takes time O(n)
Move to a tree structure, like heaps for priority queues

Binary Search Tree


For each node with the value v

All values in the left sub-tree are < v

All values in the right sub-tree are > v

No duplicate values
Implementing a Binary Search Tree
Each node has a value and pointers to its children
Add a frontier with empty nodes, all fields -

Empty tree is single empty node


Leaf node points to empty nodes
Easier to implement operations recursively

The class Tree


Three local fields value , 'left , 'right'
Value None for empty value
Empty tree has all fields None
Left has a non-empty value and empty left and right

class Tree:
# Constructor
def __init__(self, init_val = None):
[Link] = init_val

if [Link]:
[Link] = Tree()
[Link] = Tree()
else:
[Link] = None
[Link] = None

return

# Only empty node has value None


def is_empty(self):
return [Link] == None

# Leaf nodes have both children empty


def is_leaf(self):
return [Link] != None and [Link].is_empty() and [Link].is_empty()

In-order traversal
List the left sub-tree, then the current node, then the right sub-tree
Lists values in sorted order
Use to print the tree

class Tree:
...
# In-order traversal
def in_order(self):
if self.is_empty():
return []
else:
return [Link].in_order() + [[Link]] + [Link].in_order()

# Display the tree as a string


def __str__(self):
return str(self.in_order())

Find a value v
Check value at current node
If v is smaller than the current node, go left
If v is greater than the current node, go right
Natural generalization of binary search

class Tree:
...
# Check if the value v occurs in the tree
def find(self, v):
if self.is_empty():
return False

if [Link] == v:
return True

if v < [Link]:
return [Link](v)

if v > [Link]:
return [Link](v)

Minimum and Maximum


Minimum is the left most node in the tree
Maximum is the right most node in the tree

class Tree:
...
def min_val(self):
if [Link].is_empty():
return [Link]
else:
return [Link].min_val()

def max_val(self):
if [Link].is_empty():
return [Link]
else:
return [Link].max_val()

Insert a value v
Try to find v
Insert at the position where find fails

class Tree:
...
def insert(self, v):
if self.is_empty():
[Link] = v
[Link] = Tree()
[Link] = Tree()

if [Link] == v:
return

if v < [Link]:
[Link](v)
return

if v > [Link]:
[Link](v)
return

Delete a value v
If v is present, delete
Leaf node? No problem
If only one child, promote that sub-tree
Otherwise, replace v with [Link].max_val() and delete [Link].max_val()

[Link].max_val() has no right child

class Tree:
...
def delete(self, v):
if self.is_empty():
return

if v < [Link]:
[Link](v)
return

if v > [Link]:
[Link](v)
return

if v == [Link]:
if self.is_leaf():
self.make_empty()
elif [Link].is_empty():
self.copy_right()
elif [Link].is_empty():
self.copy_left()
else:
[Link] = [Link].max_val()
[Link]([Link].max_val())
return

# Convert left node to empty node


def make_empty(self):
[Link] = None
[Link] = None
[Link] = None
return

# Promote left child


def copy_left(self):
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
return

# Promote right child


def copy_right(self):
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
return

Summary
find(), insert() and delete() all walk down a single path
Worst-case: height of the tree
An un-balanced tree with n nodes may have the height O(n)
Balanced trees have height O(log n)

We will see how to keep a tree balanced to ensure all operations remain O(log n)

You might also like