CS 240 Tutorial 10 Notes: Lo Hi Lo Hi
CS 240 Tutorial 10 Notes: Lo Hi Lo Hi
Range Trees: Used for range searching multi-dimensional data, e.g., find all points (x, y) which satisfy
xlo x xhi
ylo y yhi
and
6
3
1 2 3 5
4 6 7 8
6
3
4 6 7 8
1 2 3 5
In general, you want to output all the leaves which lie between the search paths, and possibly the two leaves
which lie on the search paths; check if these points lie in range manually. (Explain how pseudocode works.)
A range tree on two dimensions x and y is a range tree on x with leafs containing the points (x, y) and
non-leafs containing a pointer to a range tree on y (only using the points which appear as descendants of the
non-leaf).
Example: Draw the perfectly balanced 2-dimensional range tree using the previous points.
Answer:
(1, 8)
6
(1, 8)
(2, 6)
7
(5, 7)
2
6
(2, 6)
4
(5, 7)
(3, 4)
3
1
4
(3, 4)
(1, 8) (2, 6) (3, 4) (5, 7)
Range search works the same way as in the 1-dimensional case, except instead of just printing the leafs which
appear between the search paths, you start searching in the associated trees which lie between the search
paths using the next pair of range values.
Pseudocode for 1-dimensional range search:
rangesearch(node, lo, hi)
loop
if node is a leaf
print node.value if lo <= node.value <= hi
return
else if lo <= node.value and hi <= node.value
node = node.left
else if lo > node.value and hi > node.value
node = node.right
else
nodelo = node.left
nodehi = node.right
exit loop
loop
if nodelo is a leaf
print nodelo.value if lo <= nodelo.value <= hi
exit loop
else if lo <= nodelo.value
print all leafs in nodelo.right
nodelo = nodelo.left
else
nodelo = nodelo.right
loop
if nodehi is a leaf
print nodehi.value if lo <= nodehi.value <= hi
exit loop
else if hi <= nodehi.value
nodehi = nodehi.left
else
print all leafs in nodehi.left
nodehi = nodehi.right
Note: Part of the definition of a range tree is that it is balanced, e.g., an AVL tree. This complicates adding
new points into a range tree. For example:
2
2
3
3
(2, 3)
3
(2, 3)
add (5, 6)
rotate
(3, 4)
(3, 4) (4, 5)
And now the tree associated with 3 has to be set to the tree previously associated with 2, and the tree
associated with 2 has to contain (2, 3) and (3, 4).
In general,
2
3
3
rotate
4
A
C
and the tree associated with 2 has to be set to the tree associated with A with the points in B added.
A3Q1. (a) Find a way to sort an array A[1..n] with O(log n) distinct elements in O(n log log n) time.
Idea: If one could find the frequency with which each element appears, then one could just sort the O(log n)
distinct elements and add enough copies of the elements to achieve the proper frequency.
How to find the correct frequencies? Easiest thing is to store them in an associative array, loop through A
and for each item encountered, increment its corresponding frequency.
Using an unordered array, finding the counter to increment costs O(log n) (the length of the associative array)
and there are O(n) increments, so this costs O(n log n).
Using an ordered array, finding the counter to increment costs O(log log n), so total incrementing cost is
O(n log log n). If the item hasnt been added to the array yet the cost is O(log n), but that only happens
O(log n) times.
Sorting the distinct elements costs O(log n log log n), and outputting the final sorted answer with correct
frequency costs O(n). Total cost: O(n log log n).
Note: An AVL tree can also be used to implement the associative array.
Alternate Idea: Use Quicksort with a 3-way partition
< pivot | = pivot | > pivot
and pivoting on the median.
However, this is harder to analyze, and you need to know how to compute medians in linear time.
(b) Sorting arrays with many duplicates is a special case of the sorting problem, and the (n log n) bound
only applies to sorting in the general case. If you have extra information about what you are trying to sort
you can possibly beat the lower bound, as in this case.
3