15 BST PDF
15 BST PDF
1 Introduction
In this lecture, we will continue considering ways to implement the set
(or associative array) interface. This time, we will implement this inter-
face with binary search trees. We will eventually be able to achieve O(log n)
worst-case asymptotic complexity for insert and lookup. This also extends
to delete, although we won’t discuss that operation in lecture.
L ECTURE N OTES
Binary Search Trees L15.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
L ECTURE N OTES
Binary Search Trees L15.3
on the array above, all we need is constant time access from array index 9
(containing 11) to array indices 4 and 14 (containing 1 and 29, respectively),
constant time access from array index 4 to array indices 2 and 7, and so on.
At each point in binary search, we know that our search will proceed in one
of at most two ways, so we will explicitly represent those choices with an
pointer structure, giving us the structure of a binary tree. The tree structure
that we got from running binary search on this array. . .
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
29
1
19
88
-‐1
8
16
55
-‐6
7
20
90
0
9
15
42
-‐9
5
Rather than the fully generic data implementation that we used for hash
tables, we’ll assume for the sake of simplicity that the client is provid-
ing us with a type of elem that is known to be a pointer, and a single
elem_compare.
/* Client-side interface */
typedef ______* elem;
L ECTURE N OTES
Binary Search Trees L15.4
We’ll create a similar picture for trees: the tree containing no elements
is NULL, and a non-empty tree is a struct with three fields: the data and the
left and right pointers, which are themselves trees.
x
root
root
root
IS
OR
L ECTURE N OTES
Binary Search Trees L15.5
Rather than drawing out the list_node struct with its three fields explicitly,
we’ll usually use a more graph-like way of presenting trees:
x
root
root
root
IS
OR
This recursive definition can be directly encoded into a very simple data
structure invariant is_tree. It checks very little: just that all the data fields
are non-NULL, as the client interface requires. If it terminates, it also en-
sures that there are no cycles; a cycle would cause nontermination, just as
it would with is_segment.
bool is_tree(tree* root) {
if (root == NULL) return true;
return root->data != NULL
&& is_tree(root->left) && is_tree(root->right);
}
L ECTURE N OTES
Binary Search Trees L15.6
6 Complexity
If our binary search tree were perfectly balanced, that is, had the same num-
ber of nodes on the left as on the right for every subtree, then the ordering
L ECTURE N OTES
Binary Search Trees L15.7
invariant would ensure that search for an element with a given key has
asymptotic complexity O(log n), where n is the number of elements in the
tree. Every time we compare the element x with the root of a perfectly
balanced tree, we either stop or throw out half the elements in of the tree.
In general we can say that the cost of lookup is O(h), where h is the
height of the tree. We will define height to be the maximum number of
nodes that can be reached by any sequence of pointers starting at the root.
An empty tree has height 0, and a tree with two children has the maximum
height of either child, plus 1.
7 The Interface
Before we talk about insertion into a binary search tree, we should specify
the interface and discuss how we will implement it. Remember that we’re
assuming a single client definition of elem and a single client definition
of elem_compare, rather than the fully generic version using void pointers
and function pointers.
/* Library interface */
typedef ______* bst_t;
bst_t bst_new()
/*@ensures \result != NULL; @*; ;
We can’t define bst_t to be tree*, for two reasons. One reason is that a
new tree should be empty, but an empty tree is represented by the pointer
NULL, which would violate the bst_new postcondition. More fundamen-
tally, if NULL was the representation of an empty tree, there would be no
way to imperatively insert additional elements in the tree.
The usual solution here is one we have already used for stacks, queues,
and hash tables: we have a header which in this case just consists of a pointer
to the root of the tree. We often keep other information associated with the
data structure in these headers, such as the size.
L ECTURE N OTES
Binary Search Trees L15.8
bool is_bst(bst* B) {
return B != NULL && is_tree(B->root);
}
Lookup in a binary search tree then just calls the recursive function we’ve
already defined:
The relationship between both is_bst and is_tree and between bst_lookup
and tree_lookup is a common one. The non-recursive function is_bst
is given the non-recursive struct bst_header, and then calls the recursive
helper function is_tree on the recursive structure of tree nodes.
8 Inserting an Element
With the header structure, it is straightforward to implement bst_insert.
We just proceed as if we are looking for the given element. If we find a node
with an equivalent element, we just overwrite its data field. Otherwise, we
insert the new key in the place where it would have been, had it been there
in the first place. This last clause, however, creates a small difficulty. When
we hit a null pointer (which indicates the key was not already in the tree),
we cannot replace what it points to (it doesn’t point to anything!). Instead,
we return the new tree so that the parent can modify itself.
L ECTURE N OTES
Binary Search Trees L15.9
return T;
}
L ECTURE N OTES
Binary Search Trees L15.10
1. T is empty, or
2. T has key k at the root, TL as left subtree and TR as right subtree, and
While this should always be true for a binary search tree, it is far weaker
than the ordering invariant stated at the beginning of lecture. Before read-
ing on, you should check your understanding of that invariant to exhibit a
tree that would satisfy the above, but violate the ordering invariant.
L ECTURE N OTES
Binary Search Trees L15.11
There is actually more than one problem with this. The most glaring
one is that following tree would pass this test:
7
5 11
1 9
Even though, locally, the key of the left node is always smaller and on the
right is always bigger, the node with key 9 is in the wrong place and we
would not find it with our search algorithm since we would look in the
right subtree of the root.
An alternative way of thinking about the invariant is as follows. As-
sume we are at a node with key k.
The general idea then is to traverse the tree recursively, and pass down
an interval with lower and upper bounds for all the keys in the tree. The
following diagram illustrates this idea. We start at the root with an unre-
stricted interval, allowing any key, which is written as (−∞, +∞). As usual
in mathematics we write intervals as (x, z) = {y | x < y and y < z}. At
the leaves we write the interval for the subtree. For example, if there were
a left subtree of the node with key 7, all of its keys would have to be in the
L ECTURE N OTES
Binary Search Trees L15.12
(‐∞,
+∞)
9
(‐∞,
9)
5
(5,
9)
(9,
+∞)
(‐∞,
5)
7
(5, 7) (7, 9)
bool is_tree(tree* T) {
return is_ordered(T, NULL, NULL);
}
bool is_bst(bst B) {
return B != NULL && is_tree(B->root);
}
L ECTURE N OTES
Binary Search Trees L15.13
2 2 2
3 3
4
3 3 3 3
1 1 4 1 4
2
Clearly, the last tree is much more balanced. In the extreme, if we insert
elements with their keys in order, or reverse order, the tree will be linear,
and search time will be O(n) for n items.
These observations mean that it is extremely important to pay attention
to the balance of the tree. We will discuss ways to keep binary search trees
balanced in a later lecture.
L ECTURE N OTES
Binary Search Trees L15.14
Exercises
Exercise 1 Rewrite tree_lookup to be iterative rather than recursive.
Exercise 3 The binary search tree interface only expected a single function for key
comparison to be provided by the client:
An alternative design would have been to, instead, expect the client to provide a
set of elem comparison functions, one for each outcome:
L ECTURE N OTES