Binary Search Trees
Binary Search Trees
In the introduction we used the binary search algorithm to find data stored in an array. This method is very effective, as each iteration reduced the number of items to search by one-half. However, since data was stored in an array, insertions and deletions were not efficient. Binary search trees store data in nodes that are linked in a tree-like fashion. For randomly inserted data, search time is O(lg n). Worst-case behavior occurs when ordered data is inserted. In this case the search time is O(n). See Cormen [1990] for a more detailed description.
Theory
A binary search tree is a tree where each node has a left and right child. Either child, or both children, may be missing. Figure 3-2 illustrates a binary search tree. Assuming k represents the value of a given node, then a binary search tree also has the following property: all children to the left of the node have values smaller than k, and all children to the right of the node have values larger than k. The top of a tree is known as the root, and the exposed nodes at the bottom are known as leaves. In Figure 3-2, the root is node 20 and the leaves are nodes 4, 16, 37, and 43. The height of a tree is the length of the longest path from root to leaf. For this example the tree height is 2.
Contents
[hide]
1 Definitions for rooted trees 2 Types of binary trees 3 Properties of binary trees 4 Common operations o 4.1 Insertion 4.1.1 External nodes 4.1.2 Internal nodes o 4.2 Deletion 4.2.1 Node with zero or one children 4.2.2 Node with two children o 4.3 Iteration 4.3.1 Pre-order, in-order, and post-order traversal 4.3.2 Depth-first order 4.3.3 Breadth-first order 5 Type theory 6 Definition in graph theory 7 Combinatorics 8 Methods for storing binary trees o 8.1 Nodes and references o 8.2 Arrays 9 Encodings o 9.1 Succinct encodings o 9.2 Encoding general trees as binary trees 10 See also 11 Notes 12 References 13 External links
A directed edge refers to the link from the parent to the child (the arrows in the picture of the tree). The root node of a tree is the node with no parents. There is at most one root node in a rooted tree. A leaf node has no children. The depth of a node n is the length of the path from the root to the node. The set of all nodes at a given depth is sometimes called a level of the tree. The root node is at depth zero. The depth of a tree is the length of the path from the root to the deepest node in the tree. A (rooted) tree with only one node (the root) has a depth of zero. Siblings are nodes that share the same parent node. A node p is an ancestor of a node q if it exists on the path from the root to node q. The node q is then termed as a descendant of p. The size of a node is the number of descendants it has including itself. In-degree of a node is the number of edges arriving at that node. Out-degree of a node is the number of edges leaving that node. The root is the only node in the tree with In-degree = 0. All the leaf nodes have Out-degree = 0.
A rooted binary tree is a tree with a root node in which every node has at most two children.
A full binary tree (sometimes proper binary tree or 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children. Sometimes a full tree is ambiguously defined as a perfect tree. A perfect binary tree is a full binary tree in which all leaves are at the same depth or same level, and in which every parent has two children.[1] (This is ambiguously also called a complete binary tree.) A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.[2] An infinite complete binary tree is a tree with a countably infinite number of levels, in which every node has two children, so that there are 2d nodes at level d. The set of all nodes is countably infinite, but the set of all infinite paths from the root is uncountable: it has the cardinality of the continuum. These paths corresponding by an order preserving bijection to the points of the Cantor set, or (through the example of the SternBrocot tree) to the set of positive irrational numbers. A balanced binary tree is commonly defined as a binary tree in which the depth of the two subtrees of every node never differ by more than 1,[3] although in general it is a binary tree where no leaf is much farther away from the root than any other leaf. (Different balancing schemes allow different definitions of "much farther"[4]). Binary trees that are balanced according to this definition have a predictable depth (how many nodes are traversed from the root to a leaf, root counting as node 0 and subsequent as 1, 2, ..., depth). This depth is equal to the integer part of number of nodes on the balanced tree. Example 1: balanced tree with 1 node, where is the (depth = 0). Example 2:
balanced tree with 3 nodes, (depth=1). Example 3: balanced tree with 5 nodes, (depth of tree is 2 nodes). A rooted complete binary tree can be identified with a free magma. A degenerate tree is a tree where for each parent node, there is only one associated child node. This means that in a performance measurement, the tree will behave like a linked list data structure.
Note that this terminology often varies in the literature, especially with respect to the meaning "complete" and "full".
where
is the depth
The number of nodes in a complete binary tree is at least and at most where is the depth of the tree. The number of leaf nodes in a perfect binary tree can be found using this formula: where is the depth of the tree. The number of nodes in a perfect binary tree can also be found using this formula: where is the number of leaf nodes in the tree. The number of null links (absent children of nodes) in a complete binary tree of n nodes is (n+1). The number nL of internal nodes (non-leaf nodes) in a Complete Binary Tree of n nodes is For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2, n0 = n2 + 1.[5] Proof: Let n = the total number of nodes B = number of branches n0, n1, n2 represent the number of nodes with no children, a single child, and two children respectively. B = n - 1 (since all nodes except the root node come from a single branch) B = n1 + 2*n2 n = n1+ 2*n2 + 1 n = n0 + n1 + n2 .
[edit] Insertion
Nodes can be inserted into binary trees in between two other nodes or added after an external node. In binary trees, a node that is inserted is specified as to which child it is.
[edit] External nodes
Say that the external node being added on to is node A. To add a new node after node A, A assigns the new node as one of its children and the new node assigns node A as its parent.
[edit] Internal nodes
Insertion on internal nodes is slightly more complex than on external nodes. Say that the internal node is node A and that node B is the child of A. (If the insertion is to insert a right child, then B is the right child of A, and similarly with a left child insertion.) A assigns its child to the new node and the new node assigns its parent to A. Then the new node assigns its child to B and B assigns its parent as the new node.
[edit] Deletion
Deletion is the process whereby a node is removed from the tree. Only certain nodes in a binary tree can be removed unambiguously.[6]
[edit] Node with zero or one children
Say that the node to delete is node A. If a node has no children (external node), deletion is accomplished by setting the child of A's parent to null. If it has one child, set the parent of A's child to A's parent and set the child of A's parent to A's child.
In a binary tree, a node with two children cannot be deleted unambiguously.[6] However, in certain binary trees these nodes can be deleted, including binary search trees.
[edit] Iteration
Often, one wishes to visit each of the nodes in a tree and examine the value there, a process called iteration or enumeration. There are several common orders in which the nodes can be visited, and each has useful properties that are exploited in algorithms based on binary trees:
Pre-Order: Root first, Left child, Right child Post-Order: Left Child, Right child, root In-Order: Left child, root, right child.
[edit] Pre-order, in-order, and post-order traversal Main article: Tree traversal
Pre-order, in-order, and post-order traversal visit each node in a tree by recursively visiting each node in the left and right subtrees of the root.
[edit] Depth-first order
, there is no need to remember all the nodes we have visited, because a tree cannot contain cycles. Pre-order is a special case of this. See depth-first search for more information.
[edit] Breadth-first order
Contrasting with depth-first order is breadth-first order, which always attempts to visit the node closest to the root that it has not already visited. See breadth-first search for more information. Also called a level-order traversal.
A single vertex. A graph formed by taking two binary trees, adding a vertex, and adding an edge directed from the new vertex to the root of each binary tree.
This also does not establish the order of children, but does fix a specific root node.
[edit] Combinatorics
In combinatorics one considers the problem of counting the number of full binary trees of a given size. Here the trees have no values attached to their nodes (this would just multiply the number of possible trees by an easily determined factor), and trees are distinguished only by their structure; however the left and right child of any node are distinguished (if they are different trees, then interchanging them will produce a tree distinct from the original one). The size of the tree is taken to be the number n of internal nodes (those with two children); the other nodes are leaf nodes and there are n + 1 of them. The number of such binary trees of size n is equal to the number of ways of fully parenthesizing a string of n + 1 symbols (representing leaves) separated by n binary operators (representing internal nodes), so as to determine the argument subexpressions of each operator. For instance for n = 3 one has to parenthesize a string like , which is possible in five ways:
The correspondence to binary trees should be obvious, and the addition of redundant parentheses (around an already parenthesized expression or around the full expression) is disallowed (or at least not counted as producing a new possibility). There is a unique binary tree of size 0 (consisting of a single leaf), and any other binary tree is characterized by the pair of its left and right children; if these have sizes i and j respectively, the full tree has size i + j + 1. Therefore the number of binary trees of size n has the following recursive description positive integer n. It follows that is the Catalan number of index n. , and for any
The above parenthesized strings should not be confused with the set of words of length 2n in the Dyck language, which consist only of parentheses in such a way that they are properly balanced. The number of such strings satisfies the same recursive description (each Dyck word of length 2n is determined by the Dyck subword enclosed by the initial '(' and its matching ')' together with the Dyck subword remaining after that closing parenthesis, whose lengths 2i and 2j satisfy i + j + 1 = n); this number is therefore also the Catalan number . So there are also five Dyck words of length 10:
.
These Dyck words do not correspond in an obvious way to binary trees. A bijective correspondence can nevertheless be defined as follows: enclose the Dyck word in a extra pair of parentheses, so that the result can be interpreted as a Lisp list expression (with the empty list () as only occurring atom); then the dotted-pair expression for that proper list is a fully parenthesized expression (with NIL as symbol and '.' as operator) describing the corresponding binary tree (which is in fact the internal representation of the proper list). The ability to represent binary trees as strings of symbols and parentheses implies that binary trees can represent the elements of a free magma on a singleton set.
In languages with tagged unions such as ML, a tree node is often a tagged union of two types of nodes, one of which is a 3-tuple of data, left child, and right child, and the other of which is a "leaf" node, which contains no data and functions much like the null value in a language with pointers.
[edit] Arrays
Binary trees can also be stored in breadth-first order as an implicit data structure in arrays, and if the tree is a complete binary tree, this method wastes no space. In this compact arrangement, if a node has an index i, its children are found at indices (for the left child) and (for the right), while its parent (if any) is found at index (assuming the root has index zero). This method benefits from more compact storage and better locality of reference, particularly during a preorder traversal. However, it is expensive to grow and wastes space proportional to 2h - n for a tree of depth h with n nodes. This method of storage is often used for binary heaps. No space is wasted because nodes are added in breadth-first order.
A prominent data structure used in many systems programming applications for representing and managing dynamic sets. Average case complexity of Search, Insert, and Delete Operations is O(log n), where n is the number of nodes in the tree. DEF: A binary tree in which the nodes are labeled with elements of an ordered dynamic set and the following BST property is satisfied: all elements stored in the left subtree of any node x are less than the element stored at x and all elements stored in the right subtree of x are greater than the element at x. An Example: Figure 4.14 shows a binary search tree. Notice that this tree is obtained by inserting the values 13, 3, 4, 12, 14, 10, 5, 1, 8, 2, 7, 9, 11, 6, 18 in that order, starting from an empty tree. Note that inorder traversal of a binary search tree always gives a sorted sequence of the values. This is a direct consequence of the BST property. This provides a way of sorting a given sequence of keys: first, create a BST with these keys and then do an inorder traversal of the BST so created. Note that the highest valued element in a BST can be found by traversing from the root in the right direction all along until a node with no right link is found (we can call that the rightmost element in the BST). The lowest valued element in a BST can be found by traversing from the root in the left direction all along until a node with no left link is found (we can call that the leftmost element in the BST). Search is straightforward in a BST. Start with the root and keep moving left or right using the BST property. If the key we are seeking is present, this search procedure will lead us to the key. If the key is not present, we end up in a null link. Insertion in a BST is also a straightforward operation. If we need to insert an element x, we first search for x. If x is present, there is nothing to do. If x is not present, then our search procedure ends in a null link. It is at this position of this null link that x will be included. If we repeatedly insert a sorted sequence of values to form a BST, we obtain a completely skewed BST. The height of such a tree is n - 1 if the tree has n nodes. Thus, the worst case complexity of searching or inserting an element into a BST having n nodes is O(n).
Deletion in BST Let x be a value to be deleted from the BST and let X denote the node containing the value x. Deletion of an element in a BST again uses the BST property in a critical way. When we delete the node X containing x, it would create a "void" that should be filled by a suitable existing node of the BST. There are two possible candidate nodes that can fill this void, in a way that the BST property is not violated: (1). Node containing highest valued element among all descendants of left child of X. (2). Node containing the lowest valued element among all the descendants of the right child of X. In case (1), the selected node will necessarily have a null right link which can be conveniently used in patching up the tree. In case (2), the selected node will necessarily have a null left link which can be used in patching up the tree. Figure 4.15 illustrates several scenarios for deletion in BSTs.