CSE 326: Data Structures Binary Search Trees
CSE 326: Data Structures Binary Search Trees
• Queue
– Enqueue
– Dequeue Then there is decreaseKey…
The Dictionary ADT
• jfogarty
• Data: James
Fogarty
– a set of insert(jfogarty, ….) CSE 666
• Sets
• Dictionaries
• Networks : Router tables
• Operating systems : Page tables
• Compilers : Symbol tables
Probably the most widely used ADT!
5
Implementations
insert find delete
• Unsorted Linked-list
• Unsorted array
• Sorted array
Tree Calculations
Recall: height is max t
number of edges from
root to a leaf
runtime:
7
Tree Calculations Example
How high is this tree? A
B C
D E F G
H I
J K L L
M N
8
More Recursive Tree Calculations:
Tree Traversals
* 5
Three types:
• Pre-order: Root, left subtree, right subtree 2 4
• Representation: G H
Data
left right I J
pointer pointer
11
Binary Tree: Representation
A
left right
pointer pointer A
B C B C
left right left right
pointer pointer pointer pointer
D E F
D E F
left right left right left right
pointer pointer pointer pointer pointer pointer
12
Binary Tree: Special Cases
A A A
B C B C B C
D E F D E F G D E F G
Full Tree
Binary Tree: Some Numbers!
For binary tree of height h:
– max # of leaves:
– max # of nodes:
– min # of leaves:
– min # of nodes:
5 8
4 8 5 11
1 7 11 2 7 6 10 18
3 4 15 20
2 9 20
7 17 30
Insertions happen only
at the leaves – easy!
Runtime:
BuildTree for BST
• Suppose keys 1, 2, 3, 4, 5, 6, 7, 8, 9 are
inserted into an initially empty BST.
Runtime depends on the order!
– in given order
– in reverse order
• Find minimum 10
5 15
• Find maximum 2 9 20
7 17 30
Deletion in BST
10
5 15
2 9 20
7 17 30
Delete(17) 10
5 15
2 9 20
7 17 30
Deletion – The One Child Case
Delete(15) 10
5 15
2 9 20
7 30
Deletion – The Two Child Case
10
Delete(5)
5 20
2 9 30
Options:
• succ from right subtree: findMin(t.right)
• pred from left subtree : findMax(t.left)
10
7 replaces 5
7 20
2 9 30
Ordering property 15
– Same as for BST
6 AVL trees or not?
4 8
1 7 11
10 12
6
4 8
1 5 7 11
3
2
Proving Shallowness Bound
Let S(h) be the min # of nodes in an AVL tree of height h=4
AVL tree of height h with the min # of nodes (12)
7 9 13 14
15
Testing the Balance Property
1. Track Balance
5 15
2. Detect Imbalance
2 9 20
3. Restore Balance
7 17 30
NULLs have
height -1
An AVL Tree
3 10 data
10 3 height
children
2 2
5 20
0 1 1 0
2 9 15 30
0 0
7 17
AVL trees: find, insert
• AVL find:
– same as BST find.
• AVL insert:
– same as BST insert, except may need to
“fix” the AVL tree after inserting new value.
AVL tree insert
Let x be the node where an imbalance occurs.
Single Rotation:
1. Rotate between x and child
Single rotation in general
a
b
Z h
h X h Y
h -1
X<b<Y<a<Z
b
a
h+1 X h h
Y Z
Double Rotation
1. Rotate between x’s child and grandchild
2. Rotate between x and x’s new child
Double rotation in general
h0
a
b
c h
Z
h W h -1 h-1
X Y
h h h-1 Y h
W X Z
4 10 16
3 6
5
15
8 17
6 10 16
4
3 5
Double rotation, step 2
15
8 17
6 10 16
4
3 5
15
6 17
4 8 16
3 5 10
Imbalance at node X
Single Rotation
1. Rotate between x and child
Double Rotation
1. Rotate between x’s child and grandchild
2. Rotate between x and x’s new child
Single and Double Rotations:
Inserting what integer values
would cause the tree to need a:
1. single rotation? 9
5 11
2 7 13
2. double rotation? 0 3
3. no rotation?
Insertion into AVL tree
1. Find spot for new key
2. Hang new node there with this key
3. Search back up the path for imbalance
4. If there is an imbalance:
case #1: Perform single rotation and exit
1 2
5 15
0 0 0 1
2 9 12 20
0 0
17 30
Unbalanced?
Hard Insert (Bad Case #1)
3
Insert(33) 10
2 2
5 15
1 0 0 1
2 9 12 20
0 0 0
3 17 30
Unbalanced?
How to fix?
Single Rotation
3 3
10 10
2 3 2 2
5 15 5 20
1 0 0 2 1 0 1 1
2 9 12 20 2 9 15 30
0 0 1 0 0 0 0
3 17 30 3 12 17 33
0
33
Hard Insert (Bad Case #2)
3
Insert(18) 10
2 2
5 15
1 0 0 1
2 9 12 20
0 0 0
Unbalanced? 3 17 30
How to fix?
Single Rotation (oops!)
3 3
10 10
2 3 2 3
5 15 5 20
1 0 0 2 1 0 2 0
2 9 12 20 2 9 15 30
0 1 0 0 0 1
3 17 30 3 12 17
0 0
18 18
Double Rotation (Step #1)
3 3
10 10
2 3 2 3
5 15 5 15
1 0 0 2 1 0 0 2
2 9 12 20 2 9 12 17
0 1 0 0 1
3 17 30 3 20
0 0 0
18 18 30
Double Rotation (Step #2)
3 3
10 10
2 3 2 2
5 15 5 17
1 0 0 2 1 0 1 1
2 9 12 17 2 9 15 20
0 1 0 0 0 0
3 20 3 12 18 30
0 0
18 30
Insert into an AVL tree: 5, 8, 9, 4, 2, 7, 3, 1
CSE 326: Data Structures
Splay Trees
AVL Trees Revisited
• Balance condition:
Left and right subtrees of every node
have heights differing by at most 1
– Strong enough : Worst case depth is O(log n)
– Easy to maintain : one single or double rotation
a
b
c h
Z
h W h -1 h-1
X Y
AVL Trees Revisited
• What extra info did we maintain in each
node?
Insert/Find
SAT/GRE always
Analogy rotate node to the
question:
root!
AVL is to Splay trees as ___________ is to __________
Recall: Amortized Complexity
If a sequence of M operations takes O(M f(n)) time,
we say the amortized runtime is O(f(n)).
• Worst case time per operation can still be large, say O(n)
2 9
3
Find/Insert in Splay Trees
1. Find or insert a node k
2. Splay k to the root using:
zig-zag, zig-zig, or plain old zig rotation
p
k
q
F
r s p
E
s q
D A B F
A k r
E
B C C D
Splaying node k to the root:
Need to be careful!
What’s bad about this process?
p k
q F s p
r E
s A B q
D F
k r E
A
B C C D
Splay: Zig-Zag *
g k
X p g p
k W X Y Z W
Y Z
*
Just like an… Which nodes improve depth?
Splay: Zig-Zig*
g k
W p p
Z
X k g
Y
Y Z W X
k Z
p
X
X Y Y Z
Why not drop zig-zig and just zig all the way?
Splaying Example: Find(6)
1 1
2 2
?
3 3
Find(6)
4 6
5 5
6 4
Still Splaying 6
1 1
2 6
?
3 3
6 2 5
5 4
4
Finally…
1 6
6 1
?
3 3
2 5 2 5
4 4
Another Splay: Find(4)
6 6
1 1
?
3 4
Find(4)
2 5 3 5
4 2
Example Splayed Out
6 4
1 1 6
?
4 3 5
3 5 2
2
But Wait…
What happened here?
k
find(k) delete k
L R
L R <k >k
Now what?
Join(L, R): Join
given two trees such that (stuff in L) < (stuff in R), merge
them:
splay L
max
L R R
1 9 find(4) 1 6 1 9
4 7 2 9 2 7
Find max
2 7
2 2
1 6 1 6
9 9
7 7
Splay Tree Summary
• All operations are in amortized O(log n) time
I
H
A
B
G
F
C
D
E
Splay E
A
I
B
H
C
G
D
F
E
CSE 326: Data Structures
B-Trees
B-Trees
SRAM Cache
Cache
8KB - 4MB
2-10 ns
Main Memory
DRAM Main Memory
up to 10GB
40-100 ns
Disk
a few
many GB Disk milliseconds
(5-10 Million ns)
Trees so far
• BST
• AVL
• Splay
AVL trees
Suppose we have 100 million items
(100,000,000):
Runtime of find:
Solution: B-Trees
• specialized M-ary search trees
of memory
B-Trees
What makes them disk-friendly?
3 15 20 30 50
1 2 10 11 12 20 25 26 40 42
AB xG 3 5 6 9 15 17 30 32 33 36 50 60 70
Data objects, that I’ll Note: All leaves at the same depth!
ignore in slides
B-Tree Properties ‡
‡
These are technically B+-Trees
Example, Again
B-Tree with M = 4
and L = 4
10 40
3 15 20 30 50
1 2 10 11 12 20 25 26 40 42
3 5 6 9 15 17 30 32 33 36 50 60 70
3 3 14
Insert(3) Insert(14)
The empty
B-Tree
M = 3 L = 2
Now, Insert(1)?
M = 3 L = 2
1 3 14 14
Insert(1) And create
3 14
a new root 1 3 14
1 3 14
14
14 14
Insert(59) Insert(26)
1 3
1 3 14 1 3 14 59
14 26 59
14 59
And add
a new child
1 3 14 26 59
M = 3 L = 2
Propagating Splits
14 59
14 59
Insert(5) 14 26 59 Add new
1 3 14 26 59 child
1 3 5
Split the leaf, but no space in parent!
14
5 14 59
5 59 Create a
new root
1 3 5 14 26 59 1 3 5 14 26 59
5 59 Insert(89)
Insert(79)
1 3 5 14 26 59
14
5 59 89
1 3 5 14 26 59 79 89
M = 3 L = 2
Deletion
1. Delete item from leaf
2. Update keys of ancestors if necessary
14 14
Delete(59)
5 59 89 5 79 89
1 3 5 14 26 59 79 89 1 3 5 14 26 79 89
Delete(5)
5 79 89 ? 79 89
1 3 5 14 26 79 89 1 3 14 26 79 89
3 79 89
1 3 3 14 26 79 89
Does Adoption Always Work?
• What if the sibling doesn’t have enough for
you to borrow from?
Delete(3)
3 79 89 ? 79 89
1 3 14 26 79 89 1 14 26 79 89
14
So, delete
79 89
the leaf
But now an internal node
has too few subtrees! 1 14 26 79 89
M = 3 L = 2
14 79
Adopt a
79 89 14 89
neighbor
1 14 26 79 89 1 14 26 79 89
M = 3 L = 2
79 79
Delete(1)
14 89 26 89
(adopt a
sibling)
1 14 26 79 89 14 26 79 89
M = 3 L = 2
Pulling out the Root A leaf has too few keys!
And no sibling with surplus!
79 79
Delete
79 89 89
the node
14 79 89 14 79 89
M = 3 L = 2
14 79 89
79 89
14 79 89
Deletion Algorithm
1. Remove the key from its leaf
i i
g g
h h
y d e y d e
f f
b b
a c a c
x x
i
g
h
y d e
f
query
b
a c
Nearest neighbor is e.
k-d Tree Construction
• If there is just one point, form a leaf with that point.
• Otherwise, divide the points in half by a line
perpendicular to one of the axes.
• Recursively construct k-d trees for the two sets of
points.
• Division strategies
– divide points perpendicular to the axis with widest spread.
– divide in a round-robin fashion (book does it this way)
k-d Tree Construction
i
g
h
y d e
f
b
a c
x
divide perpendicular to the widest spread.
k-d Tree Construction (18)
k-d tree cell
x
i s1
g s8
y y
h s2 s6
s4 e s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5
x d e
2-d Tree Decomposition
2
3
k-d Tree Splitting
sorted points in each dimension
1 2 3 4 5 6 7 8 9
i x a d g be i c h f
y a c b d f e h g i
g
h
• max spread is the max of
e fx -ax and iy - ay.
y d f
• In the selected dimension the
b middle point in the list splits the
a c data.
g s8
y y
h s2 s6
s4 e s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5
x d e
Rectangular Range Query (8)
x
i s1
g s8
y y
h s2 s6
s4 e s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5
x d e
Rectangular Range Query
print_range(xlow, xhigh, ylow, yhigh :integer, root: node pointer) {
Case {
root = null: return;
root.left = null:
if xlow < root.point.x and root.point.x < xhigh
and ylow < root.point.y and root.point.y < yhigh
then print(root);
else
if(root.axis = “x” and xlow < root.value ) or
(root.axis = “y” and ylow < root.value ) then
print_range(xlow, xhigh, ylow, yhigh, root.left);
if (root.axis = “x” and xlow > root.value ) or
(root.axis = “y” and ylow > root.value ) then
print_range(xlow, xhigh, ylow, yhigh, root.right);
}}
k-d Tree Nearest Neighbor
Search
• Search recursively to find the point in the
same cell as the query.
• On the return search each subtree where
a closer point than the one you already
know about might be found.
k-d Tree NNS (1)
query point
x
i s1
g s8
y y
h s2 s6
s4 e s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5
x d e
k-d Tree NNS
query point
x
i s1
g s8
y y
h s2 s6
s4 e w s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5
x d e
Notes on k-d NNS
• Has been shown to run in O(log n)
average time per search in a reasonable
model.
• Storage for the k-d tree is O(n).
• Preprocessing time is O(n log n) assuming
d is a constant.
Worst-Case for Nearest Neighbor
Search
query point •Half of the points
visited for a query
•Worst case O(n)
•But: on average
(and in practice)
nearest neighbor
y
queries are O(log N)
x
Quad Trees
• Space Partitioning
g
d a b f c
y d e
f g e
b
a c
x
A Bad Case
ab
x
Notes on Quad Trees
• Number of nodes is O(n(1+ log(/n)))
where n is the number of points and is
the ratio of the width (or height) of the key
space and the smallest distance between
two points
• Height of the tree is O(log n + log )
K-D vs Quad
• k-D Trees
– Density balanced trees
– Height of the tree is O(log n) with batch insertion
– Good choice for high dimension
– Supports insert, find, nearest neighbor, range queries
• Quad Trees
– Space partitioning tree
– May not be balanced
– Not a good choice for high dimension
– Supports insert, delete, find, nearest neighbor, range queries
Geometric Data Structures
• Geometric data structures are common.
• The k-d tree is one of the simplest.
– Nearest neighbor search
– Range queries
• Other data structures used for
– 3-d graphics models
– Physical simulations