0% found this document useful (0 votes)

4K views

CSE 326: Data Structures Binary Search Trees

This document discusses binary search trees (BSTs). It begins by introducing the dictionary abstract data type (ADT) and some common operations like insert, find, and remove. It then provides examples and definitions of binary trees and BSTs. Key points covered include tree traversals, the order property of BSTs, and algorithms for common BST operations like search, insertion, and deletion in both recursive and iterative forms. Deletion is noted to be more complex than insertion due to needing to rebalance the tree structure.

Uploaded by

vijayaazeem

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4K views

CSE 326: Data Structures Binary Search Trees

Uploaded by

vijayaazeem

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 150

CSE 326: Data Structures

Binary Search Trees

Today’s Outline

• Dictionary ADT / Search ADT

• Quick Tree Review
• Binary Search Trees
ADTs Seen So Far
• Stack • Priority Queue
– Push – Insert
– Pop – DeleteMin

• Queue
– Enqueue
– Dequeue Then there is decreaseKey…
The Dictionary ADT
• jfogarty
• Data: James
Fogarty
– a set of insert(jfogarty, ….) CSE 666

(key, value) • phenry

pairs Peter
Henry
CSE 002
find(boqin)
• Operations: • boqin • boqin
– Insert (key, Bo, Qin, … Bo
Qin
value) CSE 002
– Find (key)
– Remove (key) The Dictionary ADT is also
called the “Map ADT”
A Modest Few Uses

• Sets
• Dictionaries
• Networks : Router tables
• Operating systems : Page tables
• Compilers : Symbol tables
Probably the most widely used ADT!

5
Implementations
insert find delete

• Unsorted Linked-list

• Unsorted array

• Sorted array
Tree Calculations
Recall: height is max t
number of edges from
root to a leaf

Find the height of the

tree...

runtime:

7
Tree Calculations Example
How high is this tree? A

B C

D E F G

H I

J K L L

M N
8
More Recursive Tree Calculations:
Tree Traversals

A traversal is an order for

visiting all the nodes of a tree +

* 5
Three types:
• Pre-order: Root, left subtree, right subtree 2 4

• In-order: Left subtree, root, right subtree

(an expression tree)

• Post-order: Left subtree, right subtree, root

Inorder Traversal
void traverse(BNode t){
if (t != NULL)
traverse (t.left);
process t.element;
traverse (t.right);
}
}
Binary Trees
• Binary tree is
– a root
A
– left subtree (maybe
empty)
B C
– right subtree (maybe
empty)
D E F

• Representation: G H
Data
left right I J
pointer pointer

11
Binary Tree: Representation
A
left right
pointer pointer A

B C B C
left right left right
pointer pointer pointer pointer
D E F

D E F
left right left right left right
pointer pointer pointer pointer pointer pointer

12
Binary Tree: Special Cases

A A A

B C B C B C

D E F D E F G D E F G

Complete Tree Perfect Tree

H I

Full Tree
Binary Tree: Some Numbers!
For binary tree of height h:
– max # of leaves:

– max # of nodes:

– min # of leaves:

– min # of nodes:

Average Depth for N nodes?

Binary Search Tree Data Structure
• Structural property
– each node has  2 children
– result:
• storage is small 8
• operations are simple
• average depth is small
5 11
• Order property
– all keys in left subtree smaller
than root’s key 2 6 10 12
– all keys in right subtree larger
than root’s key
– result: easy to find any given key 4 7 9 14

• What must I know about what I store? 13

Example and Counter-Example

5 8

4 8 5 11

1 7 11 2 7 6 10 18

3 4 15 20

BINARY SEARCH TREES? 21

Find in BST, Recursive
Node Find(Object key,
10 Node root) {
if (root == NULL)
return NULL;
5 15
if (key < root.key)
return Find(key,
2 9 20 root.left);
else if (key > root.key)
return Find(key,
7 17 30
root.right);
else
return root;
Runtime: }
Find in BST, Iterative
Node Find(Object key,
Node root) {
10
while (root != NULL &&
root.key != key) {
if (key < root.key)
5 15
root = root.left;
else 2 9 20
root = root.right;
}
7 17 30
return root;
}
Runtime:
Insert in BST
10
Insert(13)
5 15 Insert(8)
Insert(31)

2 9 20

7 17 30
Insertions happen only
at the leaves – easy!
Runtime:
BuildTree for BST
• Suppose keys 1, 2, 3, 4, 5, 6, 7, 8, 9 are
inserted into an initially empty BST.
Runtime depends on the order!
– in given order

– in reverse order

– median first, then left median, right median, etc.

Bonus: FindMin/FindMax

• Find minimum 10

5 15

• Find maximum 2 9 20

7 17 30
Deletion in BST
10

5 15

2 9 20

7 17 30

Why might deletion be harder than insertion?

Lazy Deletion
Instead of physically deleting
nodes, just mark them as
deleted
10
+ simpler
+ physical deletions done in
batches 5 15
+ some adds just flip deleted flag

– extra memory for “deleted” flag 2 9 20

– many lazy deletions = slow
finds
– some operations may have to 7 17 30
be modified (e.g., min and
max)
Non-lazy Deletion
• Removing an item disrupts the tree
structure.
• Basic idea: find the node that is to be
removed. Then “fix” the tree so that it is
still a binary search tree.
• Three cases:
– node has no children (leaf node)
– node has one child
– node has two children
Non-lazy Deletion – The Leaf Case

Delete(17) 10

5 15

2 9 20

7 17 30
Deletion – The One Child Case

Delete(15) 10

5 15

2 9 20

7 30
Deletion – The Two Child Case
10
Delete(5)
5 20

2 9 30

What can we replace 5 with?

Deletion – The Two Child Case
Idea: Replace the deleted node with a value
guaranteed to be between the two child subtrees

Options:
• succ from right subtree: findMin(t.right)
• pred from left subtree : findMax(t.left)

Now delete the original node containing succ or pred

• Leaf or one child case – easy!
Finally…

7 replaces 5
7 20

2 9 30

Original node containing

7 gets deleted
Balanced BST
Observation
• BST: the shallower the better!
• For a BST with n nodes
– Average height is O(log n)
– Worst case height is O(n)
• Simple cases such as insert(1, 2, 3, ..., n)
lead to the worst case scenario

Solution: Require a Balance Condition that

1. ensures depth is O(log n) – strong enough!
2. is easy to maintain – not too strong!
Potential Balance Conditions
1. Left and right subtrees of the root
have equal number of nodes

2. Left and right subtrees of the root

have equal height
Potential Balance Conditions
3. Left and right subtrees of every node
have equal number of nodes

4. Left and right subtrees of every node

have equal height
CSE 326: Data Structures
AVL Trees
Balanced BST
Observation
• BST: the shallower the better!
• For a BST with n nodes
– Average height is O(log n)
– Worst case height is O(n)
• Simple cases such as insert(1, 2, 3, ..., n)
lead to the worst case scenario

Solution: Require a Balance Condition that

1. ensures depth is O(log n) – strong enough!
2. is easy to maintain – not too strong!
Potential Balance Conditions
1. Left and right subtrees of the root
have equal number of nodes
2. Left and right subtrees of the root
have equal height
3. Left and right subtrees of every node
have equal number of nodes
4. Left and right subtrees of every node
have equal height
The AVL Balance Condition
AVL balance property:

Left and right subtrees of every node

have heights differing by at most 1

• Ensures small depth

– Will prove this by showing that an AVL tree of height
h must have a lot of (i.e. O(2h)) nodes
• Easy to maintain
– Using single and double rotations
The AVL Tree Data Structure
Structural properties
1. Binary tree property
(0,1, or 2 children) 8
2. Heights of left and right
subtrees of every node 5 11
differ by at most 1
Result: 2 6 10 12
Worst case depth of any
node is: O(log n) 4 7 9 13 14

Ordering property 15
– Same as for BST
6 AVL trees or not?

4 8

1 7 11

10 12
6

4 8

1 5 7 11
3

2
Proving Shallowness Bound
Let S(h) be the min # of nodes in an AVL tree of height h=4
AVL tree of height h with the min # of nodes (12)

Claim: S(h) = S(h-1) + S(h-2) + 1

5 11
Solution of recurrence: S(h) = O(2h)
(like Fibonacci numbers) 2 6 10 12

7 9 13 14

15
Testing the Balance Property

10 We need to be able to:

1. Track Balance
5 15
2. Detect Imbalance
2 9 20
3. Restore Balance
7 17 30

NULLs have
height -1
An AVL Tree
3 10 data
10 3 height
children
2 2
5 20
0 1 1 0
2 9 15 30
0 0
7 17
AVL trees: find, insert
• AVL find:
– same as BST find.
• AVL insert:
– same as BST insert, except may need to
“fix” the AVL tree after inserting new value.
AVL tree insert
Let x be the node where an imbalance occurs.

Four cases to consider. The insertion is in the

1. left subtree of the left child of x.
2. right subtree of the left child of x.
3. left subtree of the right child of x.
4. right subtree of the right child of x.

Idea: Cases 1 & 4 are solved by a single rotation.

Cases 2 & 3 are solved by a double rotation.
Bad Case #1
Insert(6)
Insert(3)
Insert(1)
Fix: Apply Single Rotation
AVL Property violated at this node (x)
2 1
6 3
1
3 0 0
1 6
0
1

Single Rotation:
1. Rotate between x and child
Single rotation in general
a

b
Z h
h X h Y
h  -1

X<b<Y<a<Z

b
a

h+1 X h h
Y Z

Height of tree before? Height of tree after? Effect on Ancestors?

Bad Case #2
Insert(1)
Insert(6)
Insert(3)
Fix: Apply Double Rotation
AVL Property violated at this node (x)
2 2
1 1 1
3
1 1
6 3
0 0
0 1 6
0
3 6

Double Rotation
1. Rotate between x’s child and grandchild
2. Rotate between x and x’s new child
Double rotation in general
h0
a
b
c h
Z
h W h -1 h-1
X Y

W < b <X < c < Y < a < Z

c
b a

h h h-1 Y h
W X Z

Height of tree before? Height of tree after? Effect on Ancestors?

Double rotation, step 1
15
8 17

4 10 16

3 6
5
15
8 17

6 10 16
4
3 5
Double rotation, step 2
15
8 17

6 10 16
4
3 5

15
6 17
4 8 16
3 5 10
Imbalance at node X

Single Rotation
1. Rotate between x and child

Double Rotation
1. Rotate between x’s child and grandchild
2. Rotate between x and x’s new child
Single and Double Rotations:
Inserting what integer values
would cause the tree to need a:

1. single rotation? 9
5 11
2 7 13

2. double rotation? 0 3

3. no rotation?
Insertion into AVL tree
1. Find spot for new key
2. Hang new node there with this key
3. Search back up the path for imbalance
4. If there is an imbalance:
case #1: Perform single rotation and exit

case #2: Perform double rotation and exit

Both rotations keep the subtree height unchanged.
Hence only one rotation is sufficient!
Easy Insert
3
Insert(3) 10

1 2
5 15

0 0 0 1
2 9 12 20
0 0
17 30

Unbalanced?
Hard Insert (Bad Case #1)
3
Insert(33) 10

2 2
5 15
1 0 0 1
2 9 12 20
0 0 0
3 17 30
Unbalanced?
How to fix?
Single Rotation
3 3
10 10

2 3 2 2
5 15 5 20
1 0 0 2 1 0 1 1
2 9 12 20 2 9 15 30
0 0 1 0 0 0 0
3 17 30 3 12 17 33
0
33
Hard Insert (Bad Case #2)
3
Insert(18) 10

2 2
5 15
1 0 0 1
2 9 12 20
0 0 0
Unbalanced? 3 17 30

How to fix?
Single Rotation (oops!)
3 3
10 10

2 3 2 3
5 15 5 20
1 0 0 2 1 0 2 0
2 9 12 20 2 9 15 30
0 1 0 0 0 1
3 17 30 3 12 17
0 0
18 18
Double Rotation (Step #1)
3 3
10 10

2 3 2 3
5 15 5 15
1 0 0 2 1 0 0 2
2 9 12 20 2 9 12 17
0 1 0 0 1
3 17 30 3 20
0 0 0
18 18 30
Double Rotation (Step #2)
3 3
10 10

2 3 2 2
5 15 5 17
1 0 0 2 1 0 1 1
2 9 12 17 2 9 15 20
0 1 0 0 0 0
3 20 3 12 18 30
0 0
18 30
Insert into an AVL tree: 5, 8, 9, 4, 2, 7, 3, 1
CSE 326: Data Structures
Splay Trees
AVL Trees Revisited
• Balance condition:
Left and right subtrees of every node
have heights differing by at most 1
– Strong enough : Worst case depth is O(log n)
– Easy to maintain : one single or double rotation

• Guaranteed O(log n) running time for

– Find ?
– Insert ?
– Delete ?
– buildTree ?
Single and Double
a
Rotations
b
Z h
h X h Y

a
b
c h
Z
h W h -1 h-1
X Y
AVL Trees Revisited
• What extra info did we maintain in each
node?

• Where were rotations performed?

• How did we locate this node?

Other Possibilities?
• Could use different balance conditions, different ways
to maintain balance, different guarantees on running
time, …

• Why aren’t AVL trees perfect?

• Many other balanced BST data structures

– Red-Black trees
– AA trees
– Splay Trees
– 2-3 Trees
– B-Trees
– …
Splay Trees
• Blind adjusting version of AVL trees
– Why worry about balances? Just rotate
anyway!
• Amortized time per operations is O(log n)
• Worst case time per operation is O(n)
– But guaranteed to happen rarely

Insert/Find
SAT/GRE always
Analogy rotate node to the
question:
root!
AVL is to Splay trees as ___________ is to __________
Recall: Amortized Complexity
If a sequence of M operations takes O(M f(n)) time,
we say the amortized runtime is O(f(n)).

• Worst case time per operation can still be large, say O(n)

• Worst case time for any sequence of M operations is O(M f(n))

Average time per operation for any sequence is O(f(n))

Amortized complexity is worst-case guarantee over

sequences of operations.
Recall: Amortized Complexity
• Is amortized guarantee any weaker than
worstcase?

• Is amortized guarantee any stronger than

averagecase?

• Is average case guarantee good enough in

practice?

• Is amortized guarantee good enough in

practice?
The Splay Tree Idea
10

If you’re forced to make 17

a really deep access:

Since you’re down there anyway,

fix up a lot of deep nodes!

2 9

3
Find/Insert in Splay Trees
1. Find or insert a node k
2. Splay k to the root using:
zig-zag, zig-zig, or plain old zig rotation

Why could this be good??

1. Helps the new root, k

o Great if k is accessed again

2. And helps many others!

o Great if many others on the path are accessed
Splaying node k to the root:
Need to be careful!
One option (that we won’t use) is to repeatedly use AVL
single rotation until k becomes the root: (see Section
4.5.1 for details)

p
k
q
F
r s p
E
s q
D A B F
A k r
E
B C C D
Splaying node k to the root:
Need to be careful!
What’s bad about this process?

p k
q F s p
r E
s A B q
D F
k r E
A
B C C D
Splay: Zig-Zag *

g k

X p g p

k W X Y Z W

Y Z

*
Just like an… Which nodes improve depth?
Splay: Zig-Zig*
g k

W p p
Z

X k g
Y

Y Z W X

*Is this just two AVL single rotations in a row?

Special Case for Root: Zig
root p k root

k Z
p
X

X Y Y Z

Relative depth of p, Y, Z? Relative depth of everyone else?

Why not drop zig-zig and just zig all the way?
Splaying Example: Find(6)
1 1

2 2
?
3 3
Find(6)
4 6

5 5

6 4
Still Splaying 6
1 1

2 6
?
3 3

6 2 5

5 4

4
Finally…
1 6

6 1
?
3 3

2 5 2 5

4 4
Another Splay: Find(4)
6 6

1 1
?
3 4
Find(4)
2 5 3 5

4 2
Example Splayed Out
6 4

1 1 6
?
4 3 5

3 5 2

2
But Wait…
What happened here?

Didn’t two find operations take linear time

instead of logarithmic?

What about the amortized O(log n)

guarantee?
Why Splaying Helps
• If a node n on the access path is at depth d
before the splay, it’s at about depth d/2 after
the splay

• Overall, nodes which are low on the access

path tend to move closer to the root

• Splaying gets amortized O(log n)

performance. (Maybe not now, but soon, and for the rest of the
operations.)
Practical Benefit of Splaying
• No heights to maintain, no imbalance to
check for
– Less storage per node, easier to code

• Data accessed once, is often soon

accessed again
– Splaying does implicit caching by bringing it to
the root
Splay Operations: Find
• Find the node in normal BST manner
• Splay the node to the root
– if node not found, splay what would have
been its parent

What if we didn’t splay?

Amortized guarantee fails!
Bad sequence: find(leaf k), find(k), find(k), …
Splay Operations: Insert
• Insert the node in normal BST manner
• Splay the node to the root

What if we didn’t splay?

Splay Operations: Remove

k
find(k) delete k
L R
L R <k >k

Now what?
Join(L, R): Join
given two trees such that (stuff in L) < (stuff in R), merge
them:
splay L

max
L R R

Splay on the maximum element in L, then

attach R
Does this work to join any two trees?
Delete Example
Delete(4)
6 4 6

1 9 find(4) 1 6 1 9

4 7 2 9 2 7

Find max
2 7
2 2

1 6 1 6

9 9

7 7
Splay Tree Summary
• All operations are in amortized O(log n) time

• Splaying can be done top-down; this may be better

because:
– only one pass
– no recursion or parent pointers necessary
– we didn’t cover top-down in class

• Splay trees are very effective search trees

– Relatively simple
– No extra fields required
– Excellent locality properties:
frequently accessed keys are cheap to find
Splay E

I
H
A
B
G

F
C
D
E
Splay E

A
I
B
H
C
G
D
F
E
CSE 326: Data Structures
B-Trees
B-Trees

Weiss Sec. 4.7

TIme to access
CPU (conservative)
(has registers) 1 ns per instruction

SRAM Cache
Cache
8KB - 4MB
2-10 ns

Main Memory
DRAM Main Memory
up to 10GB
40-100 ns

Disk
a few
many GB Disk milliseconds
(5-10 Million ns)
Trees so far

• BST

• AVL

• Splay
AVL trees
Suppose we have 100 million items
(100,000,000):

• Depth of AVL Tree

• Number of Disk Accesses

M-ary Search Tree

• Maximum branching factor

of M
• Complete tree has height =

# disk accesses for find:

Runtime of find:
Solution: B-Trees
• specialized M-ary search trees

• Each node has (up to) M-1 keys:

– subtree between two keys x and y contains
leaves with values v such that 3 7 12 21
xv<y

• Pick branching factor M

such that each node
takes one full
{page, block} x<3 3x<7 7x<12 12x<21 21x

of memory
B-Trees
What makes them disk-friendly?

1. Many keys stored in a node

• All brought to memory/cache in one access!

2. Internal nodes contain only keys;

Only leaf nodes contain keys and actual
data
• The tree structure can be loaded into memory
irrespective of data object size
• Data actually resides in disk
B-Tree: Example
B-Tree with M = 4 (# pointers in internal node)
and L = 4 (# data items in leaf)
10 40

3 15 20 30 50

1 2 10 11 12 20 25 26 40 42
AB xG 3 5 6 9 15 17 30 32 33 36 50 60 70

Data objects, that I’ll Note: All leaves at the same depth!
ignore in slides
B-Tree Properties ‡

– Data is stored at the leaves

– All leaves are at the same depth and contains between
L/2 and L data items
– Internal nodes store up to M-1 keys
– Internal nodes have between M/2 and M children
– Root (special case) has between 2 and M children (or
root could be a leaf)

‡
These are technically B+-Trees
Example, Again
B-Tree with M = 4
and L = 4

10 40

3 15 20 30 50

1 2 10 11 12 20 25 26 40 42
3 5 6 9 15 17 30 32 33 36 50 60 70

(Only showing keys, but leaves also have data!)

Building a B-Tree

3 3 14
Insert(3) Insert(14)
The empty
B-Tree
M = 3 L = 2

Now, Insert(1)?
M = 3 L = 2

Splitting the Root

Too many
keys in a leaf!

1 3 14 14
Insert(1) And create
3 14
a new root 1 3 14
1 3 14

So, split the leaf.

M = 3 L = 2

Overflowing leaves Too many

keys in a leaf!

14
14 14
Insert(59) Insert(26)
1 3
1 3 14 1 3 14 59

14 26 59

So, split the leaf.

14 59
And add
a new child
1 3 14 26 59
M = 3 L = 2
Propagating Splits
14 59
14 59
Insert(5) 14 26 59 Add new
1 3 14 26 59 child

1 3 5
Split the leaf, but no space in parent!

14
5 14 59

5 59 Create a
new root
1 3 5 14 26 59 1 3 5 14 26 59

So, split the node.

Insertion Algorithm
1. Insert the key in its leaf 3. If an internal node ends up
2. If the leaf ends up with L+1 with M+1 items, overflow!
items, overflow! – Split the node into two nodes:
– Split the leaf into two nodes: • original with (M+1)/2 items
• original with (L+1)/2 items • new one with (M+1)/2 items
• new one with (L+1)/2 items – Add the new child to the parent
– Add the new child to the parent – If the parent ends up with M+1
– If the parent ends up with M+1 items, overflow!
items, overflow!
4. Split an overflowed root in two
and hang the new nodes
under a new root

This makes the tree deeper!

M = 3 L = 2

After More Routine Inserts

5 59 Insert(89)
Insert(79)
1 3 5 14 26 59

5 59 89

1 3 5 14 26 59 79 89
M = 3 L = 2

Deletion
1. Delete item from leaf
2. Update keys of ancestors if necessary

14 14

Delete(59)
5 59 89 5 79 89

1 3 5 14 26 59 79 89 1 3 5 14 26 79 89

What could go wrong?

M = 3 L = 2
Deletion and Adoption
A leaf has too few keys!
14 14

Delete(5)
5 79 89 ? 79 89

1 3 5 14 26 79 89 1 3 14 26 79 89

So, borrow from a sibling

3 79 89

1 3 3 14 26 79 89
Does Adoption Always Work?
• What if the sibling doesn’t have enough for
you to borrow from?

e.g. you have L/2-1 and sibling has

L/2 ?
M = 3 L = 2
Deletion and Merging
A leaf has too few keys!
14 14

Delete(3)
3 79 89 ? 79 89

1 3 14 26 79 89 1 14 26 79 89

And no sibling with surplus!

So, delete
79 89
the leaf
But now an internal node
has too few subtrees! 1 14 26 79 89
M = 3 L = 2

Deletion with Propagation

(More Adoption)

14 79

Adopt a
79 89 14 89
neighbor

1 14 26 79 89 1 14 26 79 89
M = 3 L = 2

A Bit More Adoption

79 79

Delete(1)
14 89 26 89
(adopt a
sibling)
1 14 26 79 89 14 26 79 89
M = 3 L = 2
Pulling out the Root A leaf has too few keys!
And no sibling with surplus!
79 79

Delete(26) So, delete

26 89 89
the leaf;
merge
14 26 79 89 14 79 89

But now the root A node has too few subtrees

has just one subtree! and no neighbor with surplus!
79

Delete
79 89 89
the node

14 79 89 14 79 89
M = 3 L = 2

Pulling out the Root (continued)

The root
has just one subtree!
Simply make
the one child
79 89
the new root!

14 79 89

79 89

14 79 89
Deletion Algorithm
1. Remove the key from its leaf

2. If the leaf ends up with fewer

than L/2 items, underflow!
– Adopt data from a sibling;
update the parent
– If adopting won’t work, delete
node and merge with neighbor
– If the parent ends up with
fewer than M/2 items,
underflow!
Deletion Slide Two
3. If an internal node ends up with
fewer than M/2 items, underflow!
– Adopt from a neighbor;
update the parent
– If adoption won’t work,
merge with neighbor
– If the parent ends up with fewer than
M/2 items, underflow!

4. If the root ends up with only one This reduces the

child, make the child the new root height of the tree!
of the tree
Thinking about B-Trees
• B-Tree insertion can cause (expensive)
splitting and propagation
• B-Tree deletion can cause (cheap)
adoption or (expensive) deletion, merging
and propagation
• Propagation is rare if M and L are large
(Why?)
• If M = L = 128, then a B-Tree of height
4 will store at least 30,000,000 items
Tree Names You Might Encounter
FYI:
– B-Trees with M = 3, L = x are called 2-3
trees
• Nodes can have 2 or 3 keys
– B-Trees with M = 4, L = x are called 2-3-4
trees
• Nodes can have 2, 3, or 4 keys
K-D Trees and Quad Trees
Range Queries
• Think of a range query.
– “Give me all customers aged 45-55.”
– “Give me all accounts worth $5m to $15m”

• Can be done in time ________.

• What if we want both:

– “Give me all customers aged 45-55 with
accounts worth between $5m and $15m.”
Geometric Data Structures
• Organization of points, lines, planes, etc in
support of faster processing
• Applications
– Map information
– Graphics - computing object intersections
– Data compression - nearest neighbor search
– Decision Trees - machine learning
k-d Trees
• Jon Bentley, 1975, while an undergraduate
• Tree used to store spatial data.
– Nearest neighbor search.
– Range queries.
– Fast look-up
• k-d tree are guaranteed log2 n depth where n
is the number of points in the set.
– Traditionally, k-d trees store points in
d-dimensional space which are equivalent to
vectors in d-dimensional space.
Range Queries

i i
g g
h h

y d e y d e
f f

b b
a c a c

x x

Rectangular query Circular query

Nearest Neighbor Search

i
g
h

y d e
f
query

b
a c

Nearest neighbor is e.
k-d Tree Construction
• If there is just one point, form a leaf with that point.
• Otherwise, divide the points in half by a line
perpendicular to one of the axes.
• Recursively construct k-d trees for the two sets of
points.
• Division strategies
– divide points perpendicular to the axis with widest spread.
– divide in a round-robin fashion (book does it this way)
k-d Tree Construction

i
g
h

y d e
f

b
a c

x
divide perpendicular to the widest spread.
k-d Tree Construction (18)
k-d tree cell
x
i s1

g s8
y y
h s2 s6
s4 e s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5

x d e
2-d Tree Decomposition
2

3
k-d Tree Splitting
sorted points in each dimension
1 2 3 4 5 6 7 8 9
i x a d g be i c h f
y a c b d f e h g i
g
h
• max spread is the max of
e fx -ax and iy - ay.
y d f
• In the selected dimension the
b middle point in the list splits the
a c data.

• To build the sorted lists for the

x other dimensions scan the sorted
list adding each point to one of two
sorted lists.
k-d Tree Splitting
sorted points in each dimension
1 2 3 4 5 6 7 8 9
i x a d g be i c h f
y a c b d f e h g i
g
h
indicator for each set
y d e
f a b c de f g h i
0 0 1 00 1 0 1 1
b
a c scan sorted points in y dimension
and add to correct set
x
y a b d eg c f h i
k-d Tree Construction
Complexity
• First sort the points in each dimension.
– O(dn log n) time and dn storage.
– These are stored in A[1..d,1..n]
• Finding the widest spread and equally divide
into two subsets can be done in O(dn) time.
• We have the recurrence
– T(n,d) < 2T(n/2,d) + O(dn)
• Constructing the k-d tree can be done in
O(dn log n) and dn storage
Node Structure for k-d Trees
• A node has 5 fields
– axis (splitting axis)
– value (splitting value)
– left (left subtree)
– right (right subtree)
– point (holds a point if left and right children
are null)
Rectangular Range Query
• Recursively search every cell that
intersects the rectangle.
Rectangular Range Query (1)
x
i s1

g s8
y y
h s2 s6
s4 e s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5

x d e
Rectangular Range Query (8)
x
i s1

g s8
y y
h s2 s6
s4 e s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5

x d e
Rectangular Range Query
print_range(xlow, xhigh, ylow, yhigh :integer, root: node pointer) {
Case {
root = null: return;
root.left = null:
if xlow < root.point.x and root.point.x < xhigh
and ylow < root.point.y and root.point.y < yhigh
then print(root);
else
if(root.axis = “x” and xlow < root.value ) or
(root.axis = “y” and ylow < root.value ) then
print_range(xlow, xhigh, ylow, yhigh, root.left);
if (root.axis = “x” and xlow > root.value ) or
(root.axis = “y” and ylow > root.value ) then
print_range(xlow, xhigh, ylow, yhigh, root.right);
}}
k-d Tree Nearest Neighbor
Search
• Search recursively to find the point in the
same cell as the query.
• On the return search each subtree where
a closer point than the one you already
know about might be found.
k-d Tree NNS (1)
query point
x
i s1

g s8
y y
h s2 s6
s4 e s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5

x d e
k-d Tree NNS
query point
x
i s1

g s8
y y
h s2 s6
s4 e w s6
y d f x y y y
s5
s2
s3 s4 s7 s8
b s7
a c a b x g c f h i
s3 s1
s5

x d e
Notes on k-d NNS
• Has been shown to run in O(log n)
average time per search in a reasonable
model.
• Storage for the k-d tree is O(n).
• Preprocessing time is O(n log n) assuming
d is a constant.
Worst-Case for Nearest Neighbor
Search
query point •Half of the points
visited for a query
•Worst case O(n)
•But: on average
(and in practice)
nearest neighbor
y
queries are O(log N)

x
Quad Trees
• Space Partitioning

g
d a b f c
y d e
f g e

b
a c

x
A Bad Case

x
Notes on Quad Trees
• Number of nodes is O(n(1+ log(/n)))
where n is the number of points and  is
the ratio of the width (or height) of the key
space and the smallest distance between
two points
• Height of the tree is O(log n + log )
K-D vs Quad
• k-D Trees
– Density balanced trees
– Height of the tree is O(log n) with batch insertion
– Good choice for high dimension
– Supports insert, find, nearest neighbor, range queries
• Quad Trees
– Space partitioning tree
– May not be balanced
– Not a good choice for high dimension
– Supports insert, delete, find, nearest neighbor, range queries
Geometric Data Structures
• Geometric data structures are common.
• The k-d tree is one of the simplest.
– Nearest neighbor search
– Range queries
• Other data structures used for
– 3-d graphics models
– Physical simulations

Binary Search Tree or BST
No ratings yet
Binary Search Tree or BST
44 pages
Binary Search Trees & AVL Trees
No ratings yet
Binary Search Trees & AVL Trees
38 pages
Lecture9 SearchTrees
No ratings yet
Lecture9 SearchTrees
42 pages
Trees
No ratings yet
Trees
65 pages
Trees
No ratings yet
Trees
42 pages
Dynamic Trees Hjs-Fa14
No ratings yet
Dynamic Trees Hjs-Fa14
81 pages
ds unit 5 (1)
No ratings yet
ds unit 5 (1)
6 pages
Stacks and Queues
No ratings yet
Stacks and Queues
126 pages
Lecture #10 - Tree
No ratings yet
Lecture #10 - Tree
35 pages
ADS & A Unit-2 Study Material
No ratings yet
ADS & A Unit-2 Study Material
26 pages
Hierarchical Data Structures
No ratings yet
Hierarchical Data Structures
21 pages
Trees
No ratings yet
Trees
48 pages
COA_PUT_SOLUTION
No ratings yet
COA_PUT_SOLUTION
83 pages
Dsa Module 4
No ratings yet
Dsa Module 4
82 pages
Bal Search Trees
No ratings yet
Bal Search Trees
92 pages
Avl PDF
No ratings yet
Avl PDF
60 pages
CS 133 - Data Structures and File Organization: Binary Tree
No ratings yet
CS 133 - Data Structures and File Organization: Binary Tree
66 pages
08-BST and AVL Trees
No ratings yet
08-BST and AVL Trees
33 pages
UNit 4
No ratings yet
UNit 4
25 pages
Lecture07_SearchTrees
No ratings yet
Lecture07_SearchTrees
37 pages
Data Structures: Binary Trees
No ratings yet
Data Structures: Binary Trees
26 pages
CSE 326 Lecture 7: More On Search Trees Today's Topics:: From Last Time: Remove (Delete) Operation
No ratings yet
CSE 326 Lecture 7: More On Search Trees Today's Topics:: From Last Time: Remove (Delete) Operation
19 pages
Tree
No ratings yet
Tree
24 pages
Binary Tree
No ratings yet
Binary Tree
39 pages
Data Structures. MOD_4
No ratings yet
Data Structures. MOD_4
17 pages
Balanced+Trees REVIEW
No ratings yet
Balanced+Trees REVIEW
2 pages
Lec 6,7 BST
No ratings yet
Lec 6,7 BST
47 pages
Topic 4 Trees
No ratings yet
Topic 4 Trees
13 pages
A07
No ratings yet
A07
35 pages
DS Tree
No ratings yet
DS Tree
11 pages
7 Trees3
No ratings yet
7 Trees3
65 pages
Tree
No ratings yet
Tree
60 pages
08 - Search Trees
No ratings yet
08 - Search Trees
50 pages
Binary Search Tree Is A Node-Based Binary Tree
No ratings yet
Binary Search Tree Is A Node-Based Binary Tree
22 pages
Representation: Left - Subtree (Keys) Node (Key) Right - Subtree (Keys)
No ratings yet
Representation: Left - Subtree (Keys) Node (Key) Right - Subtree (Keys)
8 pages
AVL Tree: CSE 203N Data Structures
No ratings yet
AVL Tree: CSE 203N Data Structures
24 pages
Lecture 2.Pptx 3
No ratings yet
Lecture 2.Pptx 3
41 pages
Trees: Cpe311: Data Structure and Algorithm Analysis
No ratings yet
Trees: Cpe311: Data Structure and Algorithm Analysis
32 pages
Binary Search Tree
No ratings yet
Binary Search Tree
47 pages
Tree Final
No ratings yet
Tree Final
109 pages
Tree Data Structure Slides
No ratings yet
Tree Data Structure Slides
23 pages
AVL Trees
No ratings yet
AVL Trees
41 pages
Binary Search Trees: 1 BST Basics
No ratings yet
Binary Search Trees: 1 BST Basics
16 pages
unit -3 tree
No ratings yet
unit -3 tree
74 pages
Binary Search Trees
No ratings yet
Binary Search Trees
9 pages
8 Balanced_BST_new
No ratings yet
8 Balanced_BST_new
78 pages
CSE 326: Data Structures AVL Trees: Potential Balance Conditions
No ratings yet
CSE 326: Data Structures AVL Trees: Potential Balance Conditions
10 pages
Tree Balancing
No ratings yet
Tree Balancing
23 pages
Binary Tree Illustrated
100% (1)
Binary Tree Illustrated
56 pages
Lect 3 Non Linear D DS
No ratings yet
Lect 3 Non Linear D DS
63 pages
Lecture 7 Binary Search Trees, AVL Trees
No ratings yet
Lecture 7 Binary Search Trees, AVL Trees
48 pages
Tree
No ratings yet
Tree
24 pages
Introduction To Trees: Linear Data Structures
No ratings yet
Introduction To Trees: Linear Data Structures
85 pages
AVL TREE
No ratings yet
AVL TREE
14 pages
BST 13 AUGUST
No ratings yet
BST 13 AUGUST
16 pages
5 Trees1
No ratings yet
5 Trees1
106 pages
Dsa24 9
No ratings yet
Dsa24 9
34 pages
Week 7 Search Tree Data Structures
No ratings yet
Week 7 Search Tree Data Structures
19 pages
Binary Search Tree 1
No ratings yet
Binary Search Tree 1
41 pages
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
IBM Lotus Symphony Developer's Guide
No ratings yet
IBM Lotus Symphony Developer's Guide
150 pages
K516 Assignment2.1 - 17summer - Agrawal
No ratings yet
K516 Assignment2.1 - 17summer - Agrawal
749 pages
Stock Transfer From A Plant To Another Plant in SAP
100% (6)
Stock Transfer From A Plant To Another Plant in SAP
13 pages
Informatica Error Handling
100% (1)
Informatica Error Handling
7 pages
1) Self Intraduction. 2) JAM TOPIC (Hindi OR English) - 3) ASK Few Technical Questions & 4) Sales Mock Call (HINDI OR English)
No ratings yet
1) Self Intraduction. 2) JAM TOPIC (Hindi OR English) - 3) ASK Few Technical Questions & 4) Sales Mock Call (HINDI OR English)
6 pages
OpenCV in Visua LC++
No ratings yet
OpenCV in Visua LC++
5 pages
SRS Payroll Management System
0% (3)
SRS Payroll Management System
11 pages
Linux Interview Question
No ratings yet
Linux Interview Question
45 pages
Documents: Bs 3692 Grade 8
0% (2)
Documents: Bs 3692 Grade 8
56 pages
ACN-Assignment 02
No ratings yet
ACN-Assignment 02
7 pages
Survey Questionnaire
100% (1)
Survey Questionnaire
16 pages
Session 5
No ratings yet
Session 5
62 pages
Phdwin User Manual
No ratings yet
Phdwin User Manual
202 pages
Cryptographic Protocol
No ratings yet
Cryptographic Protocol
99 pages
Object Oriented Programming Name: TE-53 Sec: Lab Task 1 Clo 1,2 Plo 1
No ratings yet
Object Oriented Programming Name: TE-53 Sec: Lab Task 1 Clo 1,2 Plo 1
3 pages
PowerBI Interview Questions and Answers
No ratings yet
PowerBI Interview Questions and Answers
8 pages
Citsmart Configuration and Parametrization
No ratings yet
Citsmart Configuration and Parametrization
32 pages
Applying The FAROUT Method
No ratings yet
Applying The FAROUT Method
13 pages
Helper
No ratings yet
Helper
2 pages
Hybris Document
100% (1)
Hybris Document
9 pages
Module Name: Software Engineering Method: Submitted by Zahid Masum NCC Id Center
No ratings yet
Module Name: Software Engineering Method: Submitted by Zahid Masum NCC Id Center
25 pages
Process Scheduling Algorithms
No ratings yet
Process Scheduling Algorithms
3 pages
Week 11b Views
No ratings yet
Week 11b Views
26 pages
Training Report On Linux
No ratings yet
Training Report On Linux
66 pages
2 - Basics of Python Programming7
No ratings yet
2 - Basics of Python Programming7
32 pages
IPCRF Priority Learning Needs Master Teachers
No ratings yet
IPCRF Priority Learning Needs Master Teachers
12 pages
SAP Classical ABAP Syllabus
No ratings yet
SAP Classical ABAP Syllabus
7 pages
System Administrator Resume
No ratings yet
System Administrator Resume
4 pages
Absil Bib PDF
No ratings yet
Absil Bib PDF
20 pages