0% found this document useful (0 votes)
3 views40 pages

index4

Uploaded by

Mx A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views40 pages

index4

Uploaded by

Mx A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Dynamic Multi-Level Index (B+ trees)

 Because of the insertion/deletion problem, most multi-


level indexes use B+-trees:

 In B+-Tree data structures, each node corresponds to


a disk block

 Leave space in each tree node

 Each node is kept between half-full and completely full

 Allows efficient insertion and deletion of search values

1
Search Trees

2
Binary Search Tree
Root

Nodes

Leaf Nodes

3
Binary Search Tree
Root Level 1

3 Parent of 1 Level 2
Nodes

1 Child of 3
Level 3

Level 4

Leaf Nodes

4
Binary Search Tree
Root Level 1

3 Parent of 1 Level 2
Nodes
Subtree
1 Child of 3
Level 3

Subtree
Level 4

Leaf Nodes

In this figure, Binary Search Tree


Each node has 2 pointers (left child & right child)
Fan-out = 2 Tree order = 2
5
Search Tree - Example

Each node has 3 pointers


Fan-out = 3 Tree order = 3

6
Tree-Based Indexing

 Tree-based index: a hierarchical search data


structure that directs searches to the correct page
 Data entries are arranged in sorted order by search
key
 Each node in the tree represents a physical page, so:
 Retrieving a node involves a disk I/O
 The leaf level is: the lowest level of the tree which
contains the data entries k* of the index
 Tree based indexing allows us to efficiently locate all
data entries with search key values in a desired range
Tree-Based Indexing Cont.
Start search
at root:
12 78

age < 12 12<=age<78 age >= 78


A
3 9 19 56 86 94

B
33 44

Leaf Level L1 L2 L3
Daniels, 22, 6003 Basu, 33, 4003 Smith, 44, 3000
Ashby, 25, 3000 Jones, 40, 6003 Tracy, 44, 5004
Bristow, 29, 2007 Cass, 50, 5004

Tree-Structure Index
Tree-Based Indexing Cont.

 So how does it work?


 All searches begin at the topmost node (root)
 The contents of non-leaf (internal) nodes direct searches to
the correct leaf page
 Non-leaf pages contain node pointers separated by search
key values
 The node pointer to the left (right) of a key value k points to a
subtree that contains only data entries less than (greater than
or equal to) k
 The number of disk I/Os = the length of a path from
the root to a leaf + the number of leaf pages with
qualifying data entries
Internal vs. Leaf Nodes
 Internal Nodes
 Guide the search to leaf nodes: pointers are tree
pointers
 Leaf Nodes
 Have an entry for every value of the search field
 Pointers are data pointers
 Data pointers are stored only at leaf nodes

The structure of leaf nodes differs from the structure of internal nodes

11
B+ Tree

 B+ tree: is an index structure that ensures that all


paths from the root to a leaf are of the same length
 The structure is always balanced in height
 It is very fast to find the correct leaf page (faster than
binary search in a sorted file) as:
 Each non-leaf node can hold a very large number of node-
pointers (fan-out)
 The height of the tree is rarely more than 3 or 4
 The height of a balanced tree is the length of a path
from root to leaf
 The root is typically in the buffer pool because it is
frequently accessed
B+ Tree Cont.

 If every non-leaf node has n children (on


average), then a tree of height h has nh leaf
pages
 In practice, n is at least 100, which means a tree of
height 4 contains 100 million leaf pages!
 It is possible to search a file with 100 million
leaf pages and find the required page using 4
I/Os only
 In contrast, binary search of the same file
would take log2 100,000,000 (over 25) I/Os
Example B+ Tree
Search for Holden Suzuki

Internal Nodes
Mazda Toyota

Leaf Nodes

BMW * Holden* Mazda Suzuki Toyota* Volvo*


* *



Data
Holden

Suzuki
Mazda

Toyota
Blocks

Volvo
BMW

14
Example B+ Tree
 Search begins at root, and
 key comparisons direct search to a leaf
 Example: Search for 5, 14

15
B+ Tree Index: Internal Node

X < K1 Ki-1 ≤ X < Ki Kq-1 ≤ X

16
B+ Tree Index: Internal Node
An internal node is of the form:
[p1 , k1 , .. , ki-1, pi , ki , .., kq-1 , pq ]
pi is pointer and ki is field value (key)

1. Keys
 k1 < k2 < …< ki < … < kq-1
 Some search values from the leaf nodes are repeated
in the internal nodes to guide the search
 For all search values X in the subtree pointed at by pi,
we have:
 X < Ki, and X ≥ Ki-1
17
B+ Tree Index: Internal Node
An internal node is of the form:
[p1 , k1 , .. , ki-1, pi , ki , .., kq-1 , pq ]
pi is pointer and ki is field value (key)

2. Pointers:
 Pointers are tree pointers to blocks that are tree nodes
 For a tree of Order q:
 Each internal node has at most:
 q tree pointers
 Each internal node has at least:
 q/2 tree pointers
except the root, which can have at least 2 tree pointers

18
B+ Tree Index: Leaf Node

19
B+ Tree Index: Leaf Node
A leaf node is of the form:
[<k1 , pr1>, .. ,<kq-1 , prq-1 >, pnext ]

1. Pointers pri
 Each pri is a data pointer to ki’s record(s):
 If ki is a key field (i.e., unique)
 pri points to a data block that contains the record
 If ki is a non-key (i.e., repeated)
 pri points to a block containing pointers to the data file
records (as in secondary index)
20
B+ Tree Index: Leaf Node
A leaf node is of the form:
[<k1 , pr1>, .. ,<kq-1 , prq-1 >, pnext ]

2. Pointer pnext
 pnext points to the next leaf node
 Leaf nodes are linked to provide ordered access on the
indexing field
 Traverse leaf nodes as a linked list
 Very useful for range search
 (e.g., cars between BMW and Mazda)
21
Example B+ Tree
Search for all car
Suzuki
makes ≥
Mazda(alphabetically)
Internal Nodes
Mazda Toyota

Leaf Nodes

BMW * Holden* Mazda Suzuki Toyota* Volvo*


* *



Data
Holden

Suzuki

Mazda
Toyota

Blocks

Volvo
BMW

22
Dynamic Multi-Level Index (B+ trees)
 Insertion:
 into a node that is not full is quite efficient
 if a node is full it is split into two nodes
 splitting may propagate to other tree levels
 Deletion:
 efficient if a node remains more than half full
 otherwise, it is merged with neighboring nodes

23
Example B+ Tree
Insert Nissan
Suzuki

Mazda Toyota

BMW * Holden* Mazda


*


Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Volvo
BMW

24
Example B+ Tree
Insert Nissan
Suzuki

Mazda Toyota

BMW * Holden* Mazda Nissan*


*


Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Volvo
BMW

25
Example B+ Tree
Insert Skoda
Suzuki

Mazda Toyota

BMW * Holden* Mazda Nissan*


*


Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Skoda
Volvo
BMW

26
Example B+ Tree
Insert Skoda
Suzuki

Mazda Toyota
Node
Overflow

BMW * Holden*

Mazda Nissan* Skoda*
*


Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Skoda
Volvo
BMW

27
Example B+ Tree
Insert Skoda
Suzuki

Mazda Toyota
Parent?

BMW * Holden* Mazda Nissan* Skoda*


*


Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Skoda
Volvo
BMW

28
Split leaf node L:
Example B+ Tree 1.redistribute entries: If L is of order p,
then (p+1)/2 entries stay in L and the
rest move to Lnew (eg., (3+1)/2=2)
Insert Skoda
Suzuki 2.copy up first key in Lnew into L’s parent
(eg., Skoda)
L’s 3.Insert pointer to Lnew into L’s parent
parent
Mazda Skoda Toyota

L Lnew
BMW * Holden* Mazda Nissan* Skoda*
*


Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Skoda
Volvo
BMW

29
Example B+ Tree
Insert Audi
Suzuki

Mazda Skoda Toyota

BMW * Holden* Mazda Nissan* Skoda*


*



Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Skoda
Volvo
BMW

Audi
30
Example B+ Tree
Insert Audi
Suzuki

Mazda Skoda Toyota

Insert Holden
into parent

Audi* BMW * Holden* Mazda Nissan* Skoda*


* …



Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Skoda
Volvo
BMW

Audi
31
Example B+ Tree
Insert Audi
Suzuki

Node
Overflow
Holden Mazda Skoda Toyota


Audi* BMW * Holden* Mazda Nissan* Skoda*
*



Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Skoda
Volvo
BMW

Audi
32
Example B+ Tree
Insert Audi
Mazda Suzuki

Holden Skoda Toyota

Audi* BMW * Holden* Mazda Nissan* Skoda*


*

Split internal node L:


1.redistribute entries evenly
2.push up middle key (eg. Mazda)



Data
Holden

Nissan
Suzuki

Mazda

Blocks
Toyota

Skoda
“Mazda” is pushed up and appears only once
Volvo
BMW

Audi
in the index (compare to leaf node split?)

33
Inserting an new entry into a B+ tree
 Find correct leaf L, put new entry onto L
 If L has enough space, done 
 Else, must split L (into L and a new node Lnew)
 redistribute entries evenly
 If L is of order p, then (p+1)/2 entries stay in L and
the rest move to Lnew (or some other similar rule)
 copy up first key in Lnew into L’s parent
 Insert pointer to Lnew into L’s parent
 This can happen recursively
 To split internal node:
 redistribute entries evenly
 push up middle key (vs. copy up)
34
Example B+ Tree
Insert ”8”

35
Example B+ Tree
Step 3: Push up “17” into
Insert ”8” parent

alternative rule to split


leaf node L: (p-1)/2
entries stay in L and the rest
move to Lnew
Step 2: copy up “5” into (eg., (5-1)/2=2)
parent

Step 1: leaf node split

36
Example B+ Tree After Inserting 8

Notice that root split lead to increase in height!


Is that a problem?

37
Search in a B+ tree index
 Searching for a record using B+ tree:
1) Read one block at each level in the index
 Number of accessed index blocks = tree height
2) Read the data block containing the searched record

 Total number of accessed blocks = tree height + 1

Try to minimize tree height!

38
Impact of tree order
Tree Order = 3

Tree Order = 5

39
Search in a B+ tree index
 Tree height depends on:
1. The number of entries in leaf nodes (e), and
2. Tree order (i.e., fan-out (fo))

 Tree height is proportional to logfo e


 Fan-out = index blocking factor = block size/entry size
Try to minimize entry size!
 Index entry = <key value, pointer>
 The smaller the size of the key value, the more efficient
the B+ tree (i.e., shorter)

40
Dynamic Multi-Level Index (B+ trees)
 Insertion:
 into a node that is not full is quite efficient
 if a node is full it is split into two nodes
 splitting may propagate to other tree levels
 Deletion:
 efficient if a node remains more than half full
 otherwise, it is merged with neighboring nodes

41

You might also like