0% found this document useful (0 votes)
10 views

Unit V

Data Structures

Uploaded by

Vaishnavi Kaware
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Unit V

Data Structures

Uploaded by

Vaishnavi Kaware
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 55

Data Indexing

Herbert A. Evans
Unit V : Indexing and Multiway
Trees
Indexing and Multiway Trees- Indexing, indexing
techniques-primary, secondary, dense, sparse, Multiway
search trees, B-Tree- insertion, deletion, B+Tree -
insertion, deletion, use of B+ tree inIndexing, Trie Tree.
Purposes of Data Indexing

• What is Data Indexing?

• Why it is important?
Concept of File Systems

• Stores and organizes data into computer


files.

• Makes it easier to find and access data at


any given time.
Database Management Systems

• The file system that manages a database.

• Database - is an organized collection of


logically related data.
How DBMS Accesses Data?
• The operations read, modify, update, and
delete are used to access data from
database.

• DBMS must first transfer the data


temporarily to a buffer in main memory.

• Data is then transferred between disk and


main memory into units called blocks.
Purpose of Data Indexing

• It is a data structure that is added to a file


to provide faster access to the data.

• It reduces the number of blocks that the


DBMS has to check.
Properties of Data Index

• It contains a search key and a pointer.

• Search key - an attribute or set of attributes that is


used to look up the records in a file.

• Pointer - contains the address of where the data


is stored in memory.

• It can be compared to the card catalog system


used in public libraries of the past.
Two Types of Indices

• Ordered index (Primary index or clustering


index) – which is used to access data
sorted by order of values.

• Hash index (secondary index or non-


clustering index ) - used to access data
that is distributed uniformly across a range
of buckets.
Ordered Index
Hash Index
Choosing Indexing Technique
• Five Factors involved when choosing the
indexing technique:
• access type
• access time
• insertion time
• deletion time
• space overhead
Indexing Definitions
• Access type- is the type of access being used.
• Access time - time required to locate the data.
• Insertion time - time required to insert the new
data.
• Deletion time - time required to delete the data.
• Space overhead - the additional space occupied
by the added data structure.
Types of Ordered Indices

• Dense index - an index record appears for


every search-key value in the file.

• Sparse index - an index record that


appears for only some of the values in the
file.
Dense Index
Sparse Index
Choosing Multi-Level Index
• In some cases an index may be too large for efficient
processing.

• In that case use multi-level indexing.

• In multi-level indexing, the primary index is treated as a


sequence file and sparse index is created on it.

• The outer index is a sparse index of the primary index


whereas the inner index is the primary index.
Multi-Level Index
M -Way Tree
M- Way search trees are generalized versions of binary
search trees. The goal of m-way search tree is to minimize
the accesses while retrieving a key from a file.

However an m-way search tree of height h calls for O(h)


number of accesses for an insert/delete/retrieval operation.
B-Tree
A B-tree of order m is an m-way tree (i.e., a tree
where each node may have up to m children) in
which:

Every node has at most m children.


Every non-leaf node (except root) has at least ⌈m/2⌉ child
nodes. The root has at least two children if it is not a leaf node.
A non-leaf node with k children contains k − 1 keys.
B-Tree(Cont.)
All leaves appear at the same level.

It generalizes Binary Search Tree.


An example B-Tree
26 A B-tree of order 5
containing 26 items
6 12

42 51 62
1 2 4 7 8 13 15 18 25

27 29 45 46 48 53 55 60 64 70 90

Note that all the leaves are at the same level


B-Trees
Construction of a B-Tree
• Attempt to insert the new key into a leaf
• If this would result in that leaf becoming too big, split
the leaf into two, promoting the middle key to the leaf’s
parent
• If this would result in the parent becoming too big,
split the parent into two, promoting the middle key
• This strategy might have to be repeated all the way to
the top
• If necessary, the root is split in two and the middle key
is promoted to a new root, making the tree one level
higher
Example of B-Tree
 Construct a B-tree of order 3 where keys arrive in the
following order:

20 10 50 30 40 70 60 80 90
Solution
20 10 50 30 40 70 60 80 90

20
• 1

• 2.
10,20

• 3.

10,20
20 ,50 20

10 50
Solution

• 4.
20

10 30,50

• 5. 20 20,40

10 30,40
40 ,50 10 30 50
Solution

20,40
• 6.

10 30 50,70

• 7.
20,40 40
20, 40, 60

10 30 60 70
50,60, 10 30 50 70
Solution

40

20 60

10 30 50 70
Solution

• 8.
40

20 60

10 30 50 70,80
Solution

• 9.
40

20 60

10 30 50 70, 80
80, 90
Solution

40

20 60,80

10 30 50 70 90
Constructing a B-tree
• Suppose we start with an empty B-tree and keys
arrive in the following order:1 12 8 2 25 6 14
28 17 7 52 16 48 68 3 26 29 53 55 45
• We want to construct a B-tree of order 5
• The first four items go into the root:

1 2 8 12

• To put the fifth item in the root would violate


condition 5
• Therefore, when 25 arrives, pick the middle key to
make a new root
B-Trees
Constructing a B-tree (contd.)

1 2 12 25

6, 14, 28 get added to the leaf nodes:


8

1 2 6 12 14 25 28

B-Trees
Constructing a B-tree (contd.)
Adding 17 to the right leaf node would over-fill it, so we take the
middle key, promote it (to the root) and split the leaf
8 17

1 2 6 12 14 25 28

7, 52, 16, 48 get added to the leaf nodes


8 17

1 2 6 7 12 14 16 25 28 48 52

B-Trees
Constructing a B-tree (contd.)
Adding 68 causes us to split the right most leaf, promoting 48 to the
root, and adding 3 causes us to split the left most leaf, promoting 3
to the root; 26, 29, 53, 55 then go into the leaves
3 8 17 48

1 2 6 7 12 14 16 25 26 28 29 52 53 55 68

Adding 45 causes a split of 25 26 28 29

and promoting 28 to the root then causes the root to split

B-Trees
Constructing a B-tree (contd.)

17

3 8 28 48

1 2 6 7 12 14 16 25 26 29 45 52 53 55 68

B-Trees
Inserting into a B-Tree
• Attempt to insert the new key into a leaf
• If this would result in that leaf becoming too big, split the
leaf into two, promoting the middle key to the leaf’s
parent
• If this would result in the parent becoming too big, split
the parent into two, promoting the middle key
• This strategy might have to be repeated all the way to
the top
• If necessary, the root is split in two and the middle key is
promoted to a new root, making the tree one level higher

B-Trees
Exercise in Inserting a B-Tree
• Insert the following keys to a 5-way B-tree:
• 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8,
19, 4, 31, 35, 56

B-Trees
Removal from a B-tree
• During insertion, the key always goes into a leaf. For
deletion we wish to remove from a leaf. There are three
possible ways we can do this:
• 1 - If the key is already in a leaf node, and removing it
doesn’t cause that leaf node to have too few keys, then
simply remove the key to be deleted.
• 2 - If the key is not in a leaf then it is guaranteed (by the
nature of a B-tree) that its predecessor or successor will
be in a leaf -- in this case we can delete the key and
promote the predecessor or successor key to the non-
leaf deleted key’s position.

B-Trees
Removal from a B-tree (2)
• If (1) or (2) lead to a leaf node containing less than the minimum
number of keys then we have to look at the siblings immediately
adjacent to the leaf in question:
– 3: if one of them has more than the min. number of keys then we
can promote one of its keys to the parent and take the parent
key into our lacking leaf
– 4: if neither of them has more than the min. number of keys then
the lacking leaf and one of its neighbours can be combined with
their shared parent (the opposite of promoting a key) and the
new leaf will have the correct number of keys; if this step leave
the parent with too few keys then we repeat the process up to
the root itself, if required

B-Trees
Type #1: Simple leaf deletion

Assuming a 5-way
B-Tree, as before... 12 29 52

2 7 9 15 22 31 43 56 69 72

Delete 2: Since there are enough


keys in the node, just delete it
Note when printed: this slide is animated

B-Trees
Type #2: Simple non-leaf
deletion

Delete 52
12 29 56
52

7 9 15 22 31 43 56 69 72

Borrow the predecessor


or (in this case) successor

Note when printed: this slide is animated

B-Trees
Type #4: Too few keys in node
and its siblings

12 29 56

Join back together

7 9 15 22 31 43 69 72
Too few keys!
Delete 72

Note when printed: this slide is animated

B-Trees
Type #4: Too few keys in node
and its siblings

12 29

7 9 15 22 31 43 56 69

Note when printed: this slide is animated

B-Trees
Type #3: Enough siblings

12 29
Demote root key and
promote leaf key

7 9 15 22 31 43 56 69

Delete 22

Note when printed: this slide is animated

B-Trees
Type #3: Enough siblings

12 31

7 9 15 29 43 56 69

Note when printed: this slide is animated

B-Trees
B+ Tree
• B+ Tree is an extension of B Tree which allows efficient
insertion, deletion and search operations.
• In B Tree, Keys and records both can be stored in the
internal as well as leaf nodes. Whereas, in B+ tree,
records (data) can only be stored on the leaf nodes while
internal nodes can only store the key values.
• The leaf nodes of a B+ tree are linked together in the
form of a singly linked lists to make the search queries
more efficient.
B+Tree Advantages
Comparison of B Tree and B+Tree
B+Tree Example
Trie Tree
• What is Trie-
• The word "Trie" is an excerpt from the word
"retrieval". Trie is a sorted tree-based data-structure
that stores the set of strings.
• A trie (derived from retrieval) is a multiway tree data
structure used for storing strings over an alphabet. It
is used to store a large amount of strings. The
pattern matching can be done efficiently using tries.
• The trie shows words like allot, alone, ant, and, are,
bat, bad. The idea is that all strings sharing common
prefix should come from a common node. The tries
are used in spell checking programs.
Trie Tree
Properties of the Trie for a set of the string:

• The root node of the trie always represents the null


node.
• Each child of nodes is sorted alphabetically.
• Each node can have a maximum of 26 children (A to
Z).
• Each node (except the root) can store one letter of
the alphabet.
Trie Tree
• Create Trie tree for string and, ant, dad, do
Trie Tree
• The diagram below depicts a trie representation for the bell, bear,
bore, bat, ball, stop, stock, and stack.
Thank You

You might also like