0% found this document useful (0 votes)
39 views

CO3 Notes Indexing

Indexing allows for quick retrieval of records from a database. There are different types of indexes like primary indexes, secondary indexes, clustered indexes, and multi-level indexes. Primary indexes contain a key field and pointer to each data block. Secondary indexes provide additional ways to access data through non-key fields. Clustered indexes group similar records together. Multi-level indexes use B-tree or B+-tree structures to allow efficient insertion and deletion while maintaining a balanced tree.

Uploaded by

Nani Yagneshwar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

CO3 Notes Indexing

Indexing allows for quick retrieval of records from a database. There are different types of indexes like primary indexes, secondary indexes, clustered indexes, and multi-level indexes. Primary indexes contain a key field and pointer to each data block. Secondary indexes provide additional ways to access data through non-key fields. Clustered indexes group similar records together. Multi-level indexes use B-tree or B+-tree structures to allow efficient insertion and deletion while maintaining a balanced tree.

Uploaded by

Nani Yagneshwar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CO3

Indexing

An index for a file in a database system works in much the same way as the index in this
textbook. Database-system indices play the same role as book indices in libraries. For example,
to retrieve a student record given an ID, the database system would look up an index to find on
which disk block the corresponding record resides, and then fetch the disk block, to get the
appropriate student record.
Indexing is a data structure technique which allows you to quickly retrieve records from a
database file. An Index is a small table having only two columns. The first column comprises
a copy of the primary or candidate key of a table. Its second column contains a set
of pointers for holding the address of the disk block where that specific key value stored.

For a file with a given record structure consisting of several fields (or attributes), an index
access structure is usually defined on a single field of a file, called an indexing field (or
indexing attribute). The index typically stores each value of the index field along with a list
of pointers to all disk blocks that contain records with that field value. The values in the index
are ordered so that we can do a binary search on the index. If both the data file and the index
file are ordered, and since the index file is typically much smaller than the data file, searching
the index using a binary search is a better option.
PRIMARY INDEXES
A primary index is an ordered file whose records are of fixed length with two fields, and it
acts like an access structure to efficiently search for and access the data records in a data file.
The first field is of the same data type as the ordering key field—called the primary key—of
the data file, and the second field is a pointer to a disk block (a block address). There is one
index entry (or index record) in the index file for each block in the data file. Each index entry
has the value of the primary key field for the first record in a block and a pointer to that block
as its two field values.
The primary index can be classified into two types: Dense index and Sparse index.
Dense index:
The dense index contains an index record for every search key value in the data file. It makes
searching faster.
In this, the number of records in the index table is same as the number of records in the main
table.
It needs more space to store index record itself. The index records have the search key and a
pointer to the actual record on the disk.

No. of records in IT = No. of records in HD


Sparse Index: The total number of entries in the index is the same as the number of disk blocks
in the ordered data file. The first record in each block of the data file is called the anchor record
of the block, or simply the block anchor.
Each index entry has the value of the primary key field for the first record in a block and a
pointer to that block as its two field values. A binary search on the index file requires fewer
block accesses than a binary search on the data file.
It is an index record that appears for only some of the values in the file. Sparse Index helps you
to resolve the issues of dense Indexing in DBMS. In this method of indexing technique, a range
of index columns stores the same data block address, and when data needs to be retrieved, the
block address will be fetched. However, sparse Index stores index records for only some
search-key values. It needs less space, less maintenance overhead for insertion, and deletions.

No. of records in IT = No. of blocks in HD

CLUSTERED INDEXES
Clustering index is defined on an ordered data file. The data file is ordered on a non-key field.
The index is created on non-primary key columns which may not be unique for each record. In
such cases, in order to identify the records faster, we will group two or more columns together
to get the unique values and create index out of them. This method is known as the clustering
index.
Basically, records with similar characteristics are grouped together and indexes are created for
these groups. By using cluster indexing we can reduce the cost of searching reason being
multiple records related to the same thing are stored in one place and it also gives the frequent
joining of more than two tables (records).
Suppose a company contains several employees in each department. Suppose we use a
clustering index, where all employees which belong to the same Dept_ID are considered within
a single cluster, and index pointers point to the cluster as a whole. Here Dept_Id is a non-unique
key.
The previous schema is little confusing because one disk block is shared by records which
belong to the different cluster. If we use separate disk block for separate clusters, then it is
called better technique.

SECONDARY INDEXES:
A secondary index provides a secondary means of accessing a data file for which some primary
access already exists. The data file records could be ordered, unordered, or hashed. The
secondary index may be created on a field that is a candidate key and has a unique value in
every record, or on a non-key field with duplicate values.
The index is again an ordered file with two fields. The first field is of the same data type as
some non-ordering field of the data file that is an indexing field. The second field is either a
block pointer or a record pointer. Many secondary indexes (and hence, indexing fields) can be
created for the same file—each represents an additional means of accessing that file based on
some specific field.
Unordered file with Key:

• File is already primary indexed on Eid(Primary Key).


• Now suppose search to be done using Pno.
• Pno is unordered and we cannot make it ordered.
• So, Index table will maintain Pno as a key and always in ordered.
• We will store the Pno in the index as ordered so binary search can be applied for
faster searching.
• It will be a type of dense indexing.
Unordered file with Non-Key:

• Now, search to be done by Ename(Non-key)


• Index file contains Ename as key and is ordered.
• Maintains intermediate index layer which contains block of record pointers.
• Pointer in IT points to a particular block and the record pointers in that block will
point to the record in HD.
MULTI-LEVEL INDEXES

The idea behind a multilevel index is to reduce the part of the index that we continue to search.
Because a single-level index is an ordered file, we can create a primary index to the index itself;
In this case, the original index file is called the first-level index and the index to the index is
called the second-level index. We can repeat the process, creating a third, fourth, ..., top level
until all entries of the top-level fit in one disk block.

Dynamic Multilevel Indexes Using B-Trees and B+ Trees

As we have seen, a multilevel index reduces the number of blocks accessed when searching for
a record, given its indexing field value. We are still faced with the problems of dealing with
index insertions and deletions.

To retain the benefits of using multilevel indexing while reducing index insertion and deletion
problems, designers adopted a multilevel index called a dynamic multilevel index that leaves
some space in each of its blocks for inserting new entries and uses appropriate
insertion/deletion algorithms for creating and deleting new index blocks when the data file
grows and shrinks. It is implemented by using data structures called B-trees and B+-trees.

Trees: A tree is formed of nodes. Each node in the tree, except for a special node called the
root, has one parent node and zero or more child nodes. The root node has no parent. A node
that does not have any child nodes is called a leaf node; a nonleaf node is called an internal
node.
The level of a node is always one more than the level of its parent, with the level of the root
node being zero. Figure illustrates a tree data structure. In this figure the root node is A, and its
child nodes are B, C, and D. Nodes E, J, C, G, H, and K are leaf nodes. Since the leaf nodes
are at different levels of the tree, this tree is called unbalanced.

• Most multi-level indexes use B-tree or B+-tree data structures.


• These data structures are variations of search trees that allow efficient insertion and
deletion of new search values.
• Both B-Tree and B+-Tree are balanced.
• Elements are in sorted order.

B-TREE

The B-tree has additional constraints that ensure that the tree is always balanced and that the
space wasted by deletion, if any, never becomes excessive. The algorithms for insertion and
deletion, though, become more complex in order to maintain these constraints. Nonetheless,
most insertions and deletions are simple processes;
A B tree of order q contains the following properties:

• Order is the max no of children a node can have.


• Root node in a B-Tree can have max q children and min 2 children.
• Every node in a B-Tree except the root node can have max q nodes and min ⎡(q/2)⎤
children.
• Every node in a B-Tree contains at most q-1 keys.
• All leaf nodes must be at the same level.

A B-tree starts with a single root node (which is also a leaf node) at level 0 (zero). Once the
root node is full with p − 1 search key values and we attempt to insert another entry in the tree,
the root node splits into two nodes at level 1. Only the middle value is kept in the root node,
and the rest of the values are split evenly between the other two nodes. When a nonroot node
is full and a new entry is inserted into it, that node is split into two nodes at the same level, and
the middle entry is moved to the parent node along with two pointers to the new split nodes. If
the parent node is full, it is also split. Splitting can propagate all the way to the root node,
creating a new level if the root is split.

Example:

1) Insert the values in order 8, 5, 1, 7, 3, 12, 9, 6 in a B-tree of order p = 3


2) The elements to be inserted are 8, 9, 10, 11, 15, 20, 17 in a B-tree of order p = 3

B+ TREE

Most implementations of a dynamic multilevel index use a variation of the B-tree data structure
called a B+-tree. In a B+-tree, data pointers are stored only at the leaf nodes of the tree; hence,
the structure of leaf nodes differs from the structure of internal nodes. The leaf nodes have an
entry for every value of the search field, along with a data pointer to the record (or to the block
that contains this record) if the search field is a key field.

• B+ Tree is an extension of B Tree which allows efficient insertion, deletion and search
operations.
• In B Tree, records (data) can only be stored on the leaf nodes while internal nodes can
only store the key values.
• The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to
make the search queries more efficient.
The pointers in internal nodes are tree pointers to blocks that are tree nodes, whereas the
pointers in leaf nodes are data pointers to the data file records or blocks—except for the Pnext
pointer, which is a tree pointer to the next leaf node. By starting at the leftmost leaf node, it is
possible to traverse leaf nodes as a linked list, using the Pnext pointers. This provides ordered
access to the data records on the indexing field. A Pprevious pointer can also be included.

Example:

1) Insert the following key values 6, 16, 26, 36, 46 on a B+ tree with order = 3
2) The elements to be inserted are 5,15, 25, 35, 45 on a B+ tree with order = 3

You might also like