BCSE302L Database Systems
Module -4
Indexing
Dr. S. RENUKA DEVI
Professor
SCSE
VIT Chennai Campus
Index
An index for a file in a database system works in much the
same way as the index in this textbook
The contents in the index are in sorted order, making it
easy to find the word we want
There are two basic kinds of indices:
Ordered indices - Based on a sorted ordering of the values
Hash indices - Based on a uniform distribution of values
across a range of buckets. The bucket to which a value is
assigned is determined by a function, called a hash
function
An attribute or set of attributes used to look up records in
a file is called a search key
Each index structure is associated with a particular search
Ordered Indices
Just like the index of a book or a library catalog, an ordered index
stores the values of the search keys in sorted order, and
associates with each search key the records that contain it.
A file may have several indices, on different search keys
If the file containing the records is sequentially ordered, then a
clustering index is an index whose search key also defines the
sequential order of the file
Clustering Indices also called primary indices
The search key of a clustering index is often the primary key
Indices whose search key specifies an order different from the
sequential order of the file are called non clustering indices, or
secondary indices.
Dense and Sparse Indices
An index entry, or index record, consists of a
search-key value and pointers to one or more records
with that value as their search-key value
The pointer to a record consists of the identifier of a
disk block and an offset within the disk block to
identify the record within the block
There are two types of ordered indices that we can use:
Dense Index
Sparse Index
Dense Index
In a dense index, an index entry appears for every search-
key value in the file
In a dense clustering index, the index record contains
the search-key value and a pointer to the first data record
with that search-key value
The rest of the records with the same search-key value
would be stored sequentially after the first record, since,
because the index is a clustering one,records are sorted on
the same search key
In a dense non clustering index, the index must store a
list of pointers to all records with the same search-key value
Sparse Index
In a sparse index, an index entry appears for only some of the
search-key values
Sparse indices can be used only if the relation is stored in sorted
order of the search key, that is, if the index is a clustering index
Each index entry contains a search-key value and a pointer to the
first data record with that search-key value
To locate a record, we find the index entry with the largest
search-key value that is less than or equal to the search-key value
for which we are looking.
We start at the record pointed to by that index entry, and follow
the pointers in the file until we find the desired record
Multilevel Indices
If an index is small enough to be kept entirely in main memory, the
search time to find an entry is low.
However, if the index is so large that not all of it can be kept in memory,
index blocks must be fetched from disk when required.
The search for an entry in the index then requires several disk-block
reads.
To deal with this problem, we treat the index just as we would treat any
other sequential file, and construct a sparse outer index on the original
index.
To locate a record, we first use binary search on the outer index to find
the record for the largest search-key value less than or equal to the one
that we desire.
The pointer points to a block of the inner index. We scan this block until
we find the record that has the largest search-key value less than or
equal to the one that we desire.
The pointer in this record points to the block of the file that contains the
record for which we are looking.
Multilevel Indices
Index Update
Every index must be updated whenever a record is
either inserted into or deleted from the file.
Algorithms for updating single-level indices
Insertion
First, the system performs a lookup using the
search-key value that appears in the record to be
inserted. The actions the system takes next depend on
whether the index is dense or sparse:
Dense indices:
1. If the search-key value does not appear in the index,
the system inserts an index entry with the search-key
value in the index at the appropriate position.
2. Otherwise the following actions are taken:
a) If the index entry stores pointers to all records with the
same search key value, the system adds a pointer to the
new record in the index entry.
b) Otherwise, the index entry stores a pointer to only the
first record with the search-key value. The system then
places the record being inserted after the other records
with the same search-key values.
Sparse indices: We assume that the index stores an entry
for each block.
If the system creates a new block, it inserts the first
search-key value (in search-key order) appearing in
the new block into the index.
On the other hand, if the new record has the least
search-key value in its block, the system updates the
index entry pointing to the block; if not, the system
makes no change to the index.
Deletion. To delete a record, the system first looks up the record
to be deleted. The actions the system takes next depend on
whether the index is dense or sparse:
Dense indices:
1. If the deleted record was the only record with its particular
search-key value, then the system deletes the corresponding
index entry from the index.
2. Otherwise the following actions are taken:
a) If the index entry stores pointers to all records with the same
search key value, the system deletes the pointer to the deleted
record from the index entry.
b) Otherwise, the index entry stores a pointer to only the first
record with the search-key value. In this case, if the deleted
record was the first record with the search-key value, the system
updates the index entry to point to the next record.
Sparse indices:
1. If the index does not contain an index entry with the
search-key value of the deleted record, nothing needs to
be done to the index.
2. Otherwise the system takes the following actions:
a) If the deleted record was the only record with its search
key, the system replaces the corresponding index record
with an index record for the next search-key value (in
search-key order). If the next search-key value already has
an index entry, the entry is deleted instead of being
replaced.
b) Otherwise, if the index entry for the search-key value
points to the record being deleted, the system updates the
index entry to point to the next record with the same
search-key value.
Secondary Indices
Secondary indices must be dense, with an index entry for every
search-key value, and a pointer to every record in the file.
A secondary index on a candidate key looks just like a dense
clustering index, except that the records pointed to by successive
values in the index are not stored sequentially
If the search key of a secondary index is not a candidate key, it is not
enough to point to just the first record with each search-key value.
The remaining records with the same search-key value could be
anywhere in the file, since the records are ordered by the search key
of the clustering index, rather than by the search key of the
secondary index.
An extra level of indirection is used to implement secondary indices
on search keys that are not candidate keys.
The pointers in such a secondary index do not point directly to the
file. Instead, each points to a bucket that contains pointers to the
file.
Any Queries?