Unit 4 Notes
Unit 4 Notes
Indexing is used to quickly retrieve particular data from the database. Formally we can define
Indexing as a technique that uses data structures to optimize the searching time of a database
query in DBMS. Indexing reduces the number of disks required to access a particular data by
internally creating an index table.
Index usually consists of two columns which are a key-value pair. The two columns of the index
table(i.e., the key-value pair) contain copies of selected columns of the tabular data of the
database.
Here, Search Key contains the copy of the Primary Key or the Candidate Key of the database
table. Generally, we store the selected Primary or Candidate keys in a sorted manner so that we
can reduce the overall query time or search time(from linear to binary).
Data Reference contains a set of pointers that holds the address of the disk block. The pointed
disk block contains the actual data referred to by the Search Key. Data Reference is also
called Block Pointer because it uses block-based addressing.
Indexing Attributes
Let's discuss the various indexing attributes:
Standard (B-tree) and Bitmap
B-tree-indexing is one of the most popular and commonly used indexing techniques. B-tree in
DBMS is a type of tree data structure that contains 2 things namely: Index Key and its
corresponding disk address. Index Key refers to a certain disk address and that disk further
contains rows or tuples of data.
On the other hand, Bitmap indexing uses strings to store the address of the tuples or rows. A
bitmap is a mapping from one system to the other such as integers to bits.
Bitmap has an advantage over B-tress as bitmap performs faster retrieval of certain data
(Bitmap is made according to a certain data, hence retrieves faster). Bitmaps are also more
compact than B-trees.
There is a drawback with bit mapping, bit mapping requires more overhead during tuple
operations on the table. Hence, bit maps are mainly used in data warehouse environments.
Syntax
Example - We can convert all values in a column to uppercase and stored these results in the
index.
Syntax:
CREATE INDEX index-name
ON members(UPPER(target-column));
Note: The index table formed used columns values are also termed as Column Index or Column
Index-table.
Single-Column and Concatenated
We can create a single-column index table or multi-column index table. Concatenated indexes
are made according to certain WHERE clauses(WHERE clause related to the most frequent SQL
Queries), hence making the searching or data retrieval faster.
We can use the primary key to create multiple index tables such as indexing based on year
(grouping years) or indexing based on model-name etc. This multi-table indexing will help in
getting specific query results faster.
Example - Suppose we have a table namely a student table. If the student table is partitioned
according to the roll number(primary key) then the index table of the student table should be
partitioned according to roll number as well. This type of partition will help in the grouping of
similar data and faster query results.
Types of Indexes
According to the attributes defined above, we divide indexing into three types:
1. Primary Indexing: The indexing or the index table created using Primary keys is known as
Primary Indexing. It is defined on ordered data. As the index is comprised of primary keys, they
are unique, not null, and possess one to one relationship with the data blocks.
Example:
Characteristics of Primary Indexing:
Example:
Characteristics of Secondary Indexing:
3. Cluster Indexing: Clustered Indexing is used when there are multiple related records
found at one place. It is defined on ordered data. The important thing to note here is
that the index table of clustered indexing is created using non-key values which may or
may not be unique. To achieve faster retrieval, we group columns having similar
characteristics. The indexes are created using these groups and this process is known
as Clustering Index.
Example:
Characteristics of Clustered Indexing:
Ordered Indexing:
Ordered indexing is the traditional way of storing that gives fast retrieval. The indices are stored
in a sorted manner hence it is also known as ordered indices.
1. Dense Indexing: In dense indexing, the index table contains records for every search key value of
the database. This makes searching faster but requires a lot more space. It is like primary
indexing but contains a record for every search key.
Example:
2. Sparse Indexing: Sparse indexing consumes lesser space than dense indexing, but it is a
bit slower as well. We do not include a search key for every record despite that we store
a Search key that points to a block. The pointed block further contains a group of data.
Sometimes we have to perform double searching this makes sparse indexing a bit
slower.
Example:
Multi-Level Indexing
Since the index table is stored in the main memory, single-level indexing for a huge amount of
data requires a lot of memory space. Hence, multilevel indexing was introduced in which we
divide the main data block into smaller blocks. This makes the outer block of the index table
small enough to be stored in the main memory.
Example:
We use the B+ Tree data structure for multilevel indexing. The leaf nodes of the B+ tree contain
the actual data pointers. The leaf nodes are themselves in the form of a linked list. This linked
list representation helps in both sequential and random access.
Example- 2: Construct a B+ Tree for the following search key values, Where n = 4.
{10, 30, 40, 50, 60, 70, 90 }
Now, Let’s Insert and Delete some elements into this tree.
Insert 25,75
When we insert an element, we add it on the next right node of the value lower than the
inserting element.
Delete 70
Here, when you delete any element. The element that has been deleted will be replaced with
the element on the right.
Static Hashing
Bucket Overflow :
This will occur only in two ways.
1. Insufficient buckets.
2. Skew in distribution of records. Some buckets are given more records than others, so a bucket
can overflow even though the other buckets still have space. This situation is called ‘bucket
skew’.
Overflow Chaining :
The overflows of a given bucket are chained together in a linked list. This is called ‘Closed
Hashing’.
In ‘Open Hashing’, the set of buckets are fixed, and there are no overflow chains. Here, if a
bucket is full, the system inserts records in some other bucket in the initial set of buckets.
A hash index arranges the search keys, with their associated pointers, into a hash file
structure. In this, one applies a hash function on a search key to helping identify a bucket, and
store the key and its associated pointers in the bucket.
Example-10: Hash file organization of DEPT file using DName as key, where there are eight
departments.
Note: In case of hash functions, the hash function is of two types :
1. The distribution is uniform: The hash function assigns each bucket the same number of
search-key values from the set of all possible search-key values.
1. The distribution is random : In the average case, each bucket will have nearly the same number
of values assigned to it, regardless of the actual distribution of search-key values.