0% found this document useful (0 votes)
8 views

ADS Module2.Docx

The document provides an overview of various sorting algorithms, including Bubble Sort, Selection Sort, Quick Sort, Insertion Sort, and Shell Sort, detailing their algorithms, advantages, and disadvantages. It emphasizes the importance of sorting for data organization and search efficiency, while also highlighting the specific characteristics and use cases for each algorithm. Additionally, it includes C code examples for implementing these sorting techniques.

Uploaded by

deeeeps06
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

ADS Module2.Docx

The document provides an overview of various sorting algorithms, including Bubble Sort, Selection Sort, Quick Sort, Insertion Sort, and Shell Sort, detailing their algorithms, advantages, and disadvantages. It emphasizes the importance of sorting for data organization and search efficiency, while also highlighting the specific characteristics and use cases for each algorithm. Additionally, it includes C code examples for implementing these sorting techniques.

Uploaded by

deeeeps06
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

MODULE 2

2.1 Sorting
Sorting refers to arranging the data in specific order, typically ascending or
descending. It is a fundamental operation that is used to improve the search
efficiency and organize data. It reduces the complexity of a problem.

Types of sorting algorithms:


1.​ Comparison based sorting: It compares the elements and arrange them in
particular order.
Ex: Bubble sort, Merge sort.
2.​ Non - comparison based sorting: Based on indexing or counting, sorting will
take place.
Ex: Radix sort, Counting sort.

Common sorting algorithms:


1.​ Bubble sort
2.​ Selection sort
3.​ Quick sort
4.​ Insertion sort
5.​ Shell sort
6.​ Merge sort

2.1.1 Bubble sort


It is a simple sorting technique that repeatedly swaps the adjacent elements if they
are not arranged. But it is not suitable for large data sets because average and
worst-case time complexity are high.

Algorithm
1​ Start from the first element and compare it with the next.
2​ If the first element is greater than the second, swap them.
3​ Move to the next element and repeat step 2.
4​ Repeat the process for all elements in the list.
5​ The largest element will settle at the last position in one complete pass.
6​ Repeat the process for the remaining unsorted elements until the entire list is
sorted.

C code for Bubble sort


#include <stdio.h>
void bubbleSort(int arr[], int n) {
for (int i = 0; i < n-1; i++) {
for (int j = 0; j < n-i-1; j++) {
if (arr[j] > arr[j+1]) {
int temp = arr[j];
arr[j] = arr[j+1];
arr[j+1] = temp;
}
}
}
}

void printArray(int arr[], int size) {


for (int i = 0; i < size; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}

int main() {
int arr[] = {4, 2, 7, 1};
int n = sizeof(arr)/sizeof(arr[0]);
bubbleSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}
Output:
Sorted array:
1247

Characteristics of Bubble Sort


●​ Stability: Bubble Sort is stable; it maintains the relative order of equal
elements in the sorted output.
●​ In-place Sorting: Bubble Sort is an in-place algorithm; it requires only a
constant amount of additional memory.

Advantages and Disadvantages of Bubble Sort


Advantages
Simple to Implement: Easy to understand and code.
Stable: Maintains the relative order of equal elements.
In-place Sorting: Requires only a constant amount of additional memory.
Adaptive: Efficient for nearly sorted lists with a best-case time complexity of O(n).

Disadvantages
Inefficient: Poor performance on large lists with an average and worst-case time
complexity of O(n2).
High Number of Comparisons: Even if the list is partially sorted, Bubble Sort
makes unnecessary comparisons.
Not Suitable for Large Datasets: Due to its quadratic time complexity, it's not
recommended for large datasets

2.1.2 Selection sort


Selection Sort is a simple sorting algorithm that repeatedly selects the smallest
element from the unsorted part of the array and swaps it with the first unsorted
element. The algorithm is simple and easy to understand. It works well for small
arrays. It requires fewer swaps than Bubble Sort.

Algorithm
1​ Start from the first element.
2​ Find the smallest element in the unsorted part of the array.
3​ Swap it with the first unsorted element.
4​ Move to the next position and repeat until the array is fully sorted.

Example
Let's sort the array [64, 25, 12, 22, 11] using selection sort:
1.​ Start with the first element 64 and find the smallest element in the array. The
smallest is 11, so swap 11 with 64.
[11, 25, 12, 22, 64]
2.​ Move to the next element 25 and find the smallest element in the remaining
part. The smallest is 12, so swap 12 with 25.
[11, 12, 25, 22, 64]
3.​ Move to the next element 25 and find the smallest element in the remaining
part. The smallest is 22, so swap 22 with 25.
[11, 12, 22, 25, 64]
4.​ The next element 25 is already in the correct position.
The last element 64 is already in the correct position.
5.The array is now sorted: [11, 12, 22, 25, 64].

C program for selection sort


#include <stdio.h>
void swap(int *xp, int *yp) {
int temp = *xp;
*xp = *yp;
*yp = temp;
}
void selectionSort(int arr[], int n) {
int i, j, min_idx;
for (i = 0; i < n-1; i++) {
min_idx = i;
for (j = i+1; j < n; j++)
if (arr[j] < arr[min_idx])
min_idx = j;
swap(&arr[min_idx], &arr[i]);
}
}

void printArray(int arr[], int size) {


int i;
for (i = 0; i < size; i++)
printf("%d ", arr[i]);
printf("\n");
}
int main() {
int arr[] = {64, 25, 12, 22, 11};
int n = sizeof(arr)/sizeof(arr[0]);
selectionSort(arr, n);
printf("Sorted array is: \n");
printArray(arr, n);
return 0;
}
Output:
Sorted array is:
11 12 22 25 64
Advantages of Selection Sort:
●​ Simple Implementation: Easy to understand and implement.
●​ In-Place Sorting: Requires only a constant amount of additional memory
(O(1) space complexity).
●​ Performance on Small Arrays: Works efficiently for small arrays or lists.
●​ Predictable Performance: Always performs the same number of comparisons
regardless of the initial order of the elements.
●​ Works Well on Lists with Few Unique Elements: Performs relatively well
when sorting lists with a small number of unique elements.
Disadvantages of Selection Sort:
●​ Inefficient for Large Data Sets: Time complexity of O(n^2) makes it
impractical for large arrays or lists.
●​ Not Stable: Does not preserve the relative order of equal elements.
●​ High Number of Comparisons: Always makes n*(n-1)/2 comparisons, even
if the array is already sorted.
●​ Poor Performance on Nearly Sorted Data: Unlike insertion sort, it does not
take advantage of the fact that the array might be partially sorted.
Applications of selection sort:
●​ Educational Purposes: Used to teach basic concepts of sorting algorithms
due to its simplicity.
●​ Small Data Sets: Suitable for sorting small arrays or lists where the overhead
of more complex algorithms is not justified.
●​ Memory-Constrained Environments: Works well in environments with
limited memory due to its O(1) space complexity.
●​ Sorting Arrays with Few Unique Elements: Efficient when dealing with
arrays that contain a small number of unique elements.
●​ Initial Steps in More Complex Algorithms: Sometimes used as a preliminary
step in more complex sorting algorithms or hybrid algorithms for small data
sets.

2.1.3 Quick sort


Quick Sort is a divide-and-conquer algorithm that works by selecting a pivot,
partitioning the array into two subarrays (one with elements less than the pivot and
one with elements greater than the pivot), and recursively sorting each subarray. It
is efficient for large datasets and faster than Merge Sort in practice and does
in-place sorting (no extra space needed).
Algorithm:
1.​ Choose a pivot (commonly the last, first, or middle element).
2.​ Partition the array.
3.​ Elements smaller than the pivot go to the left.
4.​ Elements greater than the pivot go to the right.
5.​ Recursively apply Quick Sort on the left and right subarrays.
Quick sort example:
Imagine you have a list of numbers like [8, 3, 7, 1, 9, 2].
Here’s how quick sort would work:
1. Pick a Pivot: Let’s choose 7 as the pivot.
2. Divide: Compare each number to 7:
Numbers smaller than 7: [3, 1, 2]
Numbers larger than 7: [8, 9]
Now, your list looks like this: [3, 1, 2], [7], [8, 9].
3. Repeat: Sort the smaller groups:
For [3, 1, 2], pick 3 as the pivot.
Compare:
Numbers smaller than 3: [1, 2]
Numbers larger than 3: []
Now, you have: [1, 2], [3], [7], [8, 9].
4. Combine: Put it all together: [1, 2, 3, 7, 8, 9].
And that’s your sorted list using quick sort!

C program for quick sort:


#include <stdio.h>
void swap(int* a, int* b) {
int t = *a;
*a = *b;
*b = t;
}
int partition(int arr[], int low, int high) {
int pivot = arr[high];
int i = (low - 1);
for (int j = low; j <= high - 1; j++) {
if (arr[j] < pivot) {
i++;
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i + 1], &arr[high]);
return (i + 1);
}
void quick_sort(int arr[], int low, int high) {
if (low < high) {
int pi = partition(arr, low, high);
quick_sort(arr, low, pi - 1);
quick_sort(arr, pi + 1, high);
}
}
// Example usage
int main() {
int arr[] = {10, 7, 8, 9, 1, 5};
int n = sizeof(arr) / sizeof(arr[0]);
quick_sort(arr, 0, n - 1);
printf("Sorted array: \n");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\n");
return 0;
}
Output:
Sorted array:
1 5 7 8 9 10
Advantages of Quick Sort
●​ Efficient Average Case: O(n log n) time complexity on average, making it
faster for most inputs.
●​ In-Place Sorting: Requires minimal additional memory, typically O(log n)
space.
●​ Widely Used: Commonly used in practice due to its efficiency and
simplicity.
●​ Divide and Conquer: Effectively handles large datasets by breaking them
into smaller subproblems.
Disadvantages of Quick Sort
●​ Worst-Case Performance: Can degrade to O(n²) time complexity if the pivot
choices are poor (e.g., when the array is already sorted).
●​ Not Stable: Does not preserve the relative order of equal elements unless
modified.
●​ Recursive Overhead: Recursive nature can lead to stack overflow for large
arrays or deep recursion, especially in the worst case.
●​ Performance Depends on Pivot Selection: Choosing an optimal pivot is
crucial; poor choices can significantly impact performance.

2.1.4​ Insertion sort


Insertion Sort is a simple comparison-based sorting algorithm that builds the sorted
array one element at a time by inserting each element into its correct position.
Algorithm:
1.​ Assume the first element is already sorted.
2.​ Pick an element: Start from the second element because the first element is
already sorted.
3.​ Find the correct position: Compare the picked element with each element in
the sorted part of the array.
4.​ Shift elements: Move all elements larger than the picked element one
position to the right.
5.​ Insert the element: Place the picked element into the correct position.
6.​ Repeat: Continue this process until all elements are sorted.
Insertion Sort Example
Let's sort the array [4, 3, 2, 10, 12, 1, 5, 6] using insertion sorting algorithm.
Start with first element:
[4, 3, 2, 10, 12, 1, 5, 6]
Pick 3, compare with 4, and insert before 4:
[3, 4, 2, 10, 12, 1, 5, 6]
Pick 2, compare with 4 and 3, insert before 3:
[2, 3, 4, 10, 12, 1, 5, 6]
Pick 10, compare with 4, stays in place:
[2, 3, 4, 10, 12, 1, 5, 6]
Pick 12, compare with 10, stays in place:
[2, 3, 4, 10, 12, 1, 5, 6]
Pick 1, compare with 12, 10, 4, 3, 2, insert before 2:
[1, 2, 3, 4, 10, 12, 5, 6]
Pick 5, compare with 12, 10, 4, insert after 4:
[1, 2, 3, 4, 5, 10, 12, 6]
Pick 6, compare with 12, 10, insert after 5:
[1, 2, 3, 4, 5, 6, 10, 12]
After these steps, the array is sorted: [1, 2, 3, 4, 5, 6, 10, 12].

C Program for insertion sort:


#include <stdio.h>
void insertionSort(int arr[], int n) {
int i, key, j;
for (i = 1; i < n; i++) {
key = arr[i];
j = i - 1;
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j = j - 1;
}
arr[j + 1] = key;
}
}
void printArray(int arr[], int n) {
for (int i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");
}
int main() {
int arr[] = {12, 11, 13, 5, 6};
int n = sizeof(arr) / sizeof(arr[0]);
insertionSort(arr, n);
printf("Sorted array is: ");
printArray(arr, n);
return 0;
}
Output:
Sorted array is: 5 6 11 12 13
Insertion Sort Advantages
●​ Easy to understand and implement compared to other complex sorting
algorithms.
●​ Performs well for small arrays or lists due to low overhead.
●​ Efficient for data that is already substantially sorted. In the best case (when
the array is already sorted), the time complexity is O(n).
●​ Maintains the relative order of records with equal keys (i.e., it does not
change the order of equal elements).
●​ Requires only a constant amount (O(1)) of additional memory space, as it
sorts the array in place.
●​ Can sort a list as it receives it. This means it can sort data dynamically as
new data comes in.
Insertion Sort Disadvantages
●​ The time complexity of O(n^2) makes it inefficient for large lists or arrays
compared to more advanced algorithms like Quick Sort, Merge Sort, or
Heap Sort.
●​ Requires more comparisons and shifts in the worst case (when the array is
sorted in reverse order), which can be time-consuming.
●​ For large datasets or when performance is critical, other sorting algorithms
are generally more suitable.
●​ Due to the frequent shifting of elements, it may not perform well with cache
memory in comparison to algorithms like Quick Sort.

2.1.5​ Shell sort


Shell sort is a fundamental sorting algorithm that plays a crucial role in computer
science. Known for its efficiency, the shell sort algorithm is an extension of
insertion sort that allows the exchange of far-apart elements, making it faster for
larger datasets.
Shell sort in data structure is a way to sort a list of numbers by comparing elements
that are far apart and then gradually reducing the gap between them. It’s an
improved version of insertion sort, which sorts numbers one by one but can be
slow if the numbers are far from their correct positions.
Algorithm:
●​ Start with a Big Gap: Instead of comparing numbers next to each other, Shell
sort starts by comparing numbers that are several positions apart. This gap
can be half the size of the list, or some other value.
●​ Compare and Swap: For each pair of numbers that are far apart, compare
them. If they are in the wrong order, swap them. This step helps move larger
numbers toward the end of the list and smaller numbers toward the
beginning, even if they are far apart.
●​ Reduce the Gap: After going through the list with the big gap, reduce the
gap and repeat the process. This time, the numbers being compared are
closer together.
●​ Repeat Until Gap is 1: Keep reducing the gap until it’s just 1, which means
you’re doing a final pass like a regular insertion sort. By this point, the list is
mostly sorted, so this last step is quick.
Example:
Let’s say you have a list of numbers: [22, 7, 9, 13, 16].
Step 1: Start with a big gap (let’s use a gap of 2)
Compare 22 and 9: Swap them since 22 is larger.
Compare 7 and 13: No swap needed.
Compare 9 and 16: No swap needed.
Now the list looks like this: [9, 7, 22, 13, 16].
Step 2: Reduce the gap to 1 (now it’s like a simple insertion sort)
Compare 9 and 7: Swap them.
Compare 7 and 22: No swap needed.
Compare 22 and 13: Swap them.
Compare 13 and 16: No swap needed.
Now the list is fully sorted: [7, 9, 13, 16, 22].
Shell sort data structure is effective because it moves numbers closer to their
correct positions early on, which makes the final sorting step faster.

C program for shell sort:


#include <stdio.h>
void shellSort(int arr[], int n) {
for (int gap = n / 2; gap > 0; gap /= 2) {
for (int i = gap; i < n; i++) {
int temp = arr[i];
int j;
for (j = i; j >= gap && arr[j - gap] > temp; j -= gap) {
arr[j] = arr[j - gap];
}
arr[j] = temp;
}
}
}
// Example usage
int main() {
int arr[] = {12, 34, 54, 2, 3};
int n = sizeof(arr) / sizeof(arr[0]);
shellSort(arr, n);
printf("Sorted array is: ");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\n");
return 0;
}
Output:
Sorted array is: 2 3 12 34 54

Advantages of Shell Sort


●​ In-Place Sorting: Requires only O(1) extra space, making it
memory-efficient.
●​ Improved Insertion Sort: Faster than traditional insertion sort, especially for
larger arrays, due to reduced comparisons and shifts.
●​ Versatile Performance: With the right gap sequence, Shell Sort can perform
well in various scenarios, including nearly sorted data.
●​ Simple Implementation: Easier to understand and implement compared to
more complex algorithms like Quick Sort or Merge Sort.
●​ Stable for Certain Gap Sequences: Some gap sequences can make Shell Sort
stable, preserving the order of equal elements.

Disadvantages of Shell Sort


●​ Complex Time Complexity: The exact time complexity is difficult to
analyze, and it heavily depends on the choice of gap sequence.
●​ Not Optimal: Generally slower than more advanced algorithms like Quick
Sort, Merge Sort, or Heap Sort for very large datasets.
●​ Non-Uniform Performance: Performance can vary significantly with
different gap sequences, making it less predictable.
●​ Less Used in Practice: Due to the availability of more efficient algorithms,
Shell Sort is less commonly used in modern applications.
2.1.6​ Merge sort
Merge sort is a way to sort a list of items, like numbers or names, in order. Imagine
you have a big pile of mixed-up playing cards, and you want to sort them. You can
break the pile into smaller groups, sort each group, and then put the groups back
together in order.
Example
We will sort the array [38, 27, 43, 3, 9, 82, 10] using merge sort.
1. Divide the Array
The array is divided into two halves.

2. Divide Each Half


Continue dividing each half into smaller subarrays until each subarray has only one
element.

3. Merge Each Pair of Subarrays


Now, start merging the subarrays back together in sorted order.
Merge [38] and [27] to get [27, 38].
Merge [43] and [3] to get [3, 43].
Merge [9] and [82] to get [9, 82].
[27, 38] [3, 43] [9, 82] [10]
4. Merge the Sorted Subarrays
Continue merging the sorted subarrays.
Merge [27, 38] and [3, 43] to get [3, 27, 38, 43].
Merge [9, 82] and [10] to get [9, 10, 82].
[3, 27, 38, 43] [9, 10, 82]
5. Merge the Final Two Halves
Finally, merge the last two halves to get the fully sorted array.
Merge [3, 27, 38, 43] and [9, 10, 82].
Start by comparing the first elements of each half: 3 (left) and 9 (right). 3 is
smaller, so add 3 to the new array.
Next, compare 27 (left) and 9 (right). 9 is smaller, so add 9 to the new array.
Then, compare 27 (left) and 10 (right). 10 is smaller, so add 10 to the new array.
Next, compare 27 (left) and 82 (right). 27 is smaller, so add 27 to the new array.
Continue comparing and adding the smallest elements until all elements are
merged.
[3, 9, 10, 27, 38, 43, 82]
6. Final Sorted Array
The array is now fully sorted: [3, 9, 10, 27, 38, 43, 82].

C program
#include <stdio.h>
#include <stdlib.h>
void merge(int arr[], int left, int mid, int right) {
int n1 = mid - left + 1;
int n2 = right - mid;
int L[n1], R[n2];
for (int i = 0; i < n1; i++)
L[i] = arr[left + i];
for (int j = 0; j < n2; j++)
R[j] = arr[mid + 1 + j];
int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = R[j];
j++;
}
k++;
}
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}
while (j < n2) {
arr[k] = R[j];
j++;
k++;
}
}
void mergeSort(int arr[], int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}
void printArray(int A[], int size) {
for (int i = 0; i < size; i++)
printf("%d ", A[i]);
printf("\n");
}
int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int arr_size = sizeof(arr) / sizeof(arr[0]);
printf("Given array is \n");
printArray(arr, arr_size);
mergeSort(arr, 0, arr_size - 1);
printf("\nSorted array is \n");
printArray(arr, arr_size);
return 0;
}
Output:
Given array is
12 11 13 5 6 7
Sorted array is
5 6 7 11 12 13
Advantages
●​ Consistent Time Complexity: O(n log n) time complexity in all cases (best,
average, worst).
●​ Stable Sorting: Maintains the relative order of equal elements.
●​ Efficient for Large Data Sets: Handles large arrays or lists efficiently.
●​ Parallelizable: Can be easily parallelized due to its divide-and-conquer
nature.
●​ Predictable Performance: Performance does not degrade based on input data
characteristics.
Disadvantages
●​ High Space Complexity: Requires O(n) additional space for merging.
●​ Complex Implementation: More complex to implement compared to simpler
algorithms like insertion sort or selection sort.
●​ Not In-Place: Uses extra space for temporary subarrays, which can be a
limitation for memory-constrained environments.
●​ Overhead for Small Arrays: For small arrays, the overhead of recursive calls
and merging can make it slower than simpler algorithms like insertion sort.
2.2​ Hashing
Hashing is a technique used in data structures that efficiently stores and retrieves
data in a way that allows for quick access. Hashing involves mapping data to a
specific index in a hash table (an array of items) using a hash function that enables
fast retrieval of information based on its key. The great thing about hashing is, we
can achieve all three operations (search, insert and delete) in O(1) time on average.
Hashing is mainly used to implement a set of distinct items and dictionaries (key
value pairs).
Components of Hashing
There are majorly three components of hashing:
1.​ Key: A Key can be anything string or integer which is fed as input in the
hash function the technique that determines an index or location for storage
of an item in a data structure.
2.​ Hash Function: Receives the input key and returns the index of an element
in an array called a hash table. The index is known as the hash index.
3.​ Hash Table: Hash table is typically an array of lists. It stores values
corresponding to the keys. Hash stores the data in an associative manner in
an array where each data value has its own unique index.
How does Hashing work?
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it
in a table.
Step 1: We know that hash functions (which is some mathematical formula) are
used to calculate the hash value which acts as the index of the data structure where
the value will be stored.
Step 2: So, let’s assign
o​ “a” = 1,
o​ “b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the
string:
●​ “ab” = 1 + 2 = 3,
●​ “cd” = 3 + 4 = 7 ,
●​ “efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash
function that is used here is the sum of the characters in key mod Table size . We
can compute the location of the string in the array by taking the sum(string) mod
7.
Step 5: So we will then store
o​ “ab” in 3 mod 7 = 3,
o​ “cd” in 7 mod 7 = 0, and
o​ “efg” in 18 mod 7 = 4.

The above technique enables us to calculate the location of a given string by using
a simple hash function and rapidly find the value that is stored in that location.
Therefore, the idea of hashing seems like a great way to store (key, value) pairs of
the data in a table.

Hash Functions
Hashing refers to the process of generating a fixed-size output from an input of
variable size using the mathematical formulas known as hash functions. This
technique determines an index or location for the storage of an item in a data
structure. We use hashing for dictionaries, frequency counting, maintaining data for
quick access by key, etc. Real World Applications include Database Indexing,
Cryptography, Caches, Symbol Table and Dictionaries.
A hash function creates a mapping from an input key to an index in hash table, this
is done through the use of mathematical formulas known as hash functions.
For example: Consider phone numbers as keys and a hash table of size 100. A
simple example hash function can be to consider the last two digits of phone
numbers so that we have valid array indexes as output.

Types of Hash Functions:


●​ Division method: The division method involves dividing the key by a
prime number and using the remainder as the hash value.
h(k)=k (mod m)
k(mod m) denotes the remainder when k is divided by m.
●​ Mid square method: In the mid-square method, the key is squared,
and the middle digits of the result are taken as the hash value.
Steps: Square the key
Extract the middle digits of the squared value.
2
H(k)=l ( l is obtained by deleting digits from both the ends of 𝑘
●​Folding method: The key k is partitioned into a number of parts,
where each part except possibly the last, has the same number of digits as
the required address. Then the parts are added together, ignoring the last
carry. That is, H(k) = 𝑘1 + 𝑘2 + … + 𝑘𝑟

●​Subtraction method: In some cases, keys are consecutive but they


do not start from 1. In such cases, we subtract a number from the key to
determine the address.
●​Digit extraction method: The selected keys are extracted from the
key and made us as its address using a method called digit extraction. In this
case, we select specific digits from the key k and use it as an address. For
example, we want to hash a 6 digit employee number, say 123456, to a three
digit address, we could select the first, third and fourth digits, from the left,
and use them as address.
●​Rotation hashing method: This method is not generally used by
itself, but it is used in combination with other hashing methods. This method
is especially useful when keys are assigned serially, as in case of employee
numbers. A simple hashing algorithm is used to create synonyms when
hashing keys are identical except for the last character.

Hash table:
The data structure which is used for storing records is called a hash table. It
enables us to search a record rapidly by making use a given key value also
facilities easy insertion and deletion of records.

Building a hash table:


Suppose there are 18 records, each with a 3 digit numeric key value. The
records and their key values are given in some order that cannot be predicted
easily. The requirement is to build a table in the form of array containing all the
18 key values, so that searching can be performed efficiently.
To achieve this, we need to build a hash table by implementing the following:
●​ A hashing function
●​ Appropriate collision resolution policy
A hashing function is nothing but a method of calculating the table address
pertaining to a key. This function is useful only in cases only where the address can
be calculated quickly. Further, an adequate collision resolution policy is also
implemented along with the hashing function to resolve the situations where
duplicate table address values are generated.
The hash function used here assigns to any key the remainder after division of the
key by the table size. If the table size is m and key value=k. then remainder= k &
mod m=remainder.

Collision resolution:
Suppose we want to add a new record R with key & to our file F. but suppose the
memory location address H(k) is already occupied. This situation is called
collision. There are two general ways of resolving collisions. The particular
procedure that one chooses depends on many factors. One important factor is the
ratio of the number n of keys in K (which is the number of records in F) to the
number m of hash addresses in L. This ratio, λ = n/m, is called the load factor.
First, we show that collisions are almost impossible to avoid. Specifically, suppose
a student class has 24 students and suppose the table has space for 365 records.
One random hash function is to choose the student's birthday as the hash address.
Although the load factor λ = 24/365 = 7% is very small, it can be shown that there
is a better than fifty-fifty chance that two of the students have the same birthday.
The efficiency of a hash function with a collision resolution procedure is measured
by the average number of probes (key comparisons) needed to find the location of
the record with a given key k.
Following two quantities:
S(λ) = average number of probes for a successful search
U(λ) = average number of probes for an unsuccessful search
These quantities will be discussed for our collision procedures.

Collision Resolution Techniques:


There are two broad ways of collision resolution:
(i) Open Addressing, where an array-based implemented.
(ii) Separate Chaining, where an array of linked list implemented.

Open Addressing
Open Addressing includes:
Linear probing (linear search)
Quadratic probing (nonlinear search), and
Double hashing (uses two hash functions)

Linear probing (linear search)


Suppose that a new record R with key k is to be added to the memory table, but
that the memory location with hash address H(k) = h is already filled natural way
to resolve the collision is to assign R to the first available location following T[h]
(We assume that the table T with m locations is circular, so that T[1] comes after
T[m] .) Accordingly, with such a collision procedure, we will search for the record
R in the table T by linearly searching the locations T [h], T[h + 1] T[k + 2] ... until
finding R or meeting an empty location, which indicates an unsuccessful search.
The above collision resolution is called linear probing. The average numbers of
probes for a successful search and for an unsuccessful search are known to be the
following respective quantities:

1 1 1 1
S(λ) = 2
(1 + 1−λ
) ​ and U(λ) = 2
(1 + 2 )
(1−λ)
(Here lambda = n / m is the load factor.)

One main disadvantage of linear probing is that records tend to cluster, that is,
appear next one another, when the load factor is greater than 50 percent. Such a
clustering substantially creases the average search time for a record. Two
techniques to minimize clustering are as follows:
1. Quadratic probing: Suppose a record R with key k has the hash address H (x)=h.
Then, instead of searching the locations with addresses h, h + 1 h +2..... we linearly
search the locations with addresses
2
h, h + 1, h + 4, h + 9, h + 16 ,...,h+ 𝑖 ,...
2. Double hashing: Here a second hash function H’ is used for resolving a
collision. Suppose a record R with key k has the hash addresses H(k) = h and H'(k)
=h’ is not equal to m. Then we linearly search the locations with addresses
h, h+h’, h+2h’…….
If m is a prime number, then the above sequence will access all the locations in the
table.

Write a C program to implement Open Addressing (Linear Probing,


Quadratic Probing, and Double Hashing).

#include <stdio.h>
#include <stdlib.h>

#define TABLE_SIZE 10
#define EMPTY -1

int linearTable[TABLE_SIZE];
int quadraticTable[TABLE_SIZE];
int doubleHashTable[TABLE_SIZE];

// Initialize all tables


void initializeTables() {
for (int i = 0; i < TABLE_SIZE; i++) {
linearTable[i] = quadraticTable[i] = doubleHashTable[i] = EMPTY;
}
}

// Hash Functions
int hash1(int key) {
return key % TABLE_SIZE;
}

int hash2(int key) {


return 7 - (key % 7); // must be non-zero
}

// Insert with Linear Probing


void insertLinear(int key) {
int index = hash1(key);

for (int i = 0; i < TABLE_SIZE; i++) {


int probeIndex = (index + i) % TABLE_SIZE;
if (linearTable[probeIndex] == EMPTY) {
linearTable[probeIndex] = key;
printf("Inserted at index %d using Linear Probing.\n", probeIndex);
return;
}
}

printf("Hash table is full (Linear Probing).\n");


}

// Insert with Quadratic Probing


void insertQuadratic(int key) {
int index = hash1(key);

for (int i = 0; i < TABLE_SIZE; i++) {


int probeIndex = (index + i * i) % TABLE_SIZE;
if (quadraticTable[probeIndex] == EMPTY) {
quadraticTable[probeIndex] = key;
printf("Inserted at index %d using Quadratic Probing.\n", probeIndex);
return;
}
}

printf("Hash table is full (Quadratic Probing).\n");


}

// Insert with Double Hashing


void insertDoubleHash(int key) {
int index1 = hash1(key);
int index2 = hash2(key);

for (int i = 0; i < TABLE_SIZE; i++) {


int probeIndex = (index1 + i * index2) % TABLE_SIZE;
if (doubleHashTable[probeIndex] == EMPTY) {
doubleHashTable[probeIndex] = key;
printf("Inserted at index %d using Double Hashing.\n", probeIndex);
return;
}
}

printf("Hash table is full (Double Hashing).\n");


}

// Display function
void displayTable(int table[]) {
for (int i = 0; i < TABLE_SIZE; i++) {
if (table[i] != EMPTY)
printf("Index %d: %d\n", i, table[i]);
else
printf("Index %d: EMPTY\n", i);
}
}

// Main function
int main() {
int choice, key;

initializeTables();

while (1) {
printf("\nOpen Addressing Menu:\n");
printf("1. Insert (Linear Probing)\n");
printf("2. Insert (Quadratic Probing)\n");
printf("3. Insert (Double Hashing)\n");
printf("4. Display Linear Table\n");
printf("5. Display Quadratic Table\n");
printf("6. Display Double Hash Table\n");
printf("7. Exit\n");

printf("Enter your choice: ");


scanf("%d", &choice);

switch (choice) {
case 1:
printf("Enter key to insert: ");
scanf("%d", &key);
insertLinear(key);
break;
case 2:
printf("Enter key to insert: ");
scanf("%d", &key);
insertQuadratic(key);
break;
case 3:
printf("Enter key to insert: ");
scanf("%d", &key);
insertDoubleHash(key);
break;
case 4:
printf("\nLinear Probing Table:\n");
displayTable(linearTable);
break;
case 5:
printf("\nQuadratic Probing Table:\n");
displayTable(quadraticTable);
break;
case 6:
printf("\nDouble Hashing Table:\n");
displayTable(doubleHashTable);
break;
case 7:
exit(0);
default:
printf("Invalid choice. Try again.\n");
}
}

return 0;
}

Separate chaining: Chaining involves maintaining two tables in memory.


Firstly there is a table in memory which contains the records in F, except that T
now has an additional field LINK which is used so that all records in T with the
same hash address h may be linked together to form a linked list. Second, there is a
hash address table LIST which contains pointers to the linked lists in T.
Suppose a new record R with key k is added to the file F. We place R in the first
available location in the table T and then add R to the linked list with pointer
LIST[H(k)]. If the linked lists of records are not sorted, then R is simply inserted at
the beginning of its linked list. Searching for a record or deleting a record is
nothing more than searching for a node or deleting a node from a linked list.
The average number of probes, using chaining, for a successful search and for an
unsuccessful search are known to be the following approximate values:
1 −λ
S(λ) = 1+ 2
λ​ ​ and U(λ)≈ 𝑒 + λ
Here the load factor lambda = n / m may be greater than 1, since the number m of
hash addresses in L. (not the number of locations in T) may be less than the
number n of records in F.

C program to implement Separate Chaining using an array of


linked lists.
#include <stdio.h>
#include <stdlib.h>

#define TABLE_SIZE 10

// Node structure for linked list


struct Node {
int data;
struct Node* next;
};

// Hash table as an array of linked list pointers


struct Node* hashTable[TABLE_SIZE];

// Hash function
int hashFunction(int key) {
return key % TABLE_SIZE;
}
// Insert a key into the hash table
void insert(int key) {
int index = hashFunction(key);

struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));


newNode->data = key;
newNode->next = hashTable[index];

hashTable[index] = newNode;
}

// Search a key in the hash table


int search(int key) {
int index = hashFunction(key);
struct Node* temp = hashTable[index];

while (temp != NULL) {


if (temp->data == key)
return 1;
temp = temp->next;
}
return 0;
}

// Delete a key from the hash table


void delete(int key) {
int index = hashFunction(key);
struct Node *temp = hashTable[index], *prev = NULL;

while (temp != NULL && temp->data != key) {


prev = temp;
temp = temp->next;
}

if (temp == NULL) {
printf("Key %d not found.\n", key);
return;
}

if (prev == NULL) {
hashTable[index] = temp->next;
} else {
prev->next = temp->next;
}

free(temp);
printf("Key %d deleted.\n", key);
}

// Display the hash table


void display() {
for (int i = 0; i < TABLE_SIZE; i++) {
printf("Index %d: ", i);
struct Node* temp = hashTable[i];
while (temp != NULL) {
printf("%d -> ", temp->data);
temp = temp->next;
}
printf("NULL\n");
}
}

// Main function
int main() {
int choice, key;

while (1) {
printf("\nHash Table Operations:\n");
printf("1. Insert\n2. Search\n3. Delete\n4. Display\n5. Exit\n");
printf("Enter your choice: ");
scanf("%d", &choice);

switch (choice) {
case 1:
printf("Enter key to insert: ");
scanf("%d", &key);
insert(key);
break;
case 2:
printf("Enter key to search: ");
scanf("%d", &key);
if (search(key))
printf("Key %d found.\n", key);
else
printf("Key %d not found.\n", key);
break;
case 3:
printf("Enter key to delete: ");
scanf("%d", &key);
delete(key);
break;
case 4:
display();
break;
case 5:
printf("Exiting...\n");
return 0;
default:
printf("Invalid choice.\n");
}
}

return 0;
}
Rehashing
Rehashing is the process of increasing the size of a hash map and redistributing the
elements to new buckets based on their new hash values. It is done to improve the
performance of the hash map and to prevent collisions caused by a high load factor.
When a hash map becomes full, the load factor (i.e., the ratio of the number of
elements to the number of buckets) increases. As the load factor increases, the
number of collisions also increases, which can lead to poor performance. To avoid
this, the hash map can be resized and the elements can be rehashed to new buckets,
which decreases the load factor and reduces the number of collisions.
During rehashing, all elements of the hash map are iterated and their new bucket
positions are calculated using the new hash function that corresponds to the new
size of the hash map. This process can be time-consuming but it is necessary to
maintain the efficiency of the hash map.
Rehashing can be done as follows:
●​ For each addition of a new entry to the map, check the load factor.
●​ If it’s greater than its pre-defined value (or default value of 0.75 if not
given), then Rehash.
●​ For Rehash, make a new array of double the previous size and make it the
new bucketarray.
●​ Then traverse to each element in the old bucketArray and call the insert() for
each so as to insert it into the new larger bucket array.

C program for Rehashing.


#include <stdio.h>
#include <stdlib.h>

#define INITIAL_SIZE 5
#define LOAD_FACTOR 0.7
#define EMPTY -1
// Function to check if a number is prime
int isPrime(int n) {
if (n <= 1) return 0;
for (int i = 2; i * i <= n; i++)
if (n % i == 0) return 0;
return 1;
}

// Function to find next prime number greater than n


int nextPrime(int n) {
while (!isPrime(++n));
return n;
}

// Hash function
int hash(int key, int size) {
return key % size;
}

// Insert function
void insert(int **table, int *size, int *count, int key);

// Rehash function
void rehash(int **table, int *size, int *count) {
int oldSize = *size;
int newSize = nextPrime(oldSize * 2);
int *newTable = (int *)malloc(newSize * sizeof(int));

for (int i = 0; i < newSize; i++)


newTable[i] = EMPTY;

for (int i = 0; i < oldSize; i++) {


if ((*table)[i] != EMPTY) {
int key = (*table)[i];
int index = hash(key, newSize);
while (newTable[index] != EMPTY)
index = (index + 1) % newSize;
newTable[index] = key;
}
}

free(*table);
*table = newTable;
*size = newSize;
printf("Rehashed! New table size: %d\n", newSize);
}

// Insert with linear probing and rehashing


void insert(int **table, int *size, int *count, int key) {
float lf = (float)(*count + 1) / *size;
if (lf > LOAD_FACTOR) {
rehash(table, size, count);
}

int index = hash(key, *size);


while ((*table)[index] != EMPTY)
index = (index + 1) % *size;

(*table)[index] = key;
(*count)++;
printf("Inserted %d at index %d\n", key, index);
}

// Display table
void display(int *table, int size) {
printf("\nHash Table:\n");
for (int i = 0; i < size; i++) {
if (table[i] != EMPTY)
printf("Index %d: %d\n", i, table[i]);
else
printf("Index %d: EMPTY\n", i);
}
}
int main() {
int *hashTable;
int size = INITIAL_SIZE;
int count = 0;
int choice, key;

hashTable = (int *)malloc(size * sizeof(int));


for (int i = 0; i < size; i++)
hashTable[i] = EMPTY;

while (1) {
printf("\n1. Insert\n2. Display\n3. Exit\nEnter choice: ");
scanf("%d", &choice);

switch (choice) {
case 1:
printf("Enter key to insert: ");
scanf("%d", &key);
insert(&hashTable, &size, &count, key);
break;
case 2:
display(hashTable, size);
break;
case 3:
free(hashTable);
exit(0);
default:
printf("Invalid choice.\n");
}
}
return 0;
}

Extendible hashing:
Extendible Hashing is a dynamic hashing method wherein directories, and buckets
are used to hash data. It is an aggressively flexible method in which the hash
function also experiences dynamic changes.
Main features of Extendible Hashing:
The main features in this hashing technique are:
• Directories: The directories store addresses of the buckets in pointers. An id is
assigned to each directory which may change each time when directory Expansion
takes place.
• Buckets: The buckets are used to hash the actual data.

Key terms used in Extendible hashing:


●​ Directories: These containers store pointers to buckets. Each directory is
given a unique id which may change each time when expansion takes place.
The hash function returns this directory id which is used to navigate to the
appropriate bucket. Number of Directories = 2^Global Depth.
●​ Buckets: They store the hashed keys. Directories point to buckets. A bucket
may contain more than one pointers to it if its local depth is less than the
global depth.
●​ Global Depth: It is associated with the Directories. They denote the number
of bits which are used by the hash function to categorize the keys. Global
Depth = Number of bits in directory id.
●​ Local Depth: It is the same as that of Global Depth except for the fact that
Local Depth is associated with the buckets and not the directories. Local
depth in accordance with the global depth is used to decide the action that to
be performed in case an overflow occurs. Local Depth is always less than or
equal to the Global Depth.
●​ Bucket Splitting: When the number of elements in a bucket exceeds a
particular size, then the bucket is split into two parts.
●​ Directory Expansion: Directory Expansion Takes place when a bucket
overflows. Directory Expansion is performed when the local depth of the
overflowing bucket is equal to the global depth.

Basic working of Extendible hashing:


Step 1 – Analyze Data Elements: Data elements may exist in various forms eg.
Integer, String, Float, etc.. Currently, let us consider data elements of type integer.
eg: 49.
Step 2 – Convert into binary format: Convert the data element in Binary form. For
string elements, consider the ASCII equivalent integer of the starting character and
then convert the integer into binary form. Since we have 49 as our data element, its
binary form is 110001.
Step 3 – Check Global Depth of the directory. Suppose the global depth of the
Hash-directory is 3.
Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in
the binary number and match it to the directory id. Eg. The binary obtained is:
110001 and the global-depth is 3. So, the hash function will return 3 LSBs of
110001 viz. 001.
Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with
directory-id 001.
Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket
overflows. If an overflow is encountered, go to step 7 followed by Step 8,
otherwise, go to step 9.
Step 7 – Tackling Over Flow Condition during Data Insertion: Many times, while
inserting data in the buckets, it might happen that the Bucket overflows. In such
cases, we need to follow an appropriate procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to the global depth. Then choose
one of the cases below.
Case1: If the local depth of the overflowing Bucket is equal to the global
depth, then Directory Expansion, as well as Bucket Split, needs to be
performed. Then increment the global depth and the local depth value by 1.
And, assign appropriate pointers. Directory expansion will double the
number of directories present in the hash structure.
Case2: In case the local depth is less than the global depth, then only Bucket
Split takes place. Then increment only the local depth value by 1. And,
assign appropriate pointers.
Step 8 – Rehashing of Split Bucket Elements: The Elements present in the
overflowing bucket that is split are rehashed w.r.t the new global depth of the
directory.
Step 9 – The element is successfully hashed.

Advantages:
1.​ Data retrieval is less expensive (in terms of computing).
2.​ No problem of Data-loss since the storage capacity increases dynamically.
3.​ With dynamic changes in hashing function, associated old values are
rehashed w.r.t the new hash function.

Limitations Of Extendible Hashing:


●​ The directory size may increase significantly if several records are hashed on
the same directory while keeping the record distribution non-uniform.
●​ Size of every bucket is fixed.
●​ Memory is wasted in pointers when the global depth and local depth
difference becomes drastic.
●​ This method is complicated to code.
Question bank:
1.​ Explain the working of Bubble Sort with an example.
2.​ Write a C program to implement Selection Sort.
3.​ Explain the steps involved in Insertion Sort with an example.
4.​ What is Shell Sort? How is it different from Insertion Sort?
5.​ Write the algorithm for Quick Sort and mention its time complexity.
6.​ Explain the Merge Sort algorithm and mention its best, worst, and
average-case time complexity.
7.​ What is Hashing? Explain its advantages.
8.​ What are Hash Functions? Give two examples.
9.​ Explain the concept of Separate Chaining in Hashing.
10.​What is Open Addressing in Hashing? Mention its different types.
11.​Write a C program to implement Bubble Sort and analyze its time
complexity.
12.​Implement Selection Sort in C and explain how it works with an example.
13.​Explain Insertion Sort and analyze its performance in best, worst, and
average cases.
14.​Write a C program to implement Shell Sort and explain its working with an
example.
15.​Explain the Quick Sort algorithm and write a C program to implement it.
16.​Write a C program to implement Merge Sort and explain its working with an
example.
17.​What is Rehashing? Explain when and why it is required.
18.​Compare Separate Chaining and Open Addressing in hashing.
19.​Explain Extendible Hashing with a suitable example.
20.​What are collisions in hashing? Explain different techniques to handle
collisions.
21.​Compare Bubble Sort, Selection Sort, Insertion Sort, Shell Sort, Quick Sort,
and Merge Sort based on time complexity and space complexity.
22.​Implement Quick Sort in C and explain how partitioning works with an
example.
23.​Implement Merge Sort in C and explain its divide-and-conquer approach.
24.​Write a C program to perform insertion, deletion, and searching using
Hashing (Separate Chaining).
25.​Implement Open Addressing in C using Linear Probing and explain how
collisions are resolved.
26.​Write a C program to implement Double Hashing and explain its
significance.
27.​Explain Extendible Hashing and its advantages over other hashing
techniques.
28.​Write a C program to perform Rehashing and explain when it is required.
29.​Implement a Hash Table using Chaining and perform insertion, deletion, and
search operations.
30.​Explain the importance of Hashing in real-world applications and discuss
different hashing techniques.

C-Programs:-
●​ Write a C program to implement Bubble Sort and display the sorted array.
●​ Implement Selection Sort in C and explain how it works with an example.
●​ Write a C program to perform Insertion Sort on an array of numbers.
●​ Implement Shell Sort in C and display the sorted array.
●​ Write a C program to perform Quick Sort using recursion.
●​ Implement Merge Sort in C and explain how it divides and merges the array.
●​ Write a C program to implement a simple hash function and demonstrate its
working.
●​ Implement Linear Probing (Open Addressing) in hashing and demonstrate
collision handling.
●​ Write a C program to implement Separate Chaining using an array of linked
lists.
●​ Implement Rehashing in C and demonstrate how it works when the load
factor increases.
●​ Implement Bubble Sort in C and analyze its best, worst, and average case
time complexities.
●​ Write a C program to implement Selection Sort and count the number of
swaps
●​ performed.
●​ Implement Insertion Sort in C and display the number of comparisons made.
●​ Write a C program to implement Shell Sort and compare its performance
with Insertion Sort.
●​ Implement Quick Sort in C and demonstrate how partitioning works step by
step.
●​ Write a C program to implement Merge Sort and display the sorting process.
●​ Implement a hash table using Separate Chaining and allow insertion,
deletion, and searching of elements.
●​ Write a C program to implement Open Addressing (Linear Probing,
Quadratic Probing, and Double Hashing).
●​ Implement Rehashing in C and demonstrate how it improves performance
when the hash table is full.
●​ Write a C program to implement Extendible Hashing and demonstrate
dynamic resizing.
●​ Implement all six sorting algorithms (Bubble Sort, Selection Sort, Insertion
Sort, Shell Sort, Quick Sort, and Merge Sort) and compare their execution
times for different input sizes.
●​ Write a C program to implement Quick Sort using recursion and
non-recursion (stackbased approach).
●​ Implement Merge Sort in C and optimize it for better space complexity.
●​ Write a C program to implement a Hash Table using Separate Chaining and
perform insertion, deletion, and searching operations.
●​ Implement Linear Probing, Quadratic Probing, and Double Hashing in Open
Addressing and compare their performance.
●​ Write a C program to perform Rehashing dynamically when the load factor
reaches a threshold.
●​ Implement Extendible Hashing in C and allow dynamic expansion and
contraction of the hash table.
●​ Write a C program to implement a dictionary using hashing, allowing users
to insert, delete, and search words.
●​ Implement a student database system using hashing, where student records
are stored based on their roll number using Separate Chaining.
●​ Write a C program to read a large dataset of numbers, hash them using
Extendible Hashing, and search for specific values efficiently.

You might also like