ADS Module2.Docx
ADS Module2.Docx
2.1 Sorting
Sorting refers to arranging the data in specific order, typically ascending or
descending. It is a fundamental operation that is used to improve the search
efficiency and organize data. It reduces the complexity of a problem.
Algorithm
1 Start from the first element and compare it with the next.
2 If the first element is greater than the second, swap them.
3 Move to the next element and repeat step 2.
4 Repeat the process for all elements in the list.
5 The largest element will settle at the last position in one complete pass.
6 Repeat the process for the remaining unsorted elements until the entire list is
sorted.
int main() {
int arr[] = {4, 2, 7, 1};
int n = sizeof(arr)/sizeof(arr[0]);
bubbleSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}
Output:
Sorted array:
1247
Disadvantages
Inefficient: Poor performance on large lists with an average and worst-case time
complexity of O(n2).
High Number of Comparisons: Even if the list is partially sorted, Bubble Sort
makes unnecessary comparisons.
Not Suitable for Large Datasets: Due to its quadratic time complexity, it's not
recommended for large datasets
Algorithm
1 Start from the first element.
2 Find the smallest element in the unsorted part of the array.
3 Swap it with the first unsorted element.
4 Move to the next position and repeat until the array is fully sorted.
Example
Let's sort the array [64, 25, 12, 22, 11] using selection sort:
1. Start with the first element 64 and find the smallest element in the array. The
smallest is 11, so swap 11 with 64.
[11, 25, 12, 22, 64]
2. Move to the next element 25 and find the smallest element in the remaining
part. The smallest is 12, so swap 12 with 25.
[11, 12, 25, 22, 64]
3. Move to the next element 25 and find the smallest element in the remaining
part. The smallest is 22, so swap 22 with 25.
[11, 12, 22, 25, 64]
4. The next element 25 is already in the correct position.
The last element 64 is already in the correct position.
5.The array is now sorted: [11, 12, 22, 25, 64].
C program
#include <stdio.h>
#include <stdlib.h>
void merge(int arr[], int left, int mid, int right) {
int n1 = mid - left + 1;
int n2 = right - mid;
int L[n1], R[n2];
for (int i = 0; i < n1; i++)
L[i] = arr[left + i];
for (int j = 0; j < n2; j++)
R[j] = arr[mid + 1 + j];
int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = R[j];
j++;
}
k++;
}
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}
while (j < n2) {
arr[k] = R[j];
j++;
k++;
}
}
void mergeSort(int arr[], int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}
void printArray(int A[], int size) {
for (int i = 0; i < size; i++)
printf("%d ", A[i]);
printf("\n");
}
int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int arr_size = sizeof(arr) / sizeof(arr[0]);
printf("Given array is \n");
printArray(arr, arr_size);
mergeSort(arr, 0, arr_size - 1);
printf("\nSorted array is \n");
printArray(arr, arr_size);
return 0;
}
Output:
Given array is
12 11 13 5 6 7
Sorted array is
5 6 7 11 12 13
Advantages
● Consistent Time Complexity: O(n log n) time complexity in all cases (best,
average, worst).
● Stable Sorting: Maintains the relative order of equal elements.
● Efficient for Large Data Sets: Handles large arrays or lists efficiently.
● Parallelizable: Can be easily parallelized due to its divide-and-conquer
nature.
● Predictable Performance: Performance does not degrade based on input data
characteristics.
Disadvantages
● High Space Complexity: Requires O(n) additional space for merging.
● Complex Implementation: More complex to implement compared to simpler
algorithms like insertion sort or selection sort.
● Not In-Place: Uses extra space for temporary subarrays, which can be a
limitation for memory-constrained environments.
● Overhead for Small Arrays: For small arrays, the overhead of recursive calls
and merging can make it slower than simpler algorithms like insertion sort.
2.2 Hashing
Hashing is a technique used in data structures that efficiently stores and retrieves
data in a way that allows for quick access. Hashing involves mapping data to a
specific index in a hash table (an array of items) using a hash function that enables
fast retrieval of information based on its key. The great thing about hashing is, we
can achieve all three operations (search, insert and delete) in O(1) time on average.
Hashing is mainly used to implement a set of distinct items and dictionaries (key
value pairs).
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the
hash function the technique that determines an index or location for storage
of an item in a data structure.
2. Hash Function: Receives the input key and returns the index of an element
in an array called a hash table. The index is known as the hash index.
3. Hash Table: Hash table is typically an array of lists. It stores values
corresponding to the keys. Hash stores the data in an associative manner in
an array where each data value has its own unique index.
How does Hashing work?
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it
in a table.
Step 1: We know that hash functions (which is some mathematical formula) are
used to calculate the hash value which acts as the index of the data structure where
the value will be stored.
Step 2: So, let’s assign
o “a” = 1,
o “b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the
string:
● “ab” = 1 + 2 = 3,
● “cd” = 3 + 4 = 7 ,
● “efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash
function that is used here is the sum of the characters in key mod Table size . We
can compute the location of the string in the array by taking the sum(string) mod
7.
Step 5: So we will then store
o “ab” in 3 mod 7 = 3,
o “cd” in 7 mod 7 = 0, and
o “efg” in 18 mod 7 = 4.
The above technique enables us to calculate the location of a given string by using
a simple hash function and rapidly find the value that is stored in that location.
Therefore, the idea of hashing seems like a great way to store (key, value) pairs of
the data in a table.
Hash Functions
Hashing refers to the process of generating a fixed-size output from an input of
variable size using the mathematical formulas known as hash functions. This
technique determines an index or location for the storage of an item in a data
structure. We use hashing for dictionaries, frequency counting, maintaining data for
quick access by key, etc. Real World Applications include Database Indexing,
Cryptography, Caches, Symbol Table and Dictionaries.
A hash function creates a mapping from an input key to an index in hash table, this
is done through the use of mathematical formulas known as hash functions.
For example: Consider phone numbers as keys and a hash table of size 100. A
simple example hash function can be to consider the last two digits of phone
numbers so that we have valid array indexes as output.
Hash table:
The data structure which is used for storing records is called a hash table. It
enables us to search a record rapidly by making use a given key value also
facilities easy insertion and deletion of records.
Collision resolution:
Suppose we want to add a new record R with key & to our file F. but suppose the
memory location address H(k) is already occupied. This situation is called
collision. There are two general ways of resolving collisions. The particular
procedure that one chooses depends on many factors. One important factor is the
ratio of the number n of keys in K (which is the number of records in F) to the
number m of hash addresses in L. This ratio, λ = n/m, is called the load factor.
First, we show that collisions are almost impossible to avoid. Specifically, suppose
a student class has 24 students and suppose the table has space for 365 records.
One random hash function is to choose the student's birthday as the hash address.
Although the load factor λ = 24/365 = 7% is very small, it can be shown that there
is a better than fifty-fifty chance that two of the students have the same birthday.
The efficiency of a hash function with a collision resolution procedure is measured
by the average number of probes (key comparisons) needed to find the location of
the record with a given key k.
Following two quantities:
S(λ) = average number of probes for a successful search
U(λ) = average number of probes for an unsuccessful search
These quantities will be discussed for our collision procedures.
Open Addressing
Open Addressing includes:
Linear probing (linear search)
Quadratic probing (nonlinear search), and
Double hashing (uses two hash functions)
1 1 1 1
S(λ) = 2
(1 + 1−λ
) and U(λ) = 2
(1 + 2 )
(1−λ)
(Here lambda = n / m is the load factor.)
One main disadvantage of linear probing is that records tend to cluster, that is,
appear next one another, when the load factor is greater than 50 percent. Such a
clustering substantially creases the average search time for a record. Two
techniques to minimize clustering are as follows:
1. Quadratic probing: Suppose a record R with key k has the hash address H (x)=h.
Then, instead of searching the locations with addresses h, h + 1 h +2..... we linearly
search the locations with addresses
2
h, h + 1, h + 4, h + 9, h + 16 ,...,h+ 𝑖 ,...
2. Double hashing: Here a second hash function H’ is used for resolving a
collision. Suppose a record R with key k has the hash addresses H(k) = h and H'(k)
=h’ is not equal to m. Then we linearly search the locations with addresses
h, h+h’, h+2h’…….
If m is a prime number, then the above sequence will access all the locations in the
table.
#include <stdio.h>
#include <stdlib.h>
#define TABLE_SIZE 10
#define EMPTY -1
int linearTable[TABLE_SIZE];
int quadraticTable[TABLE_SIZE];
int doubleHashTable[TABLE_SIZE];
// Hash Functions
int hash1(int key) {
return key % TABLE_SIZE;
}
// Display function
void displayTable(int table[]) {
for (int i = 0; i < TABLE_SIZE; i++) {
if (table[i] != EMPTY)
printf("Index %d: %d\n", i, table[i]);
else
printf("Index %d: EMPTY\n", i);
}
}
// Main function
int main() {
int choice, key;
initializeTables();
while (1) {
printf("\nOpen Addressing Menu:\n");
printf("1. Insert (Linear Probing)\n");
printf("2. Insert (Quadratic Probing)\n");
printf("3. Insert (Double Hashing)\n");
printf("4. Display Linear Table\n");
printf("5. Display Quadratic Table\n");
printf("6. Display Double Hash Table\n");
printf("7. Exit\n");
switch (choice) {
case 1:
printf("Enter key to insert: ");
scanf("%d", &key);
insertLinear(key);
break;
case 2:
printf("Enter key to insert: ");
scanf("%d", &key);
insertQuadratic(key);
break;
case 3:
printf("Enter key to insert: ");
scanf("%d", &key);
insertDoubleHash(key);
break;
case 4:
printf("\nLinear Probing Table:\n");
displayTable(linearTable);
break;
case 5:
printf("\nQuadratic Probing Table:\n");
displayTable(quadraticTable);
break;
case 6:
printf("\nDouble Hashing Table:\n");
displayTable(doubleHashTable);
break;
case 7:
exit(0);
default:
printf("Invalid choice. Try again.\n");
}
}
return 0;
}
#define TABLE_SIZE 10
// Hash function
int hashFunction(int key) {
return key % TABLE_SIZE;
}
// Insert a key into the hash table
void insert(int key) {
int index = hashFunction(key);
hashTable[index] = newNode;
}
if (temp == NULL) {
printf("Key %d not found.\n", key);
return;
}
if (prev == NULL) {
hashTable[index] = temp->next;
} else {
prev->next = temp->next;
}
free(temp);
printf("Key %d deleted.\n", key);
}
// Main function
int main() {
int choice, key;
while (1) {
printf("\nHash Table Operations:\n");
printf("1. Insert\n2. Search\n3. Delete\n4. Display\n5. Exit\n");
printf("Enter your choice: ");
scanf("%d", &choice);
switch (choice) {
case 1:
printf("Enter key to insert: ");
scanf("%d", &key);
insert(key);
break;
case 2:
printf("Enter key to search: ");
scanf("%d", &key);
if (search(key))
printf("Key %d found.\n", key);
else
printf("Key %d not found.\n", key);
break;
case 3:
printf("Enter key to delete: ");
scanf("%d", &key);
delete(key);
break;
case 4:
display();
break;
case 5:
printf("Exiting...\n");
return 0;
default:
printf("Invalid choice.\n");
}
}
return 0;
}
Rehashing
Rehashing is the process of increasing the size of a hash map and redistributing the
elements to new buckets based on their new hash values. It is done to improve the
performance of the hash map and to prevent collisions caused by a high load factor.
When a hash map becomes full, the load factor (i.e., the ratio of the number of
elements to the number of buckets) increases. As the load factor increases, the
number of collisions also increases, which can lead to poor performance. To avoid
this, the hash map can be resized and the elements can be rehashed to new buckets,
which decreases the load factor and reduces the number of collisions.
During rehashing, all elements of the hash map are iterated and their new bucket
positions are calculated using the new hash function that corresponds to the new
size of the hash map. This process can be time-consuming but it is necessary to
maintain the efficiency of the hash map.
Rehashing can be done as follows:
● For each addition of a new entry to the map, check the load factor.
● If it’s greater than its pre-defined value (or default value of 0.75 if not
given), then Rehash.
● For Rehash, make a new array of double the previous size and make it the
new bucketarray.
● Then traverse to each element in the old bucketArray and call the insert() for
each so as to insert it into the new larger bucket array.
#define INITIAL_SIZE 5
#define LOAD_FACTOR 0.7
#define EMPTY -1
// Function to check if a number is prime
int isPrime(int n) {
if (n <= 1) return 0;
for (int i = 2; i * i <= n; i++)
if (n % i == 0) return 0;
return 1;
}
// Hash function
int hash(int key, int size) {
return key % size;
}
// Insert function
void insert(int **table, int *size, int *count, int key);
// Rehash function
void rehash(int **table, int *size, int *count) {
int oldSize = *size;
int newSize = nextPrime(oldSize * 2);
int *newTable = (int *)malloc(newSize * sizeof(int));
free(*table);
*table = newTable;
*size = newSize;
printf("Rehashed! New table size: %d\n", newSize);
}
(*table)[index] = key;
(*count)++;
printf("Inserted %d at index %d\n", key, index);
}
// Display table
void display(int *table, int size) {
printf("\nHash Table:\n");
for (int i = 0; i < size; i++) {
if (table[i] != EMPTY)
printf("Index %d: %d\n", i, table[i]);
else
printf("Index %d: EMPTY\n", i);
}
}
int main() {
int *hashTable;
int size = INITIAL_SIZE;
int count = 0;
int choice, key;
while (1) {
printf("\n1. Insert\n2. Display\n3. Exit\nEnter choice: ");
scanf("%d", &choice);
switch (choice) {
case 1:
printf("Enter key to insert: ");
scanf("%d", &key);
insert(&hashTable, &size, &count, key);
break;
case 2:
display(hashTable, size);
break;
case 3:
free(hashTable);
exit(0);
default:
printf("Invalid choice.\n");
}
}
return 0;
}
Extendible hashing:
Extendible Hashing is a dynamic hashing method wherein directories, and buckets
are used to hash data. It is an aggressively flexible method in which the hash
function also experiences dynamic changes.
Main features of Extendible Hashing:
The main features in this hashing technique are:
• Directories: The directories store addresses of the buckets in pointers. An id is
assigned to each directory which may change each time when directory Expansion
takes place.
• Buckets: The buckets are used to hash the actual data.
Advantages:
1. Data retrieval is less expensive (in terms of computing).
2. No problem of Data-loss since the storage capacity increases dynamically.
3. With dynamic changes in hashing function, associated old values are
rehashed w.r.t the new hash function.
C-Programs:-
● Write a C program to implement Bubble Sort and display the sorted array.
● Implement Selection Sort in C and explain how it works with an example.
● Write a C program to perform Insertion Sort on an array of numbers.
● Implement Shell Sort in C and display the sorted array.
● Write a C program to perform Quick Sort using recursion.
● Implement Merge Sort in C and explain how it divides and merges the array.
● Write a C program to implement a simple hash function and demonstrate its
working.
● Implement Linear Probing (Open Addressing) in hashing and demonstrate
collision handling.
● Write a C program to implement Separate Chaining using an array of linked
lists.
● Implement Rehashing in C and demonstrate how it works when the load
factor increases.
● Implement Bubble Sort in C and analyze its best, worst, and average case
time complexities.
● Write a C program to implement Selection Sort and count the number of
swaps
● performed.
● Implement Insertion Sort in C and display the number of comparisons made.
● Write a C program to implement Shell Sort and compare its performance
with Insertion Sort.
● Implement Quick Sort in C and demonstrate how partitioning works step by
step.
● Write a C program to implement Merge Sort and display the sorting process.
● Implement a hash table using Separate Chaining and allow insertion,
deletion, and searching of elements.
● Write a C program to implement Open Addressing (Linear Probing,
Quadratic Probing, and Double Hashing).
● Implement Rehashing in C and demonstrate how it improves performance
when the hash table is full.
● Write a C program to implement Extendible Hashing and demonstrate
dynamic resizing.
● Implement all six sorting algorithms (Bubble Sort, Selection Sort, Insertion
Sort, Shell Sort, Quick Sort, and Merge Sort) and compare their execution
times for different input sizes.
● Write a C program to implement Quick Sort using recursion and
non-recursion (stackbased approach).
● Implement Merge Sort in C and optimize it for better space complexity.
● Write a C program to implement a Hash Table using Separate Chaining and
perform insertion, deletion, and searching operations.
● Implement Linear Probing, Quadratic Probing, and Double Hashing in Open
Addressing and compare their performance.
● Write a C program to perform Rehashing dynamically when the load factor
reaches a threshold.
● Implement Extendible Hashing in C and allow dynamic expansion and
contraction of the hash table.
● Write a C program to implement a dictionary using hashing, allowing users
to insert, delete, and search words.
● Implement a student database system using hashing, where student records
are stored based on their roll number using Separate Chaining.
● Write a C program to read a large dataset of numbers, hash them using
Extendible Hashing, and search for specific values efficiently.