0% found this document useful (0 votes)
4 views

Searching and Sorting

Uploaded by

vanshk970
Copyright
© © All Rights Reserved
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Searching and Sorting

Uploaded by

vanshk970
Copyright
© © All Rights Reserved
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 29

SEARCHING AND SORTING

Why is sorting needed:-


Importance of Ordered Sets: Sorting enhances efficiency in both manual and machine data
retrieval (e.g, phone directories, library books).
• Terminology:
• A file of size n is a sequence of n items.
• Each item is a record, with an associated key that usually determines sort order.
• Internal vs. External Sorting:
• Internal: Records are sorted within main memory.
• External: Records may exist in auxiliary storage.
• Stable Sorts: A stable sort keeps the relative order of records with identical keys as they
appeared in the original file.
• Direct vs. Pointer-Based Sorting:
• Direct Sort: Records themselves are rearranged.
• Pointer-Based Sorting: When records are large, sorting is done via pointers in an auxiliary
table to avoid moving data directly.
Selection of Sorting Method:
• No one method is universally optimal.
• Programmers must consider application-specific needs to choose the appropriate sorting
technique.

Sorting Efficiency
Sorting efficiency depends on balancing coding time, machine time, and memory usage. If a file is
small, simpler sorting methods may be preferable over complex ones designed to save time and
space. However, frequent or extensive file sorting demands more efficient methods to avoid
overwhelming processing time. Sorting time is often measured by critical operations (e.g., key
comparisons and record movements), not in time units. This chapter introduces mathematical
analysis for sorting times, highlighting the significance of file size (n) on sorting performance. For
small files, sorting time increases linearly with file size, but for large files, it increases quadratically,
making efficiency crucial for large datasets.

Main Points:
1. Balancing Considerations:
• Sorting method choice should weigh coding time, execution time, and memory
requirements, adjusting based on file size and task frequency.
• For smaller or one-time tasks, simpler sorting methods may suffice.
2. Critical Operations:
• Sorting time efficiency is often gauged by the count of key comparisons and record
movements rather than direct time.
• For large, complex key comparisons, focusing on reducing these operations is
essential.
3. Mathematical Analysis:
• Sorting time can vary based on the best, worst, and average cases and is influenced
heavily by file size (n).
• For small values of n, time is nearly proportional to n; for larger n, time becomes
almost proportional to n2, significantly impacting performance.
4. Efficiency Impact for Large Datasets:
• As n grows, quadratic factors (e.g., 0.01n2) dominate, so increasing n by a factor can
exponentially increase sorting time.
• For very large files, choosing an efficient sorting algorithm becomes critical to
manage time and space requirements effectively.

Big O Notation
Big-O notation (O) is a way to describe the asymptotic behavior of functions, which measures
how the runtime or space requirement of an algorithm grows relative to its input size n. When
we say f(n)=O(g(n)), we mean that for large enough n, f(n) grows at most as fast as g(n), up to
a constant factor.
1. Definition of Big-O:
• A function f(n) is O(g(n)) if there are constants a and b such that f(n)≤a⋅g(n) for all
n≥b.
• This notation captures that f(n) grows at most as fast as g(n), ignoring constant
factors and lower-order terms.
2. Asymptotic Bounds:
• If f(n) grows slower than g(n) as n increases, f(n) is said to be bounded by g(n) or to
be of a smaller order.
• Functions may be bounded by many others, but we usually use the "closest fit,"
ignoring constants and focusing on the dominant term.
3. Transitivity of Big-O:
• If f(n)=O(g(n)) and g(n)=O(h(n)), then f(n)=O(h(n)), allowing us to extend bounds
across multiple functions.
4. Hierarchy of Functions:
• Constant functions O(1) are the smallest, followed by logarithmic O(logn), linear
O(n), polynomial O(nk), and exponential O(dn), where d>1.
• Polynomial functions grow slower than exponential functions, making exponential-
time algorithms impractical for large inputs.
5. Significance of Polynomial vs. Exponential Growth:
• Polynomial algorithms (e.g., O(nk)) are generally feasible, while exponential
algorithms (e.g., O(2n)) grow too rapidly to solve large problems effectively on
current computing hardware.
6. Logarithmic Growth:
• All logarithmic functions (e.g., log2n and log10n) are in the same order, denoted
O(logn), as they differ only by a constant factor.

Summary of Function Order Examples


• f(n)=100n is O(n), f(n)=10n2+37n+153 is O(n2), and logarithmic functions such as logn are
O(logn).
• Exponential functions (e.g., 2n) outpace polynomial ones and grow unsustainably fast,
making exponential algorithms infeasible for most practical applications.

Efficiency of Sorting Algorithms: Key Concepts


Sorting algorithms are evaluated primarily based on their time complexity, but other factors, like
memory usage and stability, are also considered when selecting a sorting method.
1. Order of Growth:
• Sorting algorithms typically fall within the time complexities of O(nlogn) and O(n2).
• Optimal Sorting: A universal, consistently O(n) sort does not exist for all cases due
to constraints imposed by information theory, especially when data order is arbitrary.
• For large n, algorithms that are O(nlogn) are much more efficient than those that are
O(n2). For example, increasing n by a factor of 100 in an O(nlogn) algorithm
increases runtime by a factor of less than 200, whereas an O(n2) algorithm would see
a 10,000-fold increase.
2. Empirical Testing:
• Running tests on actual data provides real-world insights into performance, though
this can vary with specific data characteristics. Some algorithms perform better on
nearly-sorted data, while others handle randomized or reverse-ordered data more
uniformly.
• Performance on various datasets can help determine an algorithm's suitability for
different scenarios, as real-world efficiency can deviate from theoretical complexity
depending on the dataset's attributes.
3. Data Characteristics:
• Knowledge about the initial order of the dataset can inform sorting choices. For
example:
• Best-case data (almost sorted): Some algorithms, like insertion sort, can
achieve O(n) time complexity on nearly-sorted inputs.
• Worst-case data (reverse or random order): Algorithms like quicksort
degrade to O(n2) in the worst-case scenario without optimizations.
4. Space Considerations:
• In-Place Sorting: Ideal sorting algorithms require O(1) additional space, meaning
they sort within the array or list containing the original data, with few extra variables.
In-place sorts, like quicksort, require minimal auxiliary storage.
• Time-Space Trade-Off: Generally, faster algorithms may use more memory. In-
place sorts, however, manage to balance both time and space but may have higher
constant factors, which affects their speed on small datasets.
5. Program Optimization:
• Since sorting can be a performance bottleneck, programmers often optimize sorting
code for speed. Removing function calls in inner loops and inlining code can reduce
overhead, although it may reduce readability.
• Minimizing function calls and storage operations (especially calls to the operating
system) in the sorting logic is crucial for performance gains.

Choosing a Sorting Algorithm


Ultimately, the choice of sorting algorithm depends on factors like data characteristics, performance
on average vs. worst-case scenarios, space constraints, and specific application needs. Sorting is an
area where trade-offs between speed, memory, and programmer effort are weighed carefully to find
the best fit for the situation.

Sorting Techniques

Bubble Sort: Overview and Efficiency


The Bubble Sort is a simple sorting algorithm where each element in an array is compared with its
neighbor, and they are swapped if they are in the wrong order. This process is repeated until the
entire array is sorted. Though Bubble Sort is easy to understand and implement, it is generally one
of the least efficient sorting algorithms, especially for large datasets.

How Bubble Sort Works


1. Sequential Passes:
• The algorithm iterates through the list multiple times.
• During each pass, adjacent elements are compared. If the current element is larger
than the next, they are swapped.
• At the end of each pass, the largest unsorted element "bubbles" to its correct position
at the end of the list.
2. Example: Given an array:
Copy code
25 57 48 37 12 92 86 33

• Pass 1: After comparisons and swaps, 92 is positioned at the end.


• Pass 2: The second-largest value, 86, is moved to its correct position.
• This continues until the array is sorted.
3. Termination:
• After n−1 passes, where n is the number of elements, the array will be sorted.
• This is because each pass places at least one element in its correct final position.
Efficiency of Bubble Sort
Bubble Sort has a worst-case time complexity of O(n2), making it inefficient for large lists. Here’s a
breakdown of the performance:
1. Basic Analysis:
• Without improvements, Bubble Sort makes (n−1)×(n−1) comparisons, or O(n2).
• The worst case (when the array is sorted in reverse order) involves maximum
comparisons and swaps.
2. Optimizations:
• Reduced Comparisons: Each pass places the largest unsorted element in its correct
position, so the number of comparisons per pass decreases as sorting progresses.
• Early Termination: If a pass completes with no swaps, the array is sorted, allowing
the algorithm to terminate early.
• Improved Time Complexity in Best Case: If the list is already sorted, Bubble Sort
completes in O(n) time, with just one pass through the array.
3. Improvement Using Bidirectional Passes:
• An alternate version, called Bidirectional Bubble Sort, sorts by alternating
directions for each pass, with the largest unsorted element moving to the end in one
pass and the smallest unsorted element moving to the beginning in the next. This
modification can reduce the number of passes further.

Space Complexity
Bubble Sort requires minimal additional space—only a few integer variables and one temporary
variable for swaps—making its space complexity O(1). This low memory usage can be
advantageous in environments with limited space.

Summary
While Bubble Sort has a straightforward implementation and minimal memory needs, it is usually
avoided for large datasets due to its O(n2) time complexity in average and worst-case scenarios. Its
primary redeeming feature is the O(n) efficiency in the best case (already sorted arrays) and its
simplicity, making it useful for educational purposes and small datasets.
Code:-
OuickSort
Quicksort Algorithm Overview
Quicksort is a widely used sorting algorithm based on the partition-exchange method. It follows a
divide-and-conquer strategy, dividing the array into smaller subarrays that can be sorted
independently. Here's a comprehensive breakdown of its key components:

1. Basic Concept
• The primary idea behind Quicksort is to select a 'pivot' element from the array. The elements
are then rearranged so that those less than the pivot are on its left, and those greater than the
pivot are on its right. This process continues recursively for the resulting subarrays until the
entire array is sorted.
2. Choosing the Pivot
• The pivot can be chosen in various ways (e.g., first element, last element, median of three)
depending on the implementation. The choice of pivot can significantly impact the
algorithm's performance. A well-chosen pivot can lead to more balanced partitions, which is
crucial for efficiency.

3. Partitioning Process
• The partitioning involves several steps:
• Initialization: Two pointers are established: down starts at the lower bound of the
array, and up starts at the upper bound.
• Moving Pointers:
• Increment the down pointer until an element greater than the pivot is found.
• Decrement the up pointer until an element less than the pivot is found.
• If up is greater than down, swap the elements at these pointers.
• This process continues until the pointers cross. Finally, the pivot is swapped with the
element at the up pointer, which indicates the final position of the pivot.

4. Recursive Sorting
• After the partitioning, the pivot is in its final position. The algorithm recursively sorts the
left subarray (elements less than the pivot) and the right subarray (elements greater than the
pivot).
• The recursion stops when subarrays are reduced to sizes of 0 or 1, which are inherently
sorted.

5. Algorithm Implementation
• The pseudocode provided outlines a recursive implementation of the Quicksort algorithm. It
includes a partition function that rearranges the elements based on the pivot and two
recursive calls for the left and right subarrays.
• A non-recursive version can also be implemented using a stack to keep track of subarray
indices to minimize function call overhead.

6. Optimizations
• To enhance performance, several strategies can be employed:
• Median-of-Three Pivoting: Choosing the pivot as the median of the first, last, and
middle elements to reduce the likelihood of unbalanced partitions.
• Mean Sort: Using the mean of the subarrays as the pivot in subsequent partitions,
leading to better-balanced subarrays over time.
• Bsort Technique: This optimization ensures that the smallest and largest elements in
subarrays are correctly placed without the need for further sorting. Small subarrays
of sizes 2 or 3 can be sorted directly, bypassing the need for partitioning altogether.

7. Efficiency Considerations
• Quicksort has an average time complexity of O(nlogn), but its worst-case time complexity is
O(n2), typically occurring with poorly chosen pivots (e.g., already sorted data).
• Despite this, it is often faster in practice than other O(nlogn) algorithms like Merge Sort or
Heap Sort due to its smaller constant factors and locality of reference.

Conclusion
Quicksort is favored for its efficiency and speed in average cases, along with its recursive nature
that simplifies implementation. It remains a fundamental algorithm in computer science,
particularly for sorting tasks, with numerous variations and optimizations to suit different scenarios.
Understanding its mechanics and strategies for enhancing performance is crucial for effective
application in software development and data processing.
CODE:-
#include <stdio.h>

// Function to perform partitioning of the array


int partition(int arr[], int lb, int ub) {
int pivot = arr[lb]; // Choosing the first element as the pivot
int down = lb + 1; // Starting point for down pointer
int up = ub; // Starting point for up pointer

while (down <= up) {


// Move down pointer to the right until an element > pivot is found
while (down <= ub && arr[down] <= pivot) {
down++;
}
// Move up pointer to the left until an element < pivot is found
while (up >= lb && arr[up] > pivot) {
up--;
}
// If pointers have not crossed, swap elements at down and up
if (down < up) {
int temp = arr[down];
arr[down] = arr[up];
arr[up] = temp;
}
}
// Swap the pivot with the element at the up pointer
arr[lb] = arr[up];
arr[up] = pivot;

return up; // Return the final position of the pivot


}

// Quicksort function
void quicksort(int arr[], int lb, int ub) {
if (lb < ub) {
// Partition the array and get the pivot index
int pivotIndex = partition(arr, lb, ub);
// Recursively sort the subarrays
quicksort(arr, lb, pivotIndex - 1); // Left subarray
quicksort(arr, pivotIndex + 1, ub); // Right subarray
}
}

// Function to print the array


void printArray(int arr[], int size) {
for (int i = 0; i < size; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}

// Main function
int main() {
int arr[] = {25, 57, 48, 37, 12, 92, 86, 33};
int n = sizeof(arr) / sizeof(arr[0]);

printf("Original array:\n");
printArray(arr, n);

quicksort(arr, 0, n - 1);

printf("Sorted array:\n");
printArray(arr, n);

return 0;
}

The efficiency of Quicksort can be summarized through its time complexity, average performance,
and how it behaves under different conditions. Here's an organized overview based on the detailed
description you provided:

Time Complexity of Quicksort


1. Best Case:
• Time Complexity: O(nlogn)
• Condition: This occurs when the pivot divides the array into two equal halves at
each step. The total number of comparisons is given by: T(n)=T(2n)+T(2n)
+O(n)⟹T(n)=2T(2n)+O(n)
• This recursive relationship leads to a solution of O(nlogn).
2. Average Case:
• Time Complexity: O(nlogn)
• Condition: On average, if the pivot divides the array reasonably well (not
necessarily perfectly), the algorithm still maintains an average performance of
O(nlogn). Statistically, it is shown that Quicksort performs about 1.386nlogn
comparisons for random input.
3. Worst Case:
• Time Complexity: O(n2)
• Condition: The worst-case scenario arises when the pivot chosen is the smallest or
largest element consistently. This can happen in the following cases:
• The original array is already sorted or nearly sorted.
• The pivot is always chosen as the first or last element without any additional
strategies (like randomization).
• In this scenario, the recursive breakdown leads to highly unbalanced partitions,
leading to n+(n−1)+(n−2)+…+1, which sums to O(n2).

Characteristics and Behavior


• Optimal Condition: Quicksort works best with completely unsorted data where the pivot
selections yield balanced partitions.
• Locality of Reference: Quicksort exhibits good locality of reference, accessing a small
portion of the array at any time, which can lead to efficient use of the CPU cache and
reduced page faults in virtual memory systems.

Space Complexity
• The space complexity is O(logn) due to the stack space used by recursive calls in the
average case.
• In the worst case, it can go up to O(n) if the recursion stack becomes as deep as the number
of elements (when partitions are unbalanced).

Mitigating Worst Case Scenarios


1. Randomized Quicksort: By randomly selecting the pivot, the chance of consistently hitting
the worst case is greatly reduced.
2. Median-of-Three Method: This technique involves choosing the pivot as the median of the
first, middle, and last elements, which can help avoid poor pivot choices in sorted or nearly
sorted arrays.

Comparisons with Other Sorting Algorithms


• Bubble Sort: Opposite behavior—best for sorted inputs and worst for unsorted ones.
• Merge Sort: Has guaranteed O(nlogn) performance but requires additional space for
merging, while Quicksort is in-place.
• Heapsort: Also O(nlogn) but generally slower than Quicksort due to poorer locality of
reference and overhead.

Practical Considerations
• Quicksort is often the fastest sorting algorithm in practice due to its low overhead and
average-case performance.
• For nearly sorted data, strategies like the median-of-three or switching to insertion sort for
small subarrays can further improve performance.

Conclusion
Quicksort is a highly efficient sorting algorithm with an average time complexity of O(nlogn),
making it suitable for a wide variety of data sets. It excels particularly in cases of random or
unsorted data, but care must be taken with sorted or nearly sorted data to avoid worst-case
performance. Proper pivot selection strategies are crucial in mitigating these risks and enhancing
performance.

Learn Heap From The Link


https://www.youtube.com/watch?v=0wPlzMU-k00

Heapsort

Heapsort is an efficient sorting algorithm that relies on the structure of a binary heap to achieve an
optimal time complexity of O(nlogn), making it an attractive choice for scenarios where a
guaranteed performance is needed, regardless of the initial order of input data. Here's a breakdown
of the key concepts, efficiency, and steps involved in Heapsort:

Key Concepts
1. Heap Definition:
• A max heap (descending heap) is an almost complete binary tree where each node
has a value greater than or equal to its children, ensuring that the largest element is
always at the root.
• A min heap (ascending heap) has each node value smaller than or equal to its
children, placing the smallest element at the root.
2. Binary Heap as a Priority Queue:
• Binary heaps allow for efficient priority queue operations. Insertion (pqinsert)
and deletion of the maximum/minimum element (pqmaxdelete or
pqminddelete) can both be done in O(logn) time, compared to the average
O(n/2) operations required for a sequentially ordered list.
3. Sequential Representation:
• The binary heap can be efficiently represented in an array. Given an element at index
j:
• The parent element is located at index (j−1)/2.
• The left child is at 2j+1.
• The right child is at 2j+2.
• This array-based representation enables Heapsort to operate in-place, meaning it
sorts the array without needing additional storage except for some program variables.

Efficiency of Heapsort
1. Time Complexity:
• Best, Average, and Worst Case: Heapsort has O(nlogn) complexity in all cases, as
heap construction and each extraction step in the sorting phase are consistently
O(logn).
• Heap Construction: Building a heap from an unsorted array takes O(n).
• Heap Maintenance During Sorting: After extracting the root (maximum or
minimum) and placing it at the end of the array, re-heaping the reduced heap takes
O(logn) operations for each element, resulting in O(nlogn) for the full sorting
process.
2. Space Complexity:
• Heapsort is an in-place sorting algorithm, requiring only a constant amount of
additional space, O(1).

Steps Involved in Heapsort


1. Heap Construction:
• Starting from an unsorted array, build a max heap (or min heap for ascending order)
using the "heapify" process. This ensures that each parent node is larger than its
children.
2. Heap Sort Phase:
• Swap the root element (largest in max heap) with the last element in the array. This
places the maximum element in its correct sorted position.
• Reduce the heap size by one (ignoring the last sorted element) and call heapify on
the root to restore the heap property.
• Repeat this process until the entire array is sorted.

Advantages and Drawbacks


• Advantages:
• Guaranteed O(nlogn) Time Complexity: Heapsort is reliable with no worst-case
degradation, making it suitable for time-sensitive applications.
• In-Place: Requires minimal additional memory beyond the input array itself.
• Predictable Behavior: Heapsort’s performance is unaffected by the initial order of
the input data, unlike quicksort, which can degrade to O(n2) on already sorted data.
• Drawbacks:
• Cache Inefficiency: Heapsort may suffer from cache inefficiency due to the non-
locality of memory access, especially when compared to quicksort, which has better
locality of reference.
• Comparative Overhead: In practice, Heapsort may run slower than quicksort on
random data due to the extra overhead from maintaining the heap structure.

Practical Applications
Heapsort is ideal when:
• Consistent performance is essential, especially if input data characteristics are unknown.
• Additional memory allocation is constrained, as it sorts in-place.
In summary, Heapsort offers an efficient, predictable sorting solution with a worst-case time
complexity of O(nlogn), making it advantageous for certain applications where a stable
performance is prioritized over execution speed.

CODE:-
#include <stdio.h>

// Function to swap two elements


void swap(int *a, int *b) {
int temp = *a;
*a = *b;
*b = temp;
}

// Function to heapify a subtree rooted at index i


// n is the size of the heap
void heapify(int arr[], int n, int i) {
int largest = i; // Initialize largest as root
int left = 2 * i + 1; // Left child index
int right = 2 * i + 2; // Right child index

// If left child is larger than root


if (left < n && arr[left] > arr[largest]) {
largest = left;
}

// If right child is larger than largest so far


if (right < n && arr[right] > arr[largest]) {
largest = right;
}

// If largest is not root


if (largest != i) {
swap(&arr[i], &arr[largest]);
// Recursively heapify the affected sub-tree
heapify(arr, n, largest);
}
}

// Main function to perform heapsort


void heapSort(int arr[], int n) {
// Build heap (rearrange array)
for (int i = n / 2 - 1; i >= 0; i--) {
heapify(arr, n, i);
}

// One by one extract elements from the heap


for (int i = n - 1; i > 0; i--) {
// Move current root to end
swap(&arr[0], &arr[i]);

// Call max heapify on the reduced heap


heapify(arr, i, 0);
}
}

// Function to print the array


void printArray(int arr[], int n) {
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}

int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int n = sizeof(arr) / sizeof(arr[0]);

printf("Original array:\n");
printArray(arr, n);

heapSort(arr, n);

printf("Sorted array:\n");
printArray(arr, n);

return 0;
}

C Code For descending priority queue


#include <stdio.h>

#define MAX 100 // Maximum size of the priority queue array

// Function to swap two elements


void swap(int *a, int *b) {
int temp = *a;
*a = *b;
*b = temp;
}

// Function to perform sift-up during insertion to maintain the max-heap property


void pqInsert(int heap[], int *k, int elt) {
int s = *k; // Position of the new element
int f = (s - 1) / 2; // Parent of the new element

// Traverse up to place `elt` in the correct position


while (s > 0 && heap[f] < elt) {
heap[s] = heap[f];
s = f;
f = (s - 1) / 2;
}
heap[s] = elt; // Place the new element in its proper position
(*k)++; // Increase the heap size
}

// Function to get the larger child index


int largeSon(int heap[], int p, int m) {
int s = 2 * p + 1; // Left child
if (s + 1 <= m && heap[s] < heap[s + 1]) {
s = s + 1; // Right child if larger
}
return s <= m ? s : -1; // Return -1 if no children
}

// Function to perform sift-down during deletion to maintain the max-heap property


void adjustHeap(int heap[], int k) {
int f = 0; // Start at the root
int s = largeSon(heap, f, k); // Find the larger child
int kvalue = heap[k]; // Element to adjust

// Traverse down to place `kvalue` in the correct position


while (s >= 0 && kvalue < heap[s]) {
heap[f] = heap[s];
f = s;
s = largeSon(heap, f, k - 1);
}
heap[f] = kvalue; // Place the adjusted element in its proper position
}

// Function to delete and return the maximum element from the priority queue
int pqMaxDelete(int heap[], int *k) {
int maxElement = heap[0]; // Max element is at the root
heap[0] = heap[--(*k)]; // Replace root with last element and decrease size
adjustHeap(heap, *k); // Re-adjust the heap to maintain max-heap property
return maxElement;
}

// Function to print the priority queue


void printHeap(int heap[], int k) {
for (int i = 0; i < k; i++) {
printf("%d ", heap[i]);
}
printf("\n");
}

int main() {
int heap[MAX]; // Array representing the priority queue as a max heap
int k = 0; // Current size of the heap

// Insert elements
pqInsert(heap, &k, 20);
pqInsert(heap, &k, 15);
pqInsert(heap, &k, 30);
pqInsert(heap, &k, 10);
pqInsert(heap, &k, 25);

printf("Heap after insertions:\n");


printHeap(heap, k);
// Delete the maximum element
int maxElem = pqMaxDelete(heap, &k);
printf("Deleted max element: %d\n", maxElem);
printf("Heap after deletion:\n");
printHeap(heap, k);

return 0;
}

Insertion Sort
Simple Insertion Sort:
• Sorts an array by inserting each element into an already sorted subarray.
• For sorted data, it has a time complexity of O(n); for reverse-ordered data, it’s O(n2).
• Better than bubble sort and effective if the input is almost sorted.
• Optimizations:
• Binary Search: Reduces comparisons to O(nlogn) by finding the insertion position quickly
but doesn’t improve overall time as element shifting still takes O(n2).
• List Insertion: Uses a linked list, allowing insertion without shifting array elements. This
reduces replacement operations but not comparisons. Extra space is needed for the link
array.
• Selection of Sorts:
• Small Files: Use selection sort for large records and simple keys (less shifting), and
insertion sort if comparisons are more expensive.
• Larger Data: Use heapsort or quicksort. Quicksort is optimal for arrays with over 30
elements, while heapsort is more efficient than insertion for sizes above 60-70.
• Hybrid Approach:
• In quicksort, insertion sort can speed up small subarrays (fewer than 20 elements).

CODE:-
Here is the implementation of Simple Insertion Sort, Binary Insertion Sort, and List Insertion
Sort in C.

1. Simple Insertion Sort

#include <stdio.h>

void insertionSort(int arr[], int n) {


for (int i = 1; i < n; i++) {
int key = arr[i];
int j = i - 1;
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j = j - 1;
}
arr[j + 1] = key;
}
}

void printArray(int arr[], int n) {


for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}

int main() {
int arr[] = {12, 11, 13, 5, 6};
int n = sizeof(arr) / sizeof(arr[0]);

insertionSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);

return 0;
}

2. Binary Insertion Sort


Binary Insertion Sort finds the correct position of the element using binary search instead of a linear
search.

#include <stdio.h>

int binarySearch(int arr[], int item, int low, int high) {


while (low <= high) {
int mid = low + (high - low) / 2;
if (item == arr[mid])
return mid + 1;
else if (item > arr[mid])
low = mid + 1;
else
high = mid - 1;
}
return low;
}

void binaryInsertionSort(int arr[], int n) {


for (int i = 1; i < n; i++) {
int key = arr[i];
int j = i - 1;
int loc = binarySearch(arr, key, 0, j);

while (j >= loc) {


arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = key;
}
}

int main() {
int arr[] = {12, 11, 13, 5, 6};
int n = sizeof(arr) / sizeof(arr[0]);

binaryInsertionSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);

return 0;
}

3. List Insertion Sort


This version uses a linked list for insertion, which avoids shifting elements in the array.

#include <stdio.h>
#include <stdlib.h>

struct Node {
int data;
struct Node* next;
};

void sortedInsert(struct Node** head_ref, struct Node* new_node) {


struct Node* current;
if (*head_ref == NULL || (*head_ref)->data >= new_node->data) {
new_node->next = *head_ref;
*head_ref = new_node;
} else {
current = *head_ref;
while (current->next != NULL && current->next->data < new_node->data) {
current = current->next;
}
new_node->next = current->next;
current->next = new_node;
}
}

void insertionSortLinkedList(int arr[], int n) {


struct Node* sorted = NULL;

for (int i = 0; i < n; i++) {


struct Node* new_node = (struct Node*)malloc(sizeof(struct Node));
new_node->data = arr[i];
new_node->next = NULL;
sortedInsert(&sorted, new_node);
}

struct Node* current = sorted;


int i = 0;
while (current != NULL) {
arr[i++] = current->data;
current = current->next;
}
}

int main() {
int arr[] = {12, 11, 13, 5, 6};
int n = sizeof(arr) / sizeof(arr[0]);

insertionSortLinkedList(arr, n);
printf("Sorted array: \n");
printArray(arr, n);

return 0;
}
These implementations cover:
• Simple Insertion Sort: Direct insertion into a sorted part of the array.
• Binary Insertion Sort: Finds position with binary search for efficiency in comparison.
• List Insertion Sort: Uses a linked list to avoid shifting elements.

MergeSort

In merge sort we are given with 2 arrays which are already sorted we have to just compare there
elements and add the smaller element to the 3rd array and then print it

Code:-

#include <stdio.h>
#include <stdlib.h>

void mergeArrays(int a[], int b[], int c[], int n1, int n2) {
int i = 0, j = 0, k = 0;

// Traverse both arrays and store the smaller element in c[]


while (i < n1 && j < n2) {
if (a[i] < b[j]) {
c[k++] = a[i++];
} else {
c[k++] = b[j++];
}
}

// Store remaining elements of a[], if any


while (i < n1) {
c[k++] = a[i++];
}

// Store remaining elements of b[], if any


while (j < n2) {
c[k++] = b[j++];
}
}

// Function to print an array


void printArray(int arr[], int size) {
for (int i = 0; i < size; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}

int main() {
int a[] = {1, 3, 5, 7};
int b[] = {2, 4, 6, 8};
int n1 = sizeof(a) / sizeof(a[0]);
int n2 = sizeof(b) / sizeof(b[0]);
int c[n1 + n2];

mergeArrays(a, b, c, n1, n2);

printf("Merged array: \n");


printArray(c, n1 + n2);

return 0;
}

COOK-KIM ALGO

The Cook-Kim algorithm is a sorting method optimized for nearly sorted files. Here’s a summary of
the key steps and principles:
1. Applicability: It’s particularly efficient for nearly sorted data or smaller sorted files, where
simple insertion sort is typically fast. For larger, more sorted files, it outperforms even
middle-element quicksort.
2. Process:
• The algorithm scans the input for unordered pairs (where an element is greater than
the following one).
• These unordered pairs are removed and placed at the end of a new array.
• After removing an unordered pair, the algorithm resumes scanning the input from the
immediate neighboring elements, leaving the original array sorted once all unordered
pairs are removed.
3. Sorting Unordered Elements:
• The newly created array of unordered pairs is sorted. Middle-element quicksort is
used if the array has over 30 elements; otherwise, simple insertion sort is used.
4. Final Merge:
• The two arrays (the originally sorted array and the newly sorted unordered pairs
array) are merged to produce the fully sorted output.
5. Advantages: The Cook-Kim algorithm leverages the pre-sorted state of input more
effectively than other sorting methods, making it faster than quicksort, insertion sort, and
mergesort for nearly sorted data.
6. Limitations: For randomly ordered data, Cook-Kim is less efficient than standard sorts like
mergesort and quicksort, which remain preferable for general cases.

SEARCHING
Theory of Linear Search:
Linear search, also known as sequential search, is a straightforward search algorithm used to find
the position of a target value within a list. In linear search, we start from the beginning of the array
or list and check each element one by one until we find the target value. If the target is found, the
algorithm stops and returns the index of the target. If it reaches the end without finding the target, it
returns an indication that the target is not present in the list.
Characteristics of Linear Search:
• Time Complexity: O(n), where n is the number of elements in the list. This is because, in
the worst case, we might have to check every element.
• Best Case: O(1) if the target is found at the first position.
• Worst Case: O(n) if the target is found at the last position or is not present in the list at all.
• Use Case: Linear search is typically used for small or unsorted lists. It’s simple and doesn’t
require any additional memory or complex data structures.

CODE:-
#include <stdio.h>

int linearSearch(int arr[], int size, int target) {


for (int i = 0; i < size; i++) {
if (arr[i] == target) {
return i; // Return the index if the element is found
}
}
return -1; // Return -1 if the element is not found
}

int main() {
int arr[] = {34, 67, 23, 89, 1, 90};
int size = sizeof(arr) / sizeof(arr[0]);
int target = 23;

int result = linearSearch(arr, size, target);


if (result != -1) {
printf("Element found at index %d\n", result);
} else {
printf("Element not found in the array\n");
}

return 0;
}

Binary Search Overview


• Binary search is an efficient algorithm for searching a sorted array.
• It compares a target value (key) with the middle element of the array.
• If the key matches the middle element, the search ends. If not, the search continues in the
upper or lower half of the array.
Key Features
• The binary search is efficient: each comparison reduces the search space by half, resulting in
a time complexity of O(log n).
• In practice, two comparisons occur for each iteration in C (checking both equality and less
than), leading to approximately 2 * log2(n) comparisons.
Implementation
• The algorithm can be implemented both recursively and non-recursively. A non-recursive
version is typically more efficient in practical applications.
Limitations
• The binary search requires the data to be stored in a sorted array format, as it relies on the
property of consecutive indices.
• It is less effective in scenarios with frequent insertions or deletions, as maintaining a sorted
array becomes cumbersome.
Padded List Method
• A solution for using binary search with insertions and deletions is the padded list approach,
which consists of:
• Element Array: Holds sorted keys with "empty" slots for future growth.
• Flag Array: Indicates whether a slot in the element array is full (1) or empty (0).
• Searching: Perform a binary search on the element array. If the key is found, check the flag
to confirm its existence.
• Insertion: Locate the position for the new element. If the position is empty, insert the
element there; if full, shift elements as necessary.
• Deletion: To delete, find the key and set its flag value to 0, indicating it is empty.
Drawbacks
• Insertions can involve significant shifting of elements, leading to performance issues.
• Limited growth space can require redistribution of empty slots to maintain efficiency.
CODE:-

#include <stdio.h>

// Function to perform binary search


int binarySearch(int arr[], int size, int key) {
int low = 0;
int high = size - 1;

while (low <= high) {


int mid = low + (high - low) / 2; // Calculate mid index to avoid overflow

// Check if key is present at mid


if (arr[mid] == key) {
return mid; // Key found, return its index
}

// If key is greater, ignore left half


if (arr[mid] < key) {
low = mid + 1;
}
// If key is smaller, ignore right half
else {
high = mid - 1;
}
}

return -1; // Key not found


}

int main() {
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; // Sorted array
int size = sizeof(arr) / sizeof(arr[0]);
int key = 5; // Element to search for

// Perform binary search


int result = binarySearch(arr, size, key);

// Output the result


if (result != -1) {
printf("Element found at index: %d\n", result);
} else {
printf("Element not found in the array.\n");
}

return 0;
}

Interpolation Search
Overview
• Interpolation search is a searching technique for ordered arrays, especially effective when
keys are uniformly distributed.
• It can outperform binary search by estimating the position of the key based on its value
relative to the values of the endpoints of the search interval.
Algorithm Steps
1. Set initial boundaries: low = 0 and high = n - 1.
2. Calculate the estimated position (mid) using the formula: mid=low+(high−low)⋅(k(high)
−k(low))(key−k(low))
3. Compare the key with the value at the mid position:
• If equal, the search is successful.
• If the key is lower, adjust high = mid - 1.
• If the key is higher, adjust low = mid + 1.
4. Repeat until the key is found or low > high.

Efficiency
• When keys are uniformly distributed, interpolation search can require an average of
O(loglogn) comparisons, which is more efficient than the O(logn) required for binary search.
• However, if keys are not uniformly distributed, performance can degrade significantly,
potentially leading to a worst-case scenario similar to sequential search.
Robust Interpolation Search
• A variation called robust interpolation search seeks to improve performance with non-
uniform key distributions.
• It introduces a gap to ensure that the estimated position (mid) is a minimum distance from
the boundaries, preventing clustering issues.
• The algorithm dynamically adjusts the gap based on the size of the current interval:
• If the key is found in the smaller interval, reset the gap to the square root of the new
interval size.
• If found in the larger interval, double the gap but keep it within half the interval size.
Performance Comparison
• Robust interpolation search has an expected number of comparisons of O(loglogn) for
random distributions, outperforming both binary and standard interpolation search.
• In experiments, robust interpolation search was shown to require fewer comparisons than
binary search in practical scenarios (e.g., 12.5 comparisons versus 36 for binary search in a
list of 40,000 names).
Limitations
• The overhead of managing the gap can be substantial.
• In the worst case, robust interpolation search can require O(n) comparisons, which is worse
than binary search's worst-case O(logn).
• The arithmetic involved in interpolation search can be computationally expensive compared
to the simpler comparisons in binary search.

CODE:-

#include <stdio.h>

// Function to perform interpolation search


int interpolationSearch(int arr[], int size, int key) {
int low = 0;
int high = size - 1;

while (low <= high && key >= arr[low] && key <= arr[high]) {
// Estimate the position of the key
int pos = low + ((high - low) * (key - arr[low])) / (arr[high] - arr[low]);

// Check if the key is present at pos


if (arr[pos] == key) {
return pos; // Key found, return its index
}

// If key is larger, move the low marker up


if (arr[pos] < key) {
low = pos + 1;
}
// If key is smaller, move the high marker down
else {
high = pos - 1;
}
}

return -1; // Key not found


}

int main() {
int arr[] = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}; // Sorted array
int size = sizeof(arr) / sizeof(arr[0]);
int key = 70; // Element to search for

// Perform interpolation search


int result = interpolationSearch(arr, size, key);

// Output the result


if (result != -1) {
printf("Element found at index: %d\n", result);
} else {
printf("Element not found in the array.\n");
}

return 0;
}

SORTING WITH DIFFERENT KEYS


Sorting on different keys refers to the practice of sorting data based on one or more specific
attributes (or "keys") of the data entries, rather than on the data entries themselves. This is
particularly useful in cases where you have complex data structures, such as records or objects,
where each record contains multiple fields.

Key Concepts of Sorting on Different Keys


1. Primary Key:
• The primary key is the main attribute used for sorting the data. For example, if you
have a list of employees, you might sort primarily by their last names.
2. Secondary Key:
• If two or more records have the same value for the primary key, a secondary key can
be used to break the tie. For instance, if two employees have the same last name, you
might then sort them by their first names.
3. Tertiary Key:
• This is used when there are still ties after applying the primary and secondary keys.
Continuing the previous example, if employees have the same last and first names,
you might sort them by their hire dates.

Sorting Algorithms and Multiple Keys


Different sorting algorithms can handle sorting based on different keys. Here are a few common
approaches:
1. Custom Comparison Functions:
• Many sorting algorithms (like quicksort, mergesort, etc.) allow the use of custom
comparison functions. These functions can be designed to sort based on multiple
keys. In languages like Python, you can use tuples to define the sort order.
2. Stable Sort:
• A stable sorting algorithm preserves the relative order of records with equal keys.
This is particularly important when sorting on multiple keys, as it ensures that if two
records have the same primary key, they remain in their original order based on
secondary keys.

Example in Python
Here’s an example of sorting a list of dictionaries (representing employees) in Python, first by last
name and then by first name:
python
Copy code
employees = [
{'first_name': 'John', 'last_name': 'Doe'},
{'first_name': 'Jane', 'last_name': 'Smith'},
{'first_name': 'Alice', 'last_name': 'Doe'},
{'first_name': 'Bob', 'last_name': 'Smith'},
]

# Sort by last name, then by first name


sorted_employees = sorted(employees, key=lambda x: (x['last_name'],
x['first_name']))

for employee in sorted_employees:


print(f"{employee['first_name']} {employee['last_name']}")

Output:
Copy code
Alice Doe
John Doe
Bob Smith
Jane Smith

Application in Databases
In databases, sorting on different keys is a common operation performed through SQL queries. For
instance:
sql
Copy code
SELECT * FROM Employees
ORDER BY last_name ASC, first_name ASC;

This SQL query sorts the employees first by last name in ascending order and then by first name in
ascending order.

EXTERNAL SORTING
External sorting is a class of algorithms used for sorting large data sets that do not fit into the main
memory (RAM) of a computer. It is particularly useful when dealing with very large files or
databases that exceed the available memory, necessitating the use of external storage (like hard
disks) to perform the sorting.

Key Characteristics of External Sorting


1. I/O Bound: External sorting algorithms are designed to minimize the number of
input/output operations because reading from and writing to disk is much slower than
accessing data in RAM.
2. Data Chunking: The data is divided into smaller chunks that can fit into memory. These
chunks are then sorted individually and written back to disk.
3. Merging: After sorting the individual chunks, the algorithm merges them together into a
single sorted output. This merging process is typically done in a way that optimizes disk
access.

Steps in External Sorting


1. Divide the Data: Split the large dataset into smaller chunks that can be loaded into memory.
Each chunk should be small enough to be processed independently.
2. Sort Each Chunk: Load each chunk into memory, sort it using an efficient internal sorting
algorithm (like quicksort or mergesort), and then write the sorted chunk back to disk.
3. Merge the Sorted Chunks: Once all chunks are sorted, perform a merging operation to
combine them into a single sorted dataset. This step can be done using a k-way merge
algorithm, where multiple sorted chunks are merged simultaneously.

Common External Sorting Algorithms


1. External Merge Sort:
• The most widely used external sorting algorithm.
• It follows the divide-and-conquer paradigm: sort the chunks, then merge them.
• It is efficient for large datasets, making it the standard for external sorting.
2. Replacement Selection Sort:
• An optimization over the basic external sort.
• It utilizes a priority queue to select the next smallest element to output, thus reducing
the number of runs in the merge phase.
3. Polyphase Merge Sort:
• A more complex merge algorithm that uses multiple input and output streams to
minimize the total number of passes over the data.

Example: External Merge Sort


Here’s a high-level overview of how the external merge sort works:
1. Create Sorted Runs:
• Read data from the file in chunks.
• Sort each chunk in memory.
• Write the sorted chunks back to the disk.
2. Merge Runs:
• Read the first element of each sorted chunk into a min-heap.
• Continuously extract the smallest element from the heap and write it to the output.
• When an element is extracted from a chunk, replace it with the next element from
that chunk.

Example Pseudocode for External Merge Sort


plaintext
Copy code
function externalMergeSort(file):
// Step 1: Divide the file into sorted runs
sortedRuns = []
while not end of file:
chunk = readNextChunk(file)
sortedChunk = sort(chunk)
writeToDisk(sortedChunk)
sortedRuns.append(filePathOf(sortedChunk))

// Step 2: Merge sorted runs


outputFile = createOutputFile()
minHeap = new MinHeap()

// Initialize the heap with the first element of each sorted run
for each run in sortedRuns:
minHeap.insert(run.firstElement())

while not minHeap.isEmpty():


smallest = minHeap.extractMin()
outputFile.write(smallest)

// If there is a next element in the same run, insert it into the heap
if nextElementExists(smallest.run):
nextElement = smallest.run.getNextElement()
minHeap.insert(nextElement)

return outputFile

Applications of External Sorting


• Database Management Systems: External sorting is commonly used in databases when
executing queries that require sorting large datasets.
• Big Data Processing: Frameworks like Hadoop and Spark use external sorting for
processing large volumes of data across distributed systems.
• File Sorting: When organizing large files or datasets on disk, external sorting ensures that
data can be processed efficiently without loading everything into memory.

You might also like