Chapter 7 - Sorting
Chapter 7 - Sorting
Sorting
The concept of an ordered set of elements is one that has considerable impact in our daily
lives. So sorting is one of the most common ingredients of programming systems. The
process of rearranging the items in a list according to some linear order is termed as sorting.
Types of sorting
Internal sorting : Records to be sorted are in main memory.
External sorting: Records to be sorted, or some of them are kept in auxiliary storage(disk/tape).
Stable Sort
• It is possible for two records in a list to have the same key
• A sorting algorithm is stable if for all records i and j such that k[i] equals k[j], if r[i]
preceded r[j] in the original list, r[i] precedes r[j] in the sorted list
– i.e, a stable sort keeps records with same key in the same relative order that
they were in before the sort
This sort is known as stable sort. In normal considerations, the data is sorted like;
AAB BBA
AAB BBC
But the stable sort keeps track of original pattern unless specified.
Sorting algorithm:
Exchange sort:
Comparison based. The basic idea is to compare two elements; if out of order, swap them or
move one of the elements. Example: Bubble sort, Quick sort.
Selection sort:
An element is selected and is placed in its correct position. Example: Selection sort, Heap
sort.
Insertion sort:
Sorts by inserting an element into a sorted list. Example : insertion sort, merge sort.
1
Soring
Bubble sort:
Basic Idea: Pass through the list sequentially several times. At each pass, each element in the
list is compared to its successor and they are interchanged if they are not in proper order.
At each pass, one element will be in its proper positions. In general, A[n-i] will be in its place
after pass i. Since each pass place a new element in its proper position, a total of N-1 passes
are required for a list of N elements. Also, all the elements in positions greater than or equal
to N-i are already in proper position after pass i, so they need not be considered in succeeding
passes.
Procedure:
void bubblesort(int a[], int N)
{
int pass, j;
for(pass=0; pass<N-1; pass++)
{ for(j=0; j<N-pass-1; j++)
if(a[j]>a[j+1]
swap(&a[j], &a[j+1]);
}
}
25 57 48 37 12 92 86 33 Interchange
25 57 48 37 12 92 86 33 No
25 57 48 37 12 92 86 33 Yes
25 48 57 37 12 92 86 33 Yes
25 48 37 57 12 92 86 33 Yes 1st Pass
25 48 37 12 57 92 86 33 No
25 48 37 12 57 92 86 33 Yes
25 48 37 12 57 86 92 33 Yes
25 48 37 12 57 86 33 92
25 48 37 12 57 86 33 |92 No
25 48 37 12 57 86 33 |92 Yes
25 37 48 12 57 86 33 |92 Yes
25 37 12 48 57 86 33 |92 No 2nd Pass
25 37 12 48 57 86 33 |92 No
25 37 12 48 57 86 33 |92 Yes
25 37 12 48 57 33 86 |92
2
Soring
25 37 12 48 57 33 86 92 Interchange
25 37 12 48 57 33 |86 92 No
25 37 12 48 57 33 |86 92 Yes
25 12 37 48 57 33 |86 92 No 3rd Pass
25 12 37 48 57 33 |86 92 No
25 12 37 48 57 33 |86 92 Yes
25 12 37 48 33 57 |86 92
25 12 37 48 33 |57 86 92 Interchange
25 12 37 48 33 |57 86 92 Yes
12 25 37 48 33 |57 86 92 No
12 25 37 48 33 |57 86 92 No 4th Pass
12 25 37 48 33 |57 86 92 Yes
12 25 37 33 48 |57 86 92
12 25 37 33 |48 57 86 92 Interchange
12 25 37 33 |48 57 86 92 No
12 25 37 33 |48 57 86 92 No 5th Pass
12 25 37 33 |48 57 86 92 Yes
12 25 33 37 |48 57 86 92
12 25 33 |37 48 57 86 92 Interchange
12 25 33 |37 48 57 86 92 No
12 25 33 |37 48 57 86 92 No 6th Pass
12 25 33 |37 48 57 86 92
12 25 33 37 48 57 86 92 Interchange
12 25 |33 37 48 57 86 92 No 7th Pass
12 25 |33 37 48 57 86 92
Algorithm:
• Given a list A of size N, the following algorithm uses bubble sort to sort the list
– For pass = 0 To N – 2
• For j = 0 To N – pass – 2
– If A[j] > A[j + 1]
Swap the elements A[j] and A[j + 1]
End If
• End For
– End For
Efficiency:
This algorithm is good for small n usually less than 100 elements.
No. of comparisons = (n-1) + (n-2) + … +2 + 1
3
Soring
= (n-1)(n-1 + 1)/2
= n(n-1)/2
= O(n2)
No. of Interchanges:
• This cannot be greater than no. of comparisons
• In the best case, there are no interchanges
• In the worst case, this equals no of comparisons
The average and worse case running time of bubble sort is O(n2).
It is actually the no of interchanges which takes up most time of the program’s execution
than the no of comparisons.
When elements are large and interchange operation is expensive, it is better to maintain an
array of pointers to the elements. One can then interchange pointers rather then the elements
itself.
Insertion Sort
Basic idea: Sorts a list of record by inserting new element into an existing sorted list. An
initial list with only one item is considered to be sorted list. For a list of size N, N-1 passes
are made, and for each pass the elements from a[0] through a[i-1] are sorted.
Take the element a[i], find the proper place to insert a[i] within 0, 1, …, i-1 and insert a[i] at
that place.
To insert new item into the list
– Search the position in the sorted sublist from last toward first
– While searching, move elements one position right to make a room to insert a[i]
– Place a[i] in its proper place
Initially 25 57 48 37 12 92 86 33
Pass 1 25 57 48 37 12 92 86 33 Insert 57
Pass 2 25 48 57 37 12 92 86 33 Insert 48
Pass 3 25 37 48 57 12 92 86 33 Insert 37
Pass 4 12 25 37 48 57 92 86 33 Insert 12
Pass 5 12 25 37 48 57 92 86 33 Insert 92
Pass 6 12 25 37 48 57 86 92 33 Insert 86
Pass 7 12 25 33 37 48 57 86 92 Insert 33
4
Soring
C-Procedure
void InsertionSort(int a[], int N)
{
int i, j;
int hold; /* the current element to insert */
Efficiency:
No of comparisons:
Best case: n – 1
Worst case: n2/2 + O(n)
Average case: n2/4 + O (n)
No of assignments (movements)
Best case: 2*(n-1) // moving from a[i] to hold and back
Worst case: n2/2 + O(n)
Average case: n2/4 + O (n)
Hence running time of insertion sort is O(n2) in worst and average case and O(n) in best case
and space requirement is O(1).
Advantages:
It is an excellent method whenever a list is nearly in the correct order and few items are
removed from their correct locations
Since there is no swapping, it is twice as faster than bubble sort
Disadvantage:
It makes a large amount of shifting of sorted elements when inserting later elements.
Selection Sort:
The selection sort algorithm sorts a list by selecting successive elements in order and placing
into their proper sorted positions.
A list of size N require N-1 passes:
For each pass I,
• Find the position of ith largest (or smallest) element.
• To place the ith largest (of smallest) in its proper position, swap this element with the
element currently in the position of its largest (or smallest) element.
5
Soring
C – Procedure
void SelectionSort(int a[], int N)
{
int i, j;
int maxpos;
for (i = N-1; i > 0; i--) //Find the position of largest element from 0 to i
{
maxpos = 0;
for (j = 1; j <= i; j++)
if (a[j] > a[maxpos])
maxpos = j;
if(maxpos != i)
swap(&a[maxpos], &a[i]); //Place the ith largest element
} // in its place
}
Tracing: Initially 25 57 48 37 12 92 86 33
Find largest between a[0] and a[7] -> 92, swap 92 with the last element 33
Pass 1 25 57 48 37 12 33 86 92
Find largest between a[0] and a[6] -> 86, since 86 is in 6th position and so is i, no
interchange.
Pass 2 25 57 48 37 12 33 86 92
Find largest between a[0] and a[5] -> 57, swap with 33
Pass 3 25 33 48 37 12 57 86 92
Find largest between a[0] and a[4] -> 48, swap 48 with 12
Pass 4 25 33 12 37 48 57 86 92
Find largest between a[0] and a[3] -> 37, No swap since i = maxpos
Pass 5 25 33 12 37 48 57 86 92
Find largest between a[0] and a[2] -> 33, swap 33 with 12
Pass 6 25 12 33 37 48 57 86 92
Find largest between a[0] and a[1] -> 25, swap 12 with 25
Pass 7 12 25 33 37 48 57 86 92
Efficiency:
No of comparisons:
Best, average and worst case: n(n – 1)/2
No of assignments (movements)
Best, average and worst case: 3(n – 1), (total n – 1 swaps)
If we include a test, to prevent interchanging an element with itself, the number of
interchanges in the best case would be 0.
Hence running time of selection sort is O(n2) and additional space requirements is O(1).
6
Soring
• Advantages:
– It is the best algorithm in regard to data movement
– An element that is in its correct final position will never be moved and only
one swap is needed to place an element in its proper position
• Disadvantages
– In case of number of comparisons, it pays no attention to the original ordering
of the list. For a list that is nearly correct to begin with, selection sort is slower
than insertion sort
Quick Sort:
It is the fastest known sorting algorithms used in practice.
Basic idea
Divide the list into two sublists such that all elements in the first list is less than some
pivot key and all elements in the second list is greater than the pivot key,
and finally sort the sublists independently and combine them.
Algorithm:
If size of list A is greater than 1
• Pick any element v from A. This is called the pivot
• Partition the list A by placing v in some position j, such that
o all elements before position j are less than or equal to v
o all elements after position j are greater than or equal to v
• Recursively sort the sublists A[0] through A[j-1] and A[j+1] through A[N-1]
• Return A[0] through A[j-1] followed by A[j] (the pivot) followed by A[j+1] through
A[N-1]
7
Soring
25 57 48 37 12 92 86 33
Choose the first element 25 as the pivot and partition the array
12 25 (57 48 37 92 86 33)
Choose the first element 57 of the second subarray as the pivot and partition the
subarray
12 25 ( 48 37 33) 57 (92 86 )
8
Soring
Tracing Example:
Sorting the following data using Quick Sort.
25 57 48 37 12 92 86 33
9
Soring
25 57 48 37 12 92 86 33
Choose Pivot 25
Down up
25 57 48 37 12 92 86 33
down up
25 57 48 37 12 92 86 33
down Up
25 57 48 37 12 92 86 33
down < up, so swap up and down and increment down
down Up
25 12 48 37 57 92 86 33 down > 25 so stop
up down
25 12 48 37 57 92 86 33 down>=up, swap with
up
(12) 25 (48 37 57 92 86 33)
List is divided into two lists, pivot is at its proper position. First list contains only one element so
automatically sorted. Now choose 48 as pivot element for second list.
Choose pivot 48 down up
12 25 48 37 57 92 86 33
down up
12 25 48 37 57 92 86 33 Down<up so, swap
down up
12 25 48 37 33 92 86 57
down up Move down
12 25 48 37 33 92 86 57
Up down up Move up
12 25 48 37 33 92 86 57
10
Soring
12 25 33 37 48 (57 86) 92
Pivot 57 down up Move down
12 25 33 37 48 57 86 92
down up
12 25 33 37 48 57 86 92 Move up
up down
12 25 33 37 48 57 86 92 Down>=up, so swap
12 25 33 37 48 57 (86) 92
Sublist 86 automatically is sorted.
Efficiency:
No. of comparisons:
Average case: O(n log n)
Worst case: O(n2)
No of interchanges (swaps)
Average case: O(n log n)
Worst case: O(n2)
Hence, the time complexity of QuickSort is O(n log n) for average case and O(n2) for worst
case
Merge Sort:
The merge sort also uses divide and conquer approach. It divides the list into sub lists. Then
merge two sorted into a single list.
Algorithm outline
If size of list is greater than 1
• divide the list into two sublists of sizes nearly equal as possible
• recursively sort the sublist separately.
• Merge the two sorted sublists into a single sorted list
End if
11
Soring
• First Phase: Partition the list in two equal halves, until the list size is 1
25 57 48 37 12 92 86 33
25 57 48 37 12 92 86 33
25 57 48 37 12 92 86 33
25 57 48 37 12 92 86 33
25 57 48 37 12 92 86 33
25 57 37 48 12 92 33 86
25 37 48 57 12 33 86 92
12 25 33 37 48 57 86 92
Merging description:
To simply, the algorithm merge to sorted sublist into a third list.
When finished, we copy back the third list in the original sorted halves to get the sorted list.
The basic merging algorithm takes two input sorted arrays A and B, and an output array C.
We initialize the ponters Aptr, Bptr, and Cptr to point the beginning of their respective
arrays. The smaller of A[Aptr] and B[Bptr] is copied to the next entry in C and appropriate
pointers are advanced. When any one of the list has been finished, the remainder of the other
list is copied to C.
12
Soring
Aptr
25 37 48 57
Bptr
12 33 86 92
Cptr
12
Compare 25 and 12 and insert minimum(12) into array C and move Bptr and Cptr
Aptr
25 37 48 57
Bptr
12 33 86 92
Cptr
12
Compare 25 and 33 and insert minimum(25) into array C and move Aptr and Cptr
Aptr
25 37 48 57
Bptr
12 33 86 92
Cptr
12 25
Compare 37 and 33 and insert minimum (33) into array C and move Bptr and Cptr.
In this way the two lists are merged.
C-Code:
void main() void msort(int x[], int temp[], int left, int
{ right)
clrscr(); {
int n,i; int mid;
int x[N]; if(left<right)
int temp[N]; {
printf("\nEnter no. of elements to sort: "); mid = (right + left) / 2;
scanf("%d",&n); msort(x, temp, left, mid);
printf("\nEnter elements to sort:\n"); msort(x, temp, mid+1, right);
for (i = 0; i < n; i++)
scanf("%d",&x[i]); merge(x, temp, left, mid+1, right);
//perform merge sort on array }
msort(x,temp,0,n-1); }
printf("Sorted List \n");
for (i = 0; i < n; i++)
printf("%d\n", x[i]);
getch();
}
13
Soring
lend = mid - 1;
tmpos = left;
no_element = right - left + 1;
14
Soring
Efficiency:
No. of Comparisons:
For all cases the number of comparisons is to be O(n*logn), the constant term is different for
different cases. On average, it requires fewer than n* logn – n + 1 comparisons.
No. of assignments:
For our implementation, it is twice the no of comparisons, merging in the temporary array
and copying back to the original array which is still O(n*logn)
Space Complexity:
In contrast to other sorting algorithms, we have studied, Merge Sort requires O(n) extra space
for the temporary memory used while merging. Algorithm has been developed for
performing in-place merge in O(n) time, but this would increase the no of assignments
If recursive version is used, additional space is required for the implicit stack, which is
O(logn). Hence, Space complexity for Merge Sort is O(n)
But, in case of imbalanced tree (right skewed and left skewed), the search time goes
approximately n2. Therefore, to minimize the search time, AVL trees are maintained. This
will increase performance up to nlogn. Still, BST requires some time to search and retrieve
the data. After deletion of elements, there are some burden to maintain the BST property. It
mean, the tree is accessed 2 times. To minimize the time for retrieval, heap is created. In
heap sort, the heap creation takes time, but the retrieval takes no time.
Heap Sort
• The heap sort algorithm sorts by representing its input as a heap in the array
• Two phases in sorting
1. Converts the array representation of the tree into a heap
2. Repeatedly moves the largest element to the last position by swapping the first
element with the last element and adjusts the heap property at each stage in the
remaining elements
First Phase:
1. The entries in the array being sorted are interpreted as a binary tree in array
implementation
2. Tree with only one node automatically satisfies the heap property. So, we don’t need
to worry about any of the leaves.
15
Soring
3. Start from the level above the leaf nodes, and work out backward towards the root.
Lets take an example for tracing following elements.
37 33 26 92 57 18 48 25 12 86 42 22
This array can be represented as following binary tree:
37
33 26
92 57 18 48
25 12 86 42 22
33 26
92 57 22 48
25 12 86 42 18
37 33 26 92 57 22 48 25 12 86 42 18
Adjust property at 57
37
33 26
92 86 22 48
25 12 57 42 18
37 33 26 92 86 22 48 25 12 57 42 18
16
Soring
At 92 the subtree already satifies the heap property, so it will remain as it is.
Adjust property at 26, here, 48 should be come up.
37
33 48
92 86 22 26
25 12 57 42 18
37 33 48 92 86 22 26 25 12 57 42 18
Adjust at 33
37
92 48
33 86 22 26
25 12 57 42 18
37 92 48 33 86 22 26 25 12 57 42 18
Adjust at 37
92
86 48
33 57 22 26
25 12 37 42 18
92 86 48 33 57 22 26 25 12 37 42 18
17
Soring
Second Phase:
• Note that the root (the first element of the array) has the largest key
• Repeat these steps until the size of heap becomes 1
1. Move the largest key at root to the last position of the heap, replacing an entry
x currently at the last position
2. Decrease a counter i that keeps track of the size of the heap, thereby excluding
the largest key from further sorting
3. The element x may not belong to the root of the heap, so insert x into the
proper position to restore the heap property
Move the largest element, 92, to Now we must find a way to put 18 in
the last position of the heap, thus its proper position, thus restoring
replacing 18 the heap property
92
86 48 86 48
33 57 22 26 33 57 22 26
25 12 37 42 18 25 12 37 42 18
18
92 86 48 33 57 22 26 25 12 37 42 18 86 48 33 57 22 26 25 12 37 42 92
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
18
Soring
Delete 92
86
86 48
48
33 57 22 26
33 57 22 26
25 12 37 42 18
25 12 37 42 18
18 18
86 48 33 57 22 26 25 12 37 42 92 86 48 33 57 22 26 25 12 37 42 92
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
86 86
57 48 57 48
33 22 26 33 42 22 26
25 12 37 42 18 25 12 37 18
18 18
86 57 48 33 22 26 25 12 37 42 92 86 57 48 33 42 22 26 25 12 37 92
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
Delete 86 86
57 48
33 42 22 26
25 12 37 18
86 57 48 33 42 22 26 25 12 37 18 92
0 1 2 3 4 5 6 7 8 9 10 11
19
Soring
57
42 48
33 37
22 26
25 12 18
57 42 48 33 37 22 26 25 12 18 86 92
0 1 2 3 4 5 6 7 8 9 10 11
Delete 57
48
42 26
33 37
22 18
25 12
48 42 26 33 37 22 18 25 12 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
Delete 48
42
37 26
33 12 22 18
25
42 37 26 33 12 22 18 25 48 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
20
Soring
Delete 42
37
33 26
25 12 22 18
37 33 26 25 12 22 18 42 48 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
Delete 37
33
25 26
18 12 22
33 25 26 18 12 22 37 42 48 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
Delete 33
26
25 22
18 12
26 25 22 18 12 33 37 42 48 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
Delete 26
21
Soring
25
18 22
12
25 18 22 12 26 33 37 42 48 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
Delete 25
22
18 12
22 18 12 25 26 33 37 42 48 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
Delete 22
18
12
18 12 22 25 26 33 37 42 48 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
Delete 18
12
12 18 22 25 26 33 37 42 48 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
Delete 12
12 18 22 25 26 33 37 42 48 57 86 92
0 1 2 3 4 5 6 7 8 9 10 11
22
Soring
• Hence time complexity of heap sort is O(nlogn) for both worst case and average case
and space complexity is O(1). In average case, it is not as efficient as quick sort,
however, it is far superior to quick sort in worse case. Generally, heap sort is used for
large amount of data.
Shell Sort
Significant improvement on simple insertion sort can be achieved by using shell sort (or
diminishing increment sort). This method separates original file into subfiles. These subfiles
contain every kth element of the original file. The value of k is called an increment. Eg. If
k=5, then subfile consists of x[0], x[5], x[10],… is first sorted.
After the first k subfiles are sorted (usually by simply insertion), a new smaller value of k is
chosen and the file is again partitioned into a new set of subfiles. Each of these larger subfiles
is sorted and the process is repeated yet again, until eventually the value of the k is set to 1.
23
Soring
81, 94, 11, 96, 12, 35, 17, 95, 28, 58, 41, 75, 15
Let hk = 13 / 2 = 6, so here increment is 6, The shell sort will be sub divided into 6 sub files.
Now, hk = floor(hk/2) = 3 / 2 = 1
15 12 11 17 41 28 58 94 35 81 95 75 96
15 12 11 17 41 28 58 94 35 81 95 75 96
12 15 11 17 41 28 58 94 35 81 95 75 96
11 12 15 17 41 28 58 94 35 81 95 75 96
11 12 15 17 41 28 58 94 35 81 95 75 96
24
Soring
11 12 15 17 41 28 58 94 35 81 95 75 96
11 12 15 17 28 41 58 94 35 81 95 75 96
11 12 15 17 28 41 58 94 35 81 95 75 96
11 12 15 17 28 41 58 94 35 81 95 75 96
11 12 15 17 35 28 41 58 94 81 95 75 96
11 12 15 17 35 28 41 58 81 94 95 75 96
11 12 15 17 35 28 41 58 81 94 95 75 96
11 12 15 17 35 28 41 58 75 81 94 95 96
11 12 15 17 35 28 41 58 75 81 94 95 96
Efficiency
Worse case : O(n2)
Average case : O(n(logn)2) (if appropriate increment sequent is used)
Radix Sort
The sorting is based on the values of the actual digits in the positional representations of the
numbers being sorted.
Process
Beginning with the least-significant digit and ending with the most-significant digit, perform
the following action,
Take each number in the order in which it appears in the file and place it into one of the ten
queues, depending on the value of the digit currently being processed.
Then restore each queue to the original file starting with the queue of numbers with a 0 digit
and ending with the queue of numbers with a 9 digit.
When these actions have been performed for each digit, starting with the least significant
digit and ending with the most significant, the file is sorted.
Tracing example:
We have 64, 8, 216, 512, 27, 729, 0, 1, 343, 125
First Pass
0 1 512 343 64 125 216 27 8 729
no%10 0 1 2 3 4 5 6 7 8 9
25
Soring
Second Pass
8 729
1 216 27
0 512 125 343 64
(no/10)%10 0 1 2 3 4 5 6 7 8 9
Third Pass
64
27
8
1
0 125 216 343 512 729
(no/100)%10 0 1 2 3 4 5 6 7 8 9
Comparison table:
Algorithms Comments
Bubble Sort Good for small n usually less than 10
Quick Sort Excellent for virtual memory environment
Insertion Sort Good for almost sorted records
Selection Sort Good for partially sorted data and small ‘n’
Merge Sort Good for external file sorting
As efficient as quick sort in average case and far superior to quick sort
Heap Sort
in the worse case
Radix Sort Good when number of digits(letters) are less
26