PRACTICAL
CONSIDERATION OF
INTERNAL AND
EXTERNAL SORTING
Dr B R Ambedkar National Institute of Technology, Jalandhar
Contents for Today’s Lecture
• Concept of Sorting
• Concept of Internal Sorting
• Practical consideration of Internal Sorting
• Concept of External Sorting
• Practical consideration of External Sorting
• Conclusion
SORTING
In computer science, arranging in an ordered
sequence is called "sorting".
Sorting is a common operation in many
applications, and efficient algorithms to perform it
have been developed.
The most common uses of sorted sequences are:
•making merging of sequences efficient.
•making lookup or search efficient;
•enable processing of data in a defined order.
INTERNAL SORTING
•Internal sorting is a sorting technique in which
the entire sorting takes place inside the main
memory of the computer.
•There is no need for external memory for
execution of sorting program.
•It is used when size of input is small.
Internal sorting includes Bubble sort , Insertion
sort, selection sort , etc.
INTERNAL SORTING(continued)
Practical consideration
Bubble sort
•Bubble Sort is the simplest sorting algorithm that
works by repeatedly swapping the adjacent
elements if they are in wrong order.
•Several passes are performed and with each pass
the smallest element gets at the top of the array.
•Function for bubble sort:
void bubsort(ELEM * array, int n) {
for (int I=0; I<n-1; I++)
for (int j=n-i-1; j>I; j--)
if (key(array[j]) < key(array[j-1]))
swap (array[j], array[j-1]);
}
INTERNAL SORTING(continued)
Practical consideration
•Insertion sort is a simple sorting algorithm that
builds the final sorted array (or list) one item at a
time.
•In this sorting, the Nth element is inserted in list
containing (N-1) elements.
•It is more convenient to use in linked list rather
than array.
EXTERNAL SORTING
•External sorting is a technique in which the data is stored
on the secondary memory, in which part by part data is
loaded into the main memory and then sorting can be done
over there.
•Then this sorted data will be stored in the intermediate
files. Finally, these files will be merged to get a sorted data.
•Thus by using the external sorting technique, a huge
amount of data can be sorted easily.
•In case of external sorting, all the data cannot be
accommodated on the single memory, in this case, some
amount of memory needs to be kept on a memory such as
hard disk, compact disk and so on.
EXTERNAL SORTING(continued)
The requirement of external sorting is there, where
the data we have to store in the main memory does
not fit into it. Basically, it consists of two phases that
are:
•Sorting phase: This is a phase in which a large
amount of data is sorted in an intermediate file.
•Merge phase: In this phase, the sorted files are
combined into a single larger file.
One of the best examples of external sorting is
external merge sort.
EXTERNAL SORTING(continued)
Practical consideration
•The external merge sort is a technique in which the
data is stored in intermediate files and then each
intermediate files are sorted independently and then
combined or merged to get a sorted data.
•EXAMPLE:
Let us consider there are 10,000 records which have
to be sorted. For this, we need to apply the external
merge sort method. Suppose the main memory has a
capacity to store 500 records in a block, with having
each block size of 100 records
External Merge Sort Practical Example
In this example, we can see 5 blocks will be sorted in
intermediate files. This process will be repeated 20
times to get all the records. Then by this, we start
merging a pair of intermediate files in the main
memory to get a sorted output.
TWO WAY MERGE SORT
•Like QuickSort, Merge Sort is a Divide and Conquer
algorithm.
•It divides input array in two halves, calls itself for
the two halves and then merges the two sorted halves.
•The merge() function is used for merging two
halves. The merge(arr, l, m, r) is key process that
assumes that arr[l..m] and arr[m+1..r] are sorted and
merges the two sorted sub-arrays into one.
TWO WAY MERGE SORT(cont.)
ALGORITHM
MergeSort(arr[], l, r)
If r > l
1. Find the middle point to divide the array into two
halves: middle m = (l+r)/2
2. Call mergeSort for first half: Call mergeSort(arr,
l, m)
3. Call mergeSort for second half: Call
mergeSort(arr, m+1, r)
4. Merge the two halves sorted in step 2 and 3: Call
merge(arr, l, m, r)
TWO WAY MERGE
SORT(cont.)
Multi-way Mergesort
Idea: Do a K-way merge instead of a 2-
way merge.
Find the smallest of K elements at each
merge step.
Multi-way Mergesort Algorithm
Algorithm:
1. As before, read M values at a time into internal
memory, sort, and write as runs on disk
2. Merge K runs:
• Read first value on each of the k runs into
internal array and build min heap
• Remove minimum from heap and write to disk
• Read next value from disk and insert that value
on heap
Repeat steps until all first K runs are processed
Repeat merge on larger & larger runs until have
just one large run: sorted list
Multi-way Mergesort (cont.)
Let N = Number of records
B = Size of a Block (in records)
M = Size of internal memory (in records)
K = Number of runs to merge at once
Specific Example:
M = 80 records
B = 10 records
N = 16,000,000 records
So, K = ½ (M/B) = ½ (80/10) = 4
EXTERNAL SORTIING
(PRACTICAL CONSIDERATION)
The Sort Benchmark, created by computer scientist
Jim Gray, compares external sorting algorithms
implemented using finely tuned hardware and
software.
Winning implementations use several techniques:
•Using parallelism
•Increasing hardware speed
•Increasing software speed
EXTERNAL SORTING
PRACTICAL CONSIDERATION(cont.)
Using parallelism
•Multiple disk drives can be used in parallel in order to
improve sequential read and write speed.
•Multiple machines connected by fast network links can
each sort part of a huge dataset in parallel.
•This can be a very cost-efficient improvement
•Sorting software can use multiple threads, to speed up
the process on modern multicore computers. Software
can use asynchronous I/O so that one run of data can be
sorted or merged while other runs are being read from or
written to disk.
EXTERNAL SORTING
PRACTICAL CONSIDERATION(cont.)
Increasing hardware speed
•Using more RAM for sorting can reduce the number of disk
seeks and avoid the need for more passes.
•Fast external memory like solid-state drives can speed sorts,
either if the data is small enough to fit entirely on SSDs or, more
rarely, to accelerate sorting SSD-sized chunks in a three-pass
sort.
•Many other factors can affect hardware's maximum sorting
speed: CPU speed and number of cores, RAM access latency,
input/output bandwidth, disk read/write speed, disk seek time,
and others.
Conclusion
Internal sorting is done when the data is small in
size and external sorting is done when the
sorting process require hard disc or other
memory rather than main memory to perform
the required operations.