COS 212
Sorting:
Radix Sort & Counting
Sort
Bucket Sort
How would you sort a pile of books?
You could sort the books by the name of the author
Create 26 separate piles, one for each letter of the alphabet
Books by authors whose names start with ‘A’ go in 1st pile,
books by authors whose names start with ‘B’ go in 2nd pile, and
so on…
Now you have your books organised by first letter
But there are still a lot of books by different authors in each
pile…
Sort each one of the 26 piles in the same way
Create 26 piles within one of the original piles
Books by authors whose names have a second letter of ‘A’ go in 1 st pile,
books by authors whose names have a second letter of ‘B’ go in 2 nd pile,
and so on…
Do this until only one author’s books are in each pile
This is a legitimate sorting algorithm used in the real
world
Radix Sort
Can we do the same thing with numbers?
We can sort by each digit, creating 10 buckets (for digits 0 to 9)
Let’s see how this works if we start with the first digit & move
right
10
12 23
data = [12 23 2 17
17 100]
0
10
0
17
12
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
But we immediately have a problem
The values starting with 1 are sorted into the incorrect order
Clearly sorting left to right doesn’t work
Radix Sort
How could we solve this problem?
Sort right to left, starting with the rightmost digit
Pad numbers with leading zeros to make them the same length
To store the multiple values that map to a digit
We’ll use one queue per digit
Our previous example also wastes a lot of memory
A separate destination array for each digit
Process repeated as many times as maximum digits in a
number
Leads to a proliferation of arrays
We’ll move numbers back and forth between the array we’re
sorting and a single array of queues
radixSort(data[])
n = number of digits in largest number
add leading zeros to numbers until all have n digits
for d = n down to 1
distribute numbers in data[] among queues 0 to 9 according to
digit d
for i = 0 up to 9
dequeue each number in queue i and add to next position in
Radix Sort
Let’s apply radix sort to the array [25 5 20 315 11 77]
data = [025
02 0000502020
31315
01 011
07
5 5 077]
0 5 1 7
0 1 2 3 4 5 6 7 8 9
Radix Sort
Let’s apply radix sort to the array [25 5 20 315 11 77]
data = [025
02 0100502020
00315
31 011
07
0 1 077]
5 5 5 7
0 1 2 3 4 5 6 7 8 9
Radix Sort
Let’s apply radix sort to the array [25 5 20 315 11 77]
data = [025
00 0100531020
02315
02 011
07
5 1 077]
5 0 5 7
= [5 11 20 25 77 315]
0 1 2 3 4 5 6 7 8 9
Do you think it’s
possible to
implement radix sort
recursively?
Radix Sort
Radix sort can be applied to binary representations
At each pass, sort according to b bits, starting from the right
How many queues?
Determined by the number of different bit strings of size b
For b = 2 there are bit strings, and therefore 4 queues: 00, 01, 10,
11
A larger value for b results in
More queues (wastes space) but fewer passes (saves time)
A smaller value for b results in
Fewer queues (saves space) but more passes (wastes time)
Radix Sort: Efficiency
private final int radix = 10; // number of digits and
buckets
private final int digits = 10; // max number of digits in a
number
public void radixsort(int[] data) {
int d, j, k, factor;
O(1) Queue<Integer>[] queues = new Queue[radix];//1 queue per
bucket
for (d = 0; d < radix; d++)
O(d) queues[d] = new Queue<Integer>();
// run through every digit, starting from the rightmost
O(n) for (d = 1, factor = 1; d <= digits; factor *= radix, d+
+) {
for (j = 0; j < data.length; j++) // enqueue
O(n)
data[j]
queues[(data[j] / factor) %
radix].enqueue(data[j]);
for (j = k = 0; j < radix; j++) // rebuild
data What about the
Total complexity? O(dn) complexity of using
while (!queues[j].isEmpty())
queues?
data[k++] = queues[j].dequeue();
Counting Sort
Array indices are always sorted (0 up to size-1)
6 7 2 5 1 0 3 4
0 1 2 3 4 5 6 7
What if we treat each element as an index?
Create array tmp where last index is the largest data
value
What if there
are duplicate
0 1 2 3 4 5 6 7 values?
Populate tmp array: tmp[array[i]] = array[i]
What if there
0 1 2 3 4 5 6 7 are gaps
between
0 1 2 3 4 5 6 7 numbers?
Counting Sort
count occurrences of each number in data[];
store occurrences in count[] indexed with numbers in
data[];
for i = 1 up to count.length-1
count[i] = the number of elements <= i;
// transfer numbers from data[] to tmp[] Last index in
for i = n-1 down to 0 count[] is equal to
tmp[count[data[i]] - 1] = data[i];the largest number
decrement count[data[i]]; in data[]
transfer numbers from tmp[] to data[];
data[
count[]
]
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9
7 2 9 3 7 3 4 1 0 1
0 0
1 1 0 0 0 12
0 1
2 0 0 0
1
0 1 2 3 4 5 6 7 8 9
count[i] = count[i – 1] +
0 1 2
1 4
2 1
5 5
0 0
5 2
7 0
7 1
8
count[i]
Counting Sort for i = n-1 down to 0
tmp[count[data[i]]-1] =
0 1 2 3 4 5 6 7 data[i];
data[ decrement count[data[i]];
7 2 9 3 7 3 4 1
]
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9
tmp[ count[
0 1 2 4 5 5 5 7 7 8
] ]
1 0 0 2 4 5 5 5 7 7 8
1 4 0 0 2 4 4 5 5 7 7 8
1 3 4 0 0 2 3 4 5 5 7 7 8
1 3 4 7 0 0 2 3 4 5 5 6 7 8
1 3 3 4 7 0 0 2 2 4 5 5 6 7 8
1 3 3 4 7 9 0 0 2 2 4 5 5 6 7 7
1 2 3 3 4 7 9 0 0 1 2 4 5 5 6 7 7
1 2 3 3 4 7 7 9 0 0 1 2 4 5 5 5 7 7
Counting Sort: Complexity
count occurrences of each number in data[];
store occurrences in count[] indexed with numbers in
data[];
for i = 1 up to count.length-1
count[i] = the number of elements <= i; O(n)
// transfer numbers from data[] to tmp[]
for i = n-1 down to 0 O(n)
tmp[count[data[i]] - 1] = data[i];
decrement count[data[i]]; How many
O(n) times will the
transfer numbers from tmp[] to data[];
loop execute?
O(n +
Total complexity?
max) O(max)
Imagine data looks like this: 3 5 100000000
Counting sort is very efficient, but only if
This number
max is not much larger than n
determines max
Can counting sort work on non-integers e.g. strings?
Counting Radix Sort
public static void countingRadixSort(String[] data, int stringLen) {
final int BUCKETS = 128; // 7bit ASCII
int N = data.length; // number of strings
String[] tmp = new String[N];
String[] in = data; String[] out = tmp;
for(int pos = stringLen - 1; pos >= 0; pos--) // repeat for each
character
{ // in string starting
with last
int[] count = new int[BUCKETS]; // bucket for each ASCII character
for(int i = 0; i < N; i++)
count[in[i].charAt(pos)]++; // count ASCII characters per bucket
for(int b = 1; b < BUCKETS; b++)
count[b] += count[b - 1]; // count ASCII characters <= b
for(int i = 0; i < N; i++) {
out[count[in[i].charAt(pos)] - 1] = in[i]; // place strings in
correct
count[in[i].charAt(pos)]--; // position in tmp
array
}
// swap in and out roles
String[] backup = in; in = out; out = backup;
}
// if odd number of passes, in is tmp, out is data; so copy back
if(stringLen % 2 == 1) for(int i = 0; i < N; i++) out[i] = in[i];
} Can counting sort work on non-integers e.g. strings?
Only efficiently in combination with radix sort.
Which Algorithm Should You Use?
20,000 Integers
Ascending Random Descending
Insertion sort .06 1 m 2.73 1 m 40.57
Selection sort 3 m 13.5 3 m 21.31 3 m 23.56
Bubble sort 2 m 58.51 4 m 47.81 5 m 3.90
Comb sort .15 1.05 .67
Shell sort .22 .49 .33
Heap sort .72 .88 .72
Merge sort .50 .66 .49
Quick sort .16 .44 .22
Radix sort 2.31 1.59 2.20
Counting sort .05 .08 .05
Which Algorithm Should You Use?
80,000 Integers
Ascending Random Descending
Insertion sort .11 29 m 2.73 29 m 36.13
Selection sort 56 m 8.09 67 m 21.31 56 m 49.94
Bubble sort 52 m 6.90 87 m 9.62 83 m 6.68
Comb sort .67 6.52 3.10
Shell sort 1.32 2.75 1.59
Heap sort 3.63 4.56 3.35
Merge sort 2.19 3.35 2.20
Quick sort .93 2.04 .99
Radix sort 10.98 10.82 10.00
Counting sort .22 .57 .20
It will always depend on the application
Just be sure to avoid the simple sorting algorithms!