0% found this document useful (0 votes)
17 views46 pages

Chap 7

The document discusses sorting algorithms and order statistics, covering various methods such as Insertion Sort, Heapsort, Mergesort, Quicksort, Bucketsort, and Radix sort. It includes an analysis of the time complexity for each method and introduces concepts like external sorting and order statistics. The document serves as a comprehensive overview of sorting techniques and their efficiencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views46 pages

Chap 7

The document discusses sorting algorithms and order statistics, covering various methods such as Insertion Sort, Heapsort, Mergesort, Quicksort, Bucketsort, and Radix sort. It includes an analysis of the time complexity for each method and introduces concepts like external sorting and order statistics. The document serves as a comprehensive overview of sorting techniques and their efficiencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

7.

Sorting and Order-Statistics

7. Sorting and Order-Statistics

 7.1 Introduction.

 7.2 Sorting methods & analysis.

– Insertion Sort.

– Heapsort.

– Mergesort.

– Quicksort.

– Bucketsort and Radix sort.

 7.3 A general lower bound for sorting

 7.4 External Sorting

 7.5 Order statistics

Malek Mouhoub, CS340 Fall 2002 1


7.1 Introduction

7.1 Introduction

The sorting problem consists in the following :

Input : a sequence of n elements          .

Output : a permutation           of the initial sequence,


sorted given an ordering relation  :          .
  

Example :

(8,1,6,3,6,4) (1,3,4,6,6,8)
Sorting Algorithm

Malek Mouhoub, CS340 Fall 2002 2


7.2 Sorting methods

7.2 Sorting methods

Insertion sort :   in the worst case.


Heapsort :    in the worst case.
Devide and Conquer algorithms :

Mergesort :    but don’t sort in place.


Quicksort :   in the worst case but    in the
average case.

When extra information are available

 Bucketsort : elements are positive integers smaller than  :


  

Malek Mouhoub, CS340 Fall 2002 3


Insertion Sort

Insertion Sort

 Efficient for a small number of values.


 The intuition behind this algorithm is the principle used by the
card players to sort a hand of cards (in the Bridge or Tarot).
– We generally start with an empty left hand and at each time
we take a card, we try to place it at the good position by
comparing it with the other cards.
 Consists of    passes. For each pass  (      )
insertion sort ensures that the elements in position 0 through 
are in sorted order.
 Best case : presorted elements.  
 Worst case : elements in reverse order.   
Malek Mouhoub, CS340 Fall 2002 4
Heapsort

Heapsort


1st Method 

1. Build a binary heap (  ).

2. Perform  deleteMin operations copy them in a second


array and then copy the array back (   ).

 waste in space : an extra array is needed.

Malek Mouhoub, CS340 Fall 2002 5


Heapsort

Heapsort


2nd Method 

 Avoid using a second array : after each deleteMin the cell


that was last in the heap can be used to store the element that
was just deleted.

 After the last deleteMin the array will contain the elements
in decreasing order.

 We can change the ordering property (max heap) if we want the


elements in increasing order.

     time complexity. Why ?


Malek Mouhoub, CS340 Fall 2002 6
Heapsort

97

53 59

26 41 58 31

97 53 59 26 41 58 31
0 1 2 3 4 5 6 7 8 9 10
First deleteMax

59

53 58

26 41 31 97

59 53 58 26 41 31 97
0 1 2 3 4 5 6 7 8 9 10

Malek Mouhoub, CS340 Fall 2002 7


Mergesort

Mergesort

Recursive algorithm :

 If   , there is only one element to sort.


 Otherwise, recursively mergesort the first half and the second
half. Merge together the two sorted halves using the merging
algorithm.

 Merging two sorted lists can be done in one pass through the
input, if the output is put in a third list. At most   
comparisons are made.

Malek Mouhoub, CS340 Fall 2002 8


Analysis of Mergesort

Analysis of Mergesort

N
T(N) = 2T(N/2) + cN
T(N) T(N/2) c
= +
N N/2
N/2 N/2
T(N/2) T(N/4) c
= +
N/2 N/4
N/4 N/4 N/4 N/4
log N T(N/4) T(N/8) c
= +
N/4 N/8

T(2) T(1) c
= +
2 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
T(N) T(1) c log N
= +
N 1
T(N) = cN log N+ N
= O(N log N)

Malek Mouhoub, CS340 Fall 2002 9


The master method

The master method

The master method provides a “cookbook” method for solving reccurences


of the form
        
where    and    are constants and   is an asymptotically
positive function.

The master theorem

1. If          and  , then       .

2. If    
  , then        .
3. If    
   and
 , and if      for
some  then     .

Malek Mouhoub, CS340 Fall 2002 10


Quicksort

Quicksort

 The Basic Algorithm.


 Quicksort Implementation.
 Quicksort Routines.
 Analysis of Quicksort.

Malek Mouhoub, CS340 Fall 2002 11


Quicksort

The Basic Algorithm

Given an array  , Quicksort works as follows :

Divide : the array  is divided in two non empty subarrays


 and   .

Conquer : the two subarrays are recursively sorted.

Malek Mouhoub, CS340 Fall 2002 12


Quicksort

The Basic Algorithm

    
1       
2        
3      
4      

Malek Mouhoub, CS340 Fall 2002 13


13 81 92 43 31 57 26 65 75 0

select pivot

13 81 92 43 31 57 26 65 75 0

partition

pivot
13 0 43 26 31 57 65 81 75 92

quicksort small quicksort large

0 13 26 31 43 57 65 75 81 92

0 13 26 31 43 57 65 75 81 92

Figure 1: Quicksort steps illustrated by example

Malek Mouhoub, CS340 Fall 2002 14


Quicksort Implementation

Quicksort Implementation

 Picking the Pivot.


 Partitioning Strategy.

Malek Mouhoub, CS340 Fall 2002 15


Quicksort Implementation

Picking the Pivot

 A wrong way : choose the first element as the pivot.


 A safe maneuver : choose the pivot randomly.
 Median-of-Three Partitioning.
Example :

8 1 4 9 6 9 5 2 7 0

The pivot is 6.

Malek Mouhoub, CS340 Fall 2002 16


Partitioning Strategy

1st step A[p ... r]

8 1 4 9 0 3 5 2 7 6

i j pivot
1st swap
8 1 4 9 0 3 5 2 7 6

i j
2nd swap
2 1 4 9 0 3 5 8 7 6

i j
Last swap
2 1 4 5 0 3 9 8 7 6

j i
Result A[p ... i-1] A[i+1 ... r]

2 1 4 5 0 3 6 8 7 9

j i

Malek Mouhoub, CS340 Fall 2002 17


Quicksort Implementation

Quicksort Routines

 Use the median of three partitioning.


 Cutoff using insertionsort for small subarrays (N=10).

Malek Mouhoub, CS340 Fall 2002 18


Quicksort Implementation

template class Comp


const Comp & median3(vector Comp &a, int left, int right)
 int center = (left+right)/2;
if (a[center] a[left])
swap(a[left], a[center]);
if (a[right] a[left])
swap(a[left], a[right]);
if (a[right] a[center])
swap(a[center], a[right]);
swap(a[center], a[right  1]); // Place pivot at position right - 1 10

return a[right  1]; 

Malek Mouhoub, CS340 Fall 2002 19


A[p ... r]

8 1 4 9 6 3 5 2 7 0

left center right

1st swap
8 1 4 9 6 3 5 2 7 0

2nd swap

6 1 4 9 8 3 5 2 7 0

3rd swap
0 1 4 9 8 3 5 2 7 6

left center right

Last swap
0 1 4 9 6 3 5 2 7 8

Result
0 1 4 9 7 3 5 2 6 8

i j

Malek Mouhoub, CS340 Fall 2002 20


template class Comp
void quicksort(vector Comp & a, int left, int right)
 /* 1*/ if (left + 10 = right)
/* 2*/ Comp pivot = median3(a, left, right);
/* 3*/ int i=left, j=right  1;
/* 4*/ for (;;)
/* 5*/ while(a[++i] pivot) 
/* 6*/ while(pivot a[j]) 
/* 7*/ if (i j)
/*8 */ swap(a[i], a[j]); 10

else
/*9 */ break;
/* 10*/ swap(a[i], a[right1]); // Restore pivot
/* 11*/ quicksort(a, left, i1); // Sort small elements
/* 12*/ quicksort(a, i+1, right); // Sort large elements
else // Do an insertion sort on the subarray
/*13 */ insertionSort(a, left, right);


Malek Mouhoub, CS340 Fall 2002 21


Wrong way of coding. Why ?

/* 3*/ int i=left+1, j=right  2;


/* 4*/ for (;;)

/* 5*/ while(a[i] pivot) i++;
/* 6*/ while(pivot a[j]) j;
/* 7*/ if (i j)
/*8 */ swap(a[i], a[j]);
else 10

/*9 */ break;


Malek Mouhoub, CS340 Fall 2002 22


Analysis of Quicksort

Analysis of Quicksort

T(N) = T(i) + T(N-i-1) + cN

i N-i-1
pivot

Assumptions :

 Random pivot.
 No cutoff for small arrays.
        
Malek Mouhoub, CS340 Fall 2002 23
Analysis of Quicksort

Worst-case Analysis

T(N) = T(N-1) + cN
N-1
pivot
T(N-1) = T(N-2) + c(N-1)
pivot N-2
T(N-2) = T(N-3) + c(N-2)
N
pivot 2
T(2) = T(1) + c(2)
pivot N
1 T(N) = T(1) + cΣ i
i=2

T(N) = 1+ c (N - 1)(N + 2)/2


2
= O(N )

Malek Mouhoub, CS340 Fall 2002 24


Analysis of Quicksort

Best Case Analysis

N
T(N) = 2T(N/2) + cN
T(N) T(N/2) c
= +
N N/2
N/2 N/2
T(N/2) T(N/4) c
= +
N/2 N/4
N/4 N/4 N/4 N/4
log N T(N/4) T(N/8) c
= +
N/4 N/8

T(2) T(1) c
= +
2 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
T(N) T(1) c log N
= +
N 1
T(N) = cN log N+ N
= O(N log N)

Malek Mouhoub, CS340 Fall 2002 25


Average-Case Analysis

Assumptions :

 The possible sizes of the subarrays have the same probability (1/N where N is the number of
elements of the array).

               (1)

 

(2)
         

   

 

 

(3)
   

     

 

 
(4)
    
      

 

To remove the summation we telescope with one equation :

 
(5)
       
       

 

Malek Mouhoub, CS340 Fall 2002 26


?? - ?? yields :

             
     
           
       
 
    
        
 
   
        
 
     
.
.
.
    
 
  
 
     
 


 
 

  
  
 
      

Malek Mouhoub, CS340 Fall 2002 27


N
N

N/10 9N/10 N

log N
logN 10
10/9 N/100 9N/100 9N/100 81N/100 N

1 81N/1000 729N/1000 N

<= N

<= N
1
O(N log N)
Bucketsort

Bucketsort

 General sorting algorithms using only comparisons require     time
in the worst case.

 In some special cases it is possible to sort in linear time.

 If the input          consists of only positive integers smaller than


 

 , bucket sort can be applied.


1. Keep an array called count, of size  ( buckets), which is initialized to
all 0s.

2. When  is read, increment count[] by 1.


3. After all the input is read, scan the count array, printing out the a
representation of the sorted list.

4. The algorithm takes     . If  is  , then the total is  .


5. Useful algorithm when the input is only small integers.

Malek Mouhoub, CS340 Fall 2002 29


Radix sort

Radix sort

 Input : the keys are all nonnegative integers in base 10 and having the
same number of digits.

 2 ways to sort the keys :

– Method 1 : Sort on the most significant digit first (leftmost digit first).
The ith step of the method consists in distributing the keys into
distinct piles based on the values of the ith digit from the left.

 a variable number of piles is required.

– Method 2 : Sort on the least significant digit first. We can use 10


piles (one for each decimal digit).

   in the best case but 



 in the worst case.

Malek Mouhoub, CS340 Fall 2002 30


7.3 A general lower bound for sorting

7.3 A general lower bound for sorting

Prove that any algorithm for sorting that uses only comparisons
requires

    comparisons in the worst case

 Merge sort and Heap sort are optimal to within a constant


factor

 and    comparisons in the average case


 quick sort is optimal on average within a constant factor

Malek Mouhoub, CS340 Fall 2002 31


Decision Trees

Decision Trees

 A decision tree is an abstraction used to prove lower bounds.

 Every algorithm that sorts by using only comparisons can be


represented by a decision tree.

 The number of comparisons used by the sorting algorithm is equal to


the depth of the deepest leaf.

emma 1 Let T be a binary tree of depth d. Then T has at most


 leaves.

emma 2 A binary tree with L leaves must have depth at least  .

eorem 1 Any sorting algorithm that uses only comparisons between elements
requires at least  comparisons in the worst case.

eorem 2 Any sorting algorithm that uses only comparisons between elements
requires    comparisons.

Malek Mouhoub, CS340 Fall 2002 32


7.4 External Sorting

7.4 External Sorting

 Most of the internal sorting algorithms take advantage of the


fact that memory is directly addressable

 comparing elements is done in constant number of time


units.

 This is not the case if the data is on tape or on a disk.

Malek Mouhoub, CS340 Fall 2002 33


Model for external sorting

Model for external sorting

 Sort data stored on tape.


 We assume that at least 3 tape drives are available (otherwise
any sorting algorithm will require   .

Malek Mouhoub, CS340 Fall 2002 34


The simple algorithm

The simple algorithm

 Algorithm based on the merge sort principle.


 4 tapes are used. 2 input and 2 output tapes.
 First step : read M records (M is the number of records the
main memory can hold) at a time from the input tape, sort the
records internally and write the sorted records on one of the
output tapes. Read M other records, sort them and write the
sorted records on the other tape. Repeat the process until all
records are processed.

 Each set of records is called a run.


 The algorithm will require  .
Malek Mouhoub, CS340 Fall 2002 35
Multi-way Merge

Multi-way Merge

 Use 2k tapes. k input tapes and k output tapes.


 The algorithm will require   .

Malek Mouhoub, CS340 Fall 2002 36


7.5 Order Statistics

7.5 Order Statistics

 The ith order statistic of a set of n elements is the ith smallest


element.

– The minimum of a set of elements is the first order statistic.

– The maximum is the nth order statistic.

– the median is the element in the middle of a sorted list of


elements.

 The selection problem consists in selecting the ith order


statistic from a set of n distinct numbers.

Malek Mouhoub, CS340 Fall 2002 37


The selection Problem

 Algorithm 1A : read the elements into an array and sort them,


returning the appropriate element.

 assuming a simple sorting algorithm, the running time is


   (    if merge sort of heap sort are
used).

 Algorithm 1B : find the kth largest element

1. read  elements into an array and sort them. The smallest of


these is in the kth position.

2. Process the remaining elements one by one. As an element


arrives, it is compared with the kth element in the array. If it is
larger, then the kth element is removed, and the new element
is placed in the correct place among the remaining 
elements.
    running time. Why ?
 If   then both algorithms are   .  is known as
the median in this case.

 The following algorithms run in     in the extreme


case of  .

Malek Mouhoub, CS340 Fall 2002 38


Algorithm 6A

 Algorithm for finding the kth smallest element

1. Read elements into an array.

2. Apply the buildHeap algorithm to this array.

3. Perform  deleteMin operations. The last


element extracted from the heap is the answer.

 Complexity :   in the worst case.

–         

–      

– For large values of  :   

–      (Idea of the
heapsort).

 By changing the heap-order property, we will solve


the problem of finding the kth largest element.

Malek Mouhoub, CS340 Fall 2002 39


Algorithm 6B

 Find the kth largest element

1. Same idea as algorithm 1B.

2. At any point in time, maintain a set  of the 


largest elements.

3. After the first  elements are read, when a new


element is read it is compared with the kth
largest element, which we denote by  ( is
the smallest element in  ).
– If the new element is larger, then it replaces
 in  .
4. At the end of the input, we find the smallest
element in  and return it as the answer.

             in the
worst case. Why ?

Malek Mouhoub, CS340 Fall 2002 40


Using quick sort for Selection

Using quick sort for Selection

     
1       
2        
3 If    then
5 Else If (k>q)      
6 Else return

   in the worst case but   in the average case.

Malek Mouhoub, CS340 Fall 2002 41


template class Comp
int quickSelect(vector Comp & a, int left, int right, int k)

/* 1*/ if (left + 10 = right)

/* 2*/ Comp pivot = median3(a, left, right);
// Begin partitioning
/* 3*/ int i=left, j=right  1;
/* 4*/ for (;;)
 10
/* 5*/ while(a[++i] pivot)

/* 6*/ while(pivot a[j]) 
/* 7*/ if (i j)
/*8 */ swap(a[i], a[j]);
else
/*9 */ break;

/* 10*/ swap(a[i], a[right 1]); // Restore pivot

/* 11*/ if (k = i) 20
/* 12*/ quickSelect(a, left, i  1, k);
/* 13*/ else if (k i + 1)
/* 14*/ quickSelect(a, i+1, right, k);
/* 15*/ else return a[k]

else // Do an insertion sort on the subarray
/*16 */ insertionSort(a, left, right);


Malek Mouhoub, CS340 Fall 2002 42


Selection in expected linear time

Selection in expected linear time

     
1 if p=r
2 then 
return
3       
4   
5 If    
6 then return      
7 else return          

Malek Mouhoub, CS340 Fall 2002 43


Selection in average-case linear time

Selection in average-case linear time

    produces a partition whose low side has 1 element with


probability  and elements with probability  for         .
 
                  

      




   






   


The recurrence can be solved by substitution (assuming that      for some constant
) :           

Malek Mouhoub, CS340 Fall 2002 44


Selection in worst-case linear time

Idea of the Select algorithm : Guarantee a good split when


the array is partitioned.

1. Divide the  elements of the input array into 


groups of 5 elements each and at most one group made
up of the remaining  mod 5 elements.
2. Find the median of each of the  groups by insertion
sorting the elements of each group and taking its middle
element.

3. Use Select recursively to find the median of the  


medians found in step 2.

4. Partition the input array around the median-of-medians


using a modified version of the Partition procedure. Let
be the number of elements on the low side of the partition,
so that  is the number of elements on the high side.

5. Use Select recursively to find the ith smallest element on


the low side if  , or the   th smallest element
on the high side if .

Malek Mouhoub, CS340 Fall 2002 45


Analysis of the Select algorithm

Analysis of the Select algorithm

The number of elements greater than ! is at least :

      
 

 if    then    


 if  "  then         
The recurrence can be solved by substitution (assuming that
     for some constant ) :

           

Malek Mouhoub, CS340 Fall 2002 46

You might also like