0% found this document useful (0 votes)
145 views

Parallel Merge Sort With MPI

Parallel merge sort is a parallelized version of the traditional merge sort algorithm designed to distribute the workload across multiple processors using the Message Passing Interface (MPI). MPI enables communication and coordination among processes running on different nodes, allowing them to work together to sort a large dataset efficiently.

Uploaded by

Irsa kanwall
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views

Parallel Merge Sort With MPI

Parallel merge sort is a parallelized version of the traditional merge sort algorithm designed to distribute the workload across multiple processors using the Message Passing Interface (MPI). MPI enables communication and coordination among processes running on different nodes, allowing them to work together to sort a large dataset efficiently.

Uploaded by

Irsa kanwall
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

FOUNDATION UNIVERSITY ISLAMABAD

PARALLEL MERGE SORT USING MPI

Submitted by

Irsa Waheed

BACHELOR

OF

SCIENCE IN SOFTWARE ENGINEERING

YEAR

2020-2024

Table of Contents
1. Introduction..........................................................................................................2
1.1 Merge Sort .....................................................................................................3
1.2 Message Passing Interface (MPI) ..................................................................3
2. Overview of MPI .................................................................................................3
2.1 Goals of MPI ....................................................................................................3
2.2 MPI Offers ........................................................................................................4
3. MPI and Merge Sort Relation ..............................................................................4
4. Implementation of Parallel Merge Sort using MPI .............................................4
4.1 Code ..................................................................................................................4
4.2 Implementation Details ....................................................................................9
5. Performance Analysis ........................................................................................10
5.1 Execution Time: .............................................................................................10
5.2 Speedup: .........................................................................................................11
5.3 Efficiency: ......................................................................................................11
5.4 Scalability: ......................................................................................................11
5.5 Load Balancing: ..............................................................................................11
5.6 Communication Overhead: .............................................................................11
5.7 Optimization Potential: ...................................................................................11
5.8 System Characteristics: ..................................................................................12
6. Conclusion .........................................................................................................12

1. Introduction
Parallel merge sort is a parallelised version of the traditional merge sort algorithm designed to
distribute the workload across multiple processors using the Message Passing Interface (MPI).
MPI enables communication and coordination among processes running on different nodes,
allowing them to work together to sort a large dataset efficiently.
1.1 Merge Sort
Merge sort is a divide-and-conquer algorithm that efficiently sorts arrays. It divides the
array into smaller halves until each part is sorted, then merges them in a sorted manner. In
parallel processing.

1.2 Message Passing Interface (MPI)


MPI (Message Passing Interface) implements merge sort across multiple processors or
nodes. MPI allows communication between these processors, enabling efficient
distribution of workloads, division of data, and merging of sorted subarrays, ultimately
accelerating the sorting process.

2. Overview of MPI
MPI is intended as a standard implementation of the "message passing" model of parallel
computing. It enables multiple processes or nodes to exchange data and coordinate their tasks
in distributed computing environments. MPI provides a set of functions for point-to-point
communication and collective operations, allowing efficient coordination and synchronisation
among parallel processes.

▪ A parallel computation consists of a number of processes, each working on some local


data. Each process has purely local variables, and there is no mechanism for any
process to directly access the memory of another.
▪ Sharing of data between processes takes place by message passing, that is, by explicitly
sending and receiving data between processes.
▪ It is a library of functions (in C) or subroutines (in Fortran) that you insert into source
code to perform data communication between processes.

2.1 Goals of MPI

▪ Provide source code portability. MPI programs should compile and run as-is on any
platform.
▪ Allow efficient implementations across a range of architectures.

2.2 MPI Offers

▪ A great deal of functionality, including several different types of communication,


special routines for common "collective" operations, and the ability to handle user-
defined data types and topologies.
▪ Support for heterogeneous parallel architectures.

3. MPI and Merge Sort Relation


The connection between MPI and merge sort lies in their association with parallel computing.
Merge sort can be parallelized using MPI to achieve faster sorting on large datasets. The
basic idea is to divide the input array into smaller chunks, and each chunk is sorted
independently by a separate process. Then, the sorted chunks are merged. This parallelization
can significantly improve the efficiency of the sorting process, especially when dealing with
large datasets that can be distributed across multiple processors or nodes in a cluster.
MPI can be employed to parallelize the execution of merge sort, making it a scalable solution
for sorting large datasets in parallel computing environments. It also optimizes the sorting
process, enhancing performance by leveraging multiple processors' capabilities for sorting
and merging.

4. Implementation of Parallel Merge Sort using MPI

4.1 Code

#include <iostream>
#include <mpi.h>

void merge(int arr[], int l, int m, int r) {


int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
// Create temporary arrays
int L[n1], R[n2];

// Copy data to temporary arrays L[] and R[]


for (i = 0; i < n1; i++)
L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1 + j];

// Merge the temporary arrays back into arr[l..r]


i = 0;
j = 0;
k = l;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = R[j];
j++;
}
k++;
}

// Copy the remaining elements of L[], if there are any


while (i < n1) {
arr[k] = L[i];
i++;
k++;
}
// Copy the remaining elements of R[], if there are any
while (j < n2) {
arr[k] = R[j];
j++;
k++;
}
}

void mergeSort(int arr[], int l, int r) {


if (l < r) {
int m = l + (r - l) / 2;

// Sort first and second halves


mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);

// Merge the sorted halves


merge(arr, l, m, r);
}
}

void parallelMergeSort(int arr[], int size, int rank, int num_procs) {


// Determine the size of each chunk
int chunk_size = size / num_procs;

// Scatter the array to all processes


int *local_arr = new int[chunk_size];
MPI_Scatter(arr, chunk_size, MPI_INT, local_arr, chunk_size, MPI_INT, 0,
MPI_COMM_WORLD);
// Perform local merge sort on each process
mergeSort(local_arr, 0, chunk_size - 1);

// Merge sorted subarrays using recursive doubling


for (int step = 1; step < num_procs; step *= 2) {
if (rank % (2 * step) == 0) {
int partner = rank + step;
if (partner < num_procs) {
// Receive the subarray from the partner
int *received_arr = new int[chunk_size];
MPI_Recv(received_arr, chunk_size, MPI_INT, partner, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);

// Merge the received subarray with the local sorted subarray


int *merged_arr = new int[2 * chunk_size];
merge(local_arr, 0, chunk_size - 1, received_arr, 0, chunk_size - 1,
merged_arr);

// Copy the merged subarray back to local_arr


std::copy(merged_arr, merged_arr + 2 * chunk_size, local_arr);

delete[] received_arr;
delete[] merged_arr;
}
} else {
int partner = rank - step;
// Send the local sorted subarray to the partner
MPI_Send(local_arr, chunk_size, MPI_INT, partner, 0,
MPI_COMM_WORLD);
}
}
// Gather the sorted chunks back to the root process
int *sorted_arr = nullptr;
if (rank == 0) {
sorted_arr = new int[size];
}

MPI_Gather(local_arr, chunk_size, MPI_INT, sorted_arr, chunk_size, MPI_INT, 0,


MPI_COMM_WORLD);

// Perform the final merge on the root process


if (rank == 0) {
mergeSort(sorted_arr, 0, size - 1);

// Display the sorted array


std::cout << "Sorted array: ";
for (int i = 0; i < size; i++) {
std::cout << sorted_arr[i] << " ";
}
std::cout << std::endl;

delete[] sorted_arr;
}

delete[] local_arr;
}

int main(int argc, char *argv[]) {


MPI_Init(&argc, &argv);

int rank, num_procs;


MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);

const int size = 8;


int arr[size] = {12, 11, 13, 5, 6, 7, 1, 10};

parallelMergeSort(arr, size, rank, num_procs);

MPI_Finalize();
return 0; }

4.2 Implementation Details


• Data Distribution:
The input array is divided into smaller chunks, and each chunk is distributed
among different MPI processes. The distribution is achieved using
MPI_Scatter, which sends portions of the array to each process.

• Local Sorting:
Each process performs a local sorting algorithm, in this case, the merge sort
algorithm, on its allocated portion of the array. The mergeSort function is
applied locally to sort the subarray assigned to each process.

• Merge Phase:
The merge phase involves combining sorted subarrays from different processes
to create larger sorted arrays. This is accomplished using a recursive doubling
approach. Processes exchange their sorted subarrays with their partners using
MPI_Send and MPI_Recv. The merge function is then used to merge the
received subarray with the local sorted subarray.

• Final Merge:
After the recursive doubling phase, the sorted chunks are gathered back to the
root process using MPI_Gather. The root process (rank 0) performs the final
merge by applying the mergeSort function on the merged chunks to produce the
fully sorted array. The final sorted array is then displayed.

• MPI_Send:
Transmits data from one processor to another in MPI. In merge sort, it's used to
send portions of the array to other processors for sorting.

• MPI_Recv:
Receives data sent by MPI_Send. In merge sort, it's employed to gather sorted
subarrays from different processors.

• MPI_Gather:
Gathers data from multiple processors to a single processor. In merge sort, it
can collect sorted subarrays from all processors to perform the final merging.

• MPI_Scatter:
Divides the data among different processors. In merge sort, it distributes
portions of the array to different processors for parallel sorting.
• MPI_Barrier:
Synchronizes processors, ensuring they reach a particular point before moving
forward. In merge sort, it can ensure that all processors complete their sorting
before merging.

5. Performance Analysis
5.1 Execution Time:
Measure the total execution time of the parallel merge sort algorithm using MPI. This
is the time taken from the start of the program to its completion. Compare this with the
execution time of the sequential merge sort on the same dataset.
5.2 Speedup:
Speedup is a measure of how much faster the parallel algorithm is compared to the
sequential algorithm.
Formula: Speedup = Sequential Execution Time / Parallel Execution Time
A speedup greater than 1 indicates improvement, and the higher the speedup, the better.
5.3 Efficiency:
Efficiency provides a normalized measure of how well the parallel algorithm utilizes
the available resources.
Formula: Efficiency = Speedup / Number of Processes
Efficiency close to 1 indicates good utilization of resources; values less than 1 suggest
overhead.
5.4 Scalability:
Scalability assesses the performance of the parallel algorithm as the problem size or the
number of processors increases.
Ideally, as the problem size increases or more processors are added, the execution time
should decrease or remain relatively constant.
5.5 Load Balancing:
Assess the load balancing among MPI processes. Uneven distribution of workloads can
lead to idle processors, reducing overall efficiency.
Evaluate if the workload is evenly distributed among processes during the parallel
sorting and merging phases.
5.6 Communication Overhead:
MPI communication introduces overhead. Evaluate the impact of communication
patterns (e.g., scatter, gather, send, and receive) on overall performance.
Minimize unnecessary communication and explore optimization strategies.
5.7 Optimization Potential:
Identify potential areas for optimization. This could include tuning parameters such as
chunk sizes, and communication strategies, or exploring alternative parallelization
approaches.
5.8 System Characteristics:
Consider the specific characteristics of the computing environment, such as the
network topology, memory architecture, and processor speed.
Adapt the parallel algorithm to leverage these characteristics for improved
performance.

6. Conclusion
The MPI implementation of merge sort showcases the potential for efficient parallel
sorting by leveraging distributed computing. Despite communication overhead, it

demonstrates improved scalability and reduced sorting time with the effective
utilization of multiple processors. Proper load balancing and synchronization

mechanisms are crucial for maximizing the performance benefits of parallel merge sort
in MPI.

You might also like