Message Passing
Interface (MPI)
Message Passing Interface (MPI)
MPI provides an efficient means of parallel
communication among a distributed collection of
machines, however, not all MPI implementations take
advantage of shared memory when it’s available between
processors.
Open MP ( OMP) Open Multiprocessing
Was produced to provide a means of implementing
shared memory parallelism.
The idea of parallel processing is wanting to partition the
work between processors so that all processors are busy
and none remain idle .
Message Passing Interface (MPI)
Almost everything in MPI can be summed up in the
single idea of “ Message sent – Message received”.
MPI is a library of functions designed to handle all
the details of message passing on the architecture on
which you want to run.
We include mpi.h this provides us with the functions
declarations for all MPI functions.
Message Passing Interface (MPI)
We must have a beginning and an
ending.
The beginning is in the form of an
MPI Init() call, which indicates to the
operating system that this is an MPI
program and allows the OS to do any
necessary initialization.
Message Passing Interface (MPI)
The ending is in the form of an MPI Finalize()
call, which indicates to the OS that “clean-up”
with respect to MPI can commence.
• If the program is embarrassingly parallel, then
the operations done between the MPI
initialization and finalization involve no
communication.
Message Passing Interface (MPI)
• How does a processor know which
number he is?
There are two important commands very
commonly used in MPI:
• MPI Comm rank
int MPI Comm rank(MPI Comm comm /*
in */, int* result /* out */)
• MPI Comm size
int MPI Comm size(MPI Comm comm /* in
*/, int* size /* out */)
Message Passing Interface (MPI)
MPI Comm rank , provides you with your processor
identification or rank (which is an integer ranging from 0
to P − 1, where P is the number of processes on which are
are running.
MPI Comm size provides you with the total
number of processors that have been allocated. The
argument comm is called the communicator
, and it essentially is a designation for a collection of
processes which can communicate with each other.
Message Passing Interface (MPI)
Write MPI program that print out “Hello World!” but also to tell us
which processor the message is coming from, and how many total
processors with which it is joined.
#include <iostream.h>
#include “mpi.h”
int main( )
{ int mynode, totalnodes;
MPI_Init( );
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
cout << "Hello world from processor " << mynode;
cout << " of " << totalnodes << endl;
MPI_Finalize( );
{
Message Passing Interface (MPI)
When run four processors, the screen output
may look like:
Hello world from processor 0 of 4
Hello world from processor 3 of 4
Hello world from processor 2 of 4
Hello world from processor 1 of 4
Message Passing Interface (MPI)
int MPI Send(
void* message /* in */,
int count /* in */,
MPI Datatype datatype /* in */,
int dest /* in */,
int tag /* in */,
MPI Comm comm /* in */)
Message Passing Interface (MPI)
int MPI Recv(
void* message /* out */,
int count /* in */,
MPI Datatype datatype /* in */,
int source /* in */,
int tag /* in */,
MPI Comm comm /* in */,
MPI Status* status /* out */)
Message Passing Interface (MPI)
Understanding the Argument Lists
• message - starting address of the send/recv buffer.
• count - number of elements in the send/recv buffer.
• datatype - data type of the elements in the send buffer.
• source - process rank to send the data.
• dest - process rank to receive the data.
• tag - message tag.
• comm - communicator.
• status - status object.
Message Passing Interface (MPI)
#include<iostream.h>
int main( )
{
int sum;
sum = 0;
for(int i=1;i<=1000;i=i+1)
sum = sum + i;
cout << "The sum from 1 to 1000 is: " << sum << endl;
{
Message Passing Interface (MPI)
Message Passing Interface (MPI)
#include<iostream.h>
#include<mpi.h>
int main( )
{ int mynode, totalnodes;
int sum,startval,endval,result;
MPI_Status status;
MPI_Init( );
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
// get totalnodes
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
// get mynode
Message Passing Interface (MPI)
sum = 0; // zero sum for accumulation
startval = 1000*mynode/totalnodes+1;
endval = 1000*(mynode+1)/totalnodes;
for(int i=startval;i<=endval;i=i+1)
sum = sum + i;
if(mynode!=0)
MPI_Send(&sum,1,MPI_INT,0,1,MPI_COMM_WORLD);
else
for(int j=1;j<totalnodes;j=j+1){
MPI_Recv(&result,1,MPI_INT,j,1,MPI_COMM_WORLD,
&status);
sum = sum + result;
{
Message Passing Interface (MPI)
if(mynode == 0)
cout << "The sum from 1 to 1000 is: " << sum << endl;
MPI_Finalize();
{
Message Passing Interface (MPI)
int mynode, totalnodes;
int datasize; // number of data units to be sent/recv
int sender; // process number of the sending process
int receiver; // process number of the receiving process
int tag; // integer message tag
MPI_Status status; // variable to contain status information
MPI_Init( );
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
// Determine datasize
double * databuffer = new double[datasize];
Message Passing Interface (MPI)
// Fill in sender, receiver, tag on sender/receiver processes,
// and fill in databuffer on the sender process.
if(mynode==sender)
MPI_Send(databuffer,datasize,MPI_DOUBLE,receiver,
tag,MPI_COMM_WORLD);
if(mynode==receiver)
MPI_Recv(databuffer,datasize,MPI_DOUBLE,sender,tag,
MPI_COMM_WORLD,&status);
// Send/Recv complete
We want to create an array on each process, but
to only initialize it on process 0. Once the array has
been initialized on process 0, then it is sent out to
each process.
#include<iostream.h>
#include<mpi.h>
int main( )
{
int i;
int nitems = 10;
int mynode, totalnodes;
MPI_Status status;
Message Passing Interface (MPI)
double * array;
MPI_Init( );
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
array = new double[nitems];
if(mynode == 0){
for(i=0;i<nitems;i++)
array[i] = (double) i;
Message Passing Interface (MPI)
{
if(mynode==0)
for(i=1;i<totalnodes;i++)
MPI_Send(array,nitems,MPI_DOUBLE,i,1,MPI_COMM_WORLD);
else
MPI_Recv(array,nitems,MPI_DOUBLE,0,1,MPI_COMM_WORLD,
&status);
for(i=0;i<nitems;i++){
cout << "Processor " << mynode;
cout << ": array[" << i << "] = " << array[i] << endl;
{
MPI_Finalize();
{
MPI Reduce:
int mynode, totalnodes;
int datasize; // number of data units over which
// reduction should occur
int root; // process to which reduction will occur
MPI_Init( );
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
// Determine datasize and root
double * senddata = new double[datasize];
double * recvdata = NULL;
if(mynode == root)
recvdata = new double[datasize];
// Fill in senddata on all processes
MPI_Reduce(senddata,recvdata,datasize,MPI_DOUBLE,MPI_SUM,
root,MPI_COMM_WORLD);
// At this stage, the process root contains the result
// of the reduction (in this case MPI_SUM) in the
// recvdata array
MPI Allreduce:
Understanding the Argument List
• operand - starting address of the send buffer.
• result - starting address of the receive buffer.
• count - number of elements in the send/receive
buffer.
• datatype - data type of the elements in the
send/receive buffer.
• operator - reduction operation to be executed.
• comm - communicator.
int mynode, totalnodes;
int datasize; // number of data units over which
// reduction should occur
MPI_Init ( );
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
// Determine datasize and root
double * senddata = new double[datasize];
double * recvdata = new double[datasize];
// Fill in senddata on all processes
MPI_Allreduce(senddata,recvdata,datasize,MPI_DOUBLE,
MPI_SUM,MPI_COMM_WORLD);
// At this stage, all processes contains the result
// of the reduction (in this case MPI_SUM) in the
// recvdata array
#include<iostream.h>
#include<math.h>
#include<mpi.h>
double func(double x);
int main( )
{
int mynode, totalnodes;
double global_a = -50.0;
double global_b = 50.0;
int levels = 10;
double local_a,local_b,local_sum,answer;
MPI_Init ( );
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
MPI_Init( );
MPI_ Gather
MPI_Init( );
MPI Allgather
MPI_Init( );
MPI Scatter
int mynode, totalnodes;
int datasize; // number of data units to be scattered to
// each process
int root; // process from which the data is scattered
MPI_Init( );
MPI_Init( );
MPI_Allgather / MPI_Allltoall Data Data
b[0] b[1] b[2] b[3] m b[0] b[1] b[2] b[3] m
0 20 0 20 22 24 26 20
Processor
Processor
1 22 1 20 22 24 26 22
MPI_Allgather
2 24 2 20 22 24 26 24
3 26 3 20 22 24 26 26
MPI_Allgather(&m,1,MPI_INT,b,1,MPI_INT,MPI_COMM_WORLD);
Data Data
a[0] a[1] a[2] a[3] b[0] b[1] b[2] b[3]
0 1 2 3 4 0 1 5 9 13
Processor
Processor
1 5 6 7 8 1 2 6 10 14
MPI_Alltoall
2 9 10 11 12 2 3 7 11 15
3 13 14 15 16 3 4 8 12 16
MPI_Alltoall(a,1,MPI_INT,b,1,MPI_INT,MPI_COMM_WORLD);
(Monte Carlo Simulation)
<Problem> Monte carlo simulation
Random number use
PI = 4 ⅹAc/As
<Requirement> N’s processor(rank)
(Monte Carlo Simulation)
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <mpi.h>
#define SCOPE 100000000
*/
** PI (3.14) Monte Carlo Method
/*
int main(int argc, char *argv[])
}
int nProcs, nRank, proc, ROOT = 0;
MPI_Status status;
int nTag = 55;
int i, nCount = 0, nMyCount = 0;
double x, y, z, pi, z1;
(Monte Carlo Simulation)
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nProcs);
MPI_Comm_rank(MPI_COMM_WORLD, &nRank);
for (i = 0; i < SCOPE; i++) {
x = ((double)rand() %100)/100;
y = ((double)rand() %100)/100;
z = x*x + y*y;
z1 = sqrt(z);
if (z1 <= 1) nMyCount++;
}
if (nRank == ROOT)
{
nCount = nMyCount;
for (proc=1; proc<nProcs; proc++) {
MPI_Recv(&nMyCount, 1, MPI_REAL, proc, nTag, MPI_COMM_WORLD, &status);
nCount += nMyCount;
{
pi = (double)4*nCount/(SCOPE * nProcs);
(Monte Carlo Simulation)
printf("\n estimate of pi is %f\n", pi);
{
else {
printf("Processor %d sending results = %d to ROOT processor\n",
nRank, nMyCount);
MPI_Send(&nMyCount, 1, MPI_REAL, ROOT, nTag,
MPI_COMM_WORLD);
{
printf("\n");
MPI_Finalize();
{
(Monte Carlo Simulation)
Processor 2 sending results = 78541693 to ROOT processor
Processor 1 sending results = 78532183 to ROOT processor
Processor 3 sending results = 78540877 to ROOT processor
Processor 0 sending results = 78540877 to ROOT processor
estimate of pi is 3.141516
Calculate pi (∏) using numerical integration