Point-to-point
Communication
Lecture 4
Jan 17, 2023
MPI Program Execution
Memory
host1 host2 host3 host4
Intranode Internode
mpiexec –n 8 –hosts host1,host2,host3,host4 ./exe
2
MPI Program Execution
Memory
host1 host2 host3 host4
hostfile
cn023
Intranode Internode
cn024
cn025
cn026
cn023
cn024
mpiexec –n 8 –f hostfile ./exe cn025
cn026
3
MPI Reference Material
• Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W. Walker and
Jack Dongarra, MPI - The Complete Reference, Second Edition,
Volume 1, The MPI Core.
• William Gropp, Ewing Lusk, Anthony Skjellum, Using MPI: portable
parallel programming with the message-passing interface, 3rd Ed.,
Cambridge MIT Press, 2014.
• https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf
Point-to-point Communication
MPI_Send Blocking send and receive
int MPI_Send (const void *buf, int count, MPI_Datatype
datatype, int dest, int tag, MPI_Comm comm)
SENDER
Tags should match
MPI_Recv
int MPI_Recv (void *buf, int count, MPI_Datatype datatype,
int source, int tag, MPI_Comm comm, MPI_Status *status)
RECEIVER
5
Simple Send/Recv Code (sendmessage.c)
No runtime or
compile-time
error
6
Runtime error
7
Message Size
Sender Receiver
message (13 bytes) Message (10 bytes)
Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(200)...........................: MPI_Recv(buf=0x7ffccc37c610, count=10,
MPI_CHAR, src=0, tag=99, MPI_COMM_WORLD, status=0x7ffccc37c5d0) failed
MPIDI_CH3_PktHandler_EagerShortSend(363): Message from rank 0 and tag 99
truncated; 13 bytes received but buffer size is 10
No runtime or
compile-time
error
9
Simple Send/Recv Code (sendmessage.c)
received : Hello, there
10
Output
0 7 0
Received: Welcome
1 0 7
Multiple Sends and Receives
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD);
else
if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
printf ("%d %d\n", myrank, count);
$ mpirun –np 2 ./send 10
0 10
1 10
Multiple Sends and Receives
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD),
MPI_Send (buf, count, MPI_INT, 1, 2, MPI_COMM_WORLD);
else
if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status),
MPI_Recv (buf, count, MPI_INT, 0, 2, MPI_COMM_WORLD, &status);
$ mpirun –np 2 ./send 10
printf ("%d %d\n", myrank, count);
0 10
1 10
Multiple Sends and Receives
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD),
MPI_Send (buf, count, MPI_INT, 1, 2, MPI_COMM_WORLD);
else
if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status),
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
$ mpirun –np 2 ./send 10
printf ("%d %d\n", myrank, count);
0 10
Multiple Sends and Receives
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD),
MPI_Send (buf, count, MPI_INT, 1, 2, MPI_COMM_WORLD);
else
if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 2, MPI_COMM_WORLD, &status),
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
$ mpirun –np 2 ./send 10
printf ("%d %d\n", myrank, count);
0 10
1 10
MPI_Send (Blocking, Standard Mode)
• Does not return until buffer can be reused SENDER
• Message buffering can affect this
• Implementation-dependent
RECEIVER
Buffering
[Source: Cray presentation]
0 12 0
18
Multiple Sends and Receives
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD),
MPI_Send (buf, count, MPI_INT, 1, 2, MPI_COMM_WORLD);
else
if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 2, MPI_COMM_WORLD, &status),
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
printf ("%d %d\n", myrank, count); $ mpirun –np 2 ./send 1000000
Eager vs. Rendezvous Protocol
• Eager
• Send completes without acknowledgement from destination
• MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE (check output of mpivars)
• Small messages, typically 128 KB (at least in MPICH)
• Not safe to use the send buffer if message size is small
• Rendezvous
• Requires an acknowledgement from a matching receive
• Large messages
MPI_Status
int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source,
int tag, MPI_Comm comm, MPI_Status *status)
• Source rank typedef struct _MPI_Status {
int count;
• Message tag int cancelled;
• Message length int MPI_SOURCE;
• MPI_Get_count int MPI_TAG;
int MPI_ERROR;
} MPI_Status, *PMPI_Status;
21
MPI_Get_count (status.c)
status.MPI_SOURCE
status.MPI_TAG
Output
• Rank 1 of 2 received 1000 elements 22
Timing Send/Recv (timingSend.c)
23
Timing Output
What is the
total time?
One-to-many Sends
25
Many-to-one Sends
a send
operation must
specify a unique
receiver
• MPI_Send (Blocking send)
• MPI_Recv (Blocking receive)
27
MPI_ANY_*
int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source,
int tag, MPI_Comm comm, MPI_Status *status)
• MPI_ANY_SOURCE
• Receiver may specify wildcard value for source
• MPI_ANY_TAG
• Receiver may specify wildcard value for tag
28
Receive Out-of-order
No specific
Rectify the order
receive buffer
location 29
Sum of Squares of N numbers
Serial Parallel
for i = 1 to N for i = 1 to N/P
sum += a[i] * a[i] sum += a[i] * a[i]
collate result
30
Sum of Squares
Core
Process
Memory
for i = 1 to N/P for i = 1 to N/P for i = 1 to N/P for i = 1 to N/P
sum += a[i] * a[i] sum += a[i] * a[i] sum += a[i] * a[i] sum += a[i] * a[i]
Rank = 0 Rank = 1 Rank = 2 Rank = 3
31
SIMD Parallelism
for i = N/P * rank ; i < N/P * (rank+1) ; i++
localsum += a[i] * a[i]
//Collect localsum, add up at one of the ranks
if (rank)
recv from all ranks (…)
else
send (0)
32