Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix
Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix
Introduction
Used to find shortest paths in a weighted graph
Travel maps containing driving distance from one point to another
Represented by tables
Shortest distance from point A to point B given by intersection of row and column
Route may pass through other cities represented in the table
Navigation systems
0
0
1
2
0
2
5
7
0
1
4
0
3
0
4
3
0
Algorithm
Input n: Number of vertices
a[0..n-1][0..n-1] -- Adjacency matrix
Output: Transformed a that contains shortest path lengths
for ( k = 0; k < n; k++ )
for ( i = 0; i < n; i++ )
for ( j = 0; j < n; j++ )
a[i][j] = min ( a[i][j], a[i][k] + a[k][j] );
Easy to see that the algorithm is (n3 )
Solution
Floyds Algorithm
0
1
2
3
4
5
0
0
1
2
0
15
11
8
5
2
5
6
0
5
2
6
3
3
1
4
0
5
2
4
6
4
7
3
0
4
5
9
7
10
6
3
0
Floyds Algorithm
Update to a[i][k] is
a[i][k] = min ( a[i,k], a[i,k] + a[k,k] );
Update to a[k][j] is
a[k][j] = min ( a[k,j], a[k,k] + a[k,j] );
In both the above updates, a[i][k] and a[k][j] cannot decrease (all numbers are positive)
Hence, no dependence between updates of a[i][j] and the updates of a[i][k] and a[k][j]
For each iteration k of the outer loop, perform broadcasts and update every element of matrix in parallel
Matrix I / O
Choice to let each process open the file, access the appropriate row (seek to it), and read the row
Another method to read the matrix rows in last process and send them to appropriate process
Process i responsible for rows bin/pc through b(i + 1)n/pc 1
Last process responsible for dn/pe rows (largest buffer)
No extra space needed for file input buffering
Last process reads the rows for each process in a loop and sends the data to appropriate task
All the printing done by process 0, by getting data from other processes
Print all the output in correct order
Processes 1, 2, . . . , p 1 simply wait for a message from process 0, then send process 0 their portion of the matrix
Floyds Algorithm
Process 0 never receives more than one submatrix at a time
Point-to-point communication
Function to read matrix from file
Executed by process p 1
Reads a contiguous group of matrix rows
Sends a message containing these rows directly to the process responsible to manage them
Function to print the matrix
Each process sends the group of matrix rows to process 0
Process 0 receives the message and prints the rows to standard output
Communication involves a pair of processes
One process sends a message
Other process receives the message
Floyds Algorithm
Function MPI_Send
Perform a blocking send
int MPI_Send ( void * buffer, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm );
buffer Starting address of the array of data items to send
count Number of data items in array (nonnegative integer)
datatype Data type of each item (uniform since it is an array); defined by an MPI constant
dest Rank of destination (integer)
tag Message tag, or integer label; allows identification of message purpose
comm Communicator; group of processes participating in this communication function
Function blocks until the message buffer is again available
Message buffer is free when
Message copied to system buffer, or
Message transmitted (may overlap computation)
Function MPI_Recv
Blocking receive for a message
int MPI_Recv ( void * buffer, int count, MPI_Datatype datatype, int src,
int tag, MPI_Comm comm, MPI_Status * status );
buffer Starting address of receive buffer
count Maximum number of data items in receive buffer
datatype Data type of each item (uniform since it is an array); defined by an MPI constant
src Rank of source (integer)
Can be specified as MPI_ANY_SOURCE to receive the message from any source in the communicator
Process rank in this case can be determined through status
tag Desired tag value (integer)
Can be specified as MPI_ANY_TAG
Received tag can be determined through status
comm Communicator; group of processes participating in this communication function
status Status objects; must be allocated before call to MPI_Recv
Blocks until the message has been received, or until an error condition causes the function to return
count contains the maximum length of message
Actual length of received message can be determined with MPI_Get_count
status contains information about the just-completed function
status->MPI source Rank of the process sending message
status->MPI tag Messages tag value
statis->MPI ERROR Error condition
Function blocks until message arrives in buffer
If message never arrives, function never returns
Deadlock
Process blocked on a condition that will never become true
Easy to write send/receive code that deadlocks
Two processes with rank 0 and 1, each wanting to compute the average of two numbers in an array a[2]
Floyds Algorithm
Floyds Algorithm
Passing a single message of length n from one PE to another has time complexity (n)
Broadcasting to p PEs requires dlog pe message-passing steps
Complexity of broadcasting: (n log p)
Outermost loop
For every iteration of outermost loop, parallel algorithm must compute the root PE taking constant time
Root PE copies the correct row of A to array tmp, taking (n) time
The loop itself executes n times
Overall time complexity
(n(1 + n + n log p + n2 /p)) = (n3 /p + n2 log p)
Prediction for the execution time on commodity cluster
n broadcasts, with dlog pe steps each
Each step passes messages of 4n bytes
Expected communication time of parallel program: ndlog pe( + 4n/)
Average time to update a single cell:
Expected computation time for parallel program: n2 dn/pe
Execution time
n2 dn/pe + ndlog pe( + 4n/)
This expression overestimates the parallel execution time because it ignores the fact that there can be considerable
overlap between computation and communication