0% found this document useful (0 votes)

33 views

Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix

Floyd's algorithm is used to find the shortest paths in a weighted graph represented by an adjacency matrix. It works by iterating through the matrix and updating each element to reflect the shortest known path between those two vertices that passes through each possible intermediate vertex. The parallel version of the algorithm partitions the matrix among processes and uses broadcasts to share updated path lengths during each iteration. Processes communicate directly through point-to-point sends and receives to distribute the matrix and collect outputs. Analysis shows the parallel algorithm has O(n^2/p) computational complexity and logarithmic communication complexity per iteration.

Uploaded by

ManojSudarshan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix

Uploaded by

ManojSudarshan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Floyds Algorithm

Introduction
Used to find shortest paths in a weighted graph
Travel maps containing driving distance from one point to another
Represented by tables
Shortest distance from point A to point B given by intersection of row and column
Route may pass through other cities represented in the table
Navigation systems

All-pairs shortest-path problem

Graph G = (V, E)
Weighted and directed graph
Problem: Find the length of the shortest path between every pair of vertices
Length of the path is strictly determined by the weight of its edges
It is not based on the number of edges traversed
Representation of weighted directed graph by adjacency matrix
n n matrix for a graph with n vertices adjacency matrix
Chosen for constant time access to every edge
Nonexistent edges may be assigned a value
0
1
2
3
4
5

0
0

1
2
0

2
5
7
0

1
4
0

3
0
4

3
0

Algorithm
Input n: Number of vertices
a[0..n-1][0..n-1] -- Adjacency matrix
Output: Transformed a that contains shortest path lengths
for ( k = 0; k < n; k++ )
for ( i = 0; i < n; i++ )
for ( j = 0; j < n; j++ )
a[i][j] = min ( a[i][j], a[i][k] + a[k][j] );
Easy to see that the algorithm is (n3 )
Solution

Floyds Algorithm

0
1
2
3
4
5

0
0

1
2
0
15
11
8
5

2
5
6
0
5
2
6

3
3
1
4
0
5
2

4
6
4
7
3
0
4

5
9
7
10
6
3
0

Creating arrays at run time

Allocate arrays on heap at run-time to make the program more useful, such as
int * arr;
arr = ( int * ) malloc ( n * sizeof ( int ) );
2D arrays in C are represented by a 1D array of pointers, using malloc
Problem in transmission as memory should be contiguous
Allocate them by first allocating space, and then, putting pointers in place
int ** arr;
// The main array
int * storage;
// Contiguous storage for array
storage = ( int * ) malloc ( m * n * sizeof ( int ) );
arr = ( int ** ) malloc ( m * sizeof ( int * ) );
for ( i = 0; i < m; i++ )
arr[i] = storage + ( i * n );
Initialize the array elements either by using a[i][j] notation, or by using storage if initialized en masse

Designing parallel algorithm

Partitioning
Choose either domain decomposition or functional decomposition
Same assignment statement executed n3 times
No functional parallelism
Easy to do domain decomposition
Divide matrix A into n2 elements
Associate a primitive task with each element
Communication
Each update of element a[i][j] requires access to elements a[i][k] and a[k][j]
For a given value of k
Element a[k,m] is needed by every task associated with elements in column m
Element a[m,k] is needed by every task associated with elements in row m
In iteration k, each element in row k gets broadcast to the tasks in the same column
Each element in column k gets broadcast to tasks in the same row
Do we need to update every element of matrix concurrently?
Values of a[i][k] and a[k][j] do not change during iteration k

Floyds Algorithm

Update to a[i][k] is
a[i][k] = min ( a[i,k], a[i,k] + a[k,k] );
Update to a[k][j] is
a[k][j] = min ( a[k,j], a[k,k] + a[k,j] );
In both the above updates, a[i][k] and a[k][j] cannot decrease (all numbers are positive)
Hence, no dependence between updates of a[i][j] and the updates of a[i][k] and a[k][j]
For each iteration k of the outer loop, perform broadcasts and update every element of matrix in parallel

Agglomeration and mapping

Use the decision tree to determine agglomeration and mapping strategy
Number of tasks: static
Communication among tasks: structured
Computation time per task: constant
Agglomerate tasks to minimize communication
One task per MPI process
Agglomerate n2 primitive tasks into p tasks using either of the following two methods
Row-wise block striped

Agglomerate tasks in the same row

Broadcast within rows eliminated; data values local to task
During every iteration of outer loop, one task broadcasts n elements to all other tasks
Time for each broadcast: dlog pe( + n/)

Column-wise block striped

Agglomerate tasks within same column
Broadcast within columns eliminated
Time for each broadcast: dlog pe( + n/)
Both the above methods result in same performance, look outside the computational kernel
Final choice: Simpler to read matrix from file with row-wise block striped if the file stores data as row-major order
Internally, C stores the matrices as row major as well

Matrix I / O
Choice to let each process open the file, access the appropriate row (seek to it), and read the row
Another method to read the matrix rows in last process and send them to appropriate process
Process i responsible for rows bin/pc through b(i + 1)n/pc 1
Last process responsible for dn/pe rows (largest buffer)
No extra space needed for file input buffering
Last process reads the rows for each process in a loop and sends the data to appropriate task
All the printing done by process 0, by getting data from other processes
Print all the output in correct order
Processes 1, 2, . . . , p 1 simply wait for a message from process 0, then send process 0 their portion of the matrix

Floyds Algorithm
Process 0 never receives more than one submatrix at a time

Point-to-point communication
Function to read matrix from file
Executed by process p 1
Reads a contiguous group of matrix rows
Sends a message containing these rows directly to the process responsible to manage them
Function to print the matrix
Each process sends the group of matrix rows to process 0
Process 0 receives the message and prints the rows to standard output
Communication involves a pair of processes
One process sends a message
Other process receives the message

Process h is not involved in communication

Both Send and receive are blocking
Both send and receive need to be conditionally executed by process rank
...
if ( id == i )
{
...
/* Send message to process j */
...
}
else if ( id == j )
{
...
/* Receive message from process i */
...
}
Communication calls must be in conditionally executed code

Floyds Algorithm
Function MPI_Send
Perform a blocking send
int MPI_Send ( void * buffer, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm );
buffer Starting address of the array of data items to send
count Number of data items in array (nonnegative integer)
datatype Data type of each item (uniform since it is an array); defined by an MPI constant
dest Rank of destination (integer)
tag Message tag, or integer label; allows identification of message purpose
comm Communicator; group of processes participating in this communication function
Function blocks until the message buffer is again available
Message buffer is free when
Message copied to system buffer, or
Message transmitted (may overlap computation)
Function MPI_Recv
Blocking receive for a message
int MPI_Recv ( void * buffer, int count, MPI_Datatype datatype, int src,
int tag, MPI_Comm comm, MPI_Status * status );
buffer Starting address of receive buffer
count Maximum number of data items in receive buffer
datatype Data type of each item (uniform since it is an array); defined by an MPI constant
src Rank of source (integer)
Can be specified as MPI_ANY_SOURCE to receive the message from any source in the communicator
Process rank in this case can be determined through status
tag Desired tag value (integer)
Can be specified as MPI_ANY_TAG
Received tag can be determined through status
comm Communicator; group of processes participating in this communication function
status Status objects; must be allocated before call to MPI_Recv
Blocks until the message has been received, or until an error condition causes the function to return
count contains the maximum length of message
Actual length of received message can be determined with MPI_Get_count
status contains information about the just-completed function
status->MPI source Rank of the process sending message
status->MPI tag Messages tag value
statis->MPI ERROR Error condition
Function blocks until message arrives in buffer
If message never arrives, function never returns
Deadlock
Process blocked on a condition that will never become true
Easy to write send/receive code that deadlocks
Two processes with rank 0 and 1, each wanting to compute the average of two numbers in an array a[2]

Floyds Algorithm

Process i has the updated value of a[i]

Both receive before send
float
a[2], avg;
int
me;
// Process rank
MPI_Status status;
...
int other = 1 - me;
MPI_Recv ( a + other, 1, MPI_FLOAT, other, 0, MPI_COMM_WORLD, &status );
MPI_Send ( a + me, 1, MPI_FLOAT, other, 0, MPI_COMM_WORLD );
avg = ( a[0] + a[1] ) / 2.0;
Process 0 blocks inside MPI_Recv waiting for message from process 1 to arrive
Process 1 blocks inside MPI_Recv waiting for message from process 0 to arrive
Both are deadlocked
Send tag does not match receive tag
float
a[2], avg;
int
me;
// Process rank
MPI_Status status;
...
int other = 1 - me;
MPI_Isend ( a + me, 1, MPI_FLOAT, other, me, MPI_COMM_WORLD, &request );
MPI_Recv ( a + other, 1, MPI_FLOAT, other, me, MPI_COMM_WORLD, &status );
avg = ( a[0] + a[1] ) / 2.0;
Processes block in MPI_Recv because the tag is not correct
Other problems could be wrong destination in send or wrong source in receive

Documenting the parallel program

Housekeeping, starting in floyd.c
Use a typedef and a #define macro to indicate matrix data types
floyd.c:main() responsible to read and print original matrix
It verifies that the matrix is square
All work done by compute_shortest_paths with four parameters

Analysis and benchmarking

Sequential version performance: (n3 )
Analysis of parallel algorithm
Innermost loop has complexity (n)
Middle loop executed at most dn/pe times due to rowwise block-striped decomposition of matrix
Overall computational complexity: (n2 /p)
Communication complexity
No communication in inner loop
No communication in middle loop
Broadcast in outer loop

Floyds Algorithm

Passing a single message of length n from one PE to another has time complexity (n)
Broadcasting to p PEs requires dlog pe message-passing steps
Complexity of broadcasting: (n log p)
Outermost loop
For every iteration of outermost loop, parallel algorithm must compute the root PE taking constant time
Root PE copies the correct row of A to array tmp, taking (n) time
The loop itself executes n times
Overall time complexity
(n(1 + n + n log p + n2 /p)) = (n3 /p + n2 log p)
Prediction for the execution time on commodity cluster
n broadcasts, with dlog pe steps each
Each step passes messages of 4n bytes
Expected communication time of parallel program: ndlog pe( + 4n/)
Average time to update a single cell:
Expected computation time for parallel program: n2 dn/pe
Execution time
n2 dn/pe + ndlog pe( + 4n/)
This expression overestimates the parallel execution time because it ignores the fact that there can be considerable
overlap between computation and communication

Dong A Sedecal
100% (4)
Dong A Sedecal
51 pages
BS en 489
100% (1)
BS en 489
34 pages
Inf3380 Floyd
No ratings yet
Inf3380 Floyd
19 pages
Floyd
No ratings yet
Floyd
20 pages
Mpi
No ratings yet
Mpi
67 pages
PP_2024_HW3
No ratings yet
PP_2024_HW3
12 pages
VSS-MPI-2
No ratings yet
VSS-MPI-2
23 pages
ECE 1747H: Parallel Programming: Message Passing (MPI)
No ratings yet
ECE 1747H: Parallel Programming: Message Passing (MPI)
67 pages
Inf3380 Week09
No ratings yet
Inf3380 Week09
20 pages
Pdcnotes
No ratings yet
Pdcnotes
23 pages
Untitled document
No ratings yet
Untitled document
23 pages
EXERCISE- 4[1] (1)
No ratings yet
EXERCISE- 4[1] (1)
8 pages
Gauss
No ratings yet
Gauss
7 pages
Parallel Programming 3
No ratings yet
Parallel Programming 3
22 pages
Multicore Architecture and Programming Lab Manual
No ratings yet
Multicore Architecture and Programming Lab Manual
29 pages
As 3
No ratings yet
As 3
2 pages
02 Message Passing Interface Tutorial
No ratings yet
02 Message Passing Interface Tutorial
34 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
Parallel Programming and MPI
No ratings yet
Parallel Programming and MPI
54 pages
Pseudo Code of Mpi Programs
No ratings yet
Pseudo Code of Mpi Programs
22 pages
MCA-Floyd Warshall Algorithm
No ratings yet
MCA-Floyd Warshall Algorithm
9 pages
Unit IV
No ratings yet
Unit IV
12 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
12.revision Parallelization
No ratings yet
12.revision Parallelization
30 pages
LEC6 parallelAlg-Broadcasting
No ratings yet
LEC6 parallelAlg-Broadcasting
15 pages
Solutions Midterm 1 March 72020
No ratings yet
Solutions Midterm 1 March 72020
7 pages
Sunil Kumar L 24
No ratings yet
Sunil Kumar L 24
21 pages
Module 3 Solutions PCS Ia2 Q.banks
No ratings yet
Module 3 Solutions PCS Ia2 Q.banks
13 pages
MPI Matrix Multiplication 1 PDF
No ratings yet
MPI Matrix Multiplication 1 PDF
23 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
CP4252 Multicore Architecture and Programming Lab Manual
No ratings yet
CP4252 Multicore Architecture and Programming Lab Manual
26 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
CP 4292 MCP lab manual
No ratings yet
CP 4292 MCP lab manual
20 pages
3.Introduction to Parallelism
No ratings yet
3.Introduction to Parallelism
64 pages
in3200-chap09
No ratings yet
in3200-chap09
56 pages
Exam 1
No ratings yet
Exam 1
8 pages
Report - Viber String
No ratings yet
Report - Viber String
26 pages
Introduction MPI - Chap2 - Slide 3
No ratings yet
Introduction MPI - Chap2 - Slide 3
16 pages
Lecture 15 MPI Summarization
No ratings yet
Lecture 15 MPI Summarization
26 pages
Mpi
No ratings yet
Mpi
30 pages
ST7 SHP 2.2 MessagePassing MPI p2p Communications 1spp 2
No ratings yet
ST7 SHP 2.2 MessagePassing MPI p2p Communications 1spp 2
53 pages
Lab 7
No ratings yet
Lab 7
5 pages
MPI Part2 Updated
No ratings yet
MPI Part2 Updated
20 pages
10.collectives I
No ratings yet
10.collectives I
31 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Distributed Memory Programming With MPI: Peter Pacheco
No ratings yet
Distributed Memory Programming With MPI: Peter Pacheco
121 pages
25 Revision
No ratings yet
25 Revision
21 pages
E 3 (Openmp - Iii) : Matrix Multiplication
No ratings yet
E 3 (Openmp - Iii) : Matrix Multiplication
10 pages
EXERCISE- 4
No ratings yet
EXERCISE- 4
8 pages
ATPESC 2019 Track-2 1-7-30 830am Guo-Raffenetti-Thakur-MPI For Scalable Computing
No ratings yet
ATPESC 2019 Track-2 1-7-30 830am Guo-Raffenetti-Thakur-MPI For Scalable Computing
199 pages
Distributed Memory Programming With: Peter Pacheco
No ratings yet
Distributed Memory Programming With: Peter Pacheco
125 pages
1 MPI Communications: CS424. Parallel Computing Lab#4
No ratings yet
1 MPI Communications: CS424. Parallel Computing Lab#4
30 pages
Lab
No ratings yet
Lab
22 pages
Lab Manual
No ratings yet
Lab Manual
31 pages
CSC4005 Tutorial3
No ratings yet
CSC4005 Tutorial3
40 pages
daa_unit-vi
No ratings yet
daa_unit-vi
50 pages
MPIreport
No ratings yet
MPIreport
4 pages
Unit Iv Distributed Memory Programming With Mpi
No ratings yet
Unit Iv Distributed Memory Programming With Mpi
19 pages
Canon's Algorithm
No ratings yet
Canon's Algorithm
11 pages
MPI2
No ratings yet
MPI2
3 pages
Mpi Basic Operations
No ratings yet
Mpi Basic Operations
6 pages
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
Performance Analysis: PE PE
No ratings yet
Performance Analysis: PE PE
10 pages
What Is SimpleScalar
No ratings yet
What Is SimpleScalar
3 pages
Begin Parallel Programming With OpenMP - CodeProject
No ratings yet
Begin Parallel Programming With OpenMP - CodeProject
8 pages
Producer Consumer Problem With Pthreads Using Semaphores
No ratings yet
Producer Consumer Problem With Pthreads Using Semaphores
7 pages
Drawing Materials
No ratings yet
Drawing Materials
26 pages
Tayal
No ratings yet
Tayal
8 pages
IEC TC 65__IN ETD 18 Sectional Committee - Meeting Minutes (2025)
No ratings yet
IEC TC 65__IN ETD 18 Sectional Committee - Meeting Minutes (2025)
15 pages
Experiment 1 Tensile Testing (Universal Tester) : RD TH
No ratings yet
Experiment 1 Tensile Testing (Universal Tester) : RD TH
23 pages
VHDL Tutorial
No ratings yet
VHDL Tutorial
37 pages
SS2Windows XP MicroStation V8i and Inroads Installation Instructions PDF
No ratings yet
SS2Windows XP MicroStation V8i and Inroads Installation Instructions PDF
26 pages
Witness Statement - Ir Tan 2019
No ratings yet
Witness Statement - Ir Tan 2019
8 pages
Instruction Set Andâ Programming of 8085 - 2024
No ratings yet
Instruction Set Andâ Programming of 8085 - 2024
140 pages
2158.pressure Vessel Handbook by Eugene F. Megyesy
100% (1)
2158.pressure Vessel Handbook by Eugene F. Megyesy
499 pages
Device Craft Wiper Motor Spec
No ratings yet
Device Craft Wiper Motor Spec
15 pages
Excavation Procedure.
No ratings yet
Excavation Procedure.
11 pages
Benefit Cost Analysis of Reducing Construction Waste in Malaysia
No ratings yet
Benefit Cost Analysis of Reducing Construction Waste in Malaysia
13 pages
Lokotrack LT300HP S N 77462
100% (2)
Lokotrack LT300HP S N 77462
733 pages
Squat, Interectation, Manoeuvring - The Nautical Institute, 1995
No ratings yet
Squat, Interectation, Manoeuvring - The Nautical Institute, 1995
60 pages
MANOJ SHARMA - Resume
No ratings yet
MANOJ SHARMA - Resume
4 pages
Fluid Mechanics 1 Lab Manual
No ratings yet
Fluid Mechanics 1 Lab Manual
36 pages
Mechanics of Mechanical Watches
No ratings yet
Mechanics of Mechanical Watches
3 pages
Selective Harmonic Elimination of PWM ACAC Voltage Controller Using Hybrid RGAPS
No ratings yet
Selective Harmonic Elimination of PWM ACAC Voltage Controller Using Hybrid RGAPS
7 pages
Comanda Jimmy 2020
No ratings yet
Comanda Jimmy 2020
3 pages
CSE Coordinators and Helpdesk
No ratings yet
CSE Coordinators and Helpdesk
11 pages
Lab 4
No ratings yet
Lab 4
5 pages
IB Chemistry Definitions
No ratings yet
IB Chemistry Definitions
4 pages
PERIODIC Test Handicraft
100% (1)
PERIODIC Test Handicraft
4 pages
QCDFSS 1 1 (Basic Requirements)
No ratings yet
QCDFSS 1 1 (Basic Requirements)
6 pages
Arkema Kepstan PEKK 8002 TDS
No ratings yet
Arkema Kepstan PEKK 8002 TDS
3 pages
Course Specification: Beng (Hons) Civil Engineering 2018-19 (Beciv)
No ratings yet
Course Specification: Beng (Hons) Civil Engineering 2018-19 (Beciv)
11 pages
NBCP (Annex A, B)
No ratings yet
NBCP (Annex A, B)
13 pages
Diversifying Level Measurement Technology in The LNG Industry
0% (1)
Diversifying Level Measurement Technology in The LNG Industry
3 pages

Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix

Uploaded by

Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix

Uploaded by

Floyds Algorithm

All-pairs shortest-path problem

Creating arrays at run time

Designing parallel algorithm

Agglomeration and mapping

Agglomerate tasks in the same row

Column-wise block striped

Process h is not involved in communication

Process i has the updated value of a[i]

Documenting the parallel program

Analysis and benchmarking

You might also like