Parallel Gaussian Elimination
Major Steps in Gauss-Jordan
1. Transform Coefficient Matrix to Upper
Triangular form
2. Back-Substitution
Transforming to Upper Triangular
forin+1 to Mdo
x[ i] b[ i] /a[ i, i]
for j i+1 toM do
b[ j] b[ j] x[ i] × a [ j, i]
endfor
endfor
Back-Substitution
forin-1 to 0 do
for j I to N do
sum = x[j] * a[i,j]
end for
x[i] = b[i] - sum
endfor
Data-Dependency Graph
[Link]
Partition (PCAM)
1 0 2 2 0 6
Rank 0 Rank idle
0 1 2 0 -3 0
0 0 22 -1 0 4
Rank 1
0 0 2 13 0 -5
0 0 0 1 16 0
Rank 2
0 0 5 3 5 -32
Static Partition Method
Steps – Static Partition Method
1. n number of rows in each stripe
2. Scatter matrix to ranks
3. for i 0 to rows-1
1. Adjust pivot row
2. Broadcast pivot row
3. Sweep matrix
4. Gather matrix
5. Solve via back-substitution
Partition (PCAM)
1 0 2 2 0 6
Completed
0 1 2 0 -3 0 rows stored
Rank 0 0 0 22 -1 0 4
Rank 1 0 0 2 13 0 -5
0 0 0 1 16 0
Rank 2
0 0 5 3 5 -32
Dynamic Partition Method
Steps – Dynamic Partition Method
1. for i 0 to rows-1
1. n number of rows per stripe ((rows-i)/ranks)
2. Scatter matrix to ranks
3. Adjust pivot row
4. Broadcast pivot row
5. Sweep matrix
6. Gather matrix
2. Solve via back-substitution
Static vs Dynamic
Static Dynamic
• Broadcast once • Broadcast for each row
• Gather once • Gather for each row
• Ranks idle after stripe • Load re-balanced at each
completed step
• Simpler code ?? • Code more complex??
Hybrid Methods
• Bandwidth Reduction (Gibbs-Poole-
Stockmeyer algorithm)
– [Link]
rithms/BOOK/BOOK3/[Link]
– [Link]
• Simpler solutions?
– Overlapping communications with computations
– Mix Static and Dynamic Methods
Homework
Use Gaussian Elimination to solve AX=B
• A ‘[Link]’ (first NxN values)
– 250x250, 1000x1000, and 3000x3000
• B ‘[Link]’ (multiple Right Hand Sides)
• X matrix of solution vectors
• Use Dynamic Partitioning or a Hybrid Method
• Run with 24, 96, 192 cores. (total of nine tests)
• Turn in code, sum of solution matrix, plot of
flops per test. (grouped bar chart)