Lecture 03
Lecture 03
Lecture 3
Parallel Programming -I
Prof. Mamun, CSE, HSTU
Objectives
Discussion on Programming Models
MapReduce
Message
Passing
Examples of Interface (MPI)
parallel
Traditional processing
models of
Parallel parallel
computer programming
Why architectures
parallelism?
2
Amdahl’s Law
We parallelize our programs in order to run them faster
3
Amdahl’s Law: An Example
Suppose that 80% of you program can be parallelized and that you
use 4 processors to run your parallel version of the program
Although you use 4 processors you cannot get a speedup more than
2.5 times (or 40% of the serial running time)
4
Real Vs. Actual Cases
Amdahl’s argument is too simplified to be applied to real cases
20 80 20 80
Serial Serial
Parallel 20 20 Parallel 20 20
Process 1 Process 1
Process 2 Process 2
Cannot be parallelized
Process 3 Process 3
Cannot be parallelized Can be parallelized
5
Guidelines
In order to efficiently benefit from parallelization, we
ought to follow these guidelines:
6
Objectives
Discussion on Programming Models
MapReduce
Message
Passing
Examples of Interface (MPI)
parallel
Traditional processing
models of
Parallel parallel
computer programming
Why architectures
parallelism?
7
Parallel Computer Architectures
Multi-Chip Single-Chip
Multiprocessors Multiprocessors
8
Multi-Chip Multiprocessors
We can categorize the architecture of multi-chip multiprocessor
computers in terms of two aspects:
Address Space
Address
M Shared Address Addres Individual
e
m
ory SharedSharedShared
sSS
e
c
a
p
Memory
M
e
m IndividualIndividuaIn
l divi
ory
M
e CentralizedCentralizedCentralized
Centralized SMP (Symmetric Multiprocessor)/UMA N/A
m
ory
M dua
l
SMPSMPSMP(Symmetric(Symmetric(Symmetric
(Uniform Memory Access) Architecture
e
m
or
y Multiprocessor)/UMAMultiprocessor)/UMAMultiprocessor)/UMA
DistributedDistributedDistributed
Distributed Distributed Shared Memory (DSM)/NUMA MPP (Massively Parallel
N/AN/AN/A
(Non-Uniform Memory Access)
DistributedDistributedDistributedSharedSharedShared Processors)/UMA
MemoryMemoryMemory
(Uniform (Uniform
(Uniform MemoryMemoryMemory
(DSM)/NUMA(DSM)/NUMA(DSM)/NUMA
Architecture MPPMPPMPP
Architecture
Aces)AcesA
)ces)Architecture
Architecture
Architecture
9
Symmetric Multiprocessors
A system with Symmetric Multiprocessors (SMP) architecture uses a
shared memory that can be accessed equally from all processors
Memory
I/O
Interconnection Network
12
Parallel Computer Architectures
Multi-Chip Single-Chip
Multiprocessors Multiprocessors
13
Moore’s Law
As chip manufacturing technology improves, transistors are getting smaller
and smaller and it is possible to put more of them on a chip
Notable examples of Chip Multiprocessors include Intel Core i-series processors, AMD
15
Ryzen processors, and ARM Cortex-A series processors.
Objectives
Discussion on Programming Models
MapReduce
Message
Passing
Examples of Interface (MPI)
parallel
Traditional processing
models of
Parallel parallel
computer programming
Why architectures
parallelism?
16
Models of Parallel Programming
What is a parallel programming model?
17
Traditional Parallel Programming
Models
Parallel Programming Models
18
Shared Memory Model
In the shared memory programming model, the abstraction is that
parallel tasks can access any location of the memory
19
Shared Memory Model
Single Thread Multi-Thread
Si = Serial
Pj = Parallel Time
Time
S1 S1 Spawn
P1
P1 P2 P3 P4
P2
Join
P3 S2 Shared Address Space
P4
S2
Process
Process
20
Shared Memory Example
begin parallel // spawn child threads
private int start_iter, end_iter, i;
shared int local_iter=4;
shared double sum=0.0, a[], b[], c[];
shared lock_type mylock;
Parallel
21
Why Locks?
Unfortunately, threads in a shared memory model need to synchronize
Mutual exclusion requires that when there are multiple threads, only one
thread is allowed to write to a shared memory location (or the critical
section) at any time
Thread 0 Thread 1
23
The Peterson’s Algorithm
To solve this problem, let us consider a software solution referred to
as the Peterson’s Algorithm [Tanenbaum, 1992]
int turn;
int interested[n]; // initialized to 0
24
No Race
Thread 0 Thread 1
interested[0] = TRUE;
turn = 0;
while (turn == 0 && interested[1] == TRUE)
{} ;
No Synchronization
Problem
25
With Race
Thread 0 Thread 1
interested[0] = TRUE; interested[1] = TRUE;
turn = 0;
turn = 1;
while (turn == 0 && interested[1] == TRUE) while (turn == 1 && interested[0] == TRUE)
{} ; {} ;
No Synchronization
Problem
26
Traditional Parallel Programming
Models
Parallel Programming Models
27
Message Passing Model
In message passing, parallel tasks have their own local memories
Message Passing Interface (MPI) programs are the best fit with the
message passing programming model
28
Message Passing Model
Single Thread Message Passing
S = Serial
P = Parallel
Time
Time
S1 S1 S1 S1 S1
P1 P1 P2 P3 P4
P2 S2 S2 S2 S2
P3
P4
Process 0 Process 1 Process 2 Process 3
S2
Node 1 Node 2 Node 3 Node 4
Process
29
Message Passing Example
id = getpid();
local_iter = 4;
start_iter = id * local_iter;
end_iter = start_iter + local_iter;
if (id == 0)
send_msg (P1, b[4..7], c[4..7]);
for (i=0; i<8; i++) else
a[i] = b[i] + c[i]; recv_msg (P0, b[4..7], c[4..7]);
sum = 0;
for (i=0; i<8; i++) for ( i=start_iter; i<end_iter; i++)
if (a[i] > 0) No +Mutual
a[i] = b[i] c[i]; Exclusion is
sum = sum + a[i]; Required!
Print sum; local_sum = 0;
for (i=start_iter; i<end_iter; i++)
Sequential if (a[i] > 0)
local_sum = local_sum + a[i];
if (id == 0) {
recv_msg (P1, &local_sum1);
sum = local_sum + local_sum1;
Print sum;
}
else
send_msg (P0, local_sum);
Parallel 31
Shared Memory Vs. Message Passing
Comparison between shared memory and message passing
programming models:
31
Objectives
Discussion on Programming Models
MapReduce
Message
Passing
Examples of Interface (MPI)
parallel
Traditional processing
models of
Parallel parallel
computer programming
Why architectures
parallelism?
32
SPMD and MPMD
When we run multiple processes with message-passing, there are
further categorizations regarding how many different programs are
cooperating in parallel execution
33
SPMD
In the SPMD model, there is only one program and each process
uses the same executable working on different sets of data
a.out
34
MPMD
The MPMD model uses different programs for different processes,
but the processes collaborate to solve the same problem
MPMD has two styles, the master/worker and the coupled analysis
Example
is ie
1 2 3 4 5 6
a
36
An Example
Process 0 Process 1 Process 2
1. Read array a() from the 1. Read array a() from the 1. Read array a() from the
input file input file input file
2. Get my rank 2. Get my rank 2. Get my rank
3. If rank==0 then 3. If rank==0 then 3. If rank==0 then
is=1, ie=2 is=1, ie=2 is=1, ie=2
If rank==1 then If rank==1 then If rank==1 then
is=3, ie=4 is=3, ie=4 is=3, ie=4
If rank==2 then If rank==2 then If rank==2 then
is=5, ie=6 is=5, ie=6 is=5, ie=6
4. Process from a(is) to 4. Process from a(is) to 4. Process from a(is) to
a(ie) a(ie) a(ie)
5. Gather the results to 5. Gather the results to 5. Gather the results to
process 0 process 0 process 0
6. If rank==0 then write 6. If rank==0 then write 6. If rank==0 then write
array a() to the output array a() to the output array a() to the output
file file file
is ie is ie is ie
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
a a a
a a a
37
Concluding Remarks
To summarize, keep the following 3 points in mind:
38
Next Class
Discussion on Programming Models
MapReduce
Message
Passing
Examples of Interface (MPI)
parallel
Traditional processing
models of
Parallel parallel
computer programming
Why architectures
parallelism?
39