Lecture 4:
Parallel Programming
Models
Parallel Programming Models
Parallel Programming Models:
Data parallelism / Task parallelism
Explicit parallelism / Implicit parallelism
Shared memory / Distributed memory
Other programming paradigms
Object-oriented
Functional and logic
Parallel Programming Models
Data Parallelism
Parallel programs that emphasize concurrent execution of the
same task on different data elements (data-parallel programs)
Most programs for scalable parallel computers are data parallel in
nature.
Task Parallelism
Parallel programs that emphasize the concurrent execution of
different tasks on the same or different data
Used for modularity reasons.
Parallel programs, structured as a task-parallel composition of dataparallel components is common.
Parallel Programming Models
Data parallelism
Task Parallelism
Parallel Programming Models
Explicit Parallelism
The programmer specifies directly the activities of the multiple
concurrent threads of control that form a parallel
computation.
Provide the programmer with more control over program behavior
and hence can be used to achieve higher performance.
Implicit Parallelism
The programmer provides high-level specification of program
behavior.
It is then the responsibility of the compiler or library to
implement this parallelism efficiently and correctly.
Parallel Programming Models
Shared Memory
The programmers task is to specify the activities of a set of
processes that communicate by reading and writing shared memory.
Advantage: the programmer need not be concerned with data-distribution
issues.
Disadvantage: performance implementations may be difficult on computers
that lack hardware support for shared memory, and race conditions tend to
arise more easily
Distributed Memory
Processes have only local memory and must use some other
mechanism (e.g., message passing or remote procedure call) to
exchange information.
Advantage: programmers have explicit control over data distribution and
communication.
Shared vs Distributed Memory
P
Bus
Shared memory
Distributed memory
Memory
Network
Parallel Programming Models
Parallel Programming Tools:
Parallel Virtual Machine (PVM)
Distributed memory, explicit parallelism
Distributed memory, explicit parallelism
Shared memory, explicit parallelism
Shared memory, explicit parallelism
Implicit parallelism
Implicit parallelism
Message-Passing Interface (MPI)
PThreads
OpenMP
High-Performance Fortran (HPF)
Parallelizing Compilers
Parallel Programming Models
Parallel Programming Models
Message Passing Model
Used on Distributed memory MIMD architectures
Multiple processes execute in parallel
asynchronously
Process creation may be static or dynamic
Processes communicate by using send and
receive primitives
Parallel Programming Models
Blocking send: waits until all data is received
Non-blocking send: continues execution after
placing the data in the buffer
Blocking receive: if data is not ready, waits until it
arrives
Non-blocking receive: reserves buffer and continue
execution. In a later wait operation if data is ready,
copies it into the memory.
Parallel Programming Models
Synchronous message-passing: Sender and
receiver processes are synchronized
Blocking-send / Blocking receive
Asynchronous message-passing: no
synchronization between sender and receiver
processes
Large buffers are required. As buffer size is finite, the
sender may eventually block.
Parallel Programming Models
Advantages of message-passing model
Programs are highly portable
Provides the programmer with explicit control over the
location of data in the memory
Disadvantage of message-passing model
Programmer is required to pay attention to such details as
the placement of memory and the ordering of
communication.
Parallel Programming Models
Factors that influence the performance of message-passing
model
Bandwidth
Latency
Ability to overlap communication with computation.
Parallel Programming Models
Example: Pi calculation
f01 f(x) dx = f01 4/(1+x2) dx = w f(xi)
f(x) = 4/(1+x2)
n = 10
w = 1/n
xi = w(i-0.5)
f(x)
0.1 0.2
xi
Parallel Programming Models
Sequential Code
f(x)
#define
f(x) 4.0/(1.0+x*x);
main(){
int n,i;
float w,x,sum,pi;
printf(n?\n);
scanf(%d, &n);
w=1.0/n;
sum=0.0;
for (i=1; i<=n; i++){
x=w*(i-0.5);
sum += f(x);
}
pi=w*sum;
printf(%f\n, pi);
}
0.1 0.2
xi
= w f(xi)
f(x) = 4/(1+x2)
n = 10
w = 1/n
xi = w(i-0.5)
Parallel Programming Models
Parallel PVM program
Master
W0
W1
W2
Master
W3
Master:
Creates workers
Sends initial values to workers
Receives local sums from
workers
Calculates and prints pi
Workers:
Receive initial values from master
Calculate local sums
Send local sums to Master
Parallel Virtual Machine (PVM)
Data Distribution
f(x)
f(x)
0.1 0.2
xi
x
0
0.1 0.2
xi
Parallel Programming Models
SPMD Parallel PVM program
Master
W0
W1
W2
Master
Master:
Creates workers
Sends initial values to workers
Receives pi from W0 and prints
W3
Workers:
Receive initial values from master
Calculate local sums
Workers other than W0:
Send local sums to W0
W0:
Receives local sums from
other workers
Calculates pi
Sends pi to Master
Parallel Programming Models
Shared Memory Model
Used on Shared memory MIMD architectures
Program consists of many independent threads
Concurrently executing threads all share a single, common
address space.
Threads can exchange information by reading and writing to
memory using normal variable assignment operations
Parallel Programming Models
Memory Coherence Problem
To ensure that the latest value of a variable updated in one thread
is used when that same variable is accessed in another thread.
Thread 1
Thread 2
Hardware support and compiler support are required
Cache-coherency protocol
Parallel Programming Models
Distributed Shared Memory (DSM) Systems
Implement Shared memory model on Distributed memory MIMD
architectures
Concurrently executing threads all share a single, common address
space.
Threads can exchange information by reading and writing to
memory using normal variable assignment operations
Use a message-passing layer as the means for communicating
updated values throughout the system.
Parallel Programming Models
Synchronization operations in Shared Memory Model
Monitors
Locks
Critical sections
Condition variables
Semaphores
Barriers
PThreads
In the UNIX environment a thread:
Exists within a process and uses the process resources
Has its own independent flow of control
Duplicates only the essential resources it needs to be independently
schedulable
May share the process resources with other threads
Dies if the parent process dies
Is "lightweight" because most of the overhead has already been
accomplished through the creation of its process.
PThreads
Because threads within the same process share resources:
Changes made by one thread to shared system resources will be seen
by all other threads.
Two pointers having the same value point to the same data.
Reading and writing to the same memory locations is possible, and
therefore requires explicit synchronization by the programmer.