0% found this document useful (0 votes)
39 views

Parallel Programming: Aaron Bloomfield CS 415 Fall 2005

This document discusses parallel programming and different approaches for parallelizing code. It describes how parallel programming can be used to model complex systems and speed up simulations. Shared memory and message passing architectures are introduced as common programming models for parallel computers. Programming interfaces like OpenMP and MPI allow developers to write multi-threaded and distributed applications respectively. The challenges of parallelizing serial code like identifying dependencies between loop iterations are also covered.

Uploaded by

Priya Rawat
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Parallel Programming: Aaron Bloomfield CS 415 Fall 2005

This document discusses parallel programming and different approaches for parallelizing code. It describes how parallel programming can be used to model complex systems and speed up simulations. Shared memory and message passing architectures are introduced as common programming models for parallel computers. Programming interfaces like OpenMP and MPI allow developers to write multi-threaded and distributed applications respectively. The challenges of parallelizing serial code like identifying dependencies between loop iterations are also covered.

Uploaded by

Priya Rawat
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Parallel Programming

Aaron Bloomfield
CS 415
Fall 2005
1

Why Parallel Programming?

Predict weather
Predict spread of SARS
Predict path of hurricanes
Predict oil slick propagation
Model growth of bio-plankton/fisheries
Structural simulations
Predict path of forest fires
Model formation of galaxies
Simulate nuclear explosions
2

Code that can be parallelized


do i= 1 to max,
a[i] = b[i] + c[i] * d[i]
end do

Parallel Computers
Programming mode types
Shared memory
Message passing

Distributed Memory Architecture

Each Processor has direct access only to its local memory


Processors are connected via high-speed interconnect
Data structures must be distributed
Data exchange is done via explicit processor-to-processor
communication: send/receive messages
Programming Models
Widely used standard: MPI
Others: PVM, Express, P4, Chameleon, PARMACS, ...
Memory

P0

Memory

P1

Memory

...

Pn

Communication
Interconnect
5

Message Passing Interface


MPI provides:
Point-to-point communication
Collective operations
Barrier synchronization
gather/scatter operations
Broadcast, reductions
Different communication modes
Synchronous/asynchronous
Blocking/non-blocking
Buffered/unbuffered
Predefined and derived datatypes
Virtual topologies
Parallel I/O (MPI 2)
C/C++ and Fortran bindings

http://www.mpi-forum.org

Shared Memory Architecture


Processors have direct access to global memory and I/O
through bus or fast switching network
Cache Coherency Protocol guarantees consistency
of memory and I/O accesses
Each processor also has its own memory (cache)
Data structures are shared in global address space
Concurrent access to shared memory must be coordinated
Programming Models
Multithreading (Thread Libraries)
OpenMP
P1
P0

Cache

Cache

...

Pn
Cache

Shared Bus
Global Shared Memory

OpenMP
OpenMP: portable shared memory parallelism
Higher-level API for writing portable multithreaded
applications
Provides a set of compiler directives and library routines
for parallel application programmers
API bindings for Fortran, C, and C++

http://www.OpenMP.org

Approaches

Parallel Algorithms
Parallel Language
Message passing (low-level)
Parallelizing compilers

10

Parallel Languages
CSP - Hoares notation for parallelism as a
network
of
sequential
processes
exchanging messages.
Occam - Real language based on CSP.
Used for the transputer, in Europe.

11

Fortran for parallelism


Fortran 90 - Array language.
Triplet
notation for array sections. Operations and
intrinsic functions possible on array
sections.
High Performance Fortran (HPF) Similar to Fortran 90, but includes data
layout specifications to help the compiler
generate efficient code.
12

More parallel languages


ZPL - array-based language at UW.
Compiles into C code (highly portable).
C* - C extended for parallelism

13

Object-Oriented
Concurrent Smalltalk
Threads in Java, Ada, thread libraries for
use in C/C++
This uses a library of parallel routines

14

Functional
NESL, Multiplisp
Id & Sisal (more dataflow)

15

Parallelizing Compilers
Automatically transform a sequential program into
a parallel program.
1. Identify loops whose
executed in parallel.
2. Often done in stages.

iterations

can

be

Q: Which loops can be run in parallel?


Q: How should we distribute the work/data?
16

Data Dependences
Flow dependence - RAW. Read-After-Write. A
"true" dependence. Read a value after it has
been written into a variable.
Anti-dependence - WAR.
Write-After-Read.
Write a new value into a variable after the old
value has been read.
Output dependence - WAW. Write-After-Write.
Write a new value into a variable and then later
on write another value into the same variable.
17

Example
1:
2:
3:
4:

A = 90;
B = A;
C = A+ D
A = 5;

18

Dependencies
A parallelizing compiler must identify loops that do
not have dependences BETWEEN ITERATIONS
of the loop.
Example:
do I = 1, 1000
A(I) = B(I) + C(I)
D(I) = A(I)
end do
19

Example
Fork one thread for each processor
Each thread executes the loop:
do I = my_lo, my_hi
A(I) = B(I) + C(I)
D(I) = A(I)
end do
Wait for all threads to finish before
proceeding.
20

Another Example
do I = 1, 1000
A(I) = B(I) + C(I)
D(I) = A(I+1)
end do

21

Yet Another Example


do I = 1, 1000
A( X(I) ) = B(I) + C(I)
D(I) = A( X(I) )
end do

22

Parallel Compilers
Two concerns:
Parallelizing code
Compiler will move code around to uncover
parallel operations

Data locality
If a parallel operation has to get data from
another processors memory, thats bad

23

Distributed computing
Take a big task that has natural parallelism
Split it up to may different computers across a
network
Examples:
SETI@Home,
prime
searches, Google Compute, etc.

number

Distributed computing is a form of parallel


computing
24

You might also like