An Introduction to Parallel Programming
Peter Pacheco
Chapter 2
Parallel Hardware and Parallel
Software
Copyright © 2010, Elsevier Inc. All rights Reserved 1
# Chapter Subtitle
Roadmap
Copyright © 2010, Elsevier Inc. All rights Reserved 2
PERFORMANCE
Copyright © 2010, Elsevier Inc. All rights Reserved 3
Speedup
Number of cores = p
Serial run-time = Tserial
Parallel run-time = Tparallel
up
ed
s pe
ar Tparallel = Tserial / p
il ne
Copyright © 2010, Elsevier Inc. All rights Reserved 4
Effect of overhead
Tparallel = Tserial / p + Toverhead
Copyright © 2010, Elsevier Inc. All rights Reserved 5
Amdahl’s Law
Unless virtually all of a serial program is
parallelized, the possible speedup is going
to be very limited — regardless of the
number of cores available.
Copyright © 2010, Elsevier Inc. All rights Reserved 6
Example
We can parallelize 90% of a serial
program.
Parallelization is “perfect” regardless of the
number of cores p we use.
Tserial = 20 seconds
Runtime of parallelizable part is
0.9 x Tserial / p = 18 / p
Copyright © 2010, Elsevier Inc. All rights Reserved 7
Example (cont.)
Runtime of “unparallelizable” part is
0.1 x Tserial = 2
Overall parallel run-time is
Tparallel = 0.9 x Tserial / p + 0.1 x Tserial = 18 / p + 2
Copyright © 2010, Elsevier Inc. All rights Reserved 8
Taking Timings
What is time?
Start to finish?
A program segment of interest?
CPU time?
Wall clock time?
Copyright © 2010, Elsevier Inc. All rights Reserved 9
Taking Timings
Copyright © 2010, Elsevier Inc. All rights Reserved 10
More on Performance
Any time data is transmitted, we’re
interested in how long it will take for the
data to reach its destination.
Copyright © 2010, Elsevier Inc. All rights Reserved 11
More on Performance
Latency
The time that elapses between the source’s
beginning to transmit the data and the
destination’s starting to receive the first byte.
Bandwidth
The rate at which a link can transmit data.
The rate at which the destination receives data
after it has started to receive the first byte.
Usually given in megabits or megabytes per
second
Copyright © 2010, Elsevier Inc. All rights Reserved 12
Message transmission time = l + n / b
latency (seconds)
length of message (bytes)
bandwidth (bytes per second)
Copyright © 2010, Elsevier Inc. All rights Reserved 13
PARALLEL PROGRAM
DESIGN
Copyright © 2010, Elsevier Inc. All rights Reserved 14
Foster’s methodology
1. Partitioning: divide the computation to be
performed and the data operated on by
the computation into small tasks.
The focus here should be on identifying
tasks that can be executed in parallel.
Copyright © 2010, Elsevier Inc. All rights Reserved 15
Foster’s methodology
2. Communication: determine what
communication needs to be carried out
among the tasks identified in the previous
step.
Copyright © 2010, Elsevier Inc. All rights Reserved 16
Foster’s methodology
3. Agglomeration or aggregation: combine
tasks and communications identified in
the first step into larger tasks.
For example, if task A must be executed
before task B can be executed, it may
make sense to aggregate them into a
single composite task.
Copyright © 2010, Elsevier Inc. All rights Reserved 17
Foster’s methodology
4. Mapping: assign the composite tasks
identified in the previous step to
processes/threads.
This should be done so that
communication is minimized, and each
process/thread gets roughly the same
amount of work.
Copyright © 2010, Elsevier Inc. All rights Reserved 18
Example - histogram
1.3,2.9,0.4,0.3,1.3,4.4,1.7,0.4,3.2,0.3,4.9,2
.4,3.1,4.4,3.9,0.4,4.2,4.5,4.9,0.9
Copyright © 2010, Elsevier Inc. All rights Reserved 19
Serial program - input
1. The number of measurements: data_count
2. An array of data_count floats: data
3. The minimum value for the bin: min_meas
4. The maximum value for the bin: max_meas
5. The number of bins: bin_count
Copyright © 2010, Elsevier Inc. All rights Reserved 20
Serial program - output
1. bin_maxes : an array of size bin_count
2. bin_counts : an array stores the number
of data elements in each bin
Copyright © 2010, Elsevier Inc. All rights Reserved 21
First two stages of Foster’s
Methodology
Copyright © 2010, Elsevier Inc. All rights Reserved 22
Alternative definition of tasks
and communication
Copyright © 2010, Elsevier Inc. All rights Reserved 23
Adding the local arrays
Copyright © 2010, Elsevier Inc. All rights Reserved 24