0% found this document useful (0 votes)
52 views24 pages

Chapter 23

Uploaded by

Hammad Ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views24 pages

Chapter 23

Uploaded by

Hammad Ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

An Introduction to Parallel Programming

Peter Pacheco

Chapter 2
Parallel Hardware and Parallel
Software

Copyright © 2010, Elsevier Inc. All rights Reserved 1


# Chapter Subtitle
Roadmap

Copyright © 2010, Elsevier Inc. All rights Reserved 2


PERFORMANCE

Copyright © 2010, Elsevier Inc. All rights Reserved 3


Speedup
 Number of cores = p
 Serial run-time = Tserial
 Parallel run-time = Tparallel

up
ed
s pe
ar Tparallel = Tserial / p
il ne

Copyright © 2010, Elsevier Inc. All rights Reserved 4


Effect of overhead

Tparallel = Tserial / p + Toverhead

Copyright © 2010, Elsevier Inc. All rights Reserved 5


Amdahl’s Law
 Unless virtually all of a serial program is
parallelized, the possible speedup is going
to be very limited — regardless of the
number of cores available.

Copyright © 2010, Elsevier Inc. All rights Reserved 6


Example
 We can parallelize 90% of a serial
program.
 Parallelization is “perfect” regardless of the
number of cores p we use.
 Tserial = 20 seconds
 Runtime of parallelizable part is
0.9 x Tserial / p = 18 / p

Copyright © 2010, Elsevier Inc. All rights Reserved 7


Example (cont.)
 Runtime of “unparallelizable” part is

0.1 x Tserial = 2
 Overall parallel run-time is

Tparallel = 0.9 x Tserial / p + 0.1 x Tserial = 18 / p + 2

Copyright © 2010, Elsevier Inc. All rights Reserved 8


Taking Timings
 What is time?
 Start to finish?
 A program segment of interest?
 CPU time?
 Wall clock time?

Copyright © 2010, Elsevier Inc. All rights Reserved 9


Taking Timings

Copyright © 2010, Elsevier Inc. All rights Reserved 10


More on Performance
 Any time data is transmitted, we’re
interested in how long it will take for the
data to reach its destination.

Copyright © 2010, Elsevier Inc. All rights Reserved 11


More on Performance
 Latency
 The time that elapses between the source’s
beginning to transmit the data and the
destination’s starting to receive the first byte.
 Bandwidth
 The rate at which a link can transmit data.
 The rate at which the destination receives data
after it has started to receive the first byte.
 Usually given in megabits or megabytes per
second

Copyright © 2010, Elsevier Inc. All rights Reserved 12


Message transmission time = l + n / b

latency (seconds)

length of message (bytes)

bandwidth (bytes per second)

Copyright © 2010, Elsevier Inc. All rights Reserved 13


PARALLEL PROGRAM
DESIGN

Copyright © 2010, Elsevier Inc. All rights Reserved 14


Foster’s methodology
1. Partitioning: divide the computation to be
performed and the data operated on by
the computation into small tasks.

The focus here should be on identifying


tasks that can be executed in parallel.

Copyright © 2010, Elsevier Inc. All rights Reserved 15


Foster’s methodology
2. Communication: determine what
communication needs to be carried out
among the tasks identified in the previous
step.

Copyright © 2010, Elsevier Inc. All rights Reserved 16


Foster’s methodology
3. Agglomeration or aggregation: combine
tasks and communications identified in
the first step into larger tasks.

For example, if task A must be executed


before task B can be executed, it may
make sense to aggregate them into a
single composite task.

Copyright © 2010, Elsevier Inc. All rights Reserved 17


Foster’s methodology
4. Mapping: assign the composite tasks
identified in the previous step to
processes/threads.

This should be done so that


communication is minimized, and each
process/thread gets roughly the same
amount of work.

Copyright © 2010, Elsevier Inc. All rights Reserved 18


Example - histogram
 1.3,2.9,0.4,0.3,1.3,4.4,1.7,0.4,3.2,0.3,4.9,2
.4,3.1,4.4,3.9,0.4,4.2,4.5,4.9,0.9

Copyright © 2010, Elsevier Inc. All rights Reserved 19


Serial program - input
1. The number of measurements: data_count
2. An array of data_count floats: data
3. The minimum value for the bin: min_meas
4. The maximum value for the bin: max_meas
5. The number of bins: bin_count

Copyright © 2010, Elsevier Inc. All rights Reserved 20


Serial program - output
1. bin_maxes : an array of size bin_count

2. bin_counts : an array stores the number


of data elements in each bin

Copyright © 2010, Elsevier Inc. All rights Reserved 21


First two stages of Foster’s
Methodology

Copyright © 2010, Elsevier Inc. All rights Reserved 22


Alternative definition of tasks
and communication

Copyright © 2010, Elsevier Inc. All rights Reserved 23


Adding the local arrays

Copyright © 2010, Elsevier Inc. All rights Reserved 24

You might also like