0% found this document useful (0 votes)

179 views4 pages

HW2 Solutions

This document contains solutions to homework problems related to parallel programming concepts. It discusses performance comparisons between two computers, speedup calculations for parallelizing an application across multiple processors, Amdahl's law analysis of improving different parts of an application, efficiency metrics for an application's performance, and plots of scalability and parallel efficiency for a sample parallel program.

Uploaded by

নিবিড় অভ্র

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views4 pages

HW2 Solutions

Uploaded by

নিবিড় অভ্র

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Parallel Programming WS15 HOMEWORK #2 (Solutions)

1 Basic concepts

1. Performance. Suppose we have two computers A and B. Computer A has a clock cycle of
1 ns and performs 2 instructions per cycle. Computer B, instead, has a clock cycle of 600 ps
and performs 1.25 instructions per cycle. Assuming a program requires the execution of the
same number of instructions in both computers:

• Which computer is faster for this program?

• What if Computer B required a 10% more instructions than Computer A?

Solution.
2 instructions 1 cycle instructions
Computer A performs 1 cycle × 10−9 seconds
= 2 × 109 second
1.25 instructions 1 cycle instructions
Computer B performs 1 cycle × 600×10−12 seconds
= 2.08 × 109 second

Computer B performs more instructions per second, thus it is the fastest for this program.

Now, let’s n be the number of instructions required by Computer A, and 1.1 × n the number
n
of instructions required by Computer B. The program will take 2×10 9 seconds in Computer
n
A and 1.89×109 seconds in Computer B. Therefore, in this scenario, Computer A executes the
program faster.

2. Speedup.
Assume the runtime of an application for a problem is 100 seconds for problem size 1. It
consists of an initialization phase which lasts for 10 seconds and cannot be parallelized, and a
problem solving phase which can be perfectly parallelized and grows quadratic with growing
problem size.

• What is the speedup for the given application as a function of the number of processors
p and the problem size n.
• What is the execution time and speedup of the application with problem size 1, if it is
parallelized and run on 4 processors?
• What is the execution time of the application if the problem size is increased to 4 and it is
run on 4 processors? And on 16 processors? What is the speedup of both measurements?

Solution.
The application has an inherently sequential part (cs ) that takes 10 seconds, and a paralleliz-
able part (cp ) that takes 90 seconds for problem size 1. Since, the parallelizable part grows
quadratically with the problems size, we can model T1 (execution time in 1 processor) as:

cs + cp × n2 .

The function of the speedup is, thus,

1
T (1, n) cs + cp × n2 10 + 90 × n2
S(p, n) := := ≡ .
T (p, n) cs + (cp × n2 )/p 10 + (90 × n2 )/p
For problem size 1 (n = 1) and 4 processors (p = 4), the execution time is 32.5 seconds. The
achieved speedup is 3.08.
Finally, if problem size is increased to 4, the execution time and speedup using 4 and 16
processors is:

• 4 processors: 370 seconds, speedup of 3.92.

• 16 processors: 100 seconds, speedup of 14.5.

3. Amdahl’s law. Assume an application where the execution of floating-point instructions on

a certain processor P consumes 60% of the total runtime. Moreover, let’s assume that 25%
of the floating-point time is spent in square root calculations.
• Based on some initial research, the design team of the next-generation processor P 2
believes that it could either improve the performance of all floating point instructions
by a factor of 1.5 or alternatively speed up the square root operation by a factor of 8.
From which design alternative would the aforementioned application benefit the most?
• Instead of waiting for the next processor generation the developers of the application
decide to parallelize the code. What speedup can be achieved on a 16-CPU system,
if 90% of the code can be perfectly parallelized? What fraction of the code has to be
parallelized to get a speedup of 10?
Solution.
1
Amdahl’s law: Sp (n) := β+(1−β)/p .

• Improvement 1 (all fp instructions sped up by a factor 1.5). Sequential part (β): 0.4.
p = 1.5. The application would observe a total speedup of:

1 1
Sp (n) := = = 1.25.
β + (1 − β)/p 0.4 + (1 − 0.4)/1.5
• Improvement 2 (square root instructions sped up by a factor of 8). Sequential part (β):
0.4 + 0.45. p = 8. The application would observe a total speedup of:

1 1
Sp (n) := = = 1.15.
β + (1 − β)/p .85 + (1 − 0.85)/8
Thus, the application would benefit the most from the first alternative.
Parallelization of code. The speedup achieved on a 16-CPU system is:

1 1
Sp (n) := = = 6.4.
β + (1 − β)/p 0.1 + (1 − 0.1)/16
To attain a speedup of 10, a 96% of the code would need to be perfectly parallelizable. This
value is obtained by solving the equation:

1
10 == .
β + (1 − β)/16

2
4. Efficiency. Consider a computer that has a peak performance of 8 GFlops/s. An application
running on this computer executes 15 TFlops, and takes 1 hour to complete.

• How many GFlops/s did the application attain?

• Which efficiency did it achieve?

Solution.
15 TFlops/s
The application attained: 3600 s = 4.26 GFlops/s.
4.26 GFlops/s
The achieved efficiency is: 8 GFlops/s = 53%.

5. Parallel efficiency. Given the data in Tab. 1, use your favorite plotting tool to plot

a) The scalability of the program (speedup vs number of processors)

b) The parallel efficiency attained (parallel efficiency vs number of processors)

In both cases plot also the ideal case, that is, scalability equal to the number of processors
and parallel efficiency equal to 1, respectively.

# Processors Best seq.(1) 2 4 8 16

# GFlops/s 4.0 7.6 14.9 23.1 35.6

Table 1: Performance attained vs number of processors.

Solution.
Table 2 includes the speedup and parallel efficiency attained. Figure 1 gives an example of
the requested plots.

# Processors Best seq.(1) 2 4 8 16

# Speedup 1 1.9 3.725 5.775 8.9
# Par. Eff. 1 0.95 0.93 0.72 0.56

Table 2: Performance attained vs number of processors.

3
16 Parallel efficiency 1

0.75
Speedup

8 0.5

4 0.25
2
1

1 2 4 8 16 1 2 4 8 16
Number of processors Number of processors

Figure 1: Scalability and parallel efficiency.

Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
UNIT-2 Parallel Programming Challenges
No ratings yet
UNIT-2 Parallel Programming Challenges
32 pages
Pepper Presentation
No ratings yet
Pepper Presentation
38 pages
Pc7 Performance
No ratings yet
Pc7 Performance
50 pages
Performance Evaluation of Parallel Computers
No ratings yet
Performance Evaluation of Parallel Computers
37 pages
Homework 1 Uchenna Ogunka 227001144 CSCE 685 Department of Mechanical Engineering Texas A&M University, College Station
No ratings yet
Homework 1 Uchenna Ogunka 227001144 CSCE 685 Department of Mechanical Engineering Texas A&M University, College Station
8 pages
Lecture04 PDF
No ratings yet
Lecture04 PDF
27 pages
Lecture 02-Amdahl's Law, Modern Hardware: ECE 459: Programming For Performance
No ratings yet
Lecture 02-Amdahl's Law, Modern Hardware: ECE 459: Programming For Performance
13 pages
Lecture-11 Amdhals Law Gustafsons Law
No ratings yet
Lecture-11 Amdhals Law Gustafsons Law
16 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
Principles of Computer Architecture-Assignment 1
No ratings yet
Principles of Computer Architecture-Assignment 1
11 pages
Unit 4
No ratings yet
Unit 4
64 pages
Parallel2 PDF
No ratings yet
Parallel2 PDF
16 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
Lect 02
No ratings yet
Lect 02
51 pages
Module 1 Chapter3
No ratings yet
Module 1 Chapter3
45 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
2.0 DD2356 DiscussingSpeedUp
No ratings yet
2.0 DD2356 DiscussingSpeedUp
13 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Performance Analysis: PE PE
No ratings yet
Performance Analysis: PE PE
10 pages
34-Amdahl''s Law-10-04-2023
No ratings yet
34-Amdahl''s Law-10-04-2023
9 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
PDC Week 2 (Performance Metrice, Amdahl's Law)
No ratings yet
PDC Week 2 (Performance Metrice, Amdahl's Law)
18 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
HPC 4th Unit - 240504 - 160030
No ratings yet
HPC 4th Unit - 240504 - 160030
19 pages
Document
No ratings yet
Document
10 pages
all numerical unit-1
No ratings yet
all numerical unit-1
28 pages
performance metrics
No ratings yet
performance metrics
34 pages
Performance and Scalability Class
No ratings yet
Performance and Scalability Class
63 pages
OOAD
No ratings yet
OOAD
67 pages
Lecture 4 - Parallel Computing Metrics
No ratings yet
Lecture 4 - Parallel Computing Metrics
3 pages
UNIT-2 ACA
No ratings yet
UNIT-2 ACA
24 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
COE4590_12_Amdahls_Law
No ratings yet
COE4590_12_Amdahls_Law
18 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Lecture # 21 (1)
No ratings yet
Lecture # 21 (1)
16 pages
2nd
No ratings yet
2nd
19 pages
Lec 3
No ratings yet
Lec 3
21 pages
CS-3006_4_PerformanceAnalysis
No ratings yet
CS-3006_4_PerformanceAnalysis
62 pages
CSA HW 6
No ratings yet
CSA HW 6
2 pages
CS-3006_10_PerformanceAnalysis
No ratings yet
CS-3006_10_PerformanceAnalysis
52 pages
Algae Notes
100% (2)
Algae Notes
20 pages
Laraib Cs - 39 Assig 1
No ratings yet
Laraib Cs - 39 Assig 1
4 pages
06 CA (Performance Enhancement)
No ratings yet
06 CA (Performance Enhancement)
31 pages
Nce 400896
100% (1)
Nce 400896
42 pages
Parallel Algorithm Analysis
No ratings yet
Parallel Algorithm Analysis
11 pages
A Primer On The Calculus of Variations and Optimal Control Theory
No ratings yet
A Primer On The Calculus of Variations and Optimal Control Theory
274 pages
Princes of The Apocalypse - Richard Baker (001-077)
67% (3)
Princes of The Apocalypse - Richard Baker (001-077)
77 pages
ACA Answer Key
No ratings yet
ACA Answer Key
24 pages
New Holland TX ELECTRICAL CENTRAL BOX CIRCUIT BOARDS WIRING LOOM
100% (1)
New Holland TX ELECTRICAL CENTRAL BOX CIRCUIT BOARDS WIRING LOOM
3 pages
Unit 1 - Part 3
No ratings yet
Unit 1 - Part 3
17 pages
Computer Architecture Unit 1 - Phase 2 PDF
No ratings yet
Computer Architecture Unit 1 - Phase 2 PDF
26 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Sheet 1
No ratings yet
Sheet 1
6 pages
Principles of Scalable Performance
0% (1)
Principles of Scalable Performance
7 pages
Chapter 1 Lecture 2 & 3 - Computer Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Computer Performance
37 pages
Mid-Sem1
No ratings yet
Mid-Sem1
2 pages
1111 Angel Attunement S PDF
100% (6)
1111 Angel Attunement S PDF
9 pages
02 Gustafsons Law
No ratings yet
02 Gustafsons Law
2 pages
Chapter 1 Lecture 2 & 3 - Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Performance
36 pages
Design Problem1 of CSE261 "Computer System Architecture"
No ratings yet
Design Problem1 of CSE261 "Computer System Architecture"
4 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Haltertoppattern A4
No ratings yet
Haltertoppattern A4
10 pages
Simcenter Nastran 2019.1: Parallel Processing Guide
No ratings yet
Simcenter Nastran 2019.1: Parallel Processing Guide
112 pages
01ec 370 Ans1
No ratings yet
01ec 370 Ans1
36 pages
PCB:05223-01 Project Code:91.4G301.001: Intel 910GML
No ratings yet
PCB:05223-01 Project Code:91.4G301.001: Intel 910GML
40 pages
A Solution To The Problem of Parallel Programming: November 2018
No ratings yet
A Solution To The Problem of Parallel Programming: November 2018
13 pages
Periodicity PDF
No ratings yet
Periodicity PDF
25 pages
Advanced Industrial and Engineering Polymer Research: Manas Chanda
No ratings yet
Advanced Industrial and Engineering Polymer Research: Manas Chanda
18 pages
NKV 550 Catalogue
No ratings yet
NKV 550 Catalogue
8 pages
Vocabulary Test 40
No ratings yet
Vocabulary Test 40
13 pages
practice exam hse201]
No ratings yet
practice exam hse201]
13 pages
input output Practice Questions by Radhey sir
No ratings yet
input output Practice Questions by Radhey sir
9 pages
Supply Chain Management Final Assignment
No ratings yet
Supply Chain Management Final Assignment
12 pages
DLP_English_Text TypesQ4M5
No ratings yet
DLP_English_Text TypesQ4M5
12 pages
Ktunotes - In: Apj Abdul Kalam Technological University
No ratings yet
Ktunotes - In: Apj Abdul Kalam Technological University
21 pages
Abdur Rahman SarkarID 1637720115department of Computer Science Engineering
No ratings yet
Abdur Rahman SarkarID 1637720115department of Computer Science Engineering
29 pages
Gujarat Alkalies and Chemical LTD
No ratings yet
Gujarat Alkalies and Chemical LTD
7 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Assignment 1: Sample Solution
No ratings yet
Assignment 1: Sample Solution
8 pages
(06012023 0716) Efae23302 Ans
No ratings yet
(06012023 0716) Efae23302 Ans
4 pages
Eot II 2024; s.1 Phy 1
No ratings yet
Eot II 2024; s.1 Phy 1
3 pages
Quantachrome
No ratings yet
Quantachrome
4 pages
CPOC - Certificates Register - Updated On 28-Jan-17
No ratings yet
CPOC - Certificates Register - Updated On 28-Jan-17
8 pages
Offer Set AU
No ratings yet
Offer Set AU
6 pages
PAL003A - Integrated Circuits (ICs) - Others - Acme-Chip - Com - Acme Chip
No ratings yet
PAL003A - Integrated Circuits (ICs) - Others - Acme-Chip - Com - Acme Chip
1 page
Biological Filters For Aquaculture
No ratings yet
Biological Filters For Aquaculture
16 pages
Book 1
No ratings yet
Book 1
9 pages
BeerandPretzelsGame ModernMicroArmour
No ratings yet
BeerandPretzelsGame ModernMicroArmour
8 pages
Concrete Pouring Under Water
No ratings yet
Concrete Pouring Under Water
9 pages
Sanitary & WS Works
No ratings yet
Sanitary & WS Works
6 pages
Visual Flame Detection Turbine Enclosures
No ratings yet
Visual Flame Detection Turbine Enclosures
6 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)

HW2 Solutions

Uploaded by

HW2 Solutions

Uploaded by

Parallel Programming WS15 HOMEWORK #2 (Solutions)

• Which computer is faster for this program?

The function of the speedup is, thus,

• 4 processors: 370 seconds, speedup of 3.92.

3. Amdahl’s law. Assume an application where the execution of floating-point instructions on

• How many GFlops/s did the application attain?

a) The scalability of the program (speedup vs number of processors)

# Processors Best seq.(1) 2 4 8 16

Table 1: Performance attained vs number of processors.

# Processors Best seq.(1) 2 4 8 16

Table 2: Performance attained vs number of processors.

Figure 1: Scalability and parallel efficiency.

You might also like