410A Week 5

The document discusses performance and scalability in parallel programming, defining scalability in terms of efficiency (E) and overhead. It outlines Foster's Methodology for parallel program design, emphasizing task partitioning, communication, and aggregation. Additionally, it covers shared memory parallel programming using POSIX threads, including thread creation, termination, and communication, with a focus on matrix-vector multiplication as a practical example.

Uploaded by

261905138

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views23 pages

410A Week 5

Uploaded by

261905138

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Performance


Scalability

Informal definition: More powerful system means more speedup

Formal definition:

We fix p and N, and find E

We increase p

If there is a new N so that E remains same, program is scalable
Performance


Scalability

We know: E = S / p, and S = Tserial/Tparallel, and Tparallel = Tserial / p + Toverhead

E = Tserial / (pTparallel) = Tserial / (Tserial + p Toverhead)

Tserial is a function of input size N

E = Tserial(N) / [Tserial(N) + p Toverhead]

Let’s consider Bubble Sort for which Tserial(N) ≈ N2

E = N2 / [N2 + p Toverhead]
Performance

Scalability

Let’s increase p by a factor of k and N by a factor of x

Let’s assume Toverhead changes by a factor of m

Enew = (xN)2 / [(xN)2 + kp mToverhead] = x2N2 / [x2N2 + (km) p Toverhead]

If x2 = km, then Enew is equal to E, and program is scalable

If E remains same without increasing N, program is strongly scalable

If E remains same by increasing N at the same rate as p, program is
weakly scalable
Timing Parallel Programs

To find know behavior during dev (is there a bottleneck)

To evaluate performance after dev

We want time the data processing part of the code, not the I/O

Generally not interested in CPU time

It includes libraries and system call time as well

It doesn’t include the idle time (could be a problem)
Timing Parallel Programs

APIs provide functions to time the code

MPI_Wtime, omp_get_wtime are two examples

Both return wall clock time and not the CPU time

Timer resolution is an important parameter

Linux provides timers with nano-second resolution

Need to check resolution before using
Timing Parallel Programs

In parallel programs code is run by multiple threads/processes

We want to time all threads/processes

In distributed-memory programs nodes have independent clocks

Every run gives different set of values (expected)

Report the minimum value
Parallel Program Design

Foster’s Methodology

Partitioning (divide data or tasks)

Communication (what type is needed among tasks)

Aggregation or agglomeration (combine tasks into composite
tasks)

Mapping (assign tasks to processors)
Parallel Program Design

Foster’s Methodology (an example)

We want histogram of a large array of floats

First focus on serial solution (very simple and straight-forward)

Input to the program

1. Number of elements in array 2. Array of floats 3. Minimum
value 4. Maximum value 5. Number of bins

Output of the program is an array of number of items in each bin
Parallel Program Design

Foster’s Methodology (an example)

If the data items are 1.3, 2.9, 0.4,
0.3, 1.3, 4.4, 1.7, 0.4, 3.2, 0.3, 4.9,
2.4, 3.1, 4.4, 3.9, 0.4, 4.2, 4.5, 4.9,
0.9 the histogram will look like →
Parallel Program Design

Foster’s Methodology (an example)

Using min_meas, max_meas values find bin_width

bin_width = (max_meas – min_meas) / bin_count;

Initialize an array bin_maxes to hold upper limits (floats)

for (b = 0 ; b < bin_count ; ++b)

bin_maxes[b] = min_meas + bin_width * (b + 1);

Initialize an array bin_counts for the item count in each bin
Parallel Program Design

Foster’s Methodology (an example)

The bin number b will have all floats n that satisfy the inequality

bin_maxes[b – 1] <= n < bin_maxes[b]

If b = 0 then the inequality is

min_meas <= n < bin_maxes[0]

Find_bin is a function that tells which bin the data item belongs

For small number of bins a linear search would be good enough
Parallel Program Design

Foster’s Methodology (an example)

for ( i = 0 ; i < data_count ; i ++) {
bin =
Find_bin(data[i],bin_maxes,bin_count,min_meas);
bin_counts[bin]++;
}
Parallel Program Design

Foster’s Methodology (an example)

Step 1: Two types of tasks (finding bin and updating count)

Step 2: Communication is through the variable bin

Step 3: Finding bin and ++count can be aggregated (sequential)
Parallel Program Design

Foster’s Methodology (an example)

Step 4: Okay Houston . . . We have a problem here!

Two threads might try to execute ++bin_counts[b] at once

If bin_count or thread count is not too big make bin_count
local

With 1000 bins and 500 threads we only add 500,000 integers
Parallel Program Design

Foster’s Methodology (an example)

We have added another task (increment local bin counts)

Necessary to avoid race condition and inaccurate bin counts

We will write the code for this problem soon using POSIX threads
Shared Memory Parallel Programming with POSIX Threads

POSIX Threads aka Pthreads is a widely available library

New features are added frequently

Sometimes called light-weight processes (not Solaris LWP)

We create a Pthread and give it a function to execute

If a Pthread completes its task, it can terminate itself

A Pthread can receive info from its parent

It can also return info to its parent

COMP410
Shared Memory Parallel Programming with POSIX Threads

A Pthread can receive info from its parent

It can also return info to its parent

Creating Pthread:

int pthread_create(pthread_t *restrict thread,
const pthread_attr_t *restrict
attr,
void *(*start_routine)(void *),
void *restrict arg);

The man page of this functionCOMP410
contains wealth of information
Shared Memory Parallel Programming with POSIX Threads

A thread terminates by calling
void pthread_exit(void *retval);

retval points to data which is available to a thread that joins

A thread can wait for another thread to terminate by calling
int pthread_join(pthread_t thread, void **retval);

If retval is not NULL, it points to the pointer returned by
pthread_exit

COMP410
Shared Memory Parallel Programming with POSIX Threads

This is how joining and
joined threads can
communicate

Not joining joinable threads
creates zombie threads

Too many zombies can
cause pthread_create to
fail
COMP410
Shared Memory Parallel Programming with POSIX Threads
Demo program

COMP410
Shared Memory: Matrix-vector multiplication


A fairly large number of disciplines need it

Let A have m rows and n columns and x have n rows

The product A*x is a vector with n rows (components)

The i-th component of A*x is the dot product of i-th row of A
and x

y[i] = a[i][0]*x[0]+a[i][1]*x[1]+...+a[i][n-
1]*x[n-1]
COMP410
Shared Memory: Matrix-vector multiplication


There are n products which are added to get one component of y

n2 products are computed to the final answer

Easily to parallelize

If A has m rows and there are p cores give m/p rows to each
core

Of course if ((m%p)!=0), some cores get more rows

COMP410
Shared Memory: Matrix-vector multiplication


Where to store A, x and y

For now make them global (see Exercise 4.2 for some issues)

Neither A nor x is modified by any thread

Each threads computes it assigned components of y

Multi-threaded matrix-vector multiplication code will be posted

We will measure execution time for various matrix sizes

We will also use this program to study cache effects

COMP410

High Performance Computing Labs & Concepts
No ratings yet
High Performance Computing Labs & Concepts
5 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
PDC Experiments
No ratings yet
PDC Experiments
11 pages
3.introduction To Parallelism
No ratings yet
3.introduction To Parallelism
64 pages
410A Week 6
No ratings yet
410A Week 6
31 pages
Mansi Kadam PC Lab Assignment 1
No ratings yet
Mansi Kadam PC Lab Assignment 1
4 pages
Parallel Computing: Lecture 4: Parallel Software: Basics
No ratings yet
Parallel Computing: Lecture 4: Parallel Software: Basics
31 pages
Revision Slides
No ratings yet
Revision Slides
25 pages
Apt05 2024S2
No ratings yet
Apt05 2024S2
23 pages
Lecture 9 - Parallel Algorithms
No ratings yet
Lecture 9 - Parallel Algorithms
28 pages
Debugging, Profiling, Performance Analysis, Optimization PDF
No ratings yet
Debugging, Profiling, Performance Analysis, Optimization PDF
56 pages
Pseudo Code of Mpi Programs
No ratings yet
Pseudo Code of Mpi Programs
22 pages
Parallel Programming Essentials
No ratings yet
Parallel Programming Essentials
40 pages
Simulating Ocean Currents
No ratings yet
Simulating Ocean Currents
35 pages
MAP - Unit2
No ratings yet
MAP - Unit2
134 pages
Lab01 PDF
No ratings yet
Lab01 PDF
5 pages
Unit 4
No ratings yet
Unit 4
42 pages
Parallel Computing: Types of Parallelism
No ratings yet
Parallel Computing: Types of Parallelism
27 pages
Why Parallel Computing?: Peter Pacheco
No ratings yet
Why Parallel Computing?: Peter Pacheco
84 pages
Parallel Computing with MPI
No ratings yet
Parallel Computing with MPI
26 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Concurrent Data Structures Report
No ratings yet
Concurrent Data Structures Report
23 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
5 pages
Parallel Computing Lab Manual PDF
100% (1)
Parallel Computing Lab Manual PDF
51 pages
RG2 ParallelizationPrinciples HPCAI Jan2020
No ratings yet
RG2 ParallelizationPrinciples HPCAI Jan2020
40 pages
HPC Int I Retest Answer Key
No ratings yet
HPC Int I Retest Answer Key
10 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
PC - Lab Manuall
No ratings yet
PC - Lab Manuall
15 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Parallel Computing Lab Manual
No ratings yet
Parallel Computing Lab Manual
26 pages
Parallel and Distributed Computing Lab Digital Assignment - 3
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
10 pages
PC Pgms
No ratings yet
PC Pgms
14 pages
F03 - Parallelizing A Sequential Algorithm and Multicore Architectures
No ratings yet
F03 - Parallelizing A Sequential Algorithm and Multicore Architectures
66 pages
Programming Assignment: On Openmp
No ratings yet
Programming Assignment: On Openmp
19 pages
02 Multicore
No ratings yet
02 Multicore
66 pages
Mid Sem QP&Solution
No ratings yet
Mid Sem QP&Solution
7 pages
2022 Mid 1
No ratings yet
2022 Mid 1
4 pages
Lab Manual
No ratings yet
Lab Manual
31 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
High Performance Computing For Computational Mechanics: ISCM-10
No ratings yet
High Performance Computing For Computational Mechanics: ISCM-10
63 pages
Chapter 1
No ratings yet
Chapter 1
39 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Parallel Processing Previous Year Question
No ratings yet
Parallel Processing Previous Year Question
11 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
01-Parallel Computing
No ratings yet
01-Parallel Computing
7 pages
Parallel & Distributed Computing Course
33% (3)
Parallel & Distributed Computing Course
10 pages
Ee8218 Lab2
No ratings yet
Ee8218 Lab2
7 pages
MPC LAB Manual New
No ratings yet
MPC LAB Manual New
24 pages
CSE524sp10 01
No ratings yet
CSE524sp10 01
62 pages
Mid 19
No ratings yet
Mid 19
3 pages
Document 15
No ratings yet
Document 15
5 pages
Chapter 23
No ratings yet
Chapter 23
24 pages
Introduction To Paralel Procesing
No ratings yet
Introduction To Paralel Procesing
40 pages
High Performance Computing Syllabus
No ratings yet
High Performance Computing Syllabus
35 pages
Week 5
No ratings yet
Week 5
15 pages
Angular Cli
No ratings yet
Angular Cli
2 pages
Placement (M) Preparation (P) Roadmap (N)
No ratings yet
Placement (M) Preparation (P) Roadmap (N)
4 pages
DBMS Architecture
100% (1)
DBMS Architecture
35 pages
Walstar Industry Report Ch1 To Ch6
No ratings yet
Walstar Industry Report Ch1 To Ch6
3 pages
Dept. of Computer Science & Engineering (CSE) : United International University (UIU)
No ratings yet
Dept. of Computer Science & Engineering (CSE) : United International University (UIU)
3 pages
WIA Module Error Log Analysis
No ratings yet
WIA Module Error Log Analysis
11 pages
Software
No ratings yet
Software
6 pages
Functional I/O with Monads
No ratings yet
Functional I/O with Monads
15 pages
Cloud-Native Apps for Developers
No ratings yet
Cloud-Native Apps for Developers
3 pages
Software Architect vs Engineer Guide
No ratings yet
Software Architect vs Engineer Guide
4 pages
Crisp Se-Kanban
No ratings yet
Crisp Se-Kanban
3 pages
Coding Interview Java (PART-2)
No ratings yet
Coding Interview Java (PART-2)
207 pages
Software Requirement Techniques
No ratings yet
Software Requirement Techniques
2 pages
Release Note FRAX - 2.5.2301
No ratings yet
Release Note FRAX - 2.5.2301
2 pages
Date and Time Functions
No ratings yet
Date and Time Functions
16 pages
KM4 1.1 Introduction To Scripting & Windows Command-Line
No ratings yet
KM4 1.1 Introduction To Scripting & Windows Command-Line
54 pages
Sun Training Guide Oct04
No ratings yet
Sun Training Guide Oct04
22 pages
Cis Rhel Linux RCL
No ratings yet
Cis Rhel Linux RCL
7 pages
Application Customizing (FI-SL)
No ratings yet
Application Customizing (FI-SL)
3 pages
Forti Link Compatibility
0% (1)
Forti Link Compatibility
1 page
Student Attendance Management System
No ratings yet
Student Attendance Management System
39 pages
DarshanHN (3y 7m)
No ratings yet
DarshanHN (3y 7m)
3 pages
Unit 2 (Kca-203
No ratings yet
Unit 2 (Kca-203
45 pages
B.tech AdmitCardsCollegeRollNo CSE 5TH
No ratings yet
B.tech AdmitCardsCollegeRollNo CSE 5TH
46 pages
BSCS Subjects
100% (2)
BSCS Subjects
20 pages
Mini Project Report
No ratings yet
Mini Project Report
61 pages
Paper 2 9 Markers
No ratings yet
Paper 2 9 Markers
7 pages
WSDL Guide for Developers
No ratings yet
WSDL Guide for Developers
23 pages
3rd Sem CT 2024
No ratings yet
3rd Sem CT 2024
5 pages

410A Week 5

Uploaded by

410A Week 5

Uploaded by

Performance

You might also like