0% found this document useful (0 votes)
2 views

Lecture 3 and 4HPC

The document outlines the design stages of parallel algorithms in High Performance Computing, which include decomposition, communication, agglomeration, and scheduling. It discusses various decomposition techniques such as domain, functional, recursive, and hybrid decomposition, along with the concept of task dependency graphs. Performance analysis metrics like parallel runtime, speedup, efficiency, and cost are also explained to evaluate the effectiveness of parallel algorithms.

Uploaded by

Nour Abdallah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 3 and 4HPC

The document outlines the design stages of parallel algorithms in High Performance Computing, which include decomposition, communication, agglomeration, and scheduling. It discusses various decomposition techniques such as domain, functional, recursive, and hybrid decomposition, along with the concept of task dependency graphs. Performance analysis metrics like parallel runtime, speedup, efficiency, and cost are also explained to evaluate the effectiveness of parallel algorithms.

Uploaded by

Nour Abdallah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

High Performance Computing

Lecture 3 & 4
Introduction to High Performance
Computing

Dr/ Mahmoud Gamal

1
Design Stages:
• Design parallel algorithms can be structured by the
following four stages:
1) Decomposition.
2) Communication
3) Agglomeration.
4) Scheduling.

2
Design Stages:
1- Decomposition (partitioning):
• Decompose the problem into small tasks that can be
executed concurrently.
• Task: indivisible sequential computation unit.

3
Design Stages:
2- Communication:
• Determine communication required to coordinate task
execution.

4
Design Stages:
3- Agglomeration:
• Combine tasks into larger tasks to improve the
performance or to reduce communication cost.
• Also determine whether it is worthwhile to replicate data
and/or computation.

5
Design Stages:
4- Scheduling (Mapping):
• Assign each task to a processor in a manner that
minimizes execution time (by minimizing communication
and idling).

6
Design Stages:

7
Decomposition techniques:
1) Domain Decomposition.
2) Functional Decomposition.
3) Recursive Decomposition
4) Hybrid Decomposition:

8
Decomposition techniques:
1- Domain Decomposition:
• Decompose the data associated with a problem {Block,
Cyclic), then each parallel task works on a portion of
the data.

9
Decomposition techniques:
1- Domain Decomposition:
• Use the owner computes rule (the process assigned a
particular data item is responsible for all computation
associated with it).

10
Decomposition techniques:
2- Functional Decomposition:
• Decompose the problem according to the
communication must be done, then each task performs a
portion of the overall work.
• Consider data dependences (two memory accesses are
involved in a data dependence if they may refer to the
same memory location and one of the accesses is a
write).

11
Decomposition techniques:
3- Recursive Decomposition:
• A method for inducing concurrency in problems that
can be solved using the divide-and-conquer strategy.

12
Decomposition techniques:
4- Hybrid Decomposition:
• A mix of decomposition techniques.

13
Task Dependency Graph:
• A task-dependency graph is a directed acyclic graph in
which the nodes represent tasks, and the directed edges
indicate the dependencies (communication) amongst
them.
• The task corresponding to a node can be executed when
all tasks connected to this node by incoming edges have
completed.

14
Example:
Database query processing:

15
Decomposition(a):

Data Decomposition

Recursive Decomposition

16
Decomposition(b):

17
Task dependency graphs:

18
Task dependency graphs:
• The maximum number of tasks (processors) that can be
executed simultaneously in a parallel program at any
given time is known as its maximum degree of
concurrency.
Decomposition Decomposition
(a) (b)
Shortest time 27-time units 34-time units

No of processors 4 4
Maximum Concurrency 4 4

19
Scheduling (Mapping):

20
Performance Analysis:
• To compare between two/more parallel algorithms for
the same problem, hers comes the analysis.
• Usually the following metrics are used:
Parallel runtime (complexity)(𝑇𝑝 )
• The estimated execution time that elapses between the
algorithm's start and termination.
𝑻𝒑 = 𝑻𝒄𝒐𝒎𝒑 + 𝑻𝒄𝒐𝒎
➢𝑇𝒄𝒐𝒎𝒑 : Computation time.
➢𝑇𝒄𝒐𝒎 : Communication time.

21
Performance Analysis:
Speedup (S):
• The ratio of the time taken to solve a problem on a
single processor (𝑇𝑠 ) to the time required to solve the
same problem on a parallel computer with p identical
processing elements (𝑇𝑝 )
𝑻𝒔
𝒔=
𝑻𝒑
Efficiency:
• A measure of the fraction of time for which a
processing element is usefully employed.

𝒔
𝑬=
𝑷
22
Performance Analysis:
Cost (C):
• The product of parallel runtime and the number of
processing elements used.
𝑪 = 𝐩 ∗ 𝑻𝒑

❖A parallel algorithm is said to be optimal if its cost


is asymptotically identical to serial cost (Sequential
time).

23
Example:
Adding n numbers by using a logical binary tree of n
processing elements.
Solution:
• 𝑻𝒔 = 𝑶 𝒏
• 𝑻𝒑 = 𝑶 (log 𝑛)
𝑻𝒔 𝑶 𝒏 𝑛
•𝒔= = = O( )
𝑻𝒑 𝑶 (log 𝑛) log 𝑛
𝑛
𝒔 O( ) 1
log 𝑛
•𝑬= = = O( )
𝑷 𝑛 log 𝑛
• 𝑪 = 𝐩 ∗ 𝑻𝒑 = 𝑶 (𝑛 log 𝑛)
• This parallel algorithm is not optimal (𝐶 ≠ 𝑛)

24

You might also like