0% found this document useful (0 votes)

24 views

05 - Lecture #5 - 6

The document discusses different types of static interconnection networks used in parallel computing platforms, including complete networks, star networks, linear arrays, rings, trees, and 2D and 3D meshes. It compares their properties such as diameter, bisection width, and cost, and explains why none are very practical on their own due to high cost or other limitations.

Uploaded by

Fatma mansour

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

05 - Lecture #5 - 6

Uploaded by

Fatma mansour

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

High Performance

Computing
LECTURE #5&6

1
Agenda
oPhysical Organization of Parallel Platforms.
oPRAM
o Interconnection Networks

2
Parallel Computing Platform

Parallel Computing Platform

3
1- PRAM
o The RAM contains a single processor operates under control of a sequential
algorithm. One instruction at a time is issued.

o The processor can load/store data from/to memory and can perform basic
arithmetic and logical operations.

oNatural extension of serial model (Random access machines RAM) → PRAM

4
Parallel Random Access Machine (PRAM)
o Group of p processors
o Processors share a common clock but may execute different
instructions in each cycle.
oThe PRAM was designed so that the user could design and analyze
parallel algorithms without concern for communication (either
between processors and memory or within sets of processors).

5
❖During the read phase,
o All P processors have the opportunity to read simultaneously a piece of data from
a memory location.
o Each processor places the data item into one of its registers.

❖During the write phase,

o Every processor can (simultaneously) write an item from one of its registers to
the global memory.

6
Memory Access (Resolving Data Access Conflicts)
❖ If two processors are trying to read from the same memory location,
should only one succeed? If so, which?

❖ If two processors are trying to write to the same memory location,

which one succeeds? Is a processor notified if it doesn’t succeed?

1. Exclusive-read, exclusive-write (EREW) PRAM.

2. Concurrent-read, exclusive-write (CREW) PRAM.
3. Exclusive-read, concurrent-write (ERCW) PRAM.
4. Concurrent-read, concurrent-write (CRCW) PRAM.

7
Exclusive Read (ER) Concurrent read (CR)
only one processor is allowed to Multiple processors are allowed to
read from a given memory read from the same memory
location during a cycle. location during a clock cycle.

Excusive Write (EW) Concurrent Write (CW)

The exclusive write model allows The concurrent write model allows
only one processor to write to a multiple processors to write to the
given memory location during a same memory location
clock cycle. simultaneously
8
Memory Access (Resolving Data Access Conflicts)
A variety of policies have been used to resolve such write conflicts:
some of the popular ones:
Priority CW: Processor are organized into a predefined priority list.
processor with the highest priority succeeds and rest of processor fails.

Common CW: all processors attempting a simultaneously write to a

given memory location will write the identical value.

Arbitrary CW: This model assumes that if multiple processors try to

write simultaneously to a given memory location, then one of them,
arbitrarily, will succeed.

Sum: Write the sum (or any associative operator) of all data items.

9
PRAM Model… Is it Practical??

❖Consider implementation of EREW PRAM with P processor & global memory

of m words
◦ Processors are connected to memory through a set of switches

◦ These switches determined memory word being accessed by each

processor

◦ To ensure such connectivity , the total number of switches must be (m*P)

◦ For a reasonable memory size , constructing a switching network of this

complexity is very expensive

Thus PRAM model are impossible to realize in practice

10
Interconnection Networks
❖Provide another mechanisms for data transfer between processors
and memory modules

11
❖ Interconnection networks can be classified as static or dynamic.

❖ Static networks:
✓ Consist of point-to-point communication links among processing nodes.

✓ Referred to as direct networks.

❖ Dynamic networks:
✓ Are built using switches and communication links.

✓ Communication links are connected to one another dynamically by the

switches to establish paths among processing nodes and memory banks.

✓ Dynamic networks are also referred to as indirect networks.

12
8.

13
14
Network Topologies

❖A variety of network topologies have been proposed and

implemented.

❖These topologies tradeoff performance for cost.

❖Commercial machines often implement hybrids of multiple topologies

for reasons of packaging, cost, and available components.

who is connected to whom

15
A- Static Interconnection Networks
Evaluating Static Interconnection Networks
❖Diameter: The maximum distance between any two processing nodes in
the network. (number of hops through which a message in transferred on
its way from one point to another )

❖Bisection Width: The minimum number of wires you must cut to divide
the network into two equal parts.

❖Connectivity: the multiplicity of paths between any two processing nodes

❖Cost: The number of links or switches besides the length of wires, etc., are
factors in to the cost.

16
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks
❖Cache Coherence in Multiprocessor Systems

17
A- Static Interconnection Networks
1. Complete network (clique)
2. Star network
3. Linear array
4. Ring
5. Tree
6. 2D & 3D mesh/torus
7. Hypercube
8. Fat tree

18
A- Static Interconnection Networks

1- Completely Connected
❖ Each processor is connected to every other processor.

❖While the performance scales very well, the hardware

complexity is not realizable for large values of p.

❖Completely connected networks are static counterparts of

crossbar.

19
A- Static Interconnection Networks

2-Star
❖ Every node is connected only to a common node at the center.

❖ Distance between any pair of nodes is O(1).

❖ The central node becomes a bottleneck.

❖ In this sense, star connected networks are static counterparts

of buses.

20
A- Static Interconnection Networks

3- linear
❖ Each node has two neighbors, one to its left and one to its right.

4- Ring (1D)
❖ It is linear but the nodes at either end are connected.

21
A- Static Interconnection Networks

5- 2D & 3D mesh
❖ Has nodes with 4 neighbors, to the north, south, east, and west.

❖Good match for discrete simulation and matrix operations

❖ Easy to manufacture and extend

❖ Examples: Cray 3D (3d torus), Intel Paragon (2D mesh)

22
A- Static Interconnection Networks
6.Hypercubes
❖A special case of a d-dimensional mesh is a hypercube.

Here, d = log2 p, where p is the total number of nodes.

❖Each node has log p neighbors.

❖The distance between two nodes is given by the number

of bit positions at which the two nodes differ.

❖costly/difficult to manufacture for high n, not so popular

nowadays
23
24
7- Tree A- Static Interconnection Networks

❖ The distance between any two nodes is no more than 2log

p.
❖ Links higher up the tree potentially carry more traffic than
those at the lower levels.
❖ For this reason, a variant called a fat-tree, fattens the links
as we go up the tree.
❖ Trees can be laid out in 2D with no wire crossings. This is an
attractive property of trees.

❖Thus tree suffers from a communication bottleneck at

higher levels of the tree (specially if right part of tree try
sending to left part) [problem]
Solution a FAT tree increased number of communication links and switching
nodes closer to root 25
9.Fat Tree Network A- Static Interconnection Networks

❖ In the pervious tree networks there was only

one path between any two pairs of nodes

❖To send message, the source node send the

message up tree until it reaches the node at
root of both source and destination then
message is routed down the tree

26
Static Interconnection Networks
1. Complete network (clique)
2. Star network none of them is very practical due to some
3. Linear array reasons “cost ..”

4. Ring
5. Tree
6. 2D & 3D mesh/torus
7. Hypercube
8. Fat tree

27
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks
❖Cache Coherence in Multiprocessor Systems

28
A- Static Interconnection Networks
Evaluating Static Interconnection Networks
❖Diameter: The maximum distance between any two processing nodes in
the network. (number of hops through which a message in transferred on
its way from one point to another )

❖Bisection Width: The minimum number of wires you must cut to divide
the network into two equal parts.

❖Connectivity: the multiplicity of paths between any two processing nodes

❖Cost: The number of links or switches besides the length of wires, etc., are
factors in to the cost.

29
Calculate it A- Static Interconnection Networks

30
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks

❖Cache Coherence in Multiprocessor Systems

31
B- Dynamic Interconnection Networks

1. Bus based network

2. Crossbar network

3. Multistage networks

32
B- Dynamic Interconnection Network
1.Bus based networks
❖ Some of the simplest and earliest parallel machines used
buses.

❖ All processors access a common bus for exchanging data.

❖ The distance between any two nodes is O(1) in a bus.

❖ However, the bandwidth of the shared bus is a major (a) with no local caches;
bottleneck
(scalable in terms of cost and non scalable in terms of
performance).

❖ Ex: Sun Enterprise servers and Intel Pentium based shared-

bus
(b) with local memory/caches.
33
B- Dynamic Interconnection Network

2.Crossbar networks
❖ Simple way to connect P processor to b
memory banks
❖ Total number of switching nodes required
to implement such network is O(pb).
❖ As the number of processing nodes
becomes large, this switch complexity is
difficult to realize at high data rates.

❖Consequently, crossbar networks are not

very scalable in terms of cost.

34
B- Dynamic Interconnection Network
3- Multistage networks
❖ Crossbars have excellent performance scalability but poor cost scalability.

❖ Buses have excellent cost scalability, but poor performance scalability.

❖ Multistage interconnects strike a compromise between these extremes.

❖It is more scalable than the bus in terms of performance and more scalable
than the crossbar in terms of cost.

36
B- Dynamic Interconnection Networks

❖ One of the most commonly used multistage interconnects is the Omega

network.

❖ This network consists of log p stages, where p is the no of processor and

no of memory).

❖Every stage consists of an interconnection pattern connect p input & p

output

❖ At each stage, input i is connected to output j if the following is true:

37
B- Dynamic Interconnection Network

→ Number of stages = log2 p

→ Number of switches at each stage = p/2
→ Total Number of Switches =(P/2) * (log2 P)

Perfect shuffle interconnection for eight inputs and outputs 38

B- Dynamic Interconnection Network

❖ Simple routing algorithm is done by comparing the

bit-level representation of source and destination
addresses.

❖In one mode, the inputs are sent straight through to

the outputs, as shown “ pass-through connection”.

❖In the other mode, the inputs to the switching node

are crossed over and then sent out “cross-
over connection”

39
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks

❖Cache Coherence in Multiprocessor Systems

40
B- Dynamic Interconnection Network

Evaluating Dynamic Interconnection Networks

41
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks

❖Cache Coherence in Multiprocessor Systems

Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Lecture 5
No ratings yet
Lecture 5
72 pages
Parallel Architecture: Sathish Vadhiyar
No ratings yet
Parallel Architecture: Sathish Vadhiyar
26 pages
Chapter 2- Communication Models
No ratings yet
Chapter 2- Communication Models
64 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Unit 1
No ratings yet
Unit 1
25 pages
Lecture 5 Network Topologies for Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies for Parallel Architectures - Updated
46 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
Interconnection Networks
No ratings yet
Interconnection Networks
31 pages
Notes 02
No ratings yet
Notes 02
9 pages
L2 Parallel Computing Models
No ratings yet
L2 Parallel Computing Models
31 pages
Slides Chapter 2 - Parallel Programming Platforms
No ratings yet
Slides Chapter 2 - Parallel Programming Platforms
33 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Lecture 6 - Interconnection Networks
No ratings yet
Lecture 6 - Interconnection Networks
50 pages
CS621 Final Term
No ratings yet
CS621 Final Term
111 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Introduction
No ratings yet
Introduction
46 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Paralle Processing in Brief
No ratings yet
Paralle Processing in Brief
31 pages
Chapter 7
No ratings yet
Chapter 7
97 pages
Parallel Algorithms: Peter Harrison and William Knottenbelt
No ratings yet
Parallel Algorithms: Peter Harrison and William Knottenbelt
65 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
No ratings yet
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
21 pages
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
No ratings yet
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
31 pages
Parallel Processing Lecture3
No ratings yet
Parallel Processing Lecture3
54 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
Par Seq Algorithms
No ratings yet
Par Seq Algorithms
44 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Ram, Pram, and Logp Models
No ratings yet
Ram, Pram, and Logp Models
72 pages
Unit 5
No ratings yet
Unit 5
89 pages
Unit VI
No ratings yet
Unit VI
12 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Chapter 3
No ratings yet
Chapter 3
57 pages
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
No ratings yet
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
42 pages
Distributed System
100% (1)
Distributed System
26 pages
Chapter 2
No ratings yet
Chapter 2
40 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
Lec3 InnerconnectionNetworks
No ratings yet
Lec3 InnerconnectionNetworks
28 pages
Chapter 03
No ratings yet
Chapter 03
68 pages
Module 4
No ratings yet
Module 4
40 pages
1 Module 1 Introduction To Multiprocessors September 29 2024
No ratings yet
1 Module 1 Introduction To Multiprocessors September 29 2024
29 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
IntroDistribuetComputing
No ratings yet
IntroDistribuetComputing
41 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
comporg6_ch12
No ratings yet
comporg6_ch12
36 pages

05 - Lecture #5 - 6

Uploaded by

05 - Lecture #5 - 6

Uploaded by

High Performance

Parallel Computing Platform

oNatural extension of serial model (Random access machines RAM) → PRAM

❖During the write phase,

❖ If two processors are trying to write to the same memory location,

1. Exclusive-read, exclusive-write (EREW) PRAM.

Excusive Write (EW) Concurrent Write (CW)

Common CW: all processors attempting a simultaneously write to a

Arbitrary CW: This model assumes that if multiple processors try to

❖Consider implementation of EREW PRAM with P processor & global memory

◦ These switches determined memory word being accessed by each

◦ To ensure such connectivity , the total number of switches must be (m*P)

◦ For a reasonable memory size , constructing a switching network of this

Thus PRAM model are impossible to realize in practice

✓ Referred to as direct networks.

✓ Communication links are connected to one another dynamically by the

✓ Dynamic networks are also referred to as indirect networks.

❖A variety of network topologies have been proposed and

❖These topologies tradeoff performance for cost.

❖Commercial machines often implement hybrids of multiple topologies

who is connected to whom

❖Connectivity: the multiplicity of paths between any two processing nodes

❖While the performance scales very well, the hardware

❖Completely connected networks are static counterparts of

❖ Distance between any pair of nodes is O(1).

❖ The central node becomes a bottleneck.

❖ In this sense, star connected networks are static counterparts

❖Good match for discrete simulation and matrix operations

❖ Easy to manufacture and extend

❖ Examples: Cray 3D (3d torus), Intel Paragon (2D mesh)

Here, d = log2 p, where p is the total number of nodes.

❖Each node has log p neighbors.

❖The distance between two nodes is given by the number

❖costly/difficult to manufacture for high n, not so popular

❖ The distance between any two nodes is no more than 2log

❖Thus tree suffers from a communication bottleneck at

❖ In the pervious tree networks there was only

❖To send message, the source node send the

❖Connectivity: the multiplicity of paths between any two processing nodes

❖Cache Coherence in Multiprocessor Systems

1. Bus based network

❖ All processors access a common bus for exchanging data.

❖ The distance between any two nodes is O(1) in a bus.

❖ Ex: Sun Enterprise servers and Intel Pentium based shared-

❖Consequently, crossbar networks are not

❖ Buses have excellent cost scalability, but poor performance scalability.

❖ Multistage interconnects strike a compromise between these extremes.

❖ One of the most commonly used multistage interconnects is the Omega

❖ This network consists of log p stages, where p is the no of processor and

❖Every stage consists of an interconnection pattern connect p input & p

❖ At each stage, input i is connected to output j if the following is true:

→ Number of stages = log2 p

Perfect shuffle interconnection for eight inputs and outputs 38

❖ Simple routing algorithm is done by comparing the

❖In one mode, the inputs are sent straight through to

❖In the other mode, the inputs to the switching node

❖Cache Coherence in Multiprocessor Systems

Evaluating Dynamic Interconnection Networks

❖Cache Coherence in Multiprocessor Systems

You might also like