0% found this document useful (0 votes)
24 views

05 - Lecture #5 - 6

The document discusses different types of static interconnection networks used in parallel computing platforms, including complete networks, star networks, linear arrays, rings, trees, and 2D and 3D meshes. It compares their properties such as diameter, bisection width, and cost, and explains why none are very practical on their own due to high cost or other limitations.

Uploaded by

Fatma mansour
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

05 - Lecture #5 - 6

The document discusses different types of static interconnection networks used in parallel computing platforms, including complete networks, star networks, linear arrays, rings, trees, and 2D and 3D meshes. It compares their properties such as diameter, bisection width, and cost, and explains why none are very practical on their own due to high cost or other limitations.

Uploaded by

Fatma mansour
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

High Performance

Computing
LECTURE #5&6

1
Agenda
oPhysical Organization of Parallel Platforms.
oPRAM
o Interconnection Networks

2
Parallel Computing Platform

Parallel Computing Platform

3
1- PRAM
o The RAM contains a single processor operates under control of a sequential
algorithm. One instruction at a time is issued.

o The processor can load/store data from/to memory and can perform basic
arithmetic and logical operations.

oNatural extension of serial model (Random access machines RAM) → PRAM

4
Parallel Random Access Machine (PRAM)
o Group of p processors
o Processors share a common clock but may execute different
instructions in each cycle.
oThe PRAM was designed so that the user could design and analyze
parallel algorithms without concern for communication (either
between processors and memory or within sets of processors).

5
❖During the read phase,
o All P processors have the opportunity to read simultaneously a piece of data from
a memory location.
o Each processor places the data item into one of its registers.

❖During the write phase,


o Every processor can (simultaneously) write an item from one of its registers to
the global memory.

6
Memory Access (Resolving Data Access Conflicts)
❖ If two processors are trying to read from the same memory location,
should only one succeed? If so, which?

❖ If two processors are trying to write to the same memory location,


which one succeeds? Is a processor notified if it doesn’t succeed?

1. Exclusive-read, exclusive-write (EREW) PRAM.


2. Concurrent-read, exclusive-write (CREW) PRAM.
3. Exclusive-read, concurrent-write (ERCW) PRAM.
4. Concurrent-read, concurrent-write (CRCW) PRAM.

7
Exclusive Read (ER) Concurrent read (CR)
only one processor is allowed to Multiple processors are allowed to
read from a given memory read from the same memory
location during a cycle. location during a clock cycle.

Excusive Write (EW) Concurrent Write (CW)


The exclusive write model allows The concurrent write model allows
only one processor to write to a multiple processors to write to the
given memory location during a same memory location
clock cycle. simultaneously
8
Memory Access (Resolving Data Access Conflicts)
A variety of policies have been used to resolve such write conflicts:
some of the popular ones:
Priority CW: Processor are organized into a predefined priority list.
processor with the highest priority succeeds and rest of processor fails.

Common CW: all processors attempting a simultaneously write to a


given memory location will write the identical value.

Arbitrary CW: This model assumes that if multiple processors try to


write simultaneously to a given memory location, then one of them,
arbitrarily, will succeed.

Sum: Write the sum (or any associative operator) of all data items.

9
PRAM Model… Is it Practical??

❖Consider implementation of EREW PRAM with P processor & global memory


of m words
◦ Processors are connected to memory through a set of switches

◦ These switches determined memory word being accessed by each


processor

◦ To ensure such connectivity , the total number of switches must be (m*P)

◦ For a reasonable memory size , constructing a switching network of this


complexity is very expensive

Thus PRAM model are impossible to realize in practice


10
Interconnection Networks
❖Provide another mechanisms for data transfer between processors
and memory modules

11
❖ Interconnection networks can be classified as static or dynamic.

❖ Static networks:
✓ Consist of point-to-point communication links among processing nodes.

✓ Referred to as direct networks.

❖ Dynamic networks:
✓ Are built using switches and communication links.

✓ Communication links are connected to one another dynamically by the


switches to establish paths among processing nodes and memory banks.

✓ Dynamic networks are also referred to as indirect networks.


12
8.

13
14
Network Topologies

❖A variety of network topologies have been proposed and


implemented.

❖These topologies tradeoff performance for cost.

❖Commercial machines often implement hybrids of multiple topologies


for reasons of packaging, cost, and available components.

who is connected to whom

15
A- Static Interconnection Networks
Evaluating Static Interconnection Networks
❖Diameter: The maximum distance between any two processing nodes in
the network. (number of hops through which a message in transferred on
its way from one point to another )

❖Bisection Width: The minimum number of wires you must cut to divide
the network into two equal parts.

❖Connectivity: the multiplicity of paths between any two processing nodes

❖Cost: The number of links or switches besides the length of wires, etc., are
factors in to the cost.

16
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks
❖Cache Coherence in Multiprocessor Systems

17
A- Static Interconnection Networks
1. Complete network (clique)
2. Star network
3. Linear array
4. Ring
5. Tree
6. 2D & 3D mesh/torus
7. Hypercube
8. Fat tree

18
A- Static Interconnection Networks

1- Completely Connected
❖ Each processor is connected to every other processor.

❖While the performance scales very well, the hardware


complexity is not realizable for large values of p.

❖Completely connected networks are static counterparts of


crossbar.

19
A- Static Interconnection Networks

2-Star
❖ Every node is connected only to a common node at the center.

❖ Distance between any pair of nodes is O(1).

❖ The central node becomes a bottleneck.

❖ In this sense, star connected networks are static counterparts


of buses.

20
A- Static Interconnection Networks

3- linear
❖ Each node has two neighbors, one to its left and one to its right.

4- Ring (1D)
❖ It is linear but the nodes at either end are connected.

21
A- Static Interconnection Networks

5- 2D & 3D mesh
❖ Has nodes with 4 neighbors, to the north, south, east, and west.

❖Good match for discrete simulation and matrix operations

❖ Easy to manufacture and extend

❖ Examples: Cray 3D (3d torus), Intel Paragon (2D mesh)

22
A- Static Interconnection Networks
6.Hypercubes
❖A special case of a d-dimensional mesh is a hypercube.

Here, d = log2 p, where p is the total number of nodes.

❖Each node has log p neighbors.

❖The distance between two nodes is given by the number


of bit positions at which the two nodes differ.

❖costly/difficult to manufacture for high n, not so popular


nowadays
23
24
7- Tree A- Static Interconnection Networks

❖ The distance between any two nodes is no more than 2log


p.
❖ Links higher up the tree potentially carry more traffic than
those at the lower levels.
❖ For this reason, a variant called a fat-tree, fattens the links
as we go up the tree.
❖ Trees can be laid out in 2D with no wire crossings. This is an
attractive property of trees.

❖Thus tree suffers from a communication bottleneck at


higher levels of the tree (specially if right part of tree try
sending to left part) [problem]
Solution a FAT tree increased number of communication links and switching
nodes closer to root 25
9.Fat Tree Network A- Static Interconnection Networks

❖ In the pervious tree networks there was only


one path between any two pairs of nodes

❖To send message, the source node send the


message up tree until it reaches the node at
root of both source and destination then
message is routed down the tree

26
Static Interconnection Networks
1. Complete network (clique)
2. Star network none of them is very practical due to some
3. Linear array reasons “cost ..”

4. Ring
5. Tree
6. 2D & 3D mesh/torus
7. Hypercube
8. Fat tree

27
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks
❖Cache Coherence in Multiprocessor Systems

28
A- Static Interconnection Networks
Evaluating Static Interconnection Networks
❖Diameter: The maximum distance between any two processing nodes in
the network. (number of hops through which a message in transferred on
its way from one point to another )

❖Bisection Width: The minimum number of wires you must cut to divide
the network into two equal parts.

❖Connectivity: the multiplicity of paths between any two processing nodes

❖Cost: The number of links or switches besides the length of wires, etc., are
factors in to the cost.

29
Calculate it A- Static Interconnection Networks

30
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks

❖Cache Coherence in Multiprocessor Systems

31
B- Dynamic Interconnection Networks

1. Bus based network

2. Crossbar network

3. Multistage networks

32
B- Dynamic Interconnection Network
1.Bus based networks
❖ Some of the simplest and earliest parallel machines used
buses.

❖ All processors access a common bus for exchanging data.

❖ The distance between any two nodes is O(1) in a bus.

❖ However, the bandwidth of the shared bus is a major (a) with no local caches;
bottleneck
(scalable in terms of cost and non scalable in terms of
performance).

❖ Ex: Sun Enterprise servers and Intel Pentium based shared-


bus
(b) with local memory/caches.
33
B- Dynamic Interconnection Network

2.Crossbar networks
❖ Simple way to connect P processor to b
memory banks
❖ Total number of switching nodes required
to implement such network is O(pb).
❖ As the number of processing nodes
becomes large, this switch complexity is
difficult to realize at high data rates.

❖Consequently, crossbar networks are not


very scalable in terms of cost.

34
B- Dynamic Interconnection Network
3- Multistage networks
❖ Crossbars have excellent performance scalability but poor cost scalability.

❖ Buses have excellent cost scalability, but poor performance scalability.

❖ Multistage interconnects strike a compromise between these extremes.

❖It is more scalable than the bus in terms of performance and more scalable
than the crossbar in terms of cost.

36
B- Dynamic Interconnection Networks

❖ One of the most commonly used multistage interconnects is the Omega


network.

❖ This network consists of log p stages, where p is the no of processor and


no of memory).

❖Every stage consists of an interconnection pattern connect p input & p


output

❖ At each stage, input i is connected to output j if the following is true:

37
B- Dynamic Interconnection Network

→ Number of stages = log2 p


→ Number of switches at each stage = p/2
→ Total Number of Switches =(P/2) * (log2 P)

Perfect shuffle interconnection for eight inputs and outputs 38


B- Dynamic Interconnection Network

❖ Simple routing algorithm is done by comparing the


bit-level representation of source and destination
addresses.

❖In one mode, the inputs are sent straight through to


the outputs, as shown “ pass-through connection”.

❖In the other mode, the inputs to the switching node


are crossed over and then sent out “cross-
over connection”

39
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks

❖Cache Coherence in Multiprocessor Systems

40
B- Dynamic Interconnection Network

Evaluating Dynamic Interconnection Networks

41
1. PRAM
2.Interconnection networks
❖ Static Network
▪ Topology
▪ Evaluation of networks
❖ Dynamic Network
▪ Topology
▪ Evaluation of networks

❖Cache Coherence in Multiprocessor Systems

42

You might also like