0% found this document useful (0 votes)

62 views24 pages

Parallel Communication Costs

This document discusses communication costs in parallel machines and various message passing and routing techniques used in parallel computers. It covers the following key points: - Communication cost has startup time, per-hop time, and per-word transfer time components. - Store-and-forward routing receives the full message at each hop before forwarding. Cut-through routing pipelines message units called flits through the network. - Important metrics for evaluating mappings between networks include congestion, dilation, and expansion. Optimal mappings aim to minimize these values. - Common structures like linear arrays and meshes can be optimally mapped to hypercubes using Gray codes and concatenation. Mapping graphs to different topologies involves tradeoffs in

Uploaded by

Monica Mupudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views24 pages

Parallel Communication Costs

Uploaded by

Monica Mupudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Communication Costs

in Parallel Machines
• Along with idling and contention, communication is a
major overhead in parallel programs.
• The cost of communication is dependent on a variety of
features including the programming model semantics,
the network topology, data handling and routing, and
associated software protocols.
Message Passing Costs in
Parallel Computers
• The total time to transfer a message over a network
comprises of the following:
– Startup time (ts): Time spent at sending and receiving nodes
(executing the routing algorithm, programming routers, etc.).
– Per-hop time (th): This time is a function of number of hops and
includes factors such as switch latencies, network delays, etc.
– Per-word transfer time (tw): This time includes all overheads that
are determined by the length of the message. This includes
bandwidth of links, error checking and correction, etc.
Store-and-Forward Routing

• A message traversing multiple hops is completely

received at an intermediate hop before being
forwarded to the next hop.
• The total communication cost for a message of size m
words to traverse l communication links is

• In most platforms, th is small and the above expression

can be approximated by
Routing Techniques

Passing a message from node P0 to P3 (a) through a store-and-

forward communication network; (b) and (c) extending the concept
to cut-through routing. The shaded regions represent the time that
the message is in transit. The startup time associated with this
message transfer is assumed to be zero.
Packet Routing

• Store-and-forward makes poor use of communication

resources.
• Packet routing breaks messages into packets and
pipelines them through the network.
• Since packets may take different paths, each packet
must carry routing information, error checking,
sequencing, and other related header information.
• The total communication time for packet routing is
approximated by:

• The factor tw accounts for overheads in packet headers.

Cut-Through Routing

• Takes the concept of packet routing to an extreme by

further dividing messages into basic units called flits.
• Since flits are typically small, the header information
must be minimized.
• This is done by forcing all flits to take the same path, in
sequence.
• A tracer message first programs all intermediate routers.
All flits then take the same route.
• Error checks are performed on the entire message, as
opposed to flits.
• No sequence numbers are needed.
Cut-Through Routing

• The total communication time for cut-through routing

is approximated by:

• This is identical to packet routing, however, tw is

typically much smaller.
Simplified Cost Model for Communicating
Messages
• The cost of communicating a message between two
nodes l hops away using cut-through routing is given
by

• In this expression, th is typically smaller than ts and tw.

For this reason, the second term in the RHS does not
show, particularly, when m is large.
• Furthermore, it is often not possible to control routing
and placement of tasks.
• For these reasons, we can approximate the cost of
message transfer by
Simplified Cost Model for Communicating
Messages
• It is important to note that the original expression for
communication time is valid for only uncongested
networks.
• If a link takes multiple messages, the corresponding tw
term must be scaled up by the number of messages.
• Different communication patterns congest different
networks to varying extents.
• It is important to understand and account for this in the
communication time accordingly.
Cost Models for
Shared Address Space Machines
• While the basic messaging cost applies to these
machines as well, a number of other factors make
accurate cost modeling more difficult.
• Memory layout is typically determined by the system.
• Finite cache sizes can result in cache thrashing.
• Overheads associated with invalidate and update
operations are difficult to quantify.
• Spatial locality is difficult to model.
• Prefetching can play a role in reducing the overhead
associated with data access.
• False sharing and contention are difficult to model.
Routing Mechanisms
for Interconnection Networks
• How does one compute the route that a message takes
from source to destination?
– Routing must prevent deadlocks - for this reason, we use
dimension-ordered or e-cube routing.
– Routing must avoid hot-spots - for this reason, two-step routing
is often used. In this case, a message from source s to
destination d is first sent to a randomly chosen intermediate
processor i and then forwarded to destination d.
Routing Mechanisms
for Interconnection Networks

Routing a message from node Ps (010) to node Pd (111) in a three-

dimensional hypercube using E-cube routing.
Mapping Techniques for Graphs

• Often, we need to embed a known communication

pattern into a given interconnection topology.
• We may have an algorithm designed for one network,
which we are porting to another topology.

For these reasons, it is useful to understand mapping

between graphs.
Mapping Techniques for Graphs: Metrics

• When mapping a graph G(V,E) into G’(V’,E’), the

following metrics are important:
• The maximum number of edges mapped onto any edge
in E’ is called the congestion of the mapping.
• The maximum number of links in E’ that any edge in E is
mapped onto is called the dilation of the mapping.
• The ratio of the number of nodes in the set V’ to that in
set V is called the expansion of the mapping.
Embedding a Linear Array
into a Hypercube
• A linear array (or a ring) composed of 2d nodes (labeled
0 through 2d − 1) can be embedded into a d-dimensional
hypercube by mapping node i of the linear array onto
node
• G(i, d) of the hypercube. The function G(i, x) is defined
as follows:

0
Embedding a Linear Array
into a Hypercube
The function G is called the binary reflected Gray
code (RGC).

Since adjoining entries (G(i, d) and G(i + 1, d)) differ

from each other at only one bit position, corresponding
processors are mapped to neighbors in a hypercube.
Therefore, the congestion, dilation, and expansion of the
mapping are all 1.
Embedding a Linear Array
into a Hypercube: Example

(a) A three-bit reflected Gray code ring; and (b) its embedding into
a three-dimensional hypercube.
Embedding a Mesh
into a Hypercube
• A 2r × 2s wraparound mesh can be mapped to a 2r+s-node
hypercube by mapping node (i, j) of the mesh onto node
G(i, r− 1) || G(j, s − 1) of the hypercube (where || denotes
concatenation of the two Gray codes).
Embedding a Mesh into a Hypercube

(a) A 4 × 4 mesh illustrating the mapping of mesh nodes to the nodes

in a four-dimensional hypercube; and (b) a 2 × 4 mesh embedded into
a three-dimensional hypercube.

Once again, the congestion, dilation, and expansion of

the mapping is 1.
Embedding a Mesh into a Linear Array

• Since a mesh has more edges than a linear array, we will

not have an optimal congestion/dilation mapping.
• We first examine the mapping of a linear array into a
mesh and then invert this mapping.
• This gives us an optimal mapping (in terms of
congestion).
Embedding a Mesh into a Linear Array:
Example

(a) Embedding a 16 node linear array into a 2-D mesh; and (b) the
inverse of the mapping. Solid lines correspond to links in the linear
array and normal lines to links in the mesh.
Embedding a Hypercube into a 2-D Mesh

• Each node subcube of the hypercube is mapped to

a node row of the mesh.
• This is done by inverting the linear-array to hypercube
mapping.
• This can be shown to be an optimal mapping.
Embedding a Hypercube into a 2-D Mesh:
Example

Embedding a hypercube into a 2-D mesh.

Parallel Programming Platforms (Part 2) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 2) : CSE3057Y Parallel and Distributed Systems
20 pages
Lecture 3 - 3 Evaluating Static Interconnection Networks
No ratings yet
Lecture 3 - 3 Evaluating Static Interconnection Networks
41 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
Intro To Communication: - Advantages
No ratings yet
Intro To Communication: - Advantages
13 pages
Static and Dynamic
No ratings yet
Static and Dynamic
43 pages
05 Notes
No ratings yet
05 Notes
30 pages
Static Interconnection Networks
No ratings yet
Static Interconnection Networks
10 pages
Interconnection Network Topology Design Trade-Offs
No ratings yet
Interconnection Network Topology Design Trade-Offs
29 pages
Parallel 2ndtweek Class2
No ratings yet
Parallel 2ndtweek Class2
18 pages
Lect3-Parallel System
No ratings yet
Lect3-Parallel System
31 pages
Sciencedirect: Cluster Based Application Mapping Strategy For 2D Noc
No ratings yet
Sciencedirect: Cluster Based Application Mapping Strategy For 2D Noc
8 pages
Slides Chapter 2 - Parallel Programming Platforms
No ratings yet
Slides Chapter 2 - Parallel Programming Platforms
33 pages
Parallel Processing Lecture3
No ratings yet
Parallel Processing Lecture3
54 pages
Distributed Memory Machines
No ratings yet
Distributed Memory Machines
10 pages
Network Communication Essentials
No ratings yet
Network Communication Essentials
44 pages
Slides 4
No ratings yet
Slides 4
36 pages
Unit 4
No ratings yet
Unit 4
56 pages
ECE544Lec4 15
No ratings yet
ECE544Lec4 15
71 pages
Optimal Communication Algorithms For Hypercubes : Journal of Parallel and Distributed Computing 11, 263-275 (1991)
No ratings yet
Optimal Communication Algorithms For Hypercubes : Journal of Parallel and Distributed Computing 11, 263-275 (1991)
13 pages
3 Module 3 Message Passing Studemt Version 2
No ratings yet
3 Module 3 Message Passing Studemt Version 2
18 pages
Network Topologies Explained
No ratings yet
Network Topologies Explained
48 pages
Dynamic Routing Protocols RIP
No ratings yet
Dynamic Routing Protocols RIP
41 pages
Interconnects
No ratings yet
Interconnects
25 pages
ECE544: Communication Networks-II, Spring 2011: D. Raychaudhuri
No ratings yet
ECE544: Communication Networks-II, Spring 2011: D. Raychaudhuri
71 pages
Interconnection Networks: Crossbar Switch, Which Can Simultaneously Connect Any Set of
No ratings yet
Interconnection Networks: Crossbar Switch, Which Can Simultaneously Connect Any Set of
11 pages
4.link State & Multicasting
No ratings yet
4.link State & Multicasting
20 pages
Matrix Transpose On Meshes: Theory and Practice
No ratings yet
Matrix Transpose On Meshes: Theory and Practice
5 pages
Network Routing Algorithms
No ratings yet
Network Routing Algorithms
42 pages
Routing Algorithms Overview
No ratings yet
Routing Algorithms Overview
10 pages
Multiprocessor Interconnection Networks Networks: CS 740 November 19, 2003
No ratings yet
Multiprocessor Interconnection Networks Networks: CS 740 November 19, 2003
8 pages
Seminar
No ratings yet
Seminar
49 pages
F07 Lecture10 Routing
No ratings yet
F07 Lecture10 Routing
12 pages
Introduction
No ratings yet
Introduction
46 pages
Week 4
No ratings yet
Week 4
45 pages
Routing Distancevector Linkstate
No ratings yet
Routing Distancevector Linkstate
39 pages
Introduction To Parallel Computing: Solution Manual
No ratings yet
Introduction To Parallel Computing: Solution Manual
70 pages
Solution 2-DD
No ratings yet
Solution 2-DD
70 pages
CS 408 Computer Networks: Chapter 11: Routing in IP
No ratings yet
CS 408 Computer Networks: Chapter 11: Routing in IP
54 pages
Parallel Routing in Hypercube Networks With Faulty Nodes
No ratings yet
Parallel Routing in Hypercube Networks With Faulty Nodes
8 pages
Network Routing Algorithms
No ratings yet
Network Routing Algorithms
47 pages
Notes HW1 Notes
No ratings yet
Notes HW1 Notes
5 pages
Advance Computer Architecture: Unit:Ii System Interconnect Architectures
No ratings yet
Advance Computer Architecture: Unit:Ii System Interconnect Architectures
53 pages
Computer Network UNIT 3
No ratings yet
Computer Network UNIT 3
28 pages
Interconnection Networks: Crossbar Switch, Which Can Simultaneously Connect Any Set of
No ratings yet
Interconnection Networks: Crossbar Switch, Which Can Simultaneously Connect Any Set of
11 pages
Routingalgorithm Networklayer 170223123829
No ratings yet
Routingalgorithm Networklayer 170223123829
48 pages
Network Layer Design Essentials
No ratings yet
Network Layer Design Essentials
63 pages
Lecture 11
No ratings yet
Lecture 11
52 pages
CN - UNIT 3 New
No ratings yet
CN - UNIT 3 New
30 pages
Implementation of Shortest Path in Packet Switching Network Using Genetic Algorithm
No ratings yet
Implementation of Shortest Path in Packet Switching Network Using Genetic Algorithm
6 pages
CNL 10 PDF
No ratings yet
CNL 10 PDF
12 pages
Computer Network Routing Basics
No ratings yet
Computer Network Routing Basics
12 pages
As Reference - Pastry - Orig
No ratings yet
As Reference - Pastry - Orig
58 pages
10-Hypercube & Network
No ratings yet
10-Hypercube & Network
22 pages
Short Notes: Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
No ratings yet
Short Notes: Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
10 pages
Online Class 13 Routing
No ratings yet
Online Class 13 Routing
23 pages
Balamurugan 2022 J. Phys. Commun. 6 022001
No ratings yet
Balamurugan 2022 J. Phys. Commun. 6 022001
16 pages
2019 - ASA-routing A-Star Adaptive Routing Algorithm For Network-on-Chips 18th ASA-routing A-Star Adaptive Routing Algorithm For Netw
No ratings yet
2019 - ASA-routing A-Star Adaptive Routing Algorithm For Network-on-Chips 18th ASA-routing A-Star Adaptive Routing Algorithm For Netw
13 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
Session5-Multiprocessor Organization
No ratings yet
Session5-Multiprocessor Organization
20 pages
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
60 pages
Quadrics Interconnection Network
100% (1)
Quadrics Interconnection Network
24 pages
Lesson 3 - Top Level View of Computer Function and Interconnection
No ratings yet
Lesson 3 - Top Level View of Computer Function and Interconnection
74 pages
Chapter-7 Multiprocessors and Multicomputers: Module-Iv
No ratings yet
Chapter-7 Multiprocessors and Multicomputers: Module-Iv
53 pages
PC 8 PDF
No ratings yet
PC 8 PDF
28 pages
Parallel Communication Costs
No ratings yet
Parallel Communication Costs
24 pages
978 3 031 01725 4
No ratings yet
978 3 031 01725 4
137 pages
Parallel and Distributed Computing Lecture 03
No ratings yet
Parallel and Distributed Computing Lecture 03
44 pages
Chapter 3 - Solution
No ratings yet
Chapter 3 - Solution
2 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
60 pages
Provided by Texas A&M University
No ratings yet
Provided by Texas A&M University
95 pages
Hybrid Router Design for NoC Efficiency
No ratings yet
Hybrid Router Design for NoC Efficiency
45 pages
A Bypass-Based Low Latency Network-On-Chip Router
No ratings yet
A Bypass-Based Low Latency Network-On-Chip Router
13 pages
CH03 COA10e
No ratings yet
CH03 COA10e
39 pages
2 - A Top-Level View of Computer Function and Interconnection
100% (1)
2 - A Top-Level View of Computer Function and Interconnection
39 pages
Network On Chip Security and Privacy Prabhat Mishra All Chapter Instant Download
100% (7)
Network On Chip Security and Privacy Prabhat Mishra All Chapter Instant Download
55 pages
05 - Lecture P2 - Message Passing Architecture
No ratings yet
05 - Lecture P2 - Message Passing Architecture
35 pages
NoC Lecture12
No ratings yet
NoC Lecture12
11 pages
Arch&org - Chapter 3
No ratings yet
Arch&org - Chapter 3
14 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Programming Assignment: EE382C, Spring 2020
No ratings yet
Programming Assignment: EE382C, Spring 2020
7 pages
CH03 COA10e
No ratings yet
CH03 COA10e
49 pages
CH03-Top-Level View of Computer
No ratings yet
CH03-Top-Level View of Computer
53 pages
Computer Bus Systems Guide
No ratings yet
Computer Bus Systems Guide
28 pages
A Scalable Network-on-Chip Microprocessor With 2.5D Integrated Memory and Accelerator
No ratings yet
A Scalable Network-on-Chip Microprocessor With 2.5D Integrated Memory and Accelerator
12 pages

Parallel Communication Costs

Uploaded by

Parallel Communication Costs

Uploaded by

Communication Costs

• A message traversing multiple hops is completely

• In most platforms, th is small and the above expression

Passing a message from node P0 to P3 (a) through a store-and-

• Store-and-forward makes poor use of communication

• The factor tw accounts for overheads in packet headers.

• Takes the concept of packet routing to an extreme by

• The total communication time for cut-through routing

• This is identical to packet routing, however, tw is

• In this expression, th is typically smaller than ts and tw.

Routing a message from node Ps (010) to node Pd (111) in a three-

• Often, we need to embed a known communication

For these reasons, it is useful to understand mapping

• When mapping a graph G(V,E) into G’(V’,E’), the

Since adjoining entries (G(i, d) and G(i + 1, d)) differ

(a) A 4 × 4 mesh illustrating the mapping of mesh nodes to the nodes

Once again, the congestion, dilation, and expansion of

• Since a mesh has more edges than a linear array, we will

• Each node subcube of the hypercube is mapped to

Embedding a hypercube into a 2-D mesh.

You might also like