0% found this document useful (0 votes)
6 views

parallel and distributed computing

The document discusses the evolution and advantages of parallel and distributed computing, highlighting the shift from centralized mainframe systems to network-based architectures. It outlines the benefits of distributed systems such as increased performance, resource sharing, extendibility, reliability, and cost-effectiveness, while also addressing the complexities and challenges involved in their design and implementation. A framework for understanding the architectural issues and components of distributed systems is presented, emphasizing the importance of communication networks, system architecture, and programming paradigms.

Uploaded by

pai120519
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

parallel and distributed computing

The document discusses the evolution and advantages of parallel and distributed computing, highlighting the shift from centralized mainframe systems to network-based architectures. It outlines the benefits of distributed systems such as increased performance, resource sharing, extendibility, reliability, and cost-effectiveness, while also addressing the complexities and challenges involved in their design and implementation. A framework for understanding the architectural issues and components of distributed systems is presented, emphasizing the importance of communication networks, system architecture, and programming paradigms.

Uploaded by

pai120519
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

CHAPTER 1

Parallel and Distributed Computing


1.1 INTRODUCTION: BASIC CONCEPTS

The last two decades spawned a revolution in the world of computing; a


move away from central mainframe-based computing to network-based
computing. Today, servers are fast achieving the levels of CPU
performance, memory capacity, and I/O bandwidth once available only in
mainframes, at a cost orders of magnitude below that of a mainframe. Servers
are being used to solve com- putationally intensive problems in science and
engineering that once belonged exclusively to the domain of supercomputers. A
distributed computing system is the system architecture that makes a
collection of heterogeneous com- puters, workstations, or servers act and
behave as a single computing system. In such a computing environment,
users can uniformly access and name local or remote resources, and run
processes from anywhere in the system, without being aware of which
computers their processes are running on. Distributed computing systems
have been studied extensively by researchers, and a great many claims and
benefits have been made for using such systems. In fact, it is hard to rule
out any desirable feature of a computing system that has not been claimed
to be offered by a distributed system [24]. However, the current advances in
processing and networking technology and software tools make it feasible
to achieve the following advantages:


Increased performance. The existence of multiple computers in a
distrib- uted system allows applications to be processed in parallel
and thus

Tools and Environments for Parallel and Distributed Computing, Edited by Salim Hariri
and Manish Parashar
ISBN 0-471-33288-7 Copyright © 2004 John Wiley & Sons, Inc.

1
2 PARALLEL AND DISTRIBUTED COMPUTING

improves application and system performance. For example, the


perfor- mance of a file system can be improved by replicating its
functions over several computers; the file replication allows several
applications to access that file system in parallel. Furthermore, file
replication distributes network traffic associated with file access across the
various sites and thus reduces network contention and queuing delays.

Sharing of resources. Distributed systems are cost-effective and enable
efficient access to all system resources. Users can share special
purpose and sometimes expensive hardware and software resources
such as data- base servers, compute servers, virtual reality servers,
multimedia infor- mation servers, and printer servers, to name just a
few.

Increased extendibility. Distributed systems can be designed to be modular
and adaptive so that for certain computations, the system will
configure itself to include a large number of computers and resources, while
in other instances, it will just consist of a few resources. Furthermore,
limitations in file system capacity and computing power can be
overcome by adding more computers and file servers to the system
incrementally.

Increased reliability, availability, and fault tolerance. The existence of mul-
tiple computing and storage resources in a system makes it attractive
and cost-effective to introduce fault tolerance to distributed systems.
The system can tolerate the failure in one computer by allocating its
tasks to another available computer. Furthermore, by replicating system
functions and/or resources, the system can tolerate one or more
component failures.

Cost-effectiveness. The performance of computers has been approxi-
mately doubling every two years, while their cost has decreased by half
every year during the last decade [3]. Furthermore, the emerging high-
speed network technology [e.g., wave-division multiplexing, asynchro-
nous transfer mode (ATM)] will make the development of distributed
systems attractive in terms of the price/performance ratio compared to
that of parallel computers.

These advantages cannot be achieved easily because designing a general


purpose distributed computing system is several orders of magnitude more
difficult than designing centralized computing systems—designing a reliable
general-purpose distributed system involves a large number of options and
decisions, such as the physical system configuration, communication network
and computing platform characteristics, task scheduling and resource alloca-
tion policies and mechanisms, consistency control, concurrency control, and
security, to name just a few. The difficulties can be attributed to many
factors related to the lack of maturity in the distributed computing field, the
asynchronous and independent behavior of the systems, and the geographic
dispersion of the system resources. These are summarized in the following
points:
INTRODUCTION: BASIC CONCEPTS 3


There is a lack of a proper understanding of distributed computing
theory—the field is relatively new and we need to design and
experiment with a large number of general-purpose reliable distributed
systems with different architectures before we can master the theory of
designing such computing systems. One interesting explanation for the
lack of understanding of the design process of distributed systems was
given by Mullender [2]. Mullender compared the design of a
distributed system to the design of a reliable national railway system
that took a century and half to be fully understood and mature.
Similarly, distributed systems (which have been around for
approximately two decades) need to evolve into several generations of
different design architectures before their designs, structures, and
programming techniques can be fully understood and mature.

The asynchronous and independent behavior of the system resources
and/or (hardware and software) components complicate the control
soft- ware that aims at making them operate as one centralized
computing system. If the computers are structured in a master–slave
relationship, the control software is easier to develop and system
behavior is more predictable. However, this structure is in conflict with
the distributed system property that requires computers to operate
independently and asynchronously.

The use of a communication network to interconnect the computers
introduces another level of complexity. Distributed system designers
not only have to master the design of the computing systems and system soft-
ware and services, but also have to master the design of reliable com-
munication networks, how to achieve synchronization and consistency,
and how to handle faults in a system composed of geographically dis-
persed heterogeneous computers. The number of resources involved in
a system can vary from a few to hundreds, thousands, or even hundreds
of thousands of computing and storage resources.

Despite these difficulties, there has been limited success in designing


special-purpose distributed systems such as banking systems, online
transac- tion systems, and point-of-sale systems. However, the design of a
general- purpose reliable distributed system that has the advantages of both
centralized systems (accessibility, management, and coherence) and
networked systems (sharing, growth, cost, and autonomy) is still a challenging
task [27]. Kleinrock
[7] makes an interesting analogy between the human-made computing systems
and the brain. He points out that the brain is organized and structured very
differently from our present computing machines. Nature has been
extremely successful in implementing distributed systems that are far more
intelligent and impressive than any computing machines humans have yet
devised. We have succeeded in manufacturing highly complex devices
capable of high- speed computation and massive accurate memory, but we
have not gained sufficient understanding of distributed systems; our
systems are still highly
4 PARALLEL AND DISTRIBUTED COMPUTING

Vector
SM- DM-MIMD
MIMD
Supercomput
er

High-Speed
Network
Workstatio
n

SIMD
Special-Purpose Architecture

Fig. 1.1 High-performance distributed system.

constrained and rigid in their construction and behavior. The gap between
natural and man-made systems is huge, and more research is required to
bridge this gap and to design better distributed systems.
In the next section we present a design framework to better understand
the architectural design issues involved in developing and implementing high-
performance distributed computing systems. A high-performance
distributed system (HPDS) (Figure 1.1) includes a wide range of computing
resources, such as workstations, PCs, minicomputers, mainframes,
supercomputers, and other special-purpose hardware units. The underlying
network interconnect- ing the system resources can span LANs, MANs, and
even WANs, can have different topologies (e.g., bus, ring, full connectivity,
random interconnect), and can support a wide range of communication
protocols.

1.2 PROMISES AND CHALLENGES OF PARALLEL AND


DISTRIBUTED SYSTEMS

The proliferation of high-performance systems and the emergence of high-


speed networks (terabit networks) have attracted a lot of interest in parallel
and distributed computing. The driving forces toward this end will be (1)
the advances in processing technology, (2) the availability of high-speed
network, and (3) the increasing research efforts directed toward the
development of software support and programming environments for
distributed computing.
PROMISES AND CHALLENGES OF PARALLEL AND DISTRIBUTED SYSTEMS 5

Further, with the increasing requirements for computing power and the
diversity in the computing requirements, it is apparent that no single
computing platform will meet all these requirements. Consequently, future
computing environments need to capitalize on and effectively utilize the exist-
ing heterogeneous computing resources. Only parallel and distributed systems
provide the potential of achieving such an integration of resources and tech-
nologies in a feasible manner while retaining desired usability and
flexibility. Realization of this potential, however, requires advances on a
number of fronts: processing technology, network technology, and software
tools and environments.

1.2.1 Processing Technology


Distributed computing relies to a large extent on the processing power of the
individual nodes of the network. Microprocessor performance has been
growing at a rate of 35 to 70 percent during the last decade, and this trend
shows no indication of slowing down in the current decade. The enormous
power of the future generations of microprocessors, however, cannot be
utilized without corresponding improvements in memory and I/O systems.
Research in main-memory technologies, high-performance disk arrays,
and high-speed I/O channels are, therefore, critical to utilize efficiently the
advances in processing technology and the development of cost-effective high-
performance distributed computing.

1.2.2 Networking Technology


The performance of distributed algorithms depends to a large extent on the
bandwidth and latency of communication among the network nodes.
Achiev- ing high bandwidth and low latency involves not only fast
hardware, but also efficient communication protocols that minimize the
software overhead. Developments in high-speed networks provide gigabit
bandwidths over local area networks as well as wide area networks at
moderate cost, thus increas- ing the geographical scope of high-
performance distributed systems.
The problem of providing the required communication bandwidth for
dis- tributed computational algorithms is now relatively easy to solve given
the mature state of fiber-optic and optoelectronic device technologies.
Achieving the low latencies necessary, however, remains a challenge.
Reducing latency requires progress on a number of fronts. First, current
communication proto- cols do not scale well to a high-speed environment.
To keep latencies low, it is desirable to execute the entire protocol stack, up
to the transport layer, in hardware. Second, the communication interface of
the operating system must be streamlined to allow direct transfer of data
from the network interface to the memory space of the application program.
Finally, the speed of light (approximately 5 microseconds per kilometer)
poses the ultimate limit to latency. In general, achieving low latency
requires a two-pronged approach:
6 PARALLEL AND DISTRIBUTED COMPUTING

1. Latency reduction. Minimize protocol-processing overhead by using


streamlined protocols executed in hardware and by improving the
network interface of the operating system.
2. Latency hiding. Modify the computational algorithm to hide latency
by pipelining communication and computation.

These problems are now perhaps most fundamental to the success of par-
allel and distributed computing, a fact that is increasingly being recognized
by the research community.

1.2.3 Software Tools and Environments


The development of parallel and distributed applications is a nontrivial
process and requires a thorough understanding of the application and the
architecture. Although a parallel and distributed system provides the user with
enormous computing power and a great deal of flexibility, this flexibility
implies increased degrees of freedom which have to be optimized in order
to fully exploit the benefits of the distributed system. For example, during
soft- ware development, the developer is required to select the optimal
hardware configuration for the particular application, the best
decomposition of the problem on the hardware configuration selected, the
best communication and synchronization strategy to be used, and so on. The
set of reasonable alterna- tives that have to be evaluated in such an
environment is very large, and select- ing the best alternative among these is a
nontrivial task. Consequently, there is a need for a set of simple and
portable software development tools that can assist the developer in
appropriately distributing the application computations to make efficient use of
the underlying computing resources. Such a set of tools should span the software
life cycle and must support the developer during each stage of application
development, starting from the specification and design formulation stages,
through the programming, mapping, distribution, schedul- ing phases, tuning,
and debugging stages, up to the evaluation and maintenance stages.

1.3 DISTRIBUTED SYSTEM DESIGN FRAMEWORK

The distributed system design framework (DSDF) highlights architectural


issues, services, and candidate technologies to implement the main compo-
nents of any distributed computing system. Generally speaking, the design
process of a distributed system involves three main activities: (1) designing the
communication system that enables the distributed system resources and
objects to exchange information, (2) defining the system structure (architec-
ture) and the system services that enable multiple computers to act as a system
rather than as a collection of computers, and (3) defining the distributed
com- puting programming techniques to develop parallel and distributed
applica-
DISTRIBUTED SYSTEM DESIGN FRAMEWORK 7

tions. Based on this notion of the design process, the distributed system design
framework can be described in terms of three layers (Figure 1.2): (1)
network, protocol, and interface (NPI) layer, (2) system architecture and services
(SAS) layer, and (3) distributed computing paradigms (DCP) layer. In what
follows, we describe the main design issues to be addressed in each layer.


Communication network, protocol, and interface layer. This layer
describes the main components of the communication system that will be
used for passing control and information among the distributed system
resources. This layer is decomposed into three sublayers: network type,
communication protocols, and network interfaces.

Distributed system architecture and services layer. This layer
represents the designer’s and system manager’s view of the system.
SAS layer defines the structure and architecture and the system
services (distrib- uted file system, concurrency control, redundancy
management, load sharing and balancing, security service, etc.) that
must be supported by the distributed system in order to provide a
single-image computing system.

Distributed computing paradigms layer. This layer represents the pro-
grammer (user) perception of the distributed system. This layer focuses
on the programming paradigms that can be used to develop distributed
applications. Distributed computing paradigms can be broadly charac-
terized based on the computation and communication models. Parallel
and distributed computations can be described in terms of two para-
digms: functional parallel and data parallel paradigms. In functional
par- allel paradigm, the computations are divided into distinct functions
which are then assigned to different computers. In data parallel
paradigm, all

Distributed Computing Paradigms

Computation Models Communication Models

Functional Data Parallel Message Passing Shared


Parallel Memory
System Architecture and Services (SAS)

Architecture Models System-Level Services

Computer Network and Protocols

Network Networks Communication Protocols

Fig. 1.2 Distributed system design framework.


8 PARALLEL AND DISTRIBUTED COMPUTING

the computers run the same program, the same program multiple data
(SPMD) stream, but each computer operates on different data streams.
One can also characterize parallel and distributed computing based on
the technique used for intertask communications into two main
models: message-passing and distributed shared memory models. In
message passing, tasks communicate with each other by messages,
while in dis- tributed shared memory, they communicate by reading/writing
to a global shared address space.
The primary objective of this book is to provide a comprehensive study
of the software tools and environments that have been used to support
parallel and distributed computing systems. We highlight the main software
tools and technologies proposed or being used to implement the
functionalities of the SAS and DCP layers.

You might also like