parallel and distributed computing
parallel and distributed computing
•
Increased performance. The existence of multiple computers in a
distrib- uted system allows applications to be processed in parallel
and thus
Tools and Environments for Parallel and Distributed Computing, Edited by Salim Hariri
and Manish Parashar
ISBN 0-471-33288-7 Copyright © 2004 John Wiley & Sons, Inc.
1
2 PARALLEL AND DISTRIBUTED COMPUTING
•
There is a lack of a proper understanding of distributed computing
theory—the field is relatively new and we need to design and
experiment with a large number of general-purpose reliable distributed
systems with different architectures before we can master the theory of
designing such computing systems. One interesting explanation for the
lack of understanding of the design process of distributed systems was
given by Mullender [2]. Mullender compared the design of a
distributed system to the design of a reliable national railway system
that took a century and half to be fully understood and mature.
Similarly, distributed systems (which have been around for
approximately two decades) need to evolve into several generations of
different design architectures before their designs, structures, and
programming techniques can be fully understood and mature.
•
The asynchronous and independent behavior of the system resources
and/or (hardware and software) components complicate the control
soft- ware that aims at making them operate as one centralized
computing system. If the computers are structured in a master–slave
relationship, the control software is easier to develop and system
behavior is more predictable. However, this structure is in conflict with
the distributed system property that requires computers to operate
independently and asynchronously.
•
The use of a communication network to interconnect the computers
introduces another level of complexity. Distributed system designers
not only have to master the design of the computing systems and system soft-
ware and services, but also have to master the design of reliable com-
munication networks, how to achieve synchronization and consistency,
and how to handle faults in a system composed of geographically dis-
persed heterogeneous computers. The number of resources involved in
a system can vary from a few to hundreds, thousands, or even hundreds
of thousands of computing and storage resources.
Vector
SM- DM-MIMD
MIMD
Supercomput
er
High-Speed
Network
Workstatio
n
SIMD
Special-Purpose Architecture
constrained and rigid in their construction and behavior. The gap between
natural and man-made systems is huge, and more research is required to
bridge this gap and to design better distributed systems.
In the next section we present a design framework to better understand
the architectural design issues involved in developing and implementing high-
performance distributed computing systems. A high-performance
distributed system (HPDS) (Figure 1.1) includes a wide range of computing
resources, such as workstations, PCs, minicomputers, mainframes,
supercomputers, and other special-purpose hardware units. The underlying
network interconnect- ing the system resources can span LANs, MANs, and
even WANs, can have different topologies (e.g., bus, ring, full connectivity,
random interconnect), and can support a wide range of communication
protocols.
Further, with the increasing requirements for computing power and the
diversity in the computing requirements, it is apparent that no single
computing platform will meet all these requirements. Consequently, future
computing environments need to capitalize on and effectively utilize the exist-
ing heterogeneous computing resources. Only parallel and distributed systems
provide the potential of achieving such an integration of resources and tech-
nologies in a feasible manner while retaining desired usability and
flexibility. Realization of this potential, however, requires advances on a
number of fronts: processing technology, network technology, and software
tools and environments.
These problems are now perhaps most fundamental to the success of par-
allel and distributed computing, a fact that is increasingly being recognized
by the research community.
tions. Based on this notion of the design process, the distributed system design
framework can be described in terms of three layers (Figure 1.2): (1)
network, protocol, and interface (NPI) layer, (2) system architecture and services
(SAS) layer, and (3) distributed computing paradigms (DCP) layer. In what
follows, we describe the main design issues to be addressed in each layer.
•
Communication network, protocol, and interface layer. This layer
describes the main components of the communication system that will be
used for passing control and information among the distributed system
resources. This layer is decomposed into three sublayers: network type,
communication protocols, and network interfaces.
•
Distributed system architecture and services layer. This layer
represents the designer’s and system manager’s view of the system.
SAS layer defines the structure and architecture and the system
services (distrib- uted file system, concurrency control, redundancy
management, load sharing and balancing, security service, etc.) that
must be supported by the distributed system in order to provide a
single-image computing system.
•
Distributed computing paradigms layer. This layer represents the pro-
grammer (user) perception of the distributed system. This layer focuses
on the programming paradigms that can be used to develop distributed
applications. Distributed computing paradigms can be broadly charac-
terized based on the computation and communication models. Parallel
and distributed computations can be described in terms of two para-
digms: functional parallel and data parallel paradigms. In functional
par- allel paradigm, the computations are divided into distinct functions
which are then assigned to different computers. In data parallel
paradigm, all
the computers run the same program, the same program multiple data
(SPMD) stream, but each computer operates on different data streams.
One can also characterize parallel and distributed computing based on
the technique used for intertask communications into two main
models: message-passing and distributed shared memory models. In
message passing, tasks communicate with each other by messages,
while in dis- tributed shared memory, they communicate by reading/writing
to a global shared address space.
The primary objective of this book is to provide a comprehensive study
of the software tools and environments that have been used to support
parallel and distributed computing systems. We highlight the main software
tools and technologies proposed or being used to implement the
functionalities of the SAS and DCP layers.