CS3551 Unit 1 Part1
CS3551 Unit 1 Part1
Introduction
Ajay Kshemkalyani and Mukesh
Singhal
Definitio
n
Distributed system is a collection of independent computers
interconnected through communication network (Autonomous
processors communicating over a communication network),
capable of collaborating on a task.
A collection of independent computers that appears to its users
as a single system. (Tanenbaum)
Some Features/characteristics
) No common physical clock – Clock time of each system varies
) No shared memory
) Geographical separation – NoW, CoW, Google search engine based on
NoW
NoW - Network of Workstations, CoW - Cluster of Workstations
) Autonomy and heterogeneity Introductio
A. Kshemkalyani and M. Singhal (Distributed ting 2/ 36
Distributed Computing: Principles, Algorithms, and
Systems
Distributed System
Model
A typical distributed system is shown in Figure. Each computer has a memory-
processing unit and the computers are connected by a communication
network.
P M P M P M
P M P M
P M P M
Figure 1.1: A distributed system connects processors by a
communication network.
Figure 1.2 schematically shows the interaction of the middleware software with
the system components. Various primitives and function calls defined in various
libraries of the middleware layer are used in the user program code.
There exist several libraries to invoke primitives for the more common functions
of the middleware layer such as reliable and ordered multicasting. Currently
deployed commercial versions of middleware use CORBA, DCOM (distributed
component object model), Java, and RMI (remote method invocation)
technologies. The message-passing interface (MPI) is an example of an
interface for various communication functions.
multiprocessors
- Message Passing systems are Multicomputer systems that do not have
a Emulating
shared address
MP space and communicate by message passing
over SM:
) Partition shared address space
) Send/Receive emulated by writing/reading from special mailbox
per pair of processes
Emulating SM over MP:
) Model each shared object as a process
) Write to shared object emulated by sending message to owner
process for the object
) Read from shared object emulated by sending query to owner of
shared object
A Send primitive has at least two parameters– the destination, and the
buffer in the user space, containing the data to be sent. Similarly, a
Receive primitive has at least two parameters – the source from which
the data is to be received and the user buffer into which the data is to
be received.
There are two ways of sending data when the Send primitive is invoked
– the buffered option and the unbuffered option. The buffered option
which is the standard option copies the data from the user buffer to the
kernel buffer. The data later gets copied from the kernel buffer onto the
network. In the unbuffered option, the data gets copied directly from the
user buffer onto the network.
Classification of Primitives
Synchronous (send/receive)
) Handshake between sender and receiver
) Send completes when Receive completes
) Receive completes when data copied into buffer
Asynchronous (send)
) Control returns to process when data copied out of user-
specified buffer
Blocking (send/receive)
) Control returns to invoking process after processing of primitive
(whether sync or async) completes
Nonblocking (send/receive)
) Control returns to process immediately after invocation
) Send: even before data copied out of user buffer
) Receive: even before data may have arrived from sender
Non-blocking
Primitive
Figure 1.7: A nonblocking send primitive. When the Wait call returns,
at least one of its parameters is posted.
Return parameter returns a system-generated handle
) Used later to check for status of completion of call
) Keep checking (loop or periodically) if handle has
been posted
) Issue Wait(handle1, handle2, . . .) call with list
of handles
) Wait call blocks until one of the stipulated
handles is posted
Blocking Receive: It blocks until the data expected arrives and is written in the
specified user buffer. Then control is returned to the user process.
Non-blocking Receive: It will cause the kernel to register the call and return the
handle of a location that the user process can later check for the completion of
the non-blocking Receive operation. The user process can check for the
completion of the non-blocking Receive by invoking the Wait operation on the
returned handle.
Blocking asynchronous Send: The user process that invokes the Send is
blocked until the data is copied from the user’s buffer to the kernel buffer.
For the unbuffered option, until the data is copied from the user’s buffer to
the network.
Figure 1.10 An example of a synchronous execution in a message-passing system. All the messages
sent in a round are received within that same round.
Async execution
) No processor synchrony, no bound on drift rate of
clocks
) Message delays finite but unbounded
) No bound on time for a step at a process
Sync execution
) Processors are synchronized; clock drift rate
bounded
) Message delivery occurs in one logical step/round
) Known upper bound on time to execute a step at a
process
System
Emulations
Challenges: System
Perspective (1)
Communication mechanisms: E.g., Remote Procedure Call
(RPC), remote object invocation (ROI), message-oriented vs.
stream-oriented communication
Processes: Code migration, process/thread management at
clients and servers, design of software and mobile agents
Naming: Easy to use identifiers needed to locate resources and
processes transparently and scalably
Synchronization Eg. Mutual Exclusion, Leader election,
Synchronizing physical clocks and devising logical clocks.
Data storage and access
) Schemes for data storage, search, and lookup should be fast and
scalable across network
) Revisit file system design
Consistency and
replication
) Replication for fast
access, scalability, avoid
A. Kshemkalyani and M. Singhal (Distributed ting Introductio 19 / 36
Distributed Computing: Principles, Algorithms, and
Systems
Challenges: System
Perspective (2)
Challenges: System
Perspective (3)
Challenges: Algorithm/Design
(1)
Challenges: Algorithm/Design
(2)
Challenges: Algorithm/Design
(3)
Synchronization/coordination mechanisms
) Physical clock synchronization: hardware drift needs correction
) Leader election: select a distinguished process, due to inherent
symmetry
) Mutual exclusion: coordinate access to critical resources
) Distributed deadlock detection and resolution: need to observe
global state; avoid duplicate detection, unnecessary aborts
) Termination detection: global state of quiescence; no CPU
processing and no in-transit messages
) Garbage collection: Reclaim objects no longer pointed to by any
process
Challenges: Algorithm/Design
(4)
Group communication, multicast, and ordered message delivery
Efficient algorithms for group communication and group management must b
designed
) Group: processes sharing a context, collaborating
) Multiple joins, leaves, fails
) Concurrent sends: semantics of delivery order need to be specified
Challenges: Algorithm/Design
(5)
Challenges: Algorithm/Design
(6)
Challenges: Algorithm/Design
(7)
Reliable and fault-tolerant distributed systems
) Consensus algorithms: processes reach agreement in spite of
faults (under various fault models)
) Replication and replica management
) Voting and quorum systems
) Distributed databases, commit: ACID properties
) Self-stabilizing systems: ”illegal” system state changes to
”legal” state; requires built-in redundancy
) Checkpointing and recovery algorithms: roll back and restart
from earlier ”saved” state
) Failure detectors:
2 Diffi cult to distinguish a ”slow” process/message from a failed
process/ never sent message
2 algorithms that ”suspect” a process as having failed and converge on
a
determination of its up/down status
Challenges: Algorithm/Design
(8)
Mobile systems
) Wireless communication: unit disk model; broadcast medium
(MAC), power management etc.
) CS perspective: routing, location management, channel allocation,
localization and position estimation, mobility management
) Base station model (cellular model)
) Ad-hoc network model (rich in distributed graph theory problems)
Sensor networks: Processor with electro-mechanical
Interface Ubiquitous or pervasive computing
) Processors embedded in and seamlessly pervading
environment
) Wireless sensor and actuator mechanisms; self-organizing;
network-centric, resource-constrained
) E.g., intelligent home, smart workplace
Peer-to-peer computing
) No hierarchy; symmetric role; self-organizing; efficient object
storage and lookup;scalable; dynamic reconfig
Publish/subscribe, content distribution
) Filtering information to extract that of interest
Distributed agents
) Processes that move and cooperate to perform specific tasks;
coordination, controlling mobility, software design and interfaces
Distributed data mining
) Extract patterns/trends of interest
) Data not available in a single repository
Grid computing
) Grid of shared computing resources; use idle CPU cycles
) Issues: scheduling, QOS guarantees, security of machines and
jobs
Security
) Confidentiality, authentication, availability in a distributed
setting
) Manage wireless, peer-to-peer, grid environments
2 Issues: e.g., Lack of trust, broadcast media, resource-constrained,
lack of structure