21CS401 /DISTRIBUTED
SYSTEMS
Course Objective
• To explain the foundation and challenges of distributed systems.
• To infer the knowledge of message ordering and group
communication.
• To demonstrate the distributed mutual exclusion and deadlock
detection algorithms.
• To predict the significance of check pointing and rollback recovery
algorithms
• To summarize the characteristics of peer-to-peer and distributed
shared memory system
SYLLABUS
UNIT-1
• Introduction: Definition –Characteristics-Relation to computer system
components –Motivation – Message-passing systems versus shared
memory systems –Primitives for distributed communication –
Synchronous versus asynchronous executions –Challenges of
Distributed system: System Perspective. A model of distributed
computations: A distributed program –A model of distributed
executions –Models of communication networks –Global state – Cuts
of a distributed computation.
UNIT-2-MESSAGE ORDERING &
GROUP COMMUNICATION
• Message ordering and group communication: Message ordering
paradigms –Asynchronous execution with synchronous
communication –Synchronous program order on an asynchronous
system –Group communication – Causal order (CO) - Total order.
UNIT-3-DISTRIBUTED MUTEX &
DEADLOCK
• Distributed mutual exclusion algorithms: Introduction – Preliminaries
– Lamport‘s algorithm – Ricart-Agrawala algorithm – Maekawa‘s
algorithm – Suzuki–Kasami‘s broadcast algorithm. Deadlock detection
in distributed systems: Introduction – System model – Preliminaries –
Models of deadlocks – Knapp‘s classification.
UNIT-4-CHECKPOINTING AND
ROLLBACK RECOVERY
• Introduction – Background and definitions – Issues in failure recovery
– Checkpoint-based recovery – Log-based rollback recovery – Koo-
Toueg coordinated checkpointing algorithm – Juang- Venkatesan
algorithm for asynchronous checkpointing and recovery.
UNIT-5-P2P & DISTRIBUTED
SHARED MEMORY
• Peer-to-peer computing and overlay graphs: Introduction – Data
indexing and overlays – Content addressable networks – Tapestry.
Distributed shared memory: Abstraction and advantages – Types of
memory consistency models.
COURSE OUTCOMES
• CO1: Illustrate the models of communication in building a distributed
environment. (K2-Understand)
• CO2: Interpret the order of message in communication network for
synchronous and asynchronous system. (K2-Understand)
• CO3: Use the Mutex and Deadlock detection algorithm in real time
application.( K2-Understand)
• CO4: Discover the issues of check pointing and rollback recovery mechanisms
in distributed environment. (K2-Understand)
• CO5: Relate the features of peer-to-peer and memory consistency models for
a given application.
• (K2-Understand)
TEXT BOOK(S):
• T1: Kshemkalyani, Ajay D., and Mukesh Singhal. Distributed
computing: principles, algorithms, and systems. Cambridge University
Press, 2011.
• T2: George Coulouris, Jean Dollimore and Tim Kindberg, ”Distributed
Systems Concepts and Design”, 5th Edition, Pearson Education, 2017.
• Tanenbaum A.S., Van Steen M., "Distributed Systems: Principles and
paradigms”, 2nd Edition, Pearson
• Education, 2017.
UNIT I
Chapter 1,2
Introduction: Definition –Characteristics-Relation to computer system
components –Motivation – Message-passing systems versus shared
memory systems –Primitives for distributed communication –
Synchronous versus asynchronous executions –Challenges of Distributed
system: System Perspective. A model of distributed computations: A
distributed program –A model of distributed executions –Models of
communication networks –Global state – Cuts of a distributed
computation.
Chapter 1-Introduction
• A distributed system is a system in which components are located on
different networked computers, which can communicate and coordinate their
actions by passing messages to one another.
• A distributed System is a collection of autonomous computer systems that are
physically separated but are connected by a centralized computer network
that is equipped with distributed system software
• A distributed system is a collection of independent(autonomous)
entities that cooperate to solve a problem that cannot be
individually solved.
Middleware
• Middleware refers to software that acts as an intermediary between
different applications, services, or components in a distributed
system.
Types of Distributed Systems
• Client-server systems: The most traditional and simple type of
distributed system, involves a multitude of networked computers
that interact with a central server for data storage, processing, or
other common goal.
Types of Distributed Systems
• Peer-to-peer networks: They distribute workloads among hundreds
or thousands of computers all running the same software.
Characteristics of distributed system:
• No common physical clock
• element of “distribution” in the system and gives rise to the
inherent asynchrony among the processors
• No shared memory
• it requires message-passing for communication
• distributed system provides the abstraction of common address space
via distributed shared memory abstraction.
Characteristics of distributed
system
• Geographical separation
• The geographically wider apart processors are the representative of a
distributed system
• it may be in wide-area network (WAN) or the network/cluster of
workstations (NOW/COW) configuration connecting processors on a LAN
• NOW configuration is the low-cost high-speed off-the-shelf processors.
Characteristics of distributed
system
• Autonomy and heterogeneity
• The processors are “loosely coupled” having different speeds and
each runs different operating system but cooperate with one another
by offering services for solving a problem jointly.
Relation to computer system
components
Relation to computer system
components
• In distributed system each computer has a memory-processing
unit and are connected by a communication network.
• Figure 1.2 shows the relationships of software components that
run on computers use the local operating system and network
protocol stack for functioning.
• A distributed software is also termed as middleware.
Relation to computer system
components
• A distributed execution is the execution of processes across the
distributed system to collaboratively achieve a common goal which is
also termed a computation or a run.
• A distributed system follows a layered architecture that reduces the
complexity of the system design.
• Middleware hides the heterogeneity transparently at the platform
level
Relation to computer system
components
• It is assumed that the middleware layer does not contain the
application layer functions like http, mail, ftp, and telnet.
• User program code includes the code to invoke libraries of the
middleware layer to support the reliable and ordered multicasting.
Relation to computer system
components
• Some of the commercial versions of middleware often in use are
CORBA, DCOM (distributed component object model), Java, and RMI
(remote method invocation), message-passing interface (MPI).
Motivation
• Inherently distributed computations
• Resource sharing
• Access to geographically remote data and resources
• Increased performance/cost ratio
• availability, integrity, fault-tolerance
• Enhanced reliability
• Scalability
Motivation
• Inherently distributed computations
• Resource sharing
• Resources cannot be fully replicated at all the sites because it is often neither practical
nor cost-effective.
• Access to geographically remote data and resources
• In many scenarios, the data cannot be replicated at every site participating in the
distributed execution because it may be too large or too sensitive to be replicated.
Enhanced reliability
• A distributed system increased reliability because of the possibility of
replicating resources and executions
• geographically distributed resources are not likely to crash/malfunction
at the same time under normal circumstances.
• Availability: resource should be accessible at all times;
• Integrity: value/state of the resource must be correct, in the face of concurrent access
from multiple processors,
• Fault-tolerance: ability to recover from system failures.
• Increased performance/cost ratio
• By resource sharing and accessing geographically remote data and resources, the
performance/cost ratio is increased.
• Scalability
• As the processors are usually connected by a wide-area network, adding more
processors does not pose a direct bottleneck for the communication network.
• Modularity and incremental expandability
• Heterogeneous processors may be easily added into the system without affecting the
performance, as long as those processors are running the same middleware
algorithms.
• Similarly, existing processors may be easily replaced by other processors.
Message-passing vs. Shared Memory
Shared memory systems are those in which there is a (common) shared address
space throughout the system.
• Communication among processors takes place via shared data variables, and
control variables for synchronization among the processors.
• Eg for shared memory:Semaphores and monitors
Message-passing vs. Shared
Memory
Message passing
• All multicomputer systems do not have shared address space but communicate
by message passing.
• Programmers find it easier to program using shared memory than by message
passing. leads to development of abstraction(idea)
• Abstraction (shared memory) is provided to simulate a shared address space.
• For a distributed system, this abstraction is called distributed shared memory.
Message-passing vs. Shared
Memory
• 1.5.1 Emulating message-passing on a shared memory system (MP →SM)
• The shared address space is partitioned into disjoint parts, one part being
assigned to each processor.
• “Send” and “receive” operations are implemented for writing to and
reading from the destination/sender processor’s address space,
respectively.
Message-passing vs. Shared
Memory
• Specifically, a separate location is reserved as mailbox (assumed to have
unbounded in size) for each ordered pair of processes.
• A Pi–Pj message-passing can be emulated by a write by Pi to the mailbox and
then a read by Pj from the mailbox.
• The write and read operations are controlled using synchronization
primitives to inform the receiver/sender after the data has been
sent/received.
Message-passing vs. Shared
Memory
• 1.5.2 Emulating shared memory on a message-passing system (SM
→MP)
• This involves use of “send” and “receive” operations for “write” and
“read” operations.
• Each shared location can be modeled as a separate process;
• “write” to a shared location is emulated by sending an update
message to the corresponding owner process and a “read” by
sending a query message.
Message-passing vs. Shared
Memory
• As accessing another processor’s memory requires send and receive
operations, this emulation is expensive.
• In a MIMD message-passing multicomputer system, each “processor” may
be a tightly coupled multiprocessor system with shared memory. Within
the multiprocessor system, the processors communicate via shared
memory.
• Between two computers, the communication by message passing are
more suited for wide-area distributed systems.
Primitives for distributed
communication
• Blocking/non-blocking, synchronous/asynchronous primitives
• Processor synchrony
• Libraries and standards
Blocking/non-blocking, synchronous/asynchronous primitives
• Message send and message receive communication primitives are denoted
by Send() and Receive().
• Send primitive has 2 parameters destination, user buffer space containing
the data to be sent
• Receive primitive has 2 parameters source from which the data to be
received , user buffer space into which the data is to be received.
Blocking/non-blocking,
synchronous/asynchronous primitives
2 ways of sending data buffered option, unbuffered option
• buffered option copies data from user buffer to the kernel buffer, later
the data gets copied from kernel buffer onto the network.
• unbuffered option data gets copied directly from user buffer onto the
network.
Blocking/non-blocking,
synchronous/asynchronous primitives
• Send primitive uses the buffered option and unbuffered option.
• Receive primitive buffered option required because the data already
arrived when the primitive is invoked, and needs a storage place in
the kernel.
Blocking/non-blocking,
synchronous/asynchronous primitives
Synchronous (send/receive)
• Send and Receive are synchronous when establish a Handshake between
sender and receiver
• Send completes when Receive completes
• Receive completes when data copied into buffer
Asynchronous (send)
• Control returns to process when data copied out of user-specified buffer
Blocking/non-blocking,
synchronous/asynchronous primitives
Blocking (send/receive)
• Control returns to invoking process after processing of primitive (whether sync
or async) completes
Nonblocking (send/receive)
• Control returns to process immediately after invocation (even though
operation has not completed)
• Send: control returns to the process even before data copied out of user buffer
• Receive: control returns to the process even before data may have arrived from
Processor synchrony
• Processor synchrony indicates that all the processors execute in lock-step with
their clocks synchronized
lock-step similar devices share the same timing and triggering and essentially
acting as a single device
Libraries and standards
• Many commercial software products (banking, payroll, etc., applications) use
proprietary primitive libraries supplied with the software marketed by the vendors
(e.g., the IBM CICS software which has a very widely installed customer base
worldwide uses its own primitives).
• The message-passing interface (MPI) library and the PVM (parallel virtual machine)
library are used largely by the scientific community, but other alternative libraries
exist
Synchronous versus asynchronous executions
• In addition to the two classifications of processor synchrony/asynchrony and of
synchronous/asynchronous communication primitives, there is another classification,
namely that of synchronous/asynchronous executions.
Asynchronous execution
• ) No processor synchrony, no bound on drift rate of clocks
• ) Message delays finite but unbounded
• ) No bound on time for a step at a process
Synchronous versus
asynchronous executions
Synchronous versus
asynchronous
Sync execution
executions
• ) Processors are synchronized; clock drift rate
bounded
• ) Message delivery occurs in one logical step/round
• ) Known upper bound on time to execute a step at a
process
EMULATION:Synchronous versus asynchronous executions
• Already discussed how shared memory system could be emulated by a message-passing system,
and vice-versa.
• We now have four broad classes of programs, as shown in Figure 1.11.
• Using the emulations shown, any class can be emulated by any other.
• If system A can be emulated by system B, denoted A/B, and if a problem is not solvable in B, then
it is also not solvable in A.
• Likewise, if a problem is solvable in A, it is also solvable in B.
• Hence, in a sense, all four classes are equivalent in terms of computability” – what can and
cannot be computed – in failure-free systems.
Challenges of Distributed system: System Perspective
• Communication mechanisms: E.g., Remote Procedure Call (RPC),
remote object invocation (ROI), message-oriented vs. stream-
oriented communication
• Processes: Code migration, process/thread management at clients
and servers, design of software and mobile agents
Challenges of Distributed system:
System Perspective
• Naming: Easy to use identifiers needed to locate resources and
processes transparently and scalably
• Synchronization: synchronization or coordination among processes
are essential.
Mutual exclusion is the classical example of
synchronization
Challenges of Distributed system:
System Perspective
• Data storage and access
• Schemes for data storage, search, and lookup should be fast and
scalable across network
• Consistency and replication
• Replication for fast access, scalability, avoid bottlenecks
• Require consistency management among replicas
Challenges of Distributed
system: System Perspective
•Fault-tolerance: recover from failure;despite link, node,
process failures
•Distributed systems security
• ) Secure channels, access control, key management (key
generation and key distribution), authorization, secure
group management
Challenges of Distributed
system: System Perspective
•Scalability and modularity of algorithms, data,
services
The algorithms, data (objects), and services must
be as distributed as possible.
Challenges of Distributed system:
System Perspective
•API for communications, services: ease of use(non-
technical users)
•Transparency: hiding implementation policies from user
•) Access: hide differences in data representation across
systems, provide uniform operations to access resources
• ) Location: locations of resources are transparent
• ) Migration: relocate resources without renaming
Challenges of Distributed system:
System Perspective
• Replication: hide replication from the users
• Concurrency: mask the use of shared resources
• Failure: reliable and fault-tolerant operation
•API for communications, services: ease of use(non-technical
users)
•Transparency: hiding implementation policies from user
• ) Access: hide differences in data representation across systems,
provide uniform operations to access resources
• ) Location: locations of resources are transparent
• ) Migration: relocate resources without renaming
• ) Relocation: relocate resources as they are being accessed
• ) Replication: hide replication from the users
• ) Concurrency: mask the use of shared resources
• ) Failure: reliable and fault-tolerant operation
Chapter 2:
A Model of Distributed Computations
• A model of distributed computations: A distributed program –A model
of distributed executions –Models of communication networks –
Global state – Cuts of a distributed computation..
A Distributed Program
• A distributed program is composed of a set of n asynchronous
processes, p1, p2, ..., pi , ..., pn.
• The processes do not share a global memory and communicate
solely by passing messages.
A Distributed Program
• Process execution and message transfer are asynchronous.
• Assume that each process is running on a different processor.
• Let Cij denote the channel from process pi to process pj and let mij
denote a message sent by pi to pj .
• The message transmission delay is finite and unpredictable.
A Model of Distributed
Executions
• The execution of a process consists of a sequential execution of its
actions.
• The actions are atomic and the actions of a process are modeled as
three types of events.
• internal events
• message send events,
• message receive events.
A Model of Distributed
Executions
• For a message m, let send(m) & rec(m) denote send and receive
events, respectively.
• The occurrence of events changes the states of respective processes
and channels, thus causing transitions in the global system state.
• An internal event changes the state of the process at which it occurs.
A Model of Distributed
Executions
• A send event changes
• the state of the process that sends or receives and
• the state of the channel on which the message is sent.
• An internal event only affects the process at which it occurs.
A Model of Distributed
Executions
A Model of Distributed
Executions
• For every message m that is exchanged between two processes, have
Send(m)→msg rec(m)
• Relation →msg defines causal dependencies between send and
receive events.
• Fig 2.1 shows a distributed execution using space–time diagram that
involves three processes.
• A horizontal line represents the progress of the process; a dot
indicates an event; a slant arrow indicates a message transfer.
A Model of Distributed
Executions
In this figure, for process p1, the 2nd event is a message send event, the 3rd
event is an internal event, and the 4th event is a message receive event.
Causal Precedence Relation
• The execution of a distributed application results in a set of
distributed events produced by the processes.
• Let H=∪i hi denote the set of events executed in a distributed
computation.
• Next Define a binary relation → on the set H as follows that
expresses causal dependencies between events in the
distributed execution.
• Concurrent events
• Logical vs. physical concurrency
Models of communication networks
Models of the service provided by communication networks are
• In the FIFO model, each channel acts as a first-in first-out message queue
and thus, message ordering is preserved by a channel.
• In the non-FIFO model, a channel acts like a set in which the sender
process adds messages and the receiver process removes messages from
it in a random order.
Models of communication
networks
• The “causal ordering” model is based on Lamport’s “happens before”
relation. A system that supports the causal ordering model satisfies
the following property:
Models of communication
networks
• Causally ordered delivery of messages implies FIFO message delivery.
• Note that CO ⊂ FIFO ⊂ Non-FIFO.
• Causal ordering model is useful in developing distributed algorithms.
• Example: Replicated database systems, every process that updates a
replica must receives updates in the same order to maintain database
consistency.
Models of communication
networks
Global state of a distributed system
• The global state of a distributed system is a collection of the local states of its
components, namely, the processes and the communication channels
• The state of a process at any time is defined by the contents of processor
registers, stacks, local memory, etc. and depends on the local context of the
distributed application.
Models of communication
networks
• The state of a channel is given by the set of messages in transit in the
channel.
• The occurrence of events changes the states of respective processes
and channels, thus causing transitions in global system state.
Models of communication
networks
• For example, an internal event changes the state of the process at
which it occurs.
• A send event (or a receive event) changes the state of the process
that sends (or receives) the message and the state of the channel on
which the message is sent (or received).
Models of communication
networks
• LSi0 denotes the initial state of process pi.
• LSix is a result of the execution of all the events executed by process
pi till eix.
Models of communication
networks
• Let SCijx,y denote the state of a channel Cij defined as follows:
• Thus, channel state SCijx,y denotes all messages that pi sent up to
event eix and which process pj had not received until event ejy.
Global state
• The global state GS of a distributed system is a collection of the local
states of the processes and the channels is defined as
Global state
• For a global snapshot, the states of all the components of the
distributed system must be recorded at the same instant.
• This is possible if the local clocks at processes were perfectly
synchronized by the processes.
• Basic idea is that a message cannot be received if it was not sent
i.e., the state should not violate causality. Such states are called
consistent global states.
Global state
• Inconsistent global states are not meaningful in a distributed system.
Models of communication
networks
global state GS consisting of local states
{LS11, LS23, LS33 , LS42} is
inconsistent
{LS12, LS24, LS34, LS42} is consistent;
Cuts of a distributed
computation
• A consistent global state corresponds to a cut in which every message
received in the PAST of the cut was sent in the PAST of that cut. Such
a cut is known as a consistent cut.
• All messages that cross the cut from the PAST to the FUTURE are in
transit in the corresponding consistent global state.
• A cut is inconsistent if a message crosses the cut from the FUTURE to
the PAST.
• For example, the space–time diagram of Figure 2.3 shows two cuts,
C1 and C2.
• C1 is an inconsistent cut, whereas C2 is a consistent cut.
Past and Future Cones of an
.
Event
Past Cone of an Event
An event ej could have been affected only by all events ei such
that ei → ej .
In this situtaion, all the information available at ei could be
made accessible at ej .
All such events ei belong to the past of ej .
Let Past(ej ) denote all events in the past of ej in a computation (H,
→).
Then,
Past(ej ) = {ei |∀ei∈ H, ei → ej }.
Figure. 2.4 (next slide) shows
A Model of Distributed the past of an event
. ej .
. . . Past and Future Cones of an
.
Event Figure 2.4: Illustration of past and future
cones.
max(Pasti (ej )) min(Futurei (ej ))
PAST(ej )
FUTURE( e j )
e
j
. .
88 / 1
. . . Past and Future Cones of an
.
Event
Let Pasti (ej ) be the set of all those events of Past(ej ) that
are on process pi .
Pasti (ej ) is a totally ordered set, ordered by the relation →i ,
whose maximal element is denoted by max (Pasti (ej )).
max (Pasti (ej )) is the latest event at process pi that affected
event ej (Figure 2.4).
. .
. . . Past and Future Cones of an
.
Event
S
Let Max Past(ej ) = (∀i ) {max (Pasti (ej ))}.
Max Past(ej ) consists of the latest event at every process that
affected event
ej and is referred to as the surface of the past cone of ej .
Past(ej ) represents all events on the past light cone that affect ej .
Future Cone of an Event
The future of an event ej , denoted by Future(ej ), contains all
events ei that are causally affected by ej (see Figure 2.4).
In a computation (H, →), Future(ej ) is defined as:
Future(ej ) = {ei |∀ei∈ H, ej → ei }.
. .
90 / 1
. . . Past and Future Cones of an
.
Event
Define Futurei (ej ) as the set of those events of Future(ej ) that are on
process
pi .
define min(Futurei (ej )) as the first event on process pi that is affected by
ej .
Define Min Future(ej ) as S(∀i ){min(Futurei (ej ))}, which consists of the
first event at every process that is causally affected by event ej .
Min Future(ej ) is referred to as the surface of the future cone of ej .
All events at a process pi that occurred after max (Pasti (ej )) but before
min(Futurei (ej )) are concurrent with ej .
Therefore, all and only those events of computation H that belong to the
set
“H − Past(ej ) − Future(ej )” are concurrent with event ej .
. .