KGiSL Institute of Technology
(Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai)
Recognized by UGC, Accredited by NBA (IT)
365, KGiSL Campus, Thudiyalur Road, Saravanampatti, Coimbatore – 641035.
Department of Artificial Intelligence & Data Science
Name of the Faculty : Ms.T.Suganya
Subject Name & Code : CS3551 / DISTRIBUTED COMPUTING
Branch & Department : B.Tech & AI&DS
Year & Semester : III / V
Academic Year :2022-23(ODD)
UNIT I INTRODUCTION 8
Introduction: Definition-Relation to Computer System Components – Motivation – Message - Passing Systems versus Shared
Memory Systems – Primitives for Distributed Communication – Synchronous versus Asynchronous Executions – Design Issues and
Challenges; A Model of Distributed Computations: A Distributed Program – A Model of Distributed Executions – Models of
Communication Networks – Global State of a Distributed System.
UNIT II LOGICAL TIME AND GLOBAL STATE 10
Logical Time: Physical Clock Synchronization: NTP – A Framework for a System of Logical Clocks – Scalar Time – Vector Time;
Message Ordering and Group Communication: Message Ordering Paradigms – Asynchronous Execution with Synchronous
Communication – Synchronous Program Order on Asynchronous System – Group Communication – Causal Order – Total Order;
Global State and Snapshot Recording Algorithms: Introduction – System Model and Definitions – Snapshot Algorithms for FIFO
Channels.
UNIT III DISTRIBUTED MUTEX AND DEADLOCK 10
Distributed Mutual exclusion Algorithms: Introduction – Preliminaries – Lamport’s algorithm – Ricart- Agrawala’s Algorithm –– Token-
Based Algorithms – Suzuki-Kasami’s Broadcast Algorithm; Deadlock Detection in Distributed Systems: Introduction – System Model
– Preliminaries – Models of Deadlocks – Chandy-Misra-Haas Algorithm for the AND model and OR Model.
UNIT IV CONSENSUS AND RECOVERY 10
Consensus and Agreement Algorithms: Problem Definition – Overview of Results – Agreement in a Failure-Free
System(Synchronous and Asynchronous) – Agreement in Synchronous Systems with Failures; Checkpointing and Rollback
Recovery: Introduction – Background and Definitions – Issues in Failure Recovery – Checkpoint-based Recovery – Coordinated
Checkpointing Algorithm - - Algorithm for Asynchronous Checkpointing and Recovery
UNIT V CLOUD COMPUTING 7
Definition of Cloud Computing – Characteristics of Cloud – Cloud Deployment Models – Cloud Service Models – Driving Factors and
Challenges of Cloud – Virtualization – Load Balancing – Scalability and Elasticity – Replication – Monitoring
SYLLABUS
UNIT II LOGICAL TIME AND GLOBAL STATE 10
Logical Time: Physical Clock Synchronization: NTP – A Framework for a System of
Logical Clocks – Scalar Time – Vector Time; Message Ordering and Group
Communication: Message Ordering Paradigms – Asynchronous Execution with
Synchronous Communication – Synchronous Program Order on Asynchronous
System – Group Communication – Causal Order – Total Order; Global State and
Snapshot Recording Algorithms: Introduction – System Model and Definitions
– Snapshot Algorithms for FIFO Channels.
Course Outcomes
OUTCOMES:
Upon the completion of this course, the student will be able to
CO1: Explain the foundations of distributed systems (K2)
CO2: Solve synchronization and state consistency problems (K3)
CO3 Use resource sharing techniques in distributed systems (K3)
CO4: Apply working model of consensus and reliability of distributed systems (K3)
CO5: Explain the fundamentals of cloud computing (K2)
GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS
Global state and snapshot recording algorithms are used in
distributed systems to capture a consistent snapshot of the
system's state at a particular point in time.
This allows for various distributed algorithms and protocols to be
implemented effectively.
Recording the global state of a distributed system on-the-fly is indeed a
crucial paradigm for analyzing, testing, and verifying properties associated
with distributed executions.
However, it poses challenges due to the absence of a globally shared
memory and a global clock in distributed systems, coupled with the
unpredictability of message transfer delays.
GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS
To address these challenges, the concept of consistent global states,
also known as consistent snapshots, is defined.
A consistent global state, refers to a snapshot of a distributed system
where the recorded states of individual processes and the
communication channels between them satisfy certain consistency
properties.
Several algorithms for determining snapshots in different types of
networks will be based on the properties of their communication
channels such as FIFO, non-FIFO, and causal delivery.
GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS- INTRODUCTION
In DC system, the processes are spatially separated and lack a common memory.
Communication between these processes occurs asynchronously through message
passing over communication channels.
Each component (a process or a communication channel), has its own state.
Local State: The local state of a process represents its current data, variables, program counter,
and any other relevant information that describes its internal state. This local state is specific to
each individual process and is not directly accessible by other processes in the system.
Channel State: Communication channels in a distributed system also have a state. It represents the
messages that are currently "in transit" between processes. The channel state is dynamic and
changes as messages are sent and received.
Global State: The global state of a distributed system is a collection of the local states of all its
components, including processes and communication channels. It represents the overall snapshot
of the system at a particular point in time.
GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS- INTRODUCTION
DC -Absence of shared memory & global clock - challenging to record Global state.
But Recording global state is crucial for various applications in distributed system
design. Here are some examples:
Detection of Stable Properties: It is possible to identify if the system has reached a deadlock or has
terminated its execution.
Failure Recovery: If a processor fails, the system can be restored to the last saved global state,
ensuring that the execution can resume from a known consistent state.
Debugging Distributed Software: By capturing a snapshot, developers can examine the state of the
system, identify bugs, and analyze the behavior of distributed components.
Monitoring Distributed Events: By capturing snapshots of the global state at specific intervals, it
becomes possible to analyze the behavior of the system and ensure its proper functioning.
Protocol Specification and Verification: By capturing consistent global snapshots, it becomes possible
to analyze the execution of protocols, verify properties, and ensure correctness of the system's
behavior.
Discarding Obsolete Information: Recording the global state helps in discarding obsolete
GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS- INTRODUCTION
Example: Let S1 and S2 be two distinct sites of a distributed system which maintain
bank accounts A and B, respectively. A site refers to a process in this example. Let
the communication channels from site S1 to site S2 and from site S2 to site S1 be
denoted by C12 and C21, respectively. Consider the following sequence of actions,
which are also illustrated in the timing diagram of Figure 4.1:
GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS
Based on the given sequence of actions, let's track the state of the
distributed system at each time step:
Time t0:
• Account A at site S1: $600
• Account B at site S2: $200
• Communication channel C12 (from S1 to S2): $0
• Communication channel C21 (from S2 to S1): $0
Time t1:
• Site S1 initiates a transfer of $50 from Account A to Account B.
• Account A at site S1: $550 (decremented by $50)
• Account B at site S2: $200
• Communication channel C12 (from S1 to S2): $50 (request for $50 credit
to Account B)
GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS
Please note that the state of the system at time t1 captures the changes
made by site S1 as it initiates the transfer, but site S2 has not yet
processed the request or updated its state.
The provided example demonstrates a basic transfer operation between two sites in
a distributed system. The transfer is initiated by site S1, and the request to credit
Account B is sent to site S2 over communication channel C12. The system's global
state is distributed across the local states of S1 and S2, as well as the
communication channels between them.
It's important to consider that the example only shows the state at a specific time,
and further steps would be required to complete the transaction and maintain
consistency in the system.
SYSTEM MODEL AND DEFINITIONS
SYSTEM MODEL:
In the system model, there are n processes, namely p1, p2, ..., pn. These processes are connected to each
other through channels, denoted as Cij, where i represents the sender process and j represents the receiver
process. The channels facilitate the communication between processes, allowing them to exchange messages.
Each process has its own local memory and does not have direct access to the memory of other processes.
The system operates in an asynchronous manner, which means that there is no physical global clock. Each
process has its own local clock, and there is no guarantee that the clocks of different processes are perfectly
synchronized. As a result, the notion of time in the system is relative and can vary across processes.
Message send and receive operations are also asynchronous. When a process sends a message, there is no
immediate guarantee of when the message will be delivered to the receiving process.
However, the system assumes that messages are delivered reliably, meaning that once a message is sent, it
will eventually be received by the intended recipient. The delivery of messages may be subject to finite but
arbitrary time delays, which can vary based on factors such as network congestion or system load.
SYSTEM MODEL AND DEFINITIONS
SYSTEM MODEL:
The actions performed by a process are modeled as three types of events, namely,
internal events, message send events, and message receive events.
For a message mij that is sent by process pi to process pj, let send(mij) and rec(mij)
denote its send and receive events, respectively.
Occurrence of events changes the states of respective processes and channels, thus
causing transitions in the global system state.
For example, an internal event changes the state of the process at which it occurs. A
send event (or a receive event) changes the state of the process that sends (or receives)
the message and the state of the channel on which the message is sent (or received).
The events at a process are linearly ordered by their order of occurrence.
SYSTEM MODEL AND DEFINITIONS
SYSTEM MODEL:
At any instant, the state of process pi, denoted by LSi, is a result of the sequence of all the
events executed by pi up to that instant.
For an event e and a process state LSi, e ∈ LSi iff e belongs to the sequence of events
that have taken process pi to state LSi.
For an event e and a process state LS i, e ∈ LSi iff e does not belong to the sequence of
events that have taken process pi to state LS i.
A channel’s state depends on the local states of the processes on which it is incident. If a
snapshot recording algorithm records the state of processes p i and pj as LSi and LSj,
respectively, then it must record the state of channel C ij as transit(LSi , LSj).
SYSTEM MODEL AND DEFINITIONS
A CONSISTENT GLOBAL STATE :
The global state of a distributed system is a collection of the local states of the processes and the
channels. Notationally, global state GS is defined as
A global state GS is a consistent global state iff it satisfies the following two conditions
Condition C1 states the law of conservation of messages. Every message mij that is recorded as sent in
the local state of a process pi must be captured in the state of the channel Cij or in the collected local state
of the receiver process pj.
Condition C2 states that in the collected global state, for every effect, its cause must be present. If a
message mij is not recorded as sent in the local state of process pi, then it must neither be present in the
state of the channel Cij nor in the collected local state of the receiver process pj.
SYSTEM MODEL AND DEFINITIONS
INTERPRETATION IN TERMS OF CUTS
Cuts in a space–time diagram provide a powerful graphical aid in representing and reasoning
about the global states of a computation.
A cut is a line joining an arbitrary point on each process line that slices the space–time diagram
into a PAST and a FUTURE.
The PAST region of a cut represents the events and states that have occurred before the cut.
The FUTURE region of a cut represents the events and states that have not yet occurred at the
time of the cut.
A consistent global state corresponds to a cut in which every message received in the PAST of
the cut has been sent in the PAST of that cut. Such a cut is known as a consistent cut.
All the messages that cross the cut from the PAST to the FUTURE are captured in the
corresponding channel state.
SYSTEM MODEL AND DEFINITIONS
In Figure 4.2, Cut C1 is inconsistent because message m1 is flowing from the
FUTURE to the PAST. Cut C2 is consistent and message m4 must be captured in
the state of channel C21.
SYSTEM MODEL AND DEFINITIONS
ISSUES IN RECORDING A GLOBAL STATE :
In the absence of global physical clock, recording a consistent global snapshot requires
addressing two key issues:
I1: Distinguishing Messages to be Recorded: To determine which messages should be
recorded in the global snapshot (either in a channel state or a process state), conditions C1
and C2 come into play:
Condition C1: Any message that is sent by a process before recording its snapshot must be
included in the global snapshot. This condition ensures that all messages sent prior to the
snapshot are captured, maintaining the consistency of the global state.
Condition C2: Any message that is sent by a process after recording its snapshot must not be
recorded in the global snapshot. This condition prevents messages sent after the snapshot
from being included, ensuring that the snapshot reflects a consistent state at a specific point in
time.
SYSTEM MODEL AND DEFINITIONS
ISSUES IN RECORDING A GLOBAL STATE :
I2: Determining the Snapshot Instant: The determination of when a process takes its
snapshot can be addressed using condition C2:
Condition C2: A process pj must record its snapshot before processing a message mij
that was sent by process pi after recording its snapshot.
This condition ensures that a process captures its snapshot before handling any
messages sent by other processes after their respective snapshots.
By doing so, the process guarantees that its snapshot reflects a consistent global state
without including messages that were sent by other processes after their snapshots.
SNAPSHOT ALGORITHMS FOR FIFO CHANNELS
Snapshot algorithms for FIFO (First-In-First-Out) channels in distributed
systems aim to capture a consistent global snapshot of the system's state.
Two popular algorithms for recording global snapshots in systems with FIFO
channels:
Chandy and Lamport algorithm
Lai-Yang algorithm.
The Chandy-Lamport algorithm is a renowned algorithm for recording consistent
global snapshots in distributed systems.
It was proposed by K.Mani Chandy and Leslie Lamport in 1985. This algorithm is
designed to work with systems that have FIFO (First-In-First-Out) channels between
processes.
SNAPSHOT ALGORITHMS FOR FIFO CHANNELS
Chandy-Lamport algorithm:
The Chandy-Lamport algorithm operates using the concept of marker messages.
These marker messages are used to indicate the state at which the snapshot is
taken.
After a process has recorded its local snapshot, it sends a marker message along all
its outgoing channels before sending any further regular messages.
The purpose of these markers is to act as delimiters or separators within the channels.
By introducing markers, the algorithm distinguishes between the messages that
should be included in the snapshot (channel state or process state) and those that
should not.
Markers act as boundaries or checkpoints, allowing the receiving process to identify
the messages that were sent before the snapshot and must be included in the snapshot,
while discarding or excluding messages sent after the snapshot.
SNAPSHOT ALGORITHMS FOR FIFO CHANNELS
Chandy-Lamport algorithm:
SNAPSHOT ALGORITHMS FOR FIFO CHANNELS
Chandy-Lamport algorithm:
These rules ensure that each process correctly records the state of channels by using
markers as delimiters.
When a process receives a marker, it determines whether it has already recorded its
state or not.
If it has not recorded its state, it records the empty set for the channel and proceeds to
send markers along its outgoing channels.
If the state has already been recorded, it records the set of messages received after
its state was recorded but before receiving the marker.
SNAPSHOT ALGORITHMS FOR FIFO CHANNELS
Correctness: To prove the correctness of the Chandy-Lamport algorithm,
we need to show that a recorded snapshot satisfies conditions C1 and C2.
Condition C1 - if a process receives a message mij that precedes the
marker on channel Cij, it includes mij in its recorded snapshot.
Condition C2-no messages sent after a marker on a channel are recorded
in the snapshot.
Complexity:
The recording part of a single instance of the Chandy-Lamport algorithm requires
O(e) messages and O(d) time, where e is the number of edges in the network and d is
the diameter of the network.
SNAPSHOT ALGORITHMS FOR FIFO CHANNELS
SNAPSHOT ALGORITHMS FOR FIFO CHANNELS
In both these possible runs of
the algorithm, the recorded
global states never occurred
in the execution.
This happens because a
process can change its state
asynchronously before the
markers it sent are received
by other sites and the other
sites record their states.
Thus, the recorded global
state is a valid state in an
equivalent execution and if a
stable property holds in the
system before the snapshot
algorithm begins, it holds in
the recorded global
snapshot.