Module 2
Module 2
Module - II
Module – II
Lesson Plan
L1: Logical time – A framework for a system of logical clocks, Scalar time
L2: Vector time
L3: Leader election algorithm – Bully Algorithm, Ring Algorithm.
L4: Global state and snapshot recording algorithms – System model and
definitions.
L5: Snapshot algorithm for FIFO channels – Chandy Lamport algorithm.
L6: Termination detection – System model of a distributed computation
L7: Termination detection using distributed snapshots.
L8 : Termination detection by weight throwing, Spanning tree-based algorithm
Logical time
in distributed systems, it is not possible to have global physical time;
it is possible to realize only an approximation of it
As asynchronous distributed computations make progress in spurts, it turns
out that the logical time, which advances in jumps, is sufficient to capture the
fundamental monotonicity property(order) associated with causality in
distributed systems
Causality (or the causal precedence relation) among events in a distributed
system is a powerful concept in reasoning, analysing, and drawing inferences
about a computation
The knowledge of the causal precedence relation among the events of
processes helps solve a variety of problems in distributed systems
Logical time
Examples of some of these problems is as follows:
Distributed algorithms design
Tracking of dependent events
Knowledge about the progress
In a system of logical clocks, every process has a logical clock that is advanced
using a set of rules.
Intuitively, this relation is analogous to the earlier than relation provided by the
physical time.
The logical local clock of a process pi and its local view of the global time are
squashed into one integer variable Ci.
2. Total Ordering
Scalar clocks can be used to totally order events in a distributed
system
process identifiers are linearly ordered and a tie among events with
identical scalar timestamp is broken on the basis of their process
identifiers.
The lower the process identifier in the ranking, the higher the priority.
Basic properties
1. Isomorphism
relation “→” induces a partial order on the set of events that are produced by a
distributed execution.
If events in a distributed system are time stamped using a system of vector clocks, we
have the following property.
Vector Time
2. Strong consistency
The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related
3. Event counting
Vector Time
Applications
Since vector time tracks causal dependencies exactly, it finds a wide variety of
applications.
distributed debugging,
implementations of causal ordering communication
causal distributed shared memory,
establishment of global breakpoints
determining the consistency of checkpoints in optimistic recovery
Leader election algorithm
An algorithm for choosing a unique process to play a particular role
(coordinator) is called an election algorithm.
Afterwards, if the process that plays the role of server wishes to retire
then another election is required to choose a replacement.
We say that a process calls the election if it takes an action that initiates
a particular run of the election algorithm.
● Bully algorithm
If the arrived identifier is greater, then it forwards the message to its neighbour.
If the arrived identifier is smaller and the receiver is not a participant, then it
substitutes its own identifier in the message and forwards it; but it does not
forward the message if it is already a participant.
If, however, the received identifier is that of the receiver itself, then this process’s
identifier must be the greatest, and it becomes the coordinator.
The coordinator marks itself as a non-participant once more and sends an elected
message to its neighbour, announcing its election and enclosing its identity
1. Initially, every process is marked as non-
A ring-based election in progress participant. Any process can begin an election.
2. The starting process marks itself as participant
and place its identifier in a message to its
3 neighbour.
17
3. A process receives a message and compare it
4 with its own. If the arrived identifier is larger, it
passes on the message.
24 4. If arrived identifier is smaller and receiver is
not
9 a participant,
message andsubstitute
forward if.its own identifier
It does in the
not forward the
message if it is already a participant.
1 5.On forwarding of any case, the process marks
itself as a participant.
15 6. If the received identifier is that of the receiver
itself, then this process’s identifier must be the
28 24
greatest, and it becomes the coordinator.
7. The coordinator marks itself as non-participant,
set electedi and sends an elected message to
its neighbour enclosing its ID.
8. When a process receives elected message, it
marks itself as a non-participant, sets its variable
electedi and forwards the message.
2. The bully algorithm
Process with highest id will be the coordinator
There are three types of message in this algorithm:
The process that knows it has the highest identifier can elect itself as
the coordinator simply by sending a coordinator message to all
processes with lower identifiers.
● Eventually, all processes give up but one, and that one is the new
coordinator.
● It holds an election.
ele cti on
1. The process begins an election by C
ele cti
sending an election message to these Stag e on
processes that have a higher ID and 1 p an swe
r
p p p
1 2 3 4
awaits an answer in response. an
2. If none arrives within time T, the swe r ele cti
on
process considers itself the ele cti ele cti C
on on
coordinator
and sends coordinator message to all Stag e
2 an swe
p p p
processes with lower identifiers. p1 2 r
3 4
3. Otherwise, it waits a further time T’ for
coordinator message to arrive. If none, tim eou
t
begins another election. Stag e
3
p p
4. If a process receives a coordinator 1 2
p
3
p
4
message, it sets its variable electedi to Eventu ally.....
coord inat
be the coordinator ID. or C
5. If a process receives an election Stag e
message, it sends back an answer 4 p p p p
1 2
message and begins another election 4
3
unless it has begun one already.
The bully algorithm
Ring algorithm – work out
• In a ring topology 7 processes are connected with different
ID’s as shown: P20->P5->P10->P18->P3->P16->P9 If process
P10 initiates election after how many message passes will the
coordinator be elected and known to all the processes. What
modification will take place to the election message as it
passes through all the processes?Calculate total number of
election messages and coordinator messages
P20
P5
P9
P10
P3
P18
P3
Bully Algorithm – Work out
• Pid’s 0,4,2,1,5,6,3,7, P7 was the initial coordinator and
crashed, Illustrate Bully algorithm, if P4 initiates election ,
Calculate total number of election messages and coordinator
messages
Global state and snapshot recording algorithms
Recording the global state of a distributed system on-the-fly is an important
paradigm when one is interested in analyzing, testing, or verifying properties
associated with distributed execution
Unfortunately, the lack of both a globally shared memory and a global clock in a
distributed system, added to the fact that message transfer delays in these systems are
finite but unpredictable, makes this problem non-trivial.
The state of a process is characterized by the state of its local memory and a history
of its activity.
The state of a channel is characterized by the set of messages sent along the channel
Global state and snapshot recording algorithms
The global state of a distributed system is a collection of the local states of its
components
for failure recovery, a global state of the distributed system (called a checkpoint)
is periodically saved and recovery from a processor failure is done by restoring
the system to the last saved global state
If shared memory were available, an up-to-date state of the entire system would
be available to the processes sharing the memory.
System model
The system consists of a collection of n processes, p1, p2, , pn, that are
connected by channels.
There is no globally shared memory and processes communicate solely by
passing messages.
There is no physical global clock in the system. Message send and receive is
asynchronous.
Messages are delivered reliably with finite but arbitrary time delay.
The actions performed by a process are modeled as three types of events, namely,
internal events, message send events, and message receive events.
For a message mij that is sent by process pi to process pj, let send(mij) and rec(mij)
denote its send and receive events, respectively.
Occurrence of events changes the states of respective processes and channels, thus
causing transitions in the global system state
Global state and snapshot recording algorithms
For example, an internal event changes the state of the process at which it
occurs.
A send event (or a receive event) changes the state of the process that sends
(or receives) the message and the state of the channel on which the message
is sent (or received).
At any instant, the state of process pi, denoted by LSi, is a result of the
sequence of all the events executed by pi up to that instant
Global state and snapshot recording algorithms
A consistent global state
The global state of a distributed system is a collection of the local states of the
processes and the channels. Notationally, global state GS is defined as
A cut is a line joining an arbitrary point on each process line that slices the
space–time diagram into a PAST and a FUTURE.
All the messages that cross the cut from the PAST to the FUTURE are captured
in the corresponding channel state.
Global state and snapshot recording algorithms
Interpretation in terms of cuts
If a global physical clock were available, the following simple procedure could
be used to record a consistent global snapshot of a distributed system.
In this, the initiator of the snapshot collection decides a future time at which
the snapshot is to be taken and broadcasts this time to every process.
All processes take their local snapshots at that instant in the global time.
After a site has recorded its snapshot, it sends a marker along all of its outgoing
channels before sending out any more messages.
Since channels are FIFO, a marker separates the messages in the channel into
those to be included in the snapshot (i.e., channel state or process state) from
those not to be recorded in the snapshot.
The algorithm
If the process has not yet recorded its local state, it records the state of the
channel on which the marker is received as empty and executes the marker
sending rule to record its local state Otherwise, the state of the incoming channel
on which the marker is received is recorded
The algorithm can be initiated by any process by executing the marker sending
rule.
The algorithm terminates after each process has received a marker on all of its
incoming channels.
The recorded local snapshots can be put together to create the global snapshot
Snapshot algorithms for FIFO channels
Chandy–Lamport algorithm
Termination Detection
In distributed processing systems, a problem is typically solved in a distributed
manner with the cooperation of a number of processes.
All messages are received correctly after an arbitrary but finite delay.
Messages sent over the same communication channel may not obey the FIFO
ordering.
3. An idle process can become active only on the receipt of a message from
another process
when a computation terminates, there must exist a unique process which became
idle last.
When a process goes from active to idle, it issues a request to all other processes to
take a local snapshot, and also requests itself to take a local snapshot.
When a process receives the request, if it agrees that the requester became idle
before itself, it grants the request by taking a local snapshot for the request.
A request is said to be successful if all processes have taken a local snapshot for it.
The requester or any external agent may collect all the local snapshots of a request.
Termination detection using distributed snapshots
Informal description
in the recorded snapshot, all the processes are idle and there is no message in
transit to any of the processes
2. Termination detection by weight throwing
In termination detection by weight throwing, a process called controlling
agent monitors the computation.
The weight at each process is zero and the weight at the controlling agent is 1.
The computation starts when the controlling agent sends a basic message to
one of the processes.
Thus, the sum of weights on all the processes and on all the messages in
transit is always 1.
When a process becomes passive, it sends its weight to the controlling agent
in a control message, which the controlling agent adds to its weight.
The edges of the graph represent the communication channels, through which a
process sends messages to neighbouring processes in the graph.
The algorithm uses a fixed spanning tree of the graph with process P0 at its root which
is responsible for termination detection
Process P0 communicates with other processes to determine their states and the
messages used for this purpose are called signals.
A parent node will similarly report to its parent when it has completed processing and
all of its immediate children have terminated, and so on.
The root concludes that termination has occurred, if it has terminated and all of its
immediate children have also terminated
3. A spanning-tree-based termination detection algorithm
The termination detection algorithm generates two waves of signals moving
inward and outward through the spanning tree.
If this token wave reaches the root without discovering that termination has
occurred, the root initiates a second outward wave of repeat signals.
As this repeat wave reaches leaves, the token wave gradually forms and starts
moving inward again.
• Each leaf process, after it has terminated, sends its token to its parent.
• When a parent process terminates and after it has received a token from
each of its children, it sends a token to its parent.
• This way, each process indicates to its parent process that the subtree
below it has become idle.
• The root of the tree concludes that termination has occurred, after it has
become idle and has received a token from each of its children.
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
Spanning Tree Workout
• Apply spanning tree-based termination
detection algorithm in the following scenario.
The nodes are processes 0 to 6. Leaf nodes 3,
4, 5, and 6 are each given tokens T3, T4, T5
and T6 respectively. Leaf nodes 3, 4, 5 and 6
terminate in the order, but before terminating
node 5,it sends a message to node 1