0% found this document useful (0 votes)
72 views

Module 2

The document summarizes logical time and distributed computing algorithms, including: 1. Logical time provides a framework to capture causality in distributed systems using logical clocks instead of physical time. Vector clocks can precisely determine causality between events. 2. Leader election algorithms like the Bully algorithm and ring-based algorithm elect a coordinator process. The ring-based algorithm passes a message clockwise around a ring to elect the process with the highest ID. 3. Algorithms like Chandy-Lamport are used to generate global snapshots of a distributed system to detect termination of distributed computations.

Uploaded by

geethumol
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Module 2

The document summarizes logical time and distributed computing algorithms, including: 1. Logical time provides a framework to capture causality in distributed systems using logical clocks instead of physical time. Vector clocks can precisely determine causality between events. 2. Leader election algorithms like the Bully algorithm and ring-based algorithm elect a coordinator process. The ring-based algorithm passes a message clockwise around a ring to elect the process with the highest ID. 3. Algorithms like Chandy-Lamport are used to generate global snapshots of a distributed system to detect termination of distributed computations.

Uploaded by

geethumol
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

CST 402 - DISTRIBUTED COMPUTING

Module - II
Module – II
Lesson Plan
 L1: Logical time – A framework for a system of logical clocks, Scalar time
 L2: Vector time
 L3: Leader election algorithm – Bully Algorithm, Ring Algorithm.
 L4: Global state and snapshot recording algorithms – System model and
definitions.
 L5: Snapshot algorithm for FIFO channels – Chandy Lamport algorithm.
 L6: Termination detection – System model of a distributed computation
 L7: Termination detection using distributed snapshots.
 L8 : Termination detection by weight throwing, Spanning tree-based algorithm
Logical time
 in distributed systems, it is not possible to have global physical time;
 it is possible to realize only an approximation of it
 As asynchronous distributed computations make progress in spurts, it turns
out that the logical time, which advances in jumps, is sufficient to capture the
fundamental monotonicity property(order) associated with causality in
distributed systems
 Causality (or the causal precedence relation) among events in a distributed
system is a powerful concept in reasoning, analysing, and drawing inferences
about a computation
 The knowledge of the causal precedence relation among the events of
processes helps solve a variety of problems in distributed systems
Logical time
 Examples of some of these problems is as follows:
Distributed algorithms design
Tracking of dependent events
Knowledge about the progress

 The concept of causality is widely used by human beings, often


unconsciously, in the planning, scheduling, and execution

 In day-to-day life, the global time to deduce causality relation is obtained


from loosely synchronized clocks (i.e., wrist watches, wall clocks).

 However, in distributed computing systems, the rate of occurrence of events


is several magnitudes higher and the event execution time is several
magnitudes smaller.

 Consequently, if the physical clocks are not precisely synchronized, the


causality relation between events may not be accurately captured
Logical time
 Network Time Protocols which can maintain time accurate to a few tens of
milliseconds on the Internet, are not adequate to capture the causality
relation in distributed systems.

 However, in a distributed computation, generally the progress is made in


spurts and the interaction between processes occurs in spurts

 In a system of logical clocks, every process has a logical clock that is advanced
using a set of rules.

 Every event is assigned a timestamp and the causality relation between


events can be generally inferred from their timestamps.

 The timestamps assigned to events obey the fundamental monotonicity


property;

 that is, if an event a causally affects an event b, then the timestamp of a is


smaller than the timestamp of b.
A framework for a system of logical clocks
Definition :
A system of logical clocks consists of a time domain T and a logical clock C .
Elements of T form a partially ordered set over a relation <

This relation is usually called the happened before or causal precedence.

Intuitively, this relation is analogous to the earlier than relation provided by the
physical time.

The logical clock C is a function that maps an event e in a distributed system to


an element in the time domain T, denoted as C(e) and called the timestamp of e,
and is defined as follows:
A framework for a system of logical clocks
Implementing logical clocks
Implementation of logical clocks requires addressing two issues

1. data structures local to every process to represent logical time


2. protocol (set of rules) to update the data structures to ensure the
consistency condition.
Scalar Time
The scalar time representation was proposed by Lamport in 1978 as an attempt
to totally order events in a distributed system

Time domain in this representation is the set of non-negative integers.

The logical local clock of a process pi and its local view of the global time are
squashed into one integer variable Ci.

Rules R1 and R2 to update the clocks are as follows:


Scalar Time
Scalar Time
Basic properties
1. Consistency property

2. Total Ordering
 Scalar clocks can be used to totally order events in a distributed
system

 The main problem in totally ordering events is that two or more


events at different processes may have an identical timestamp.
Scalar Time
 for two events

 a tie-breaking mechanism is needed to order such events

 a tie is broken as follows:

 process identifiers are linearly ordered and a tie among events with
identical scalar timestamp is broken on the basis of their process
identifiers.

 The lower the process identifier in the ranking, the higher the priority.

 The timestamp of an event is denoted by a tuple (t, i) where t is its


time of occurrence and i is the identity of the process where it
occurred.
Scalar Time
3. Event counting
 By referring to “d” events can be counted , as its incremental
with every instances
4. No strong consistency
The system of scalar clocks is not strongly consistent; that is, for
two events ei and ej
Vector Time
The system of vector clocks was developed independently by Fidge ,
Mattern , and Schmuck .

In the system of vector clocks, the time domain is represented by a set of


n-dimensional non-negative integer vectors.

A vector clock is a data structure used for determining the partial


ordering of events in a distributed system and detecting causality
violations

Vector Clocks are used in a distributed systems to determine whether


pairs of events are causally related
Vector Time
Vector Time

Basic properties

1. Isomorphism

relation “→” induces a partial order on the set of events that are produced by a
distributed execution.

If events in a distributed system are time stamped using a system of vector clocks, we
have the following property.
Vector Time

2. Strong consistency

The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related

3. Event counting
Vector Time
Applications

Since vector time tracks causal dependencies exactly, it finds a wide variety of
applications.

 distributed debugging,
 implementations of causal ordering communication
 causal distributed shared memory,
 establishment of global breakpoints
 determining the consistency of checkpoints in optimistic recovery
Leader election algorithm
 An algorithm for choosing a unique process to play a particular role
(coordinator) is called an election algorithm.

 An election algorithm is needed for this choice.

 It is essential that all the processes agree on the choice.

 Afterwards, if the process that plays the role of server wishes to retire
then another election is required to choose a replacement.

 We say that a process calls the election if it takes an action that initiates
a particular run of the election algorithm.

 At any point in time, a process Pi is either a participant – meaning that it


is engaged in some run of the election algorithm – or a non-participant
– meaning that it is not currently engaged in any election.
Two algorithms,
● A ring-based election algorithm

● Bully algorithm

1. A ring-based election algorithm


Each process p i has a communication channel to the next process in
the ring, p ( i + 1) mod N ,

all messages are sent clockwise around the ring.

The goal of this algorithm is to elect a single process called the


coordinator,

Initially, every process is marked as a non-participant in an election.


 Any process can begin an election. It proceeds by marking itself as a participant,
placing its identifier in an election message and sending it to its clockwise
neighbour.

 When a process receives an election message, it compares the identifier in the


message with its own.

 If the arrived identifier is greater, then it forwards the message to its neighbour.

 If the arrived identifier is smaller and the receiver is not a participant, then it
substitutes its own identifier in the message and forwards it; but it does not
forward the message if it is already a participant.

 On forwarding an election message in any case, the process marks itself as a


participant.

 If, however, the received identifier is that of the receiver itself, then this process’s
identifier must be the greatest, and it becomes the coordinator.

 The coordinator marks itself as a non-participant once more and sends an elected
message to its neighbour, announcing its election and enclosing its identity
1. Initially, every process is marked as non-
A ring-based election in progress participant. Any process can begin an election.
2. The starting process marks itself as participant
and place its identifier in a message to its
3 neighbour.
17
3. A process receives a message and compare it
4 with its own. If the arrived identifier is larger, it
passes on the message.
24 4. If arrived identifier is smaller and receiver is
not
9 a participant,
message andsubstitute
forward if.its own identifier
It does in the
not forward the
message if it is already a participant.
1 5.On forwarding of any case, the process marks
itself as a participant.
15 6. If the received identifier is that of the receiver
itself, then this process’s identifier must be the
28 24
greatest, and it becomes the coordinator.
7. The coordinator marks itself as non-participant,
set electedi and sends an elected message to
its neighbour enclosing its ID.
8. When a process receives elected message, it
marks itself as a non-participant, sets its variable
electedi and forwards the message.
2. The bully algorithm
Process with highest id will be the coordinator
There are three types of message in this algorithm:

an election message is sent to announce an election;

an answer message is sent in response to an election message

a coordinator message is sent to announce the identity of the elected


process.

The process that knows it has the highest identifier can elect itself as
the coordinator simply by sending a coordinator message to all
processes with lower identifiers.

On the other hand, a process with a lower identifier can begin an


election by sending an election message to those processes that have a
If none arrives within time T, the process considers itself the
coordinator and sends a coordinator message to all processes with
lower identifiers announcing this.

Otherwise, the process waits a further period T for a coordinator


message to arrive from the new coordinator.

If a process p i receives a coordinator message, it sets its variable


elected i to the identifier of the coordinator contained within it and
treats that process as the coordinator.

If a process receives an election message, it sends back an answer


message and begins another election – unless it has begun one
already.
When a process, P, notices that the coordinator is no longer
responding to requests, it initiates an election.

● P sends an ELECTION message to all processes with higher no.

● If no one responds, P wins the election and becomes a coordinator.

● If one of the higher-ups answers, it takes over.

P’ s job is done. When a process gets an ELECTION message from


one of its lower-numbered colleagues:

● Receiver sends an OK message back to the sender to indicate that


he is alive and will take over.
● Receiver holds an election, unless it is already holding one.

● Eventually, all processes give up but one, and that one is the new
coordinator.

● The new coordinator announces its victory by sending all processes a


message telling them that starting immediately it is the new
coordinator.

If a process that was previously down comes back:

● It holds an election.

● If it happens to be the highest process currently running, it will win


the election and take over the coordinator’ s job.
The bully algorithm

ele cti on
1. The process begins an election by C
ele cti
sending an election message to these Stag e on
processes that have a higher ID and 1 p an swe
r
p p p
1 2 3 4
awaits an answer in response. an
2. If none arrives within time T, the swe r ele cti
on
process considers itself the ele cti ele cti C
on on
coordinator
and sends coordinator message to all Stag e
2 an swe
p p p
processes with lower identifiers. p1 2 r
3 4
3. Otherwise, it waits a further time T’ for
coordinator message to arrive. If none, tim eou
t
begins another election. Stag e
3
p p
4. If a process receives a coordinator 1 2
p
3
p
4
message, it sets its variable electedi to Eventu ally.....
coord inat
be the coordinator ID. or C
5. If a process receives an election Stag e
message, it sends back an answer 4 p p p p
1 2
message and begins another election 4
3
unless it has begun one already.
The bully algorithm
Ring algorithm – work out
• In a ring topology 7 processes are connected with different
ID’s as shown: P20->P5->P10->P18->P3->P16->P9 If process
P10 initiates election after how many message passes will the
coordinator be elected and known to all the processes. What
modification will take place to the election message as it
passes through all the processes?Calculate total number of
election messages and coordinator messages
P20

P5

P9

P10
P3

P18

P3
Bully Algorithm – Work out
• Pid’s 0,4,2,1,5,6,3,7, P7 was the initial coordinator and
crashed, Illustrate Bully algorithm, if P4 initiates election ,
Calculate total number of election messages and coordinator
messages
Global state and snapshot recording algorithms
 Recording the global state of a distributed system on-the-fly is an important
paradigm when one is interested in analyzing, testing, or verifying properties
associated with distributed execution

 Unfortunately, the lack of both a globally shared memory and a global clock in a
distributed system, added to the fact that message transfer delays in these systems are
finite but unpredictable, makes this problem non-trivial.

 A distributed computing system consists of spatially separated processes that do not


share a common memory and communicate asynchronously with each other by
message passing over communication channels

 Each component of a distributed system has a local state.

 The state of a process is characterized by the state of its local memory and a history
of its activity.

 The state of a channel is characterized by the set of messages sent along the channel
Global state and snapshot recording algorithms
 The global state of a distributed system is a collection of the local states of its
components

 Recording the global state of a distributed system is an important paradigm and


it finds applications in several aspects of distributed system design

 For examples, in detection of stable properties such as deadlocks, and


termination , global state of the system is examined for certain properties;

 for failure recovery, a global state of the distributed system (called a checkpoint)
is periodically saved and recovery from a processor failure is done by restoring
the system to the last saved global state

 If shared memory were available, an up-to-date state of the entire system would
be available to the processes sharing the memory.

 The absence of shared memory necessitates ways of getting a coherent and


complete view of the system based on the local states of individual processes.
Global state and snapshot recording algorithms
 A meaningful global snapshot can be obtained if the components of the
distributed system record their local states at the same time

System model
 The system consists of a collection of n processes, p1, p2, , pn, that are
connected by channels.
 There is no globally shared memory and processes communicate solely by
passing messages.
 There is no physical global clock in the system. Message send and receive is
asynchronous.
 Messages are delivered reliably with finite but arbitrary time delay.

 The system can be described as a directed graph in which vertices represent


the processes and edges represent unidirectional communication channels.
Global state and snapshot recording algorithms
System model
 Let Cij denote the channel from process pi to process pj

 Processes and channels have states associated with them.

 The state of a process at any time is defined by the contents of processor


registers, stacks, local memory

 The state of channel Cij, denoted by SCij,

 The actions performed by a process are modeled as three types of events, namely,
internal events, message send events, and message receive events.

 For a message mij that is sent by process pi to process pj, let send(mij) and rec(mij)
denote its send and receive events, respectively.

 Occurrence of events changes the states of respective processes and channels, thus
causing transitions in the global system state
Global state and snapshot recording algorithms
 For example, an internal event changes the state of the process at which it
occurs.

 A send event (or a receive event) changes the state of the process that sends
(or receives) the message and the state of the channel on which the message
is sent (or received).

 The events at a process are linearly ordered by their order of occurrence.

 At any instant, the state of process pi, denoted by LSi, is a result of the
sequence of all the events executed by pi up to that instant
Global state and snapshot recording algorithms
A consistent global state
The global state of a distributed system is a collection of the local states of the
processes and the channels. Notationally, global state GS is defined as

A global state GS is a consistent global state if it satisfies the following two


conditions
Global state and snapshot recording algorithms
Interpretation in terms of cuts

 Cuts in a space–time diagram provide a powerful graphical aid in representing


and reasoning about the global states of a computation.

 A cut is a line joining an arbitrary point on each process line that slices the
space–time diagram into a PAST and a FUTURE.

 A consistent global state corresponds to a cut in which every message


received in the PAST of the cut has been sent in the PAST of that cut.

 Such a cut is known as a consistent cut.

 All the messages that cross the cut from the PAST to the FUTURE are captured
in the corresponding channel state.
Global state and snapshot recording algorithms
Interpretation in terms of cuts

Cut C1 is inconsistent because message m1 is flowing from the FUTURE to the


PAST
Global state and snapshot recording algorithms
Issues in recording a global state

 If a global physical clock were available, the following simple procedure could
be used to record a consistent global snapshot of a distributed system.

 In this, the initiator of the snapshot collection decides a future time at which
the snapshot is to be taken and broadcasts this time to every process.

 All processes take their local snapshots at that instant in the global time.

 However, a global physical clock is not available in a distributed system and


the following two issues need to be addressed in recording of a consistent
global snapshot of a distributed system

 1: How to distinguish between the messages to be recorded in the snapshot


(either in a channel state or a process state) from those not to be recorded
 2: How to determine the instant when a process takes its snapshot
Snapshot algorithms for FIFO channels
Chandy–Lamport algorithm
The Chandy-Lamport algorithm uses a control message, called a marker.

After a site has recorded its snapshot, it sends a marker along all of its outgoing
channels before sending out any more messages.

Since channels are FIFO, a marker separates the messages in the channel into
those to be included in the snapshot (i.e., channel state or process state) from
those not to be recorded in the snapshot.

The algorithm

A process initiates snapshot collection by executing the marker sending rule by


which it records its local state and sends a marker on each outgoing channel
Snapshot algorithms for FIFO channels
Chandy–Lamport algorithm
A process executes the marker receiving rule on receiving a marker.

If the process has not yet recorded its local state, it records the state of the
channel on which the marker is received as empty and executes the marker
sending rule to record its local state Otherwise, the state of the incoming channel
on which the marker is received is recorded

The algorithm can be initiated by any process by executing the marker sending
rule.

The algorithm terminates after each process has received a marker on all of its
incoming channels.

The recorded local snapshots can be put together to create the global snapshot
Snapshot algorithms for FIFO channels
Chandy–Lamport algorithm
Termination Detection
 In distributed processing systems, a problem is typically solved in a distributed
manner with the cooperation of a number of processes.

 In such an environment, inferring if a distributed computation has ended is


essential so that the results produced by the computation can be used

 messages used for the purpose of termination detection (by a termination


detection algorithm) are called control messages.

A termination detection (TD) algorithm must ensure the following:

1. Execution of a TD algorithm cannot indefinitely delay the underlying


computation; that is, execution of the termination detection algorithm must
not freeze the underlying computation

2. The termination detection algorithm must not require addition of new


communication channels between processes.
Termination Detection
System model of a distributed computation

 A distributed computation consists of a fixed set of processes that


communicate solely by message passing.

 All messages are received correctly after an arbitrary but finite delay.

 Communication is asynchronous, i.e., a process never waits for the receiver


to be ready before sending a message.

 Messages sent over the same communication channel may not obey the FIFO
ordering.

 A distributed computation has the following characteristics:


Termination Detection
System model of a distributed computation

1. At any given time during execution of the distributed computation, a process


can be in only one of the two states: active, and idle,

2. An active process can become idle at any time

3. An idle process can become active only on the receipt of a message from
another process

4. Only active processes can send messages.

5. A message can be received by a process when the process is in either of the


two states,

6. The sending of a message and the receipt of a message occur as atomic


actions.
Termination Detection
System model of a distributed computation
Termination detection using distributed snapshots
 The algorithm uses the fact that a consistent snapshot of a distributed system
captures stable properties.

 Termination of a distributed computation is a stable property.

 Thus, if a consistent snapshot of a distributed computation is taken after the


distributed computation has terminated, the snapshot will capture the
termination of the computation.

 The algorithm assumes that there is a logical bidirectional communication


channel between every pair of processes.

 Communication channels are reliable but non-FIFO. Message delay is


arbitrary but finite.
1. Termination detection using distributed
snapshots
Informal description

 The main idea behind the algorithm is as follows:

 when a computation terminates, there must exist a unique process which became
idle last.

 When a process goes from active to idle, it issues a request to all other processes to
take a local snapshot, and also requests itself to take a local snapshot.

 When a process receives the request, if it agrees that the requester became idle
before itself, it grants the request by taking a local snapshot for the request.

 A request is said to be successful if all processes have taken a local snapshot for it.

 The requester or any external agent may collect all the local snapshots of a request.
Termination detection using distributed snapshots
Informal description

 If a request is successful, a global snapshot of the request can thus be


obtained and the recorded state will indicate termination of the computation,
viz.,

 in the recorded snapshot, all the processes are idle and there is no message in
transit to any of the processes
2. Termination detection by weight throwing
 In termination detection by weight throwing, a process called controlling
agent monitors the computation.

 A communication channel exists between each of the processes and the


controlling agent and also between every pair of processes.

 Initially, all processes are in the idle state.

 The weight at each process is zero and the weight at the controlling agent is 1.

 The computation starts when the controlling agent sends a basic message to
one of the processes.

 The process becomes active and the computation starts.

 A non-zero weight W (0 < W ≤ 1) is assigned to each process in the active state


and to each message in transit in the following manner:
2. Termination detection by weight throwing
 When a process sends a message, it sends a part of its weight in the message.

 When a process receives a message, it add the weight received in the


message to its weight.

 Thus, the sum of weights on all the processes and on all the messages in
transit is always 1.

 When a process becomes passive, it sends its weight to the controlling agent
in a control message, which the controlling agent adds to its weight.

 The controlling agent concludes termination if its weight becomes 1.


2. Termination detection by weight throwing
2. Termination detection by weight throwing
3. A spanning-tree-based termination detection algorithm
 The algorithm assumes there are N processes Pi, 0 ≤ i ≤ N, which are modelled as the
nodes i, 0 ≤ i ≤ N, of a fixed connected undirected graph.

 The edges of the graph represent the communication channels, through which a
process sends messages to neighbouring processes in the graph.

 The algorithm uses a fixed spanning tree of the graph with process P0 at its root which
is responsible for termination detection

 Process P0 communicates with other processes to determine their states and the
messages used for this purpose are called signals.

 All leaf nodes report to their parents, if they have terminated.

 A parent node will similarly report to its parent when it has completed processing and
all of its immediate children have terminated, and so on.

 The root concludes that termination has occurred, if it has terminated and all of its
immediate children have also terminated
3. A spanning-tree-based termination detection algorithm
 The termination detection algorithm generates two waves of signals moving
inward and outward through the spanning tree.

 Initially, a contracting wave of signals, called tokens, moves inward from


leaves to the root.

 If this token wave reaches the root without discovering that termination has
occurred, the root initiates a second outward wave of repeat signals.

 As this repeat wave reaches leaves, the token wave gradually forms and starts
moving inward again.

 This sequence of events is repeated until the termination is detected


3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
• Initially, each leaf process is given a token.

• Each leaf process, after it has terminated, sends its token to its parent.

• When a parent process terminates and after it has received a token from
each of its children, it sends a token to its parent.

• This way, each process indicates to its parent process that the subtree
below it has become idle.

• In a similar manner, the tokens get propagated to the root.

• The root of the tree concludes that termination has occurred, after it has
become idle and has received a token from each of its children.
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
Spanning Tree Workout
• Apply spanning tree-based termination
detection algorithm in the following scenario.
The nodes are processes 0 to 6. Leaf nodes 3,
4, 5, and 6 are each given tokens T3, T4, T5
and T6 respectively. Leaf nodes 3, 4, 5 and 6
terminate in the order, but before terminating
node 5,it sends a message to node 1

You might also like