0% found this document useful (0 votes)
18 views

Mod 3 Part 1

Uploaded by

rar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Mod 3 Part 1

Uploaded by

rar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Synchronization related issues in distributed

System
▪ Clock synchronization
▪ Event Ordering
▪ Mutual Exclusion
▪ Election algorithm
▪ Every computer needs a timer(computer clock) to
keep track of the current time.
▪ In a distributed system an application may have
processes that concurrently run on multiple nodes
of the system.
▪ For correct results it is required that the clocks of
the nodes are synchronized with each other.
▪ Eg : to find the time taken for a message to
transmit from one node to another.
▪ Correct results impossible if the clocks are not
synchronized.
▪ Assume case 1 : Clocks are synchronized
▪ Problem to find the time taken to transmit a message from
sender to receiver.
▪ Let current time be on the sender be10:15.
▪ Since clocks are perfectly synchronized current time on the
receiver is also 10.15.
▪ Message is transmitted from the sender with a
timestamp10.15.
▪ Message is received at the receiver’s node at 10.17(according
to the receiver’s clock).
▪ Time taken = 2 min.
▪ Assume case 2 : Clocks are not synchronized
▪ Also assume same sender and same receiver.
▪ Let current time on the sender be10:15.
▪ Since clocks are not synchronized , current time on
the receiver is 10.16.(Receiver is 1 min. ahead
of the sender)
▪ Message is transmitted from the sender with a
timestamp10.15.
▪ Message is received at the receiver’s node(after 2 min.)
at 10.18 (according to the receiver’s clock).
▪ Time taken is incorrectly computed to be 3 min.
Reason : CLOCKS ARE NOT SYNCHRONIZED
How are computer clocks implemented?
A computer clock consists of 3 components :
1. Quartz crystal that oscillates at a well-defined
frequency. Oscillation depends on the kind of crystal,
how it is cut etc.
2. A constant register – Used to store a constant value.
3. A counter register – Used to keep track of the
oscillation of the quartz crystal.
▪ Each oscillation of the crystal decrements the counter
register by one.
▪ When the counter reaches zero 2 events are generated
1. An interrupt is generated and one time unit is added
the the current time.
2. The counter register is reloaded with the value in the
constant register.
▪ Within a single CPU system, it does not matter much if
this clock is off by a small amount. Since all processes
used the same clock, they will be internally consistent.
▪ In a distributed system, with n CPU’s there are n crystals
and each one may oscillate at different frequencies
▪ Crystals running at different rates will results in clocks
gradually out of sync and give different value when read
out.
▪ This differences in time values of two clocks is
called clock skew.
▪ External synchronization: clocks are synchronized with
an authoritative, external clock called Coordinated
Universal Time (UTC).
▪ UTC is an international Standard
▪ Time zones around the world are expressed as positive
or negative offsets from UTC
▪ To provide UTC, National Institute of Standard Time
(NIST) operates a radio station WWV which broadcasts
time on several short wave frequencies.
▪ If a machine has a WWV receiver, all machines can be
synchronized to it.
▪ Hence a computer clock must be periodically
resynchronized with the real time to keep it non-faulty.
• Assume that when the UTC time is t, the time value
of a clock on machine p is Cp(t).
• If all clocks in the world were perfectly
synchronized, we would have-----→
Cp(t)=t for all p and all t
• In an ideal case dC/dt=1
dC/dt=drift rate.
• If the Maximum allowable drift rate is ρ then
a clock is said to be non-faulty if the following
condition holds
1- ρ ≤ dC/dt ≤ 1 + ρ
The relation between clock time and UTC when clocks tick at different
rates.
 Let the maximum allowable drift rate ρ be 0.5
 i.e in 1 min a drift of 30 sec is allowed
 i.e after 1 min of UTC time a slow clock can show
either 30 sec and a fast clock can show 1 min 30
sec.
 If current UTC time is 10.15. Assume at this point
all clock in a distributed system are synchronized.
 After synchronization all clocks drift away from
UTC, some become slower than UTC and some
become faster than UTC
 In a distributed system if one clock is slow and one is fast at
a time Δt after they were synchronized, the maximum
deviation between the time value of two clocks will be
 =2ρΔt
Justification :
Let the current UTC time be 10.15, so are all clocks in the
system assuming they are synchronized.
Let ρ=0.5 .
After 2 min UTC = 10.17.
But a slow clock will show 10.16 and a fast clock will show
10.18
Hence difference in time value is 2 min(2*0.5*2)
 Hence to guarantee that no 2 clocks in a set
of clocks ever differ by more than  , the
clocks in the set must be resynchronized
periodically with the time interval between
two synchronization (Δt) less than or equal to

 Δt =  / 2ρ
1) Physical clocks
 Centralized Algorithms
UTC
◦ Cristian’s Algorithm
◦ Berkeley Algorithm
 Decentralized Algorithms
◦ Averaging Algorithms
2) Logical clocks
◦ Lamport’s clock
◦ Vector clocks
 Assume one machine (the time server) has a WWV
receiver and all other machines are to stay
synchronized with it.
 Every /2 seconds, each machine sends a
message to the time server asking for the
current time.
 Time server responds with message
containing its current time, CUTC.
 When the sender gets a reply, it can just
set its clock to CUTC.
Drawback of Cristian’s algorithm
Propagation delay
• It takes a non zero amount of time for the server’s
reply (CUTC) to get back.
• By the time the client receives the reply, time at the
server has moved ahead.
Solution – Measure the propagation delay.
• Record accurately the interval between sending the
request to the time server and the arrival of reply.
• Let T0 be the time at which the request is sent.

• Let T1 be the time at which the reply is received.


T0 T1
Client

Server

Hence the one way message propagation time is (T1-T0)/2


When the reply comes, the value in the message van be increased
by this amount to give an estimate of the server’s current time.
T = CUTC + (T1-T0)/2
T ---- time at the client
Improvement
Estimate can be improved if it is known approx. how long it takes
for the time sever to handle the interrupt

Hence one way propagation time = (T1-T0-I)/2


T = CUTC + (T1-T0-I)/2
2. Berkeley’s Algorithm
• In Cristian’s algorithm time server is passive.
• In Berkeley’s algorithm time server is active.
• Time server periodically sends a message (time=?) to all the
machines in a group.
• In response each computer sends back it’s clock value to the
time server.
• It takes an average of all computers (including it’s own).
• The calculated time is the current time to which all clocks
should be readjusted.
• However instead of sending the average back to the other
computers, the time server sends the amount by which each
computer’s clock requires adjustments.
• Value can be (+) or (-).
 Figure (a) The time  Figure (c) The time
 daemon asks all the other  Figure  daemon tells
 machines for their clock  (b) The machines everyone how to
answer. adjust their clock.
 values.
• The clock process at each node broadcasts its local
clock time in the form of ‘resync’ message when it’s
local time equals T0+iR
where i=integer,
T0=fixed time agreed upon by all nodes.
R =system parameter that depends on factors
like total no. of nodes in a system etc.
• A resync message is broadcast from each node at the
beginning of each fixed length resynchronization
interval.
• However since the clocks of different nodes run at
slightly different rates these broadcasts will not happen
simultaneously from all nodes.
• After broadcasting the clock value , the clock process of a
node waits for a time T.
• During this waiting period, the clock process collects the
resync messages broadcast by other nodes.
• For each resysnc message the clock process records the time
according to its own clock, when the message was received.
• At the end of the waiting period each clock process estimates
the skew of it’s clock w.r.t each of the other nodes on the
basis of times at which it received resync messages.
• It then computes a fault tolerant average of the skews and
uses it to correct the local clock before the start of the next
resynchronization interval.
Eg: Let the agreed upon time be(T0+iR) 10.15

a b c d

Current time 10.10 10.05 10.00 10.15


Msg sent by d (-5) (-10) (-15) (00)

After 5 min. a b c d

Current time 10.15 10.10 10.05 10.20


Msg sent by a (00) (-5) (-10) (+05)
After 5 min.

a b c d

Current time 10.20 10.15 10.10 10.25


Msg sent by b (+05) (00) (-05) (+10)

After 5 min.
a b c d

Current time 10.25 10.20 10.15 10.30


Msg sent by c (+10) (+05) (00) (+15)
 Every processor computes an average of the skews
a= (-5+5+10+0)/4=10/4= +2.5

b=(-10-5+0+5)/4=-10/4= -2.5

c=(-15-10-5+0)/4= -30/4= -7.5

d=(0+5+10+15)/4=30/4= +7.5
a is currently 10.25 , skew computed = +2.5
i.e a is ahead by 2.5.
Hence decreases its clock by 2.5 to get --→ 10.22.30 sec
b is currently 10.20 , skew computed = -2.5
i.e b is behind by 2.5.
Hence increases its clock by 2.5 to get --→ 10.22.30 sec

Similarly
c increases by 7.5
Hence 10.15 becomes 10.22.30 sec

And
d decreases by 7.5
Hence 10.30 becomes 10.22.30 sec
 Last 3 algorithms rely on absolute or physical
time to find out which event occurred before
which event.
 Alternative----→
 Lamport showed clock synchronization need
not be absolute. What is important is that all
processes agree on the order in which events
occur.
 Eg :
In Unix ,large programs are split into multiple source files, so
that a change to one source file only requires one file to be
recompiled not all files.
If a program consists of 100 files, not having to recompile all
become of change in 1 leads to considerable time saving.
Consider a Unix command called make .
When the programmer has finished changing all the source files, he
runs the make command. Make examines the time at which the
source and object files were last modified. If the source file input.c
has time 2152 and the corresponding object file input.o has time
2150, make knows that input.c has been changed since input.o was
created and must be recompiled.
 On the other hand if output.c --- 2144
output.o ---- 2145
No compilation needed
Thus make goes through all the source files to find out
which ones need recompilation and which do not.
 In a distributed system editor runs one
machine and the compiler runs on the other
machine.
 Assume there is no synchronization of time.
 Suppose output.o has time 2144
 Shortly thereafter output.c is modified but is
assigned time 2143 since the clock on it’s
machine is slightly behind.
 Make will not call the compiler although it
should have.
 In this example what count is whether
output.c is old than output .o( no
recompilation) or output.c is newer than
output .o (recomplication is needed).
ie which event took place first.
 Use logical clocks.
 To implement logical clocks Lamport defined
a relation called happens before
 Lamport defined a relation ”happens before”. a → b ‘a happens
before b’.
 Happens before is observable in two situations:

1. If a and b are events in the same process, and a


occurs before b, then a → b is true.

2. If a is the event of a message being sent by one


process, and b is the event of the message being
received by another process, then a → b is also true.

3. If a → b and b → c then a → c. i.e happened-before is


a transitive.
Points to note :

1. If a → b then event a causally affects event b

2. If the 2 events a and b happen in 2 different processes that do not


exchange messages (communicate) then
a → b is not true
nor b→ a.
These events are said to be concurrent
• In a happens-before relationship, time is assigned for
every event a, C(a) on which all processes agree.
• These time values must have the property that if a→b
then C(a)<C(b)
Lamport’s algorithm using Physical Clocks
• Consider 3 processes running on different processes,
each with it’s own clock, running at it’s own speed.
• When a clock has ticked 6 times in process 0, it has
ticked 8 times in process 1 and 10 times in process 2.
• Each clock runs at a constant rate but the rates are
different due to differences in crystals.
0 1 2 0 1 2

Correction using Lamport’s algorithm


 For a msg receiving event a check is made to
see if the current time on the receiver’s clock
is less than or equal to the time in time
message. If so, the receiver’s physical clock
is corrected by fast forwarding it’s clock to be
1 more than the time in the message.
To summarize –
• If a happens before b in the same process
then C(a)<C(b)
• If a and b represent the sending and receiving
of a message respectively, then C(a)<C(b)
Lamport’s algorithm using Logical Clocks
Total Ordering of events
Happens before relationship is only partial ordering of events in
a distributed system since it may happen that 2 events not
related by happens before relationship may have the same
timestamps associated with them. Eg→ e11 and e21both have
timestamp 1.
Hence we need to totally order events in a
distributed system.
i.e. for all events a and b C(a) ≠C(b)
Hence concatenate logical clock with distinct
process id no.
1.1 2.1 3.1 4.1
1

2
1.2 2.2 3.2 5.2
3
1.3
Eg :
e11 e12 e13 e14 e15

e21 e22 e23

e31 e32 e33


Eg :
e11 e12 e13 e14 e15
1 2 3 4 5

e21 e22 e23


1 2 6

e31 e32 e33

1 2 3

C(e22) < C(e13)


Does not mean that e22 happened before e13
◦ Let n be the number of processes in a distributed
system.
◦ Each process Pi is equipped with a clock Ci(a),
which is an integer vector of length n .
◦ The clock Ci can be thought of as a function that
assigns a vector Ci(a) to any event a .
◦ Ci(a) is referred to as the timestamp of event a at Pi.
◦ Ci[i] the ith entry of Ci, corresponds to Pi's own
logical time.
◦ Ci[j] is Pi's best guess of the logical time at Pj.
Time Ci assigned time to an event a has the
following 2 properties

1. Ci[i] is the number of events that have


occurred so far at Pi.

2. If Ci[j] =k then Pi knows that k events have


occurred at Pj.
Rules on vector clocks in a system with n computers:
1. Each computer starts with a local clock set at
[0,0,..,0]
2. When on computer i there is a sending event,
increment the ith component of the clock by 1
leaving other components unchanged.
This is the timestamp of the event.
Then tag the timestamp with the message and send
it onto the receiver process.
3. When on computer i there is a receiving event, form
a new local clock value taking the component wise
maximum of the local clock and the time stamp on
the arriving message. Then increment by 1 the ith
component. Finally tag the event with this value.
 Eg
P1
e11 e12 e13
Space

P2
e21 e22 e23 e24

P3
e31 e32

Global time
(1,0,0) (2,0,0) (3,4,1)
P1
e11 e12 e13
(0,1,0) (2,2,0) (2,3,1) (2,4,1)
Space

P2
e21 e22 e23 e24
(0,0,1) (0,0,2)
P3
e31 e32

Global time
(10 0) (2 0 2)

(0 1 1) (1 3 1)
(1 2 1)

( 0 0 1) (0 0 2) ( 1 3 3)
Vector Clocks captures causality
If v and w are two events --→
▪ If each element of timestamp v is less than or equal to
the corresponding element of timestamp w , then v
casually precedes w.
▪ If each element of timestamp v is greater than or equal
to the corresponding element of timestamp w , then w
casually precedes v.
▪ If neither, some elements greater and some less, then v
and w are concurrent.
Hence a system of vector clocks allows us to order events and decide
whether 2 events are causally related or not by simply looking at the
timestamps of the events
 Many Distributed Systems require a single
process to act as coordinator process
eg : Time server in the Berkeley algorithm
 If the coordinator fails due failure of the site
on which it is located, a new coordinator
process must be elected to take up the job of
the failed coordinator.
 Election algorithms are meant for electing a
coordinator process from the currently
running processes.
Assumptions :
1. Every process has a unique number e.g : it’s
network address (Ip address).
2. Every process knows the process number of
every other process, but what a process
does not know is which ones are currently
active and which are dead.
3. Election algorithms locate the process with
the highest number and designate it as a
coordinator.
1. Bully Algorithm
Bully: “the biggest guy in town wins”.
• When a process P sends a request message to the
coordinator and does not receive a reply within a fixed
timeout period, it assumes that the coordinator has failed.
• P holds election--→
 P sends an ELECTION message to all processes
with higher id numbers.
 If no one responds, P wins the election and
becomes coordinator.
 If a higher process responds, it takes over.
 Process P’s job is done.
 At any moment, a process can receive an
ELECTION message from one of its lower
numbered colleagues.
 The receiver sends an OK back to the sender
(to indicate that he is alive) and conducts its
own election.
 Eventually only the bully process (highest
number process) remains. The bully
announces victory to all processes in the
distributed group.
Process 4 holds Process 5 and 6 respond, Now 5 and 6
an election telling 4 to stop each hold an election
Process 6 tells 5 to stop Process 6 wins and tells everyone
If a process that was previously down comes back:
It holds an election.
If it happens to be the highest process currently running, it will
win the election and take over the coordinator’s job
2. Ring Algorithm
Assumptions :
1. Processes in the system are organized in a logical
ring.
2. Ring is unidirectional i.e. all messages are passed
only in one direction (clockwise/anticlockwise)
3. Every process in the system knows the structure of
the ring so that while circulating a message over
the ring if the successor of the sender process is
down the sender can skip the successor until an
active member is located.
 In Ring algorithm if any process (Pi) notices that the current
coordinator has failed, it starts an election by sending an
election message to the first neighbor (first active successor) on
the ring.

 The election message contains the node’s process identifier


and is forwarded on around the ring.

 On receiving the election message the successor adds it’s own


process no. and passes it onto the next active member in the
ring.
 Eventually the election message returns back to process Pi.
 Pi recognizes the message as it’s own election message by seeing
that the first number in the list of process no. is it’s own.
 Pi now receives a list of process nos. that are currently active.
 Pi elects the process having the highest process number as the new
coordinator.
 Pi then circulates a coordinator message over the ring to inform all
the active processes who the new coordinator is.
 When a process Pj recovers from failure it creates an inquiry
message and sends it to the successor. The message contains the
identity of Pj.
 If the successor is not the coordinator it forwards the message to its
successor. In this way the inquiry message reaches the current
coordinator. The current coordinator sends a reply to Pj informing
that it is the current coordinator.
Initiation:
1. Process 4 sends an ELECTION message to its successor (or next
alive process) with its ID
2. Each process adds its own ID and forwards the ELECTION
message
Leader Election:
1. Message comes back to initiator, here the initiator is 4.
2. Initiator announces the winner by sending another
message around the ring
In a distributed system a shared resource could be concurrently accessed
by multiple sites.
Hence it is necessary that access to a shared resource should be mutually
exclusive, i.e. at a time only one site should access the shared resource.
Centralized algorithm
 One process is elected as the coordinator.

 Whenever a process wants to enter a critical region, it sends a request


message to the coordinator stating which critical region it wants to enter
and asking for permission.
 If no other process is currently in that critical region, the coordinator
sends back a reply granting permission.
 When the reply arrives, the requesting process enters the critical region.

 If a process is already in CR, and another process asks permission to


enter the same CR then the coordinator either
1. Refrains from replying or
2. Sends a reply saying “permission denied”
Either way the request is queued by the coordinator
Process 1 asks the coordinator for Process 2 then asks When process 1 exits
permission to enter a critical permission to enter the the critical region, it
region. Permission is granted. same critical region. tells the coordinator,
The coordinator does which then replies to 2.
not reply.
Advantages:
1. Guarantees mutual exclusion.
2. Requires only 3 messages per use of CR
( request, grant, release)
Disadvantages:
1. Single point of failure.
2. A process cannot distinguish a dead
coordinator from permission denied.
3. Single coordinator becomes a performance
bottleneck
Classification of mutual exclusion algorithms
1. Non token based :
 These algorithms require 2 or more successive rounds of message
exchanges among the sites.
 These algorithms are assertion based because a site can enter its
critical section when assertion defined by its local variables
becomes true.
 Mutual exclusion is achieved because assertion becomes true only
at one site at any given time.
2. Token Based
 A unique token is shared among the sites(privilidge message) . A
site is allowed to enter its CS if it possesses the token and it
continues to hold the token until the execution of the CS is over.
 These algorithms essentially differ in the way a site carries out the
search for the token
Requirements of mutual exclusion algorithms
▪ Freedom from Deadlocks.
Two or more sites should not endlessly wait for messages that will
never arrive.
▪ Freedom from starvation.
A site should not be forced to wait indefinitely to execute CS while
other sites are repeatedly executing CS. That is, every requesting site
should get an opportunity to execute CS in a finite time.
▪ Fairness.
Fairness dictates that requests must be executed in the order they
are made (or the order in which they arrive in the system). since a
physical global clock does not exist, time is determined by logical
clocks.
▪ Fault Tolerance.
A mutual exclusion algorithm is fault-tolerant if in the wake of a
failure, it can reorganize itself so that it continues to function without
any (prolonged) disruptions
➢ The performance of mutual exclusion algorithms is generally
measured by the following four metrics:

• The number of messages necessary per CS invocation


• The synchronization delay : The time required after a site
leaves the CS and before the next site enters the CS.
• The response time :The time interval a request waits for its CS
execution to be over after its request messages have been sent
out.
• The system throughput: The rate at which the system executes
requests for the CS.
system throughput = 1/ (sd + E)
where sd is the synchronization delay and E is the average
critical section execution time

76
Last site exits Next site
CS enters CS

FIGURE
Synchronization delay
Synchronization delay time

Its request
message sent The site exits the
CS request out The site enters the CS
arrives CS

CS execution time time

Response time
FIGURE
Response time

78
▪ Here, a site communicates with a set of other sites to arbitrate
who should execute the CS next.
▪ For a site Si, request set Ri contains ids of all those sites from
which site Si must acquire permission before entering the CS.
▪ These algorithms use timestamps to order requests for the CS
and to resolve conflicts between simultaneous requests for the
CS.
▪ Logical clocks are maintained and updated according to
Lamport’s scheme.
▪ Smaller timestamp requests have higher priority over larger
timestamp requests.

79
▪ Lamport proposed Distributed Mutual Execution algorithm which
was based on his clock synchronization scheme.
▪ In Lamport’s algorithm
• Every site Si keeps a queue, request_queue, which contains
mutual exclusion requests ordered by their timestamps
• Algorithm requires messages to be delivered in the FIFO order
between every pair of sites
▪ Requesting the CS
1. When a site Si wants to enter the CS, it sends a REQUEST(tsi,i)
message to all the sites in its request set Ri and places the
request on request_queuei, ((tsi, i) is the timestamp of the
request.)
2. When a site Sj receives the REQUEST (tsi,i) message from site
Si, it returns a timestamped REPLY message to Si and places
site Si’s request on request_queuej
80
Executing the CS

Site Si enters the CS when the two following conditions hold:

• Si has received a message with timestamp larger than (tsi, i) from


all other sites.

• Si’s request is at the top of request_queuei


Releasing the CS
3. Site Si, upon exiting the CS, removes its request from the top of
its request queue and sends a timestamped RELEASE message to
all the sites in its request set
4. When a site Sj receives a RELEASE message from site Si, it
removes Si’s request from its request queue

81
(2,1)

(1,2)

 Sites S1 and S2 are making requests for the CS

82
(2,1) (1,2)(2,1)

(1,2)
(1,2)(2,1)

(1,2) (1,2)(2,1)

 S2 enters the CS

83
(2,1) (1,2)(2,1)

(1,2)
(1,2)(2,1)

(1,2) (1,2)(2,1)

 Site S2 exits the CS and sends RELEASE message

84
(2,1) (1,2)(2,1) (2,1)

(1,2)
(1,2)(2,1) (2,1)

(1,2) (1,2)(2,1) (2,1)

 Site S1 enters the CS.


➢ Performance
• Requires 3(N-1) messages per CS invocation:
• (N-1) REQUEST, (N-1) REPLY, and (N-1) RELEASE
messages, T is Synchronization delay(i.e. 1
message propagation time)
85
➢ Optimization
• Can be optimized to require between 3(N-1)
and 2(N-1) messages per CS execution by
suppressing REPLY messages in certain cases
• E.g. suppose site Sj receives a REQUEST
message from site Si after it has sent its own
REQUEST messages with timestamp lower than
the timestamp of site Si’s request
• Site Sj need not send a REPLY message to site
Si .

86

You might also like