0% found this document useful (0 votes)
5 views

CS3551 - Distributed Computing (1)

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

CS3551 - Distributed Computing (1)

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 106

CS3551-DISTRIBUTEDCOMPUTING

UNITI INTRODUCTION

Introduction: Definition-Relation to Computer System Components – Motivation – Message -Passing


SystemsversusSharedMemorySystems–PrimitivesforDistributedCommunication– Synchronous versus
Asynchronous Executions – Design Issues and Challenges; A ModelofDistributed Computations: A
Distributed Program – A Model of Distributed Executions – Models of Communication Networks –
Global State of a Distributed System.

UNITIILOGICALTIMEANDGLOBAL STATE

LogicalTime:PhysicalClockSynchronization:NTP–AFrameworkforaSystemofLogicalClocks – Scalar
Time – Vector Time; Message Ordering and Group Communication: Message Ordering Paradigms –
Asynchronous Execution with Synchronous Communication – Synchronous Program Order on
AsynchronousSystem – Group Communication –CausalOrder–TotalOrder;GlobalState and Snapshot
Recording Algorithms: Introduction – System Model and Definitions – Snapshot Algorithms for FIFO
Channels

UNITIIIDISTRIBUTEDMUTEXAND DEADLOCK

Distributed Mutual exclusion Algorithms: Introduction – Preliminaries – Lamport’s algorithm –


RicartAgrawala’s Algorithm –– Token-Based Algorithms – Suzuki-Kasami’s Broadcast Algorithm;
Deadlock Detection inDistributed Systems: Introduction– System Model– Preliminaries– Modelsof
Deadlocks – Chandy-Misra-Haas Algorithm for the AND model and OR Model.

UNITIVCONSENSUSANDRECOVERY

Consensus and Agreement Algorithms: Problem Definition – Overview of Results – Agreement in a


Failure-Free System(Synchronous and Asynchronous) – Agreement in Synchronous Systems with
Failures; Checkpointing and Rollback Recovery: Introduction – Background and Definitions–Issues
inFailureRecovery–Checkpoint-basedRecovery–CoordinatedCheckpointingAlgorithm–
–AlgorithmforAsynchronousCheckpointing andRecovery

UNITVCLOUDCOMPUTING

Definition of Cloud Computing – Characteristics of Cloud – Cloud Deployment Models – Cloud


Service Models – Driving Factors and Challenges of Cloud – Virtualization – Load Balancing –
Scalability and Elasticity – Replication – Monitoring – Cloud Services and Platforms: Compute
Services – Storage Services – Application Services
UNIT I
INTRODUCTION

The process of computation was started from working on a single processor. This uni-
processor computing can be termed as centralized computing.
Adistributedsystemisacollectionofindependentcomputers,interconnectedviaa network,
capable of collaborating on a task. Distributed computing is computing performed in a
distributed system.

A distributed system is a collection of independent entities that cooperate to solve a problem


that cannot be individually solved. Distributed computing is widely used due to
advancements in machines; faster and cheaper networks. In distributed systems, the entire
network will be viewed as a computer. The multiple systems connected to the network will
appear as a single system to the user.
FeaturesofDistributed Systems:
No common physical clock- It introducesthe element of “distribution” in the system
andgives rise to the inherent asynchrony amongst the processors.
No shared memory - A key feature that requires message-passing for communication. This
feature implies the absence of the common physical clock.
Geographicalseparation–Thegeographicallywiderapartthattheprocessorsare,the more
representative is the system of a distributed system.
Autonomy and heterogeneity – Here the processors are “loosely coupled” in that they have
different speeds and each can be running a different operating system.

Issuesindistributedsystems
Heterogeneity
OpennessSecurity
ScalabilityFailure
handling
Concurrency
Transparency
Qualityofservice

RelationtoComputerSystemComponents

Fig1.1:Example ofaDistributed System


As shown in Fig 1.1, Each computer has a memory-processing unit and the computers are
connected by a communication network. Each system connected to the distributed networks
hosts distributed software which is a middleware technology. This drives the Distributed
System (DS) at the same time preserves the heterogeneity of the DS. The term computation
or run in a distributed system is the execution of processes to achieve a common goal.

Fig1.2:Interaction oflayersofnetwork

The interaction of the layers of the network with the operating system and
middleware is shown in Fig 1.2. The middleware contains important library functions for
facilitating the operations of DS.
The distributed system uses a layered architecture to break down the complexity of system
design. The middleware is the distributed software that drives the distributed system, while
providing transparency of heterogeneity at the platform level

Examples of middleware: Object Management Group’s (OMG), Common Object Request


Broker Architecture (CORBA) [36], Remote Procedure Call (RPC), Message Passing
Interface (MPI)

Motivation

Thefollowing arethe key pointsthat actsasadrivingforcebehind DS:

Inherently distributed computations: DS can process the computations at


geographicallyremote locations.
Resourcesharing:Thehardware,databases,speciallibrariescanbesharedbetween systems
without owning a dedicated copy or a replica. This is cost effective and reliable.
Accesstogeographicallyremotedataandresources:Resourcessuchascentralizedservers can
also be accessed from distant locations.
Enhanced reliability: DS provides enhanced reliability, since they run on multiple copies of
resources.
Thetermreliabilitycomprisesof:
1. Availability: The resource/ service provided by the resource should be accessible
atall times
2. Integrity:thevalue/state oftheresourceshouldbecorrectandconsistent.
3. Fault-Tolerance:Abilitytorecoverfromsystem failures
Increased performance/cost ratio: The resource sharing and remote access features of DS
naturally increase the performance / cost ratio.
Scalable: The number of systems operating in a distributed environment can be increased as
the demand increases.

MESSAGE-PASSINGSYSTEMSVERSUSSHAREDMEMORYSYSTEMS
Communicationamongprocessorstakesplaceviashareddatavariables,and
controlvariablesforsynchronizationamongtheprocessors.Thecommunicationsbetween the
tasks in multiprocessor systems take place through two main modes:

Messagepassingsystems:
 Thisallowsmultipleprocessestoreadandwritedatatothemessagequeue without
being connected to each other.
 Messagesarestoredonthequeueuntiltheirrecipientretrieves them.

Sharedmemorysystems:
 The shared memory is the memory that can be simultaneously accessed by
multiple processes. Thisis done sothat the processes can communicate with each
other.
 Communication among processors takes place through shared data variables, and
control variables for synchronization among the processors.
 Semaphores and monitors are common synchronization mechanisms on shared
memory systems.
 When shared memory model is implemented in a distributed environment, it is
termed as distributed shared memory.

Emulatingmessage-passingonasharedmemorysystem(MP→SM)
 The shared memory system can be made to act as message passing system. The
shared address space can be partitioned into disjoint parts, one partbeing assigned
to each processor.
 Send and receive operations careimplementedby writing to and reading from the
destination/sender processor’s address space. The read and write operations are
synchronized.
 Specifically, a separate location can be reserved as the mailbox for each ordered
pair of processes.

Emulatingsharedmemoryonamessage-passingsystem(SM→MP)
 This is also implemented through read and write
operations.Eachsharedlocationcanbemodeledasaseparateprocess.Writetoasharedlo
cationis
emulated by sending an update message to the corresponding owner process and
read operation to a shared location is emulated bysending a query message to the
owner process.
 This emulation is expensive as the processes has to gain access to other process
memory location.The latencies involved in read and write operations may be high
even when using shared memory emulation because the read and write operations
are implemented by using network-wide communication.

PRIMITIVESFOR DISTRIBUTEDCOMMUNICATION

Blocking/Nonblocking/Synchronous/ Asynchronous
 Message send and message receive communication primitives are donethrough
Send() and Receive(), respectively.
 A Send primitive has two parameters:the destination, and the buffer in the user
space that holds the data to be sent.
 TheReceiveprimitivealso has twoparameters: thesourcefrom whichthe datais to
be received and the user buffer into which the data is to be received.
Therearetwo waysofsendingdatawhen theSend primitiveiscalled:

 Buffered: The standard option copies the data from the user buffer to the kernel
buffer.Thedatalatergets copiedfromthekernelbufferontothenetwork.Forthe
Receive primitive, the buffered option is usually required because the data may
already have arrived when the primitive is invoked, and needs a storage place in
the kernel.
 Unbuffered:Thedatagetscopieddirectlyfromtheuserbufferontothenetwork.

Blockingprimitives
 The primitive commands wait for the message to be delivered. The execution of
the processes is blocked.
 The sendingprocessmustwaitafterasend untilanacknowledgementismade bythe
receiver.
 Thereceivingprocessmustwaitfortheexpectedmessage fromthesending
process
 Aprimitiveisblockingifcontrolreturnstotheinvokingprocessafterthe processing for
the primitive completes.
NonBlockingprimitives
 Ifsendisnonblocking,itreturnscontroltothecallerimmediately,beforethe message is
sent.
 The advantage of this scheme is that the sendingprocess can continue computing
in parallel with the message transmission, instead of having the CPU go idle.
 Thisisaformofasynchronous communication.
 Aprimitiveisnon-blockingifcontrolreturnsbacktotheinvokingprocessimmediately
after invocation, even though the operation has not completed.
 Foranon-blockingSend,controlreturns totheprocessevenbeforethedata iscopied out
of the user buffer.

Foranon-blockingReceive,controlreturnstotheprocessevenbeforethedatamayhave arrived from the


sender.
Synchronous
 A Sendor a Receive primitiveis synchronousif boththe Send() andReceive()
handshake with each other.
 TheprocessingfortheSendprimitivecompletesonlyaftertheinvoking processor
learns
 TheprocessingfortheReceiveprimitivecompleteswhenthedatatobe received is
copied into the receiver’s user buffer.
Asynchronous
 A Send primitive is said to be asynchronous, if control returns back tothe invoking
process after the data item to be sent has been copied out of the user- specified
buffer.
 For non-blocking primitives, a return parameter on the primitive call returns a
system-generated handle which can be later used to check the statusofcompletion
of the call.
 Theprocesscancheckforthecompletion:
o checkingifthehandlehasbeenflaggedor posted
o issue a Wait with a list of handles as parameters: usually blocks untilone of
the parameter handles is posted.
Thesendandreceive primitivescan beimplementedinfourmodes:
 Blockingsynchronous
 Non-blocking synchronous
 Blockingasynchronous
 Non-blocking asynchronous

Fourmodesofsendoperation
Blocking synchronous Send:
 The datagetscopiedfromtheuserbuffer tothe kernelbuffer andisthensentover the
network.
 After the data is copied to the receiver’s system buffer and a Receive call has been
issued, an acknowledgement back to the sender causes control to return to theprocess
that invoked the Send operation and completes the Send.
Non-blockingsynchronousSend:
 Control returns back to the invoking process as soon as the copy of data from the user
buffer to the kernel buffer is initiated.
 A parameter in the non-blocking call alsogets set with the handle of a location thatthe
user process can later check for the completion of the synchronoussend operation.
 Thelocation getspostedafteranacknowledgementreturnsfromthe receiver.
 The user process can keep checking for the completion of the non-blocking
synchronous Send by testing the returned handle, or it can invoke the blocking Wait
operation on the returned handle
BlockingasynchronousSend:
 The user process that invokes the Send is blocked until the data is copied from the
user’s buffer to the kernel buffer.
Non-blockingasynchronousSend:
 The user process that invokes the Send is blocked until the transfer of the data fromthe
user’s buffer to the kernel buffer is initiated.
 Control returns to the user process as soon as this transfer is initiated, and a parameter
in the non-blocking call also gets set with the handle of a location that the user
process can check later using the Wait operation for the completion of the
asynchronous Send.
The asynchronous Send completes when the data has been copied out of the user’s
buffer.Thechecking for thecompletion maybenecessaryiftheuserwants to reusethe
buffer from which the data was sent.
Modes ofreceiveoperation Blocking
Receive:
The Receive call blocks until the data expected arrives and is written in the specified
user buffer. Then control is returned to the user process.
Non-blockingReceive:
 The Receive call will cause the kernel to register the call and return the handle
of a location that the user process can later check for the completion of the
non-blocking Receive operation.
 This location gets posted by the kernel after the expected data arrives and is
copied to the user-specified buffer. The user process can check for then
completion ofthenon-blocking Receiveby invoking theWait operation on the
returned handle.

ProcessorSynchrony

Processorsynchronyindicatesthatalltheprocessorsexecuteinlock-
stepwiththeirclockssynchronized.

To ensure that no processor begins executing the next step of code until all the processors
havecompletedexecutingthepreviousstepsofcode assigned to each of the processors.

Librariesandstandards
There exists a wide range of primitives for message-passing. The message-passing interface
(MPI) library and the PVM (parallel virtual machine) library are used largely bythe scientific
community
 Message Passing Interface (MPI): This is a standardized and portable message-
passing system to function on a wide variety of parallel computers. MPI primarily
addresses the message-passing parallel programming model: data is moved from the
address space of one process to that of another process through cooperative
operations on each process.
 Parallel Virtual Machine (PVM): It is a software tool for parallel networking of
computers. It is designed to allow a network of heterogeneous Unix and/or Windows
machines to be used as a single distributed parallel processor.
 Remote Procedure Call (RPC): The Remote Procedure Call (RPC) is a common
model of request reply protocol. In RPC, the procedure need not exist in the same
address space as the calling procedure.
 Remote Method Invocation (RMI): RMI (Remote Method Invocation) is a way that
a programmer can write object-oriented programming in which objects on different
computers can interact in a distributed network.
 Remote Procedure Call (RPC): RPC is a powerful technique for constructing
distributed, client-server based applications. In RPC, the procedure need not exist in
the same address space as the calling procedure. The two processes may be on the
same system,ortheymaybe ondifferentsystemswitha networkconnectingthem.

 Common Object Request Broker Architecture (CORBA): CORBA describes a


messaging mechanism by which objects distributed over a network can communicate with
each other irrespective of the platform and language used to develop those objects.
SYNCHRONOUSVSASYNCHRONOUSEXECUTIONS
Theexecutionofprocessindistributedsystemsmaybesynchronousor asynchronous.

AsynchronousExecution:
A communication among processes is considered asynchronous, when every
communicating process can haveadifferent observation oftheorderofthe messages being
exchanged. In an asynchronous execution:
 thereisno processorsynchronyandthereisno boundonthedriftrateofprocessor clocks
 messagedelaysarefinitebut unbounded
 noupperbound onthetimetaken byaprocess
Fig: Asynchronous execution in message passing

system Synchronous Execution:


Acommunicationamongprocessesisconsideredsynchronouswheneveryprocess observes
the same order of messages within the system. In an synchronous execution:
 processorsaresynchronizedandtheclockdriftrate betweenanytwoprocessorsis
bounded
 messagedeliverytimesaresuch thatthey occurinonelogical steporround
 upper bound on the time taken by a process to execute
a step.

Emulatinganasynchronoussystembyasynchronoussystem(A→S)
An asynchronous program can be emulated on a synchronous system fairly trivially as the
synchronous system is a special case of an asynchronous system – all communication
finishes within the same round in which it is initiated.

Emulatingasynchronoussystembyanasynchronoussystem(S→A)
Asynchronousprogram canbeemulatedonanasynchronoussystemusingatoolcalled
synchronizer.

Emulationforafaultfreesystem
Fig1.15:Emulationsina failurefreemessagepassing system
Ifsystem A can be emulated by system B, denotedA/B, and if a problem isnot solvable in
B, then it is also not solvable in A. If a problem is solvable in A, it is also solvable in B.
Hence, in a sense, all four classes are equivalent in terms of computability in failure-free
systems.

DESIGNISSUESANDCHALLENGESINDISTRIBUTEDSYSTEMS
Thedesignofdistributedsystemshasnumerouschallenges.Theycanbecategorized
into:
 Issuesrelatedtosystemandoperatingsystemsdesign
 Issuesrelatedtoalgorithmdesign
 Issues arising due to emerging technologies
Theabovethree classes arenotmutuallyexclusive.

Issuesrelated tosystemandoperatingsystemsdesign
Thefollowingaresomeofthecommonchallengestobeaddressedindesigninga distributed
system from system perspective:
 Communication: This task involves designing suitable communication mechanisms
among the various processes in the networks.
Examples:RPC, RMI

 Processes:The main challenges involved are: process and thread management at both
client and server environments, migration of code between systems, design of software and
mobile agents.
 Naming: Devising easy to use and robust schemes for names, identifiers, and
addresses is essential for locating resources and processes in a transparent and scalable
manner. The remote and highly varied geographical locations make this task difficult.
 Synchronization: Mutual exclusion, leader election, deploying physicalclocks, global
state recording are some synchronization mechanisms.
 Data storage and access Schemes: Designing file systems for easy and efficient data
storage with implicit accessing mechanism is very much essential for distributed operation
 Consistency and replication: The notion of Distributed systems goes hand in hand
with replication of data, to provide high degree of scalability. The replicas should be handed
with care since data consistency is prime issue.

 Fault tolerance: This requires maintenance of fail proof links, nodes, and processes.
Some of the common fault tolerant techniques are resilience, reliable communication,
distributed commit, checkpointing and recovery, agreement and consensus, failuredetection,
and self-stabilization.
 Security: Cryptography, secure channels, access control, key management –
generation and distribution, authorization, and secure group management are some of the
security measure that is imposed on distributed systems.
 Applications Programming Interface (API) and transparency: The user
friendliness and ease of use is very important to make the distributed services to be used by
wide community. Transparency, which is hiding inner implementation policy from users, is
of the following types:

 Accesstransparency:hidesdifferencesindatarepresentation
 Location transparency: hides differences in locations y providing uniform access to
data located at remote locations.
 Migrationtransparency:allowsrelocatingresources withoutchangingnames.
 Replication transparency: Makes the user unaware whether he is workingon original
or replicated data.
 Concurrency transparency: Masks the concurrent use of shared resources for the
user.
 Failuretransparency:systembeingreliableandfault-tolerant.
 Scalability and modularity: The algorithms, data and services must be as distributed
as possible. Various techniques such as replication, caching and cache management, and
asynchronous processing help to achieve scalability.
Algorithmicchallengesindistributedcomputing
 Designingusefulexecutionmodelsandframeworks
Theinterleaving model, partial ordermodel, input/output automatamodel and theTemporal
Logic of Actions (TLA) are some examples of models that provide different degrees of
infrastructure.
 Dynamicdistributedgraphalgorithmsanddistributedrouting algorithms
 Thedistributedsystem isgenerallymodeledasadistributedgraph.
 Hence graph algorithms arethe base for largenumber of higher
level communication,data dissemination, object location, and object search
functions.
 Thesealgorithmsmusthavethecapacitytodealwithhighlydynamicgraph
characteristics. They are expected to function like routing algorithms.
 Theperformanceofthesealgorithmshasdirectimpactonuser-perceived latency,data
traffic and load in the network.
 Timeandglobalstateinadistributed system

 The geographically remoteresourcesdemandsthe synchronizationbasedonlogical time.


 Logical time is relativeand eliminatesthe overheads of providingphysicaltime for
applications. Logical time can
(i) Capturethelogic andinter-process dependencies
(ii) tracktherelativeprogressateachprocess
 Maintainingtheglobalstateofthesystemacrossspaceinvolvestheroleoftime dimension for
consistency. This can be done with extra effort in a coordinated manner.
 Derivingappropriatemeasuresof concurrency alsoinvolvesthe time dimension, as
theexecution and communication speed of threads may vary a lot.
 Synchronization/coordinationmechanisms
 Synchronizationisessentialforthedistributedprocessestofacilitateconcurrent
execution without affecting other processes.

 The synchronization mechanisms also involve resource management andconcurrency


management mechanisms.
 Sometechniquesforprovidingsynchronizationare:
 Physical clock synchronization: Physical clocks usually diverge in their values due
tohardwarelimitations.Keepingthemsynchronizedisafundamentalchallengetomaintain
common time.
 Leader election: All the processes need to agree on which process will play the
roleof a distinguished process or a leader process. A leader is necessary even for many
distributed algorithms because there is often some asymmetry.
 Mutualexclusion:Accesstothecritical resource(s)hastobecoordinated.

 Deadlock detection and resolution:This is done to avoid duplicate work, and


deadlock resolution should be coordinated to avoid unnecessary aborts of processes.
 Terminationdetection:cooperationamongtheprocessestodetectthespecificglobal state
of quiescence.
 Garbagecollection:Detectinggarbagerequirescoordination amongtheprocesses.
 Groupcommunication,multicast,andorderedmessagedelivery
 A group is a collection of processes that share a common context and collaborate on a
common task within an application domain. Group management protocols are needed for
group communication wherein processes can join and leave groups dynamically, or fail.
 Monitoringdistributedeventsand predicates
 Predicates defined on program variables that are local to different processes are used
for specifying conditions on the global system state.
 On-linealgorithmsformonitoringsuchpredicatesarehenceimportant.
 Thespecificationofsuchpredicatesusesphysicalorlogicaltimerelationships.
 Distributedprogramdesignandverificationtools
Methodicallydesignedandverifiablycorrectprogramscangreatlyreduce the overheadof
software design, debugging, and engineering. Designing these is a big challenge.
 Debuggingdistributedprograms
Debuggingdistributedprogramsismuchharderbecauseoftheconcurrencyandreplications.
Adequate debugging mechanisms and tools are need of the hour.
 Datareplication,consistencymodels,andcaching
 Fastaccess todataandotherresourcesisimportantindistributed systems.
Managingreplicas andtheirupdatesfacesconcurrencyproblems.
 Placementofthereplicasinthesystemsisalsoachallengebecauseresources usuallycannot
be freely replicated.
 WorldWideWebdesign –caching,searching,scheduling
 WWWisacommonlyknowndistributed system.
 Theissuesofobject replicationand caching,prefetchingofobjectshaveto bedoneon
WWW also.
 Objectsearchandnavigationonthewebareimportantfunctionsintheoperationof the
web.
 Distributedsharedmemoryabstraction
 Asharedmemory iseasiertoimplementsinceitdoesnotinvolvemanagingthe
communication tasks.
 Thecommunicationis donebythemiddlewarebymessagepassing.
 Theoverheadofsharedmemoryisto bedealtbythemiddlewaretechnology.
 Someofthemethodologiesthatdoesthetaskofcommunicationinsharedmemory
distributed systems are:
 Wait-freealgorithms:Theabilityofaprocesstocompleteitsexecutionirrespective of the
actions of other processes is wait free algorithm. They control the access to shared
resources in the shared memory abstraction. They are expensive.
 Mutual exclusion: Concurrent access of processes to a shared resource or data is
executed in mutually exclusive manner. Only one process is allowed to execute the critical
section at any given time. In a distributed system,shared variables or a local kernel cannot
beusedtoimplementmutualexclusion.Message passingisthesolemeansforimplementing
distributed mutual exclusion.

 Registerconstructions:Architecturesmustbedesignedinsuchawaythat,
registersallows concurrent access without any restrictions on the concurrency permitted.
 Reliableandfault-tolerantdistributedsystems
Thefollowingaresomeofthefaulttolerant strategies:
 Consensus algorithms: Consensus algorithms allow correctly functioning processes
to reach agreement among themselves in spite of the existence of malicious processes. The
goal of the malicious processes is to prevent the correctly functioning processes from
reaching agreement. The malicious processes operate by sending messages with misleading
information, to confuse the correctly functioning processes.
 Replication and replica management: The Triple Modular Redundancy (TMR)
technique is used in software and hardware implementation. TMR is a fault-tolerant form of
N-modular redundancy, in which three systems perform a process and that resultisprocessed
by a majority-voting system to produce a single output.
 Voting and quorum systems: Providing redundancy in the active or passive
components in the system and then performing voting based on some quorum criterion is a
classical way of dealing with fault-tolerance. Designing efficient algorithms for this
purposeis the challenge.
 Distributed databases and distributed commit: The distributed databases should
also follow atomicity, consistency, isolation and durability (ACID) properties.
 Self-stabilizing systems: A self-stabilizing algorithmguarantee to take the system to a
good state even if a bad state were to arise due to some error. Self-stabilizing algorithms
requiresome in-built redundancyto track additional variables of the state and do extra work.
 Checkpointing and recovery algorithms: Checkpointing isperiodically recording
the current state on secondary storage so that, in case of a failure. The entire computation is
notlostbutcanberecoveredfromone ofthe recently takencheckpoints.Checkpointingin
distributed environment is difficult because if the checkpoints at the different processes arenot
coordinated, the local checkpoints may become useless because theyare inconsistent withthe
checkpoints at other processes.
 Failure detectors: The asynchronous distributed do not have a bound on the message
transmission time. This makes the message passing very difficult, since the receiver do not
know the waiting time. Failure detectors probabilistically suspect another process as having
failed and then converge on a determination of the up/down status of the suspected process.
 Load balancing
The objective of load balancing is to gain higher throughput, and reduce the userperceived
latency. Load balancing may be necessary because of a variety off actors suchashigh
network traffic or high request rate causing the network connection to bea bottleneck, or
high computational load. The following are some forms of load balancing:
 Datamigration:Theabilitytomovedataaround inthesystem,basedon theaccess
pattern of the users
 Computation migration: The ability to relocate processes in order to perform
are distribution of the workload.
 Distributedscheduling:Thisachievesabetterturnaroundtimefortheusersby using idle
processing power in the system more efficiently.
 Real-timescheduling
Real-time scheduling becomes more challenging when a global view of the system state is
absent with more frequent on-line or dynamic changes. The message propagation delays
which are network-dependent are hard to control or predict. This is an hindrance to meet the
QoS requirements of the network.

 Performance
Userperceived latencyin distributedsystemsmustbereduced.The commonissuesin
performance:
 Metrics: Appropriate metrics must be defined for measuring the performance of
theoretical distributed algorithms and its implementation.
 Measurement methods/tools: The distributed system isacomplexentityappropriate
methodology and tools must be developed for measuring the performance metrics.
Applicationsofdistributedcomputingandnewerchallenges
The deployment environment of distributed systems ranges from mobile systems to
cloud storage. All the environments have their own challenges:
 Mobilesystems
o Mobile systems which use wireless communication in shared broadcast
medium have issues related to physical layer such as transmission range,
power, battery power consumption, interfacing with wired internet, signal
processing and interference.
o The issues pertaining to other higher layers include routing, location
management, channel allocation, localization and position estimation, and
mobility management.
o Apart from the above mentioned common challenges, the architectural
differences of the mobile network demands varied treatment. The two
architectures are:
 Base-station approach (cellular approach): The geographical region is divided into
hexagonal physical locations called cells. The powerful base station transmits signals to all
other nodes in its range

 Ad-hoc network approach: This is an infrastructure-less approach which do not


haveany base station to transmit signals. Instead all the responsibilityis distributed among
the mobile nodes.
 It is evident that both the approaches work in different environment with different
principles of communication. Designing a distributed system to cater the varied need is a
great challenge.

 Sensornetworks
o A sensor is a processor with an electro-mechanical interface that is capable of
sensing physical parameters.
o They are low cost equipment with limited computational power and battery
life. They are designed to handle streaming data and route it to external
computer network and processes.
o Theyaresusceptibletofaults andhavetoreconfigurethemselves.
o These features introduces a whole new set of challenges, such as position
estimation and time estimation when designing a distributed system .
 Ubiquitousorpervasivecomputing
o In Ubiquitous systems the processors are embedded in the environment to
perform application functions in the background.
o Examples:Intelligentdevices,smarthomesetc.
o They are distributed systems with recent advancements operating in wireless
environments through actuator mechanisms.
o Theycanbeself-organizingandnetwork-centricwith limitedresources.
 Peer-to-peercomputing
o Peer-to-peer(P2P)computingiscomputingoveranapplicationlayer networkwhere
all interactions among the processors are at a same level.
o Thisisaformofsymmetriccomputationagainsttheclientseverparadigm.
o Theyareself-organizingwith orwithoutregularstructuretothenetwork.
Someofthekey challengesinclude:objectstorage mechanisms, efficientobjectlookup,andretrieval ina
scalablemanner; dynamicreconfiguration with nodes as well as objects joining and leaving the network
randomly;replication strategies to expedite object search; tradeoffs between object size latencyand table
sizes; anonymity, privacy, and security
 Publish-subscribe,contentdistribution,andmultimedia
o Theusersin presentdayrequireonlytheinformation of interest.
o Inadynamicenvironmentwhere theinformation constantly fluctuatesthere
isgreat demand for
o Publish:anefficientmechanismfordistributingthisinformation
o Subscribe: an efficientmechanismto allow endusers to indicateinterestin
receiving specific kinds of information
o An efficient mechanism for aggregatinglarge volumes of
published information and filtering it as per the user’s subscription filter.
o Contentdistributionreferstoamechanismthatcategorizestheinformation based on
parameters.
o Thepublishsubscribeandcontentdistributionoverlapeachother.
o Multimediadataintroducesspecialissuebecause ofitslargesize.
 Distributedagents
o Agentsaresoftwareprocessesorsometimesrobotsthatmovearoundthe system to
do specific tasks for which they are programmed.
o Agents collect and process information and can exchange such
informationwith other agents.

o Challenges in distributed agent systems include coordination mechanisms


among the agents, controlling the mobility of the agents, their software design
and interfaces.
 Distributeddatamining
o Data mining algorithms process large amount of data to detect patterns and
trends in the data, to mine or extract useful information.
o The mining can be done by applying database and artificial intelligence
techniques to a data repository.
 Gridcomputing
 Grid computing is deployed to manage resources. For instance, idle CPU
cycles of machines connected to the network will be available to others.
 Thechallenges includes: scheduling jobs, frameworkforimplementing quality
of service, real-time guarantees, security.
 Securityindistributedsystems
The challenges of security in a distributed setting include: confidentiality,
authentication and availability. This can be addressed using efficient and scalable solutions.

AMODELOFDISTRIBUTEDCOMPUTATIONS:DISTRIBUTEDPROGRAM
 A distributed program is composed of a set of asynchronous processes that
communicate by message passing over the communication network. Each process
may run on different processor.
 The processes do not share a global memory and communicate solely by passing
messages. These processes do not share a global clock that is instantaneously
accessible to these processes.
 Process execution and message transfer are asynchronous – a process may execute an
action spontaneously and a process sending a message does not wait for the delivery
of the message to be complete.
 The global state of a distributed computation is composed of the states ofthe processes
and the communication channels. The state of a process is characterized by the state
of its local memory and depends upon the context.
 Thestateofachannel ischaracterizedbythesetofmessagesintransitinthechannel.
AMODELOFDISTRIBUTED EXECUTIONS

 Theexecutionof aprocess consistsofasequentialexecutionofits actions.


 Theactionsareatomicandtheactionsofaprocessaremodeledasthreetypesof events:
internal events, message send events, and message receive events.
 Aninternaleventchangesthestateoftheprocessatwhichit occurs.
 A send event changes the state of the process that sends the message and the state of
the channel on which the message is sent.
 The execution of process pi produces a sequence of events e1, e2, e3, …, and it is
denoted by Hi: Hi =(hi→i). Here hiare states produced by pi and →are the casual
dependencies among events pi.
 →msgindicatesthedependencythatexistsduetomessagepassingbetweentwo events.

FigSpacetimedistributionofdistributed systems

 An internal event changes the state of the process at which it occurs. A send event
changes the state of the process that sends the message and the state of the channel
onwhich the message is sent.
 A receive event changes the state of the process that receives the message and the
stateof the channel on which the message is received.
CasualPrecedenceRelations
Causal message ordering is a partial ordering of messages in a distributed computing
environment. It is the delivery of messages to a process in the order in which they were
transmitted to that process.

It places a restriction on communication between processes by requiring that if the


transmissionofmessagemitoprocesspknecessarilyprecededthetransmissionofmessage mj to
thesameprocess, then thedeliveryofthesemessages to thatprocess mustbeordered such that
mi is delivered before mj.
HappenBeforeRelation
The partial ordering obtained by generalizing the relationship between two process is called
as happened-before relation or causal ordering or potential causal ordering.This term
wascoinedbyLamport.Happens-beforedefinesapartialorderofeventsinadistributed
system. Some events can’t beplaced in the order. Ifsay A→B ifA happens beforeB. A B is
defined using the following rules:
 Localordering:AandBoccuronsameprocessandAoccursbeforeB.
 Messages:send(m)→ receive(m)foranymessagem
 Transitivity:e→e’’ife→ e’ ande’→ e’’
 Orderingcanbebasedontwo situations:
1. Iftwoevents occurinsameprocess thentheyoccurredintheorderobserved.
2. Duringmessagepassing,theeventofsendingmessageoccurredbeforetheeventof receiving it.

Lamportsorderingishappenbeforerelation denoted by→


 a→b,ifaandb areeventsin thesame processandaoccurredbeforeb.
 a→b,ifaistheventofsendingamessageminaprocessandbistheeventofthe same message
m being received by another process.
 Ifa→bandb→c,thena→c. Lamports lawfollowtransitivityproperty.

When all the above conditions are satisfied, then it can be concluded that a→b is casually
related. Consider two events c and d; c→d and d→c is false (i.e) they are notcasually
related, then c and d are said to be concurrent events denoted as c||d.

FigCommunicationbetweenprocesses
Fig 1.22 shows the communication of messages m1 and m2 between three processes p1, p2
and p3. a, b, c, d, e and f are events. It can be inferred from the diagram that, a→b; c→d;
e→f; b->c; d→f; a→d; a→f; b→d; b→f. Also a||e and c||e.

Logicalvsphysical concurrency
Physicalaswellaslogicalconcurrencyistwoeventsthatcreatesconfusionin distributed
systems.
Physicalconcurrency:Severalprogramunitsfromthesameprogramthatexecute simultaneously.
Logicalconcurrency:Multipleprocessorsprovidingactualconcurrency.Theactual execution of
programs is taking place in interleaved fashion on a single processor.

Differencesbetweenlogicalandphysicalconcurrency
Logicalconcurrency Physical concurrency
Severalunitsofthesameprogramexecute Several program units of the same program
simultaneouslyonsameprocessor,givingan execute at the same time on different
illusion to the programmer that they are processors.
executing on multiple processors.
Theyareimplementedthroughinterleaving. They are implementedasuni-processorwith
I/O
channels,multipleCPUs,networkofunior multi
CPU machines.
MODELSOFCOMMUNICATIONNETWORK
Thethreemaintypes ofcommunication models indistributed systems are:
FIFO(first-in,first-out):eachchannelactsasaFIFOmessagequeue.
Non-FIFO (N-FIFO): a channel acts like a set in which a sender process adds messages and
receiver removes messages in random order.
CausalOrdering(CO):ItfollowsLamport’slaw.
o Therelationbetween thethreemodels isgivenbyCO FIFO N-FIFO.

Asystemthat supportsthecausalorderingmodelsatisfies thefollowingproperty:

GLOBALSTATE

DistributedSnapshotrepresentsastateinwhichthedistributedsystemmighthavebeenin.Asnapshot of the
systemis a single configuration of the system.

• Theglobalstateofadistributedsystemisacollectionofthelocalstatesofitscomponents,namely, the
processes
andthecommunicationchannels.•Thestateofaprocessatany timeisdefinedbythecontentsof processor
registers, stacks, local memory, etc. and depends on the local context of the distributed
application.
• Thestateof achannel isgiven by thesetof messages intransit in thechannel.
UNITII

LOGICALTIME&GLOBALSTATE

Logicalclocksarebasedoncapturingchronologicalandcausalrelationshipsofprocessesand
ordering events based on these relationships.
Threetypesof logicalclockaremaintainedindistributed systems:
 Scalarclock
 Vectorclock
 Matrixclock

In a system of logical clocks, every process has a logical clock that isadvanced using a setof
rules. Every event is assigned a timestamp and the causality relation between events can be
generally inferred from their timestamps.
The timestamps assigned to eventsobey the fundamental monotonicity property; that is, ifan
event a causally affects an event b, then the timestamp of a is smaller than the timestamp of
b.
AFrameworkfora systemoflogical clocks

AsystemoflogicalclocksconsistsofatimedomainTandalogicalclockC.ElementsofTforma partially
ordered set over a relation <. This relation is usually called the happened before or causal
Thelogical clockCisafunctionthatmapsan precedence.
eventein a distributedsystemtoan element in the
time domain T denoted as C(e).
such that
for any two events ei and ej,.
Thismonotonicitypropertyiscalledtheclockconsistencycondition.WhenTandCsatisfythe
following condition,

Thenthesystemofclocksisstrongly consistent.

Implementinglogicalclocks
Thetwo majorissues inimplantinglogical clocksare:
Datastructures:representationofeachprocess
Protocols:rulesforupdatingthedatastructurestoensureconsistent conditions.

Datastructures:
Eachprocesspi maintainsdatastructureswiththegivencapabilities:

Alocallogicalclock(lci),thathelpsprocesspimeasureitsownprogress.

Alogicalglobalclock(gci),thatisarepresentationofprocesspi’slocalviewofthe logicalglobal
time. It allows this process to assign consistent timestamps to its local events.
Protocol:
Theprotocolensuresthataprocess’slogicalclock,andthusitsviewoftheglobaltime, ismanaged
consistently with the following rules:
Rule 1: Decides the updates of the logical clock by a process. It controls send, receive and
other operations.
Rule 2: Decides how a process updates its global logical clock to update its view of the
global time and global progress. It dictates what information about the logical time is
piggybacked in a message and how this information is used by the receiving process to
update its view of the global time.

SCALARTIME
Scalar time is designed by Lamport to synchronize all the events in distributed
systems. A Lamport logical clock is an incrementing counter maintained in each process.
When a process receives a message, it resynchronizes its logical clock with that sender
maintaining causal relationship.
TheLamport’salgorithmisgovernedusingthefollowingrules:
 ThealgorithmofLamportTimestampscanbecapturedinafew rules:
 Alltheprocesscountersstartwithvalue0.
 Aprocessincrementsitscounterforeachevent (internalevent,messagesending,
message receiving) in that process.
 When aprocesssendsa message,itincludes its(incremented)countervaluewiththe
message.
 On receiving a message, the counter of the recipient is updated to the greater of its
currentcounterandthetimestampinthereceived message,andthenincrementedby one.

 IfCiisthelocalclock forprocess Pi then,


 ifaand b aretwo successiveevents in Pi, thenCi(b)=Ci(a)+d1,whered1>0
 ifais thesendingof messagem byPi, then mis assigned timestamp tm=Ci(a)
 ifb isthereceipt ofm by Pj,thenCj(b)=max{Cj(b), tm+d2}, whered2 >0

RulesofLamport’sclock
Rule1:Ci(b)=Ci(a)+d1,whered1>0
c)deliverthemessage
Rule2:ThefollowingactionsareimplementedwhenpireceivesamessagemwithtimestampCm:
a) C =max(C ,C )
i i m
b)
ExecuteRule1

Fig1.20:Evolutionofscalar time
Basicpropertiesofscalartime:
1. Consistencyproperty: Scalarclockalwayssatisfiesmonotonicity.Amonotonicclock
only increments its timestamp and never jump. Hence it is consistent.

2. Total Reordering: Scalar clocks order the events in distributed systems. But allthe events
do not follow a common identical timestamp. Hence a tie breaking mechanism is essential
toorder the events. The tie breaking is done through:
 Linearlyorderprocessidentifiers.
 Processwithlowidentifiervaluewillbegivenhigherpriority.

The term (t, i) indicates timestamp of an event, where t is its time of occurrence and i is
theidentity of the process where it occurred.
Thetotalorderrelation()overtwoeventsxandywithtimestamp(h,i)and(k,j)isgivenby:

Atotalorderisgenerallyusedtoensurelivenesspropertiesindistributedalgorithms.

3. EventCounting
If event e has a timestamp h, then h−1 represents the minimum logical duration,
countedin units ofevents, required beforeproducing theevent e. This is called height ofthe
event e. h-1 events have been produced sequentially before the event e regardless of the
processes that produced these events.

4. Nostrongconsistency
Thescalarclocksarenot strongly consistent isthatthelogical local clockand logical
global clock of a process are squashed into one, resulting in the loss causal dependency
information among events at different processes.

VECTORTIME
The ordering from Lamport's clocks is not enough to guarantee that if two events
precedeoneanotherin theorderingrelation theyarealso causally related. VectorClocksuse
avectorcounterinstead ofan integercounter.Thevectorclockof asystem with Nprocesses is a
vector of N counters, one counter per process. Vector counters have to follow the following
update rules:
 Initially,allcountersarezero.
 Each time a process experiences an event, it increments its own counter in the vector
by one.
 Each time a process sends a message, it includes a copy of its own (incremented)
vector in the message.
 Each time a process receives a message, it increments its own counter in the vector by
one and updates each element in its vector by taking the maximum of the value in its
own vector counter and the value in the vector in the received message.

Thetimedomainisrepresentedbyasetofn-dimensionalnon-negativeintegervectorsinvectortime.

RulesofVectorTime
Rule1:Beforeexecutinganevent,processpiupdatesitslocallogical time
asfollows:

Rule2:Eachmessagemispiggybackedwiththevectorclockvtofthesender process at sending time. On the rece


piexecutesthefollowingsequenceofactions:
updateitsgloballogicaltime

executeR1
deliverthemessagem

Fig1.21:Evolutionofvectorscale Basic
properties of vector time
1. Isomorphism:
 “→”inducesa partialorderonthesetof eventsthatareproducedby adistributed
execution.
 Ifevents xandyaretimestampedas vhand vk then,

 Iftheprocessat which aneventoccurred isknown,thetestto comparetwo


timestamps can be simplified as:

2. Strongconsistency
The system of vector clocks is strongly consistent; thus, by examining the vector timestamp
of two events, we can determine if the events are causally related.
3. Eventcounting
Ifaneventehastimestampvh[i],vh[j]denotesthenumberofeventsexecutedbyprocess pjthat
causally precede e.

PHYSICALCLOCKSYNCHRONIZATION:NEWTWORKTIMEPROTOCO
L (NTP)
Centralized systems do not need clocksynchronization, as theywork under acommon
clock.Butthedistributedsystemsdonotfollowcommonclock:eachsystemfunctionsbased on its
own internal clock and its own notion of time. The time in distributed systems is measured
in the following contexts:
 Thetimeofthedayatwhichanevent happenedonaspecific machineinthenetwork.
 Thetimeintervalbetweentwoeventsthathappenedondifferentmachinesinthe network.
 Therelativeorderingofeventsthathappenedondifferentmachinesinthenetwork.

Clocksynchronizationistheprocessofensuringthatphysicallydistributedprocessorshavea
common notion of time.

Due to different clocks rates, the clocks at various sites may diverge with time, and
periodically a clock synchronization must be performed to correct this clock skew in
distributed systems. Clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time). Clocks that must not only be synchronized with each other
but also have to adhere to physical time are termed physical clocks. This degree of
synchronization additionally enables to coordinate and schedule actions between multiple
computers connected to a common network.

Basicterminologies:
IfCaandCbaretwodifferentclocks,then:
 Time: The time of a clock in a machine p is givenby the function C p(t),where Cp(t)=
tfor a perfect clock.
 Frequency: Frequency istherateatwhichaclockprogresses.Thefrequencyattimet of
clock Ca is Ca’(t).
 Offset: Clock offset is the difference between the time reported by a clockand the real
time. The offset of the clock Ca is given by Ca(t)− t. The offset of clock C a relative
toCbat time t ≥ 0 is given by Ca(t)- Cb(t)
 Skew: The skew of a clock is the difference in the frequencies of the clock and

theperfect clock. The skew of a clock C a relative to clock Cb at timet is Ca (t)-
Cb’(t).
 Drift (rate): The drift of clock Ca the second derivative of the clockvalue with
respectto time. The drift is calculated as:
ClockingInaccuracies
Physical clocks are synchronized to an accurate real-time standard like UTC
(UniversalCoordinatedTime).Duetotheclockinaccuracydiscussedabove,atimer(clock) is said
to be working within its specification if:

1. Offsetdelayestimation
A time service for the Internet - synchronizes clients to UTC Reliability from
redundant paths, scalable, authenticates time sources Architecture. The design of NTP
involves a hierarchical tree of time servers with primary server at the root synchronizes with
the UTC. The next level contains secondary servers, which act as a backup to the primary
server. At the lowest level is the synchronization subnet which has the clients.

2. Clockoffsetanddelayestimation
A source node cannot accurately estimate the local time on thetarget node due to varying
message or network delays between the nodes. This protocol employs a very
commonpracticeofperformingseveraltrialsandchoosesthetrialwiththeminimum

delay.
Fig:Behavior ofclocks

Figa)Offsetanddelayestimation Figb)Offsetand delayestimation


betweenprocessesfromsameserver betweenprocessesfromdifferentservers

Let T1, T2, T3, T4 be the values of the four most recent timestamps. The clocks A and B
arestable and running at the same speed. Let a = T1 − T3 and b = T2 − T4. If the network
delay difference from A to B and from B to A, called differential delay, is
small,theclockoffsetandroundtripdelayofBrelativetoAattimeT4are approximatelygiven by the following:

Each NTP message includes the latest three timestamps T1, T2, andT3,
whileT4 isdetermined upon arrival.

MESSAGEORDERINGANDGROUPCOMMUNICATION
As the distributed systems are a network of systems at various physical locations, the
coordination between them should always be preserved. The message ordering means the
order of delivering the messages to the intended recipients. The common message order
schemes are First in First out (FIFO), non FIFO, causal order and synchronous order. In case
of group communication with multicasting, the causal and total ordering scheme is followed.
It is also essential to define the behaviour of the system in case of failures.The following are
the notations that are widely used in this chapter:
 Distributedsystemsaredenotedbyagraph(N,L).
 Theset of eventsarerepresented byevent set{E, }
 Messageisdenotedasmi:sendandreceiveeventsassiand rirespectively.
 Send(M)and receive(M)indicates themessage M sendand received.
 abdenotes aand b occursat thesame process
 Thesendreceivepairs ={(s,r) EixEjcorrespondsto r}
MESSAGEORDERINGPARADIGMS
Themessageorderings are

(i) non-FIFO
(ii) FIFO
(iii) causalorder
(iv) synchronousorder

Thereisalwaysatrade-offbetween concurrencyandeaseofuseand implementation.

AsynchronousExecutions
≺)forwhichthecausalityrelation is a
 Therecannotbeanycausalrelationshipbetweeneventsinasynchronousexecution.
Anasynchronousexecution(orA-execution)isanexecution(E,
 Themessagescanbedeliveredin anyordereveninnon
partial order. FIFO.
 ThoughthereisaphysicallinkthatdeliversthemessagessentonitinFIFO orderdue to the
physical properties of the medium, amay be formed as a composite of physical links
and multiple paths may exist between the two end points of the logical link.
Fig2.1:a)FIFOexecutionsb)nonFIFOexecutions FIFO

executions

AFIFOexecutionisanA-executioninwhich,forall

 Thelogicallink isnon-FIFO.
 A FIFOlogicalchannelcanbe createdovera non-FIFOchannelby usinga
separatenumberingschemeto sequencethemessages on each logical channel.
 Thesenderassignsandappendsa<sequence_num,connection_id>tupletoeach
message.
 Thereceiverusesabuffertoordertheincoming messages asperthesender’s
sequence numbers, and accepts only the “next” message in sequence.

CausallyOrdered(CO) executions

COexecutionisanA-executioninwhich,forall,

Fig:CO Execution
 Two send events s and s’ are related by causality ordering (not physicaltime ordering),
then a causally ordered execution requires that their corresponding receive events r
and r’ occur in the same order at all common destinations.
Applicationsofcausal order:
Applications that requires update to shared data to implement distributed shared
memory, and fair resource allocation in distributed mutual exclusion.

CausalOrder(CO)forImplementations:

Ifsend(m1)≺send(m2)thenforeachcommondestinationdofmessagesm1andm2,
deliverd(m1) ≺deliverd(m2) must be satisfied.
Otherproperties ofcausalordering
1. MessageOrder(MO):AMOexecution isanA-execution inwhich,forall

Fig:Nota CO execution

pair of events (s, r) ∈ T, the open interval set


EmptyIntervalExecution:Anexecution(E≺)isanempty-interval(EI)executionif for each

inthepartialorderis empty.
Anexecution(E,≺)is COif andonlyifforeachpairofevents(s, r)∈Tandeachevente∈E,
 weakcommonpast:

 weakcommonfuture:

SynchronousExecution

 When all the communication between pairs of processes uses synchronous send and
receives primitives, the resulting order is the synchronous order.
 The synchronous communication always involves a handshake between the receiver
and the sender, the handshake events may appear to be occurring instantaneously and
atomically.

Causalityinasynchronous execution
Thesynchronouscausalityrelation<<onEisthesmallesttransitiverelationthatsatisfiesthe following: S1: If x

S2: If sr∈T, then for all x ∈E,[(x s ⇐⇒x<<r) and (s<<x⇐⇒r <<x)].
occurs before y at the same process, then x<< y.

S3: If x <<y and y<<z, then x<<z


SynchronousExecution:
A synchronous execution (or S-execution) is anexecution (E,<< )for which the causality relation<< ispartial
order
Fig)Executioninanasynchronoussystem Fig)Executioninsynchronous

Timestampingasynchronousexecution

An execution( E ≺)is synchronous if and only if there exists a mapping from E to T (scalar timestamps)
such that
• foranymessageM,T(s(M))=T(r(M);
• foreachprocessPi,ifei≺ei1 then T(ei)<T(ei1).

ASYNCHRONOUSEXECUTIONWITHSYNCHRONOUSCOMMUNICATION
When all the communication between pairs of processes is by using synchronous send
and receive primitives, the resulting order is synchronous order. If a program is written for an
asynchronous system, say a FIFO system, will it still execute correctly if the communication
is done by synchronous primitives. There is a possibility that the program may deadlock,

Fig) A communication program for an asynchronous system deadlock when using


synchronous primitives

Fig) Illustrations of asynchronous crown of size 2 crownofsize3


Execution and of crowns
Crownofsize2
RealizableSynchronousCommunication(RSC)
A-execution can be realized under synchronous communication is called a realizablewith
synchronous communication (RSC).
Anexecutioncanbemodeledtogive atotal order that extendsthepartial order(E,≺).
In an A-execution, the messages can be made to appear instantaneous if there exist a linear extension of
the execution, such that each send event is immediately followed by its corresponding receive event in
this linear extension.
Non-separated linear extension is an extension of (E, ≺) is a linear extension of (E, ≺) such that for
each pair (s, r) ∈T, the interval { x∈Es ≺x ≺r }is empty.
A-execution (E, ≺) is an RSC execution if and only if there exists a non-separated linear extension of
the partial order (E, ≺).
In the non-separated linear extension, if the adjacent send event and its corresponding receive event are
viewed atomically, then that pair of events shares a common past and a common futurewith each other.
Crown
LetEbean execution.AcrownofsizekinE isa sequence<(si,ri),i ∈{0,…,k-1}>ofpairsof corresponding send
and receive events such that: s0 ≺r1, s1 ≺r2, sk−2 ≺rk−1, sk−1 ≺r0.
The crown is <(s1, r1) (s2, r2)> as we have s1 ≺ r2 and s2 ≺ r1. Cyclic dependencies may exist in a
crown. The crown criterion states that an A-computation is RSC, i.e., it can be realized on a system
with synchronous communication, if and only if it contains no crown.
TimestampcriterionforRSC execution
Anexecution(E,≺)isRSCifandonlyifthereexistsamappingfromEtoT(scalartimestamps) such that

Synchronousprogramsonasynchronoussystems
 A(valid)S-executioncanbetriviallyrealizedonanasynchronoussystemby scheduling the
messages in the order in which they appear in the S-execution.
 ThepartialorderoftheS-executionremainsunchangedbutthecommunicationoccurs on an
asynchronous system that uses asynchronous communication primitives.
 Onceamessagesendeventisscheduled,themiddlewarelayerwaitsfor acknowledgment;
after theack is received, the synchronous send primitive completes.

SYNCHRONOUSPROGRAMORDER ONAN ASYNCHRONOUSSYSTEM

Nondeterministicprograms
The partial ordering of messages in the distributed systems makes the repeated runs of
the same program will produce the same partial order, thus preserving deterministic nature.
Butsometimesthedistributedsystems exhibitnon determinism:
 Areceivecallcanreceiveamessagefromanysenderwhohassent amessage,ifthe
expected sender is not specified.
 Multiplesendand receivecallswhich areenabledataprocess canbeexecutedinan
interchangeable order.
 Ifisends toj, andjsends toi concurrently usingblocking synchronous calls, there
resultsadeadlock.
 Thereisnosemanticdependencybetweenthesendandtheimmediatelyfollowing
receive at each of the processes. If the receive call at one of the processes can be
scheduled before the send call, then there is no deadlock.

Rendezvous
Rendezvous systems are a form of synchronous communication among an arbitrary
number of asynchronous processes. All the processes involved meet with each other, i.e.,
communicatesynchronously with each otherat onetime.Two typesofrendezvous systems are
possible:
 Binaryrendezvous:Whentwoprocessesagreetosynchronize.
 Multi-wayrendezvous:When morethantwoprocessesagreetosynchronize.

Featuresofbinaryrendezvous:
 Forthereceivecommand,thesendermust bespecified.However,multiplerecieve
commands can exist. A type check on the data is implicitly performed.

 Sendandreceivedcommandsmaybeindividuallydisabledorenabled.Acommandis
disabled if it is guarded and the guard evaluates to false. The guard would likely
contain an expression on some local variables.
 Synchronouscommunicationisimplementedbyschedulingmessagesunderthe
covers using asynchronous communication.

 Scheduling involves pairing of matching send and receives commands that are both
enabled.Thecommunicationeventsforthecontrolmessagesunderthecoversdonot alter
the partial order of the execution.

2.3.2 Binaryrendezvousalgorithm
If multiple interactions are enabled, a process chooses one of them and tries to
synchronizewiththepartnerprocess.Theproblem reducestooneofschedulingmessages
satisfying the following constraints:
 Scheduleon-line,atomically,andinadistributedmanner.
 Scheduleinadeadlock-freemanner(i.e.,crown-free).
 Scheduletosatisfytheprogresspropertyinadditiontothesafetyproperty.

StepsinBagrodia algorithm
1. Receivecommandsareforeverenabledfromall processes.
2. Asendcommand,onceenabled,remainsenabled untilitcompletes,i.e.,itisnot
possible that a send command gets before the send is executed.
3. Topreventdeadlock,processidentifiersareused tointroduceasymmetrytobreak
potential crowns that arise.
4. Eachprocess attemptstoscheduleonlyonesendeventatanytime.
The message (M) types used are: M, ack(M), request(M), and permission(M). Execution
events in the synchronous execution are only the send of the message M and receive of the
message M. The send and receive events for the other message types – ack(M), request(M),
and permission(M) which are control messages. The messages request(M), ack(M), and
permission(M) use M’s unique tag; the message M is not included in these messages.

(messagetypes)

M,ack(M),request(M),permission(M)

(1)
PiwantstoexecuteSEND(M)toalowerpriorityprocessPj:

Pi executessend(M)andblocksuntilitreceivesack(M)fromPj.ThesendeventSEND(M) nowcompletes.

Any M’message (from a higher priority processes) and request(M’) request for
synchronization (from a lower priority processes) received during the blocking period are
queued.
(2)
Pi wants to execute SEND(M) to a higher priority

process Pj: (2a) Pi seeks permission from Pj by executing

send(request(M)).

.(2b)WhilePiis waitingforpermission,itremains unblocked.

If a message M’ arrives from a higher priority process P k, Pi accepts M’ by scheduling a


(i)

RECEIVE(M’) event and then executes send(ack(M’)) to Pk.

If a request(M’) arrives from a lower priority process P k, Pi executes


(ii)

send(permission(M’)) to Pk and blocks waiting for the messageM’. WhenM’ arrives, the
RECEIVE(M’) event is executed.

(2c)Whenthepermission(M)arrives,PiknowspartnerPjissynchronizedandPiexecutes
send(M).TheSEND(M)now completes.

(3)
request(M)arrival atPifromalowerpriorityprocess Pj:

At the time a request(M) is processed by Pi, process Pi executes send(permission(M)) to Pj


and blocks waiting for the message M. When M arrives, the RECEIVE(M) event is executed
and the process unblocks.
(4)
MessageMarrivalatPifrom ahigherpriorityprocess Pj:

AtthetimeamessageMisprocessedbyPi,processPi executesRECEIVE(M)(whichis assumed to


be always enabled) and then send(ack(M)) to Pj .
(5)
Processingwhen Piisunblocked:

When Pi is unblocked, it dequeues the next (if any) message from the queue and processes it
as a message arrival (as per rules 3 or 4).

Fig2.5:Bagrodia Algorithm

GROUPCOMMUNICATION
Group communication is done by broadcasting of messages. A message broadcast is
the sending of a message to all members in the distributed system. The communication may
be
 Multicast:Amessageis senttoacertainsubsetor agroup.
 Unicasting:Apoint-to-point messagecommunication.

Thenetworklayerprotocolcannot providethe followingfunctionalities:


 Application-specificorderingsemanticson theorderofdeliveryofmessages.
 Adaptinggroupstodynamicallychangingmembership.
 Sendingmulticaststoanarbitrarysetofprocessesateachsendevent.
 Providingvariousfault-tolerancesemantics.
 Themulticast algorithmscan beopen orclosedgroup.

CAUSALORDER (CO)
In the context of group communication, there are two modes ofcommunication:
causal order and total order. Given a system with FIFO channels, causal order needs to be
explicitly enforced by a protocol. The following two criteria must be met by a causal
orderingprotocol:
 Safety:In order to prevent causal order from being violated, a message M that
arrivesat aprocess mayneed to bebuffered until all system widemessages sent in the
causal past of the send (M) event to that same destination have already arrived. The
arrival of a message is transparent to the application process. The delivery event
corresponds to the receive event in the execution model.
 Liveness: A message that arrives at a process must eventually be delivered to the
process.

TheRaynal–Schiper–Touegalgorithm
 Each message M should carry a log of all other messages sent causally before M’s
send event, and sent to the same destination dest(M).
 The Raynal–Schiper–Toueg algorithm canonical algorithm is a representative of
several algorithms that reduces the size of the local space andmessagespace overhead
by various techniques.
 Thislogcanthen beexamined to ensurewhetheritissafetodeliveramessage.
 Allalgorithmsaimtoreduce thislogoverhead,andthe spaceandtime overheadof
maintaining the log information at the processes.
 Todistributethisloginformation,broadcastandmulticast communicationisused.
 The hardware-assisted or network layer protocol assisted multicast cannot efficiently
provide features:
 Application-specificorderingsemanticsontheorderofdeliveryofmessages.
 Adaptinggroupstodynamicallychangingmembership.
 Sendingmulticaststoanarbitrarysetofprocessesateach sendevent.
 Providingvariousfault-tolerancesemantics

TheKshemKalyani –SinghalOptimalAlgorithm
An optimal CO algorithm stores in local message logs and propagates on messages,
information of the form d is a destination of M about a message M sent in the causal past, as
long as and only as long as:

PropagationConstraintI:it is notknownthatthemessageM isdeliveredto d.

Propagation Constraint II: it is not known that a message has been sent to d in the causal
future of Send(M), and hence it is not guaranteed using a reasoning based on transitivity that
the message M will be delivered to d in CO.

Fig:Conditions forcausal ordering


∈ M.Dests” must not be stored or propagated, even to remember that (I) or (II) has been
ThePropagation Constraints also implythat ifeither(I)or(II)is false,the information “d

falsified:

not in the causal future of e k, c where d ∈Mk,cDests and there is no


 notinthecausalfutureofDeliverd(M1, a)

other message sent causally between M i,a and Mk, c to the same
destination d.

Informationaboutmessages:
(i) not knownto bedelivered
(ii) notguaranteedtobedeliveredin CO,isexplicitly trackedbythealgorithm using(source,
timestamp, destination) information.
Information about messages already delivered and messages guaranteed to be delivered in
CO is implicitly tracked without storing or propagating it, and is derived from the explicit
information. The algorithm for the send and receive operations is given in Fig. 2.7 a) and b).
Procedure SND is executed atomically. Procedure RCV is executed atomically except for a
possible interruption in line 2a where a non-blocking wait is required to meet the Delivery
Condition.
Fig2.7a)Sendalgorithm byKshemkalyani–Singhaltooptimallyimplementcausal
ordering

Figb)ReceivealgorithmbyKshemkalyani–Singhaltooptimallyimplementcausal ordering

Thedatastructuresmaintained aresorted row–majorand thencolumn–major:

1. Explicittracking:
 Trackingof(source,timestamp,destination)informationfor messages(i)notknowntobe
delivered and (ii) not guaranteed tobe delivered in CO, is done explicitly using the I.Dests
field of entries in local logs at nodes and o.Dests field of entries in messages.
 Sets li,a Dests and oi,a. Dests contain explicit information of destinations to which M i,ais

 The information about d ∈Mi,a .Dests is propagated up to the earliest events on all causal
not guaranteed to be delivered in CO and is not known to be delivered.

paths from (i, a) at which it is known that M i,a is delivered to d or is guaranteed to be


delivered to d in CO.

2. Implicittracking:
 Trackingofmessagesthatareeither(i)alreadydelivered,or(ii)guaranteedtobe delivered in
CO, is performed implicitly.
 The information about messages (i) already delivered or (ii) guaranteed tobe delivered
in CO is deleted and not propagated because it is redundant as far as enforcing CO is
concerned.
 It is useful in determining what information that is being carried in other messagesand
is being stored in logs at other nodes has become redundant and thus can be purged.
 The semantics are implicitly stored and propagated. This information about messages
that are (i) already delivered or (ii) guaranteed to be delivered inCO is tracked without
explicitly storing it.
 The algorithm derives it from the existing explicit information about messages (i) not
known to be delivered and (ii) not guaranteed to be delivered in CO, by examining
only oi,aDests or li,aDests, which is a part of the explicit information.

Fig 2.8: Illustration of propagation

constraintsMulticastsM5,1andM4,1
Message M5,1sent to processes P4 and P6 contains the piggybacked information M5,1. Dest=
{P4, P6}. Additionally, at the send event (5, 1), the information M5,1.Dests = {P4,P6} is also

piggybackedinformation P4 ∈ M5,1 .Dests is stored in Log6 as M5,1.Dests ={P4}


inserted in the local log Log5. When M5,1 is delivered to P6, the (new)

information about P6 ∈ M5,1.Destswhich wasneededforrouting,must notbe stored


inLog6because ofconstraint I.

atevent(4,1),onlythenewpiggybackedinformationP6 ∈ M5,1.DestsisinsertedinLog4 as
InthesamewaywhenM5,1isdeliveredtoprocessP4

M5,1.Dests =P6which is later propagated during multicast M4,2.

Atevent (4,3),theinformationP6 ∈M5,1.DestsinLog4ispropagatedon multicastM4,3only to


MulticastM4,3

process P6 to ensure causal delivery using the Delivery Condition. The piggybacked
information on message M4,3sent to process P3must not contain this information because of
constraint II. As long as any future message sent to P6 is delivered in causal order w.r.t.

delivered to P4, the information M5,1Dests = ∅ is piggybacked on M4,3 sent to P3.


M4,3sent to P6, it will also be delivered in causal order w.r.t. M 5,1. And as M5,1 is already

Similarly, the information P6 ∈ M5,1Dests must be deleted from Log4 as it will no longer be
needed, because of constraint II. M5,1Dests = ∅ is stored in Log4 to remember that M 5,1 has
been delivered or is guaranteed to be delivered in causal order to all its destinations.
LearningimplicitinformationatP2andP3
WhenmessageM4,2isreceivedbyprocessesP2andP3,theyinsertthe(new) piggybacked
information in their local logs, as information M 5,1.Dests = P6. They both continuetostorethis
in Log2 and Log3 and propagate this information on multicasts untilthey learn at events (2, 4)
and (3, 2) on receipt of messages M3,3and M4,3, respectively, that any future message is
expected to be delivered in causal order to process P6, w.r.t. M 5,1sent toP6. Hence

When M4,3 with piggybacked information M5,1Dests = ∅ is received byP3at (3, 2),
byconstraintII, this informationmustbe deletedfromLog2andLog3.Theflowofeventsisgiven by;

the log Log3 already contains explicit informationP6 ∈M5,1.Dests about that
this is inferred to be valid current implicit information about multicast M 5,1because

deleted to achieve optimality. M5,1Dests is set to ∅ in Log3.


multicast. Therefore,theexplicitinformationinLog3is inferredtobeoldandmust be

 The logic by which P2 learns this implicit knowledge on the arrival of M 3,3is
identical.

ProcessingatP6
When message M5,1 is delivered to P6, only M5,1.Dests = P4 is added to Log6. Further,
P6 propagates only M5,1.Dests = P4 on message M6,2, and this conveys the current
implicit information M5,1 has been delivered to P6 by its very absence in the explicit
information.
 WhentheinformationP6∈M5,1Dests arrivesonM4,3,piggybackedasM5,1 .Dests
= P6 it is used only to ensure causal delivery of M 4,3 using the Delivery
Condition,andisnot inserted inLog6(constraint I) –further, thepresence ofM5,1
.Dests = P4 in Log6 implies the implicit information that M 5,1 has already been
delivered to P6. Also, the absence of P4 in M 5,1 .Dests in theexplicitpiggybacked
information implies the implicit information that M 5,1 has been delivered or is

to ∅ in Log6.
guaranteed to be delivered in causal order to P4, and, therefore, M5,1. Dests is set

 WhentheinformationP6∈M5,1.DestsarrivesonM5,2piggybackedasM5,1.Dests

Condition, and is not inserted in Log6 because Log6 contains M 5,1 .Dests = ∅,
= {P4, P6} it is used only to ensure causal delivery of M 4,3 using the Delivery

which gives the implicit information that M 5,1 has been delivered or is
guaranteedto be delivered in causal order to both P4 and P6.

ProcessingatP1
 When M2,2arrives carrying piggybacked information M 5,1.Dests = P6 this
(new)information is inserted in Log1.
 When M6,2arrives with piggybacked information M5,1.Dests ={P4},

explicit information P6 ∈ M5,1.Dests in the piggybacked information, and hence


P1learnsimplicit information M5,1has been delivered to P6 by the very absence of

marks information P6 ∈ M5,1Dests for deletion from Log1


The information “P6 ∈M5,1.Dests piggybacked on M2,3,which arrives at P 1, is
inferred to be outdated using the implicit knowledge derived from M 5,1.Dest= ∅”

inLog1.
TOTAL ORDER

ForeachpairofprocessesPiandPjandforeachpairofmessagesMxandMythataredeliveredto both the


CentralizedAlgorithmfortotalordering
processes, P is delivered M before M if and only if P is delivered M before M
i x y j x y.

Each process sends the message it wants to broadcast to a centralized process,


whichrelays all the messages it receives to every other process over FIFO channels.

Complexity:Eachmessagetransmissiontakestwomessagehopsandexactlynmessages in a
system of n processes.

Drawbacks: A centralized algorithm has a single point of failure and congestion, and is
not an elegant solution.

Threephasedistributedalgorithm

Threephasescanbeseeninbothsenderandreceiverside.

Sender

Phase1
 In thefirstphase, aprocessmulticaststhemessageMwithalocallyuniquetagand the
local timestamp to the group members.

Phase2
 Thesenderprocess awaitsareplyfromallthegroupmemberswhorespondwitha
tentative proposal for a revised timestamp for that message M.
 Theawaitcallisnon-blocking.

Phase3
 Theprocessmulticaststhefinaltimestamptothegroup.
Fig)Sendersideofthreephasedistributed algorithm

ReceiverSide
Phase 1
 Thereceiverreceivesthemessagewithatentativetimestamp.Itupdatesthevariable
priority that tracks the highest proposed timestamp, then revises the proposed
timestamp to the priority, and places the message with its tag and the revised
timestamp at the tail of the queue temp_Q. In the queue, the entry is marked as
undeliverable.

Phase2
 Thereceiversendstherevisedtimestampbacktothesender.Thereceiverthenwaits in a
non-blocking manner for the final timestamp.

Phase3
 The final timestamp is received from the multi caster. The corresponding
messageentry in temp_Q is identified using the tag, and is marked as deliverable
after the revised timestamp is overwritten by the final timestamp.
 Thequeueisthenresortedusingthetimestamp fieldofthe entries asthekey.Asthe queue
is already sorted except for the modified entry for the message under consideration,
that messageentryhas to beplaced in itssorted positioninthequeue.
 If the message entry is at the head of the temp_Q, that entry, and all consecutive
subsequententriesthatarealsomarkedasdeliverable,aredequeuedfromtemp_Q, and
enqueued in deliver_Q.

Complexity
This algorithm uses three phases,and, to senda message to n− 1 processes, it uses 3(n–
1)messages and incurs a delay of three message hops
Example
An example execution to illustrate the algorithm is given in Figure 6.14. Here, A and B
multicast to a set of destinations and C and D are the common destinations for both
multicasts. •
Figure(a)Themainsequence ofsteps isas follows:
1. A sends a REVISE_TS(7) message, having timestamp 7. B sends a REVISE_TS(9)
message, having timestamp 9.
2. CreceivesA’sREVISE_TS(7),entersthecorrespondingmessageintemp_Q,andmarks it as
undeliverable; priority = 7. C then sends PROPOSED_TS(7) message to A
3. DreceivesB’sREVISE_TS(9),entersthecorrespondingmessageintemp_Q,andmarks it as
undeliverable; priority = 9. D then sends PROPOSED_TS(9) message to B.
4. C receives B’sREVISE_TS(9), enters thecorresponding messagein temp_Q,and marks it
as undeliverable; priority = 9. C then sends PROPOSED_TS(9) message to B.
5. DreceivesA’sREVISE_TS(7),entersthecorrespondingmessagein temp_Q,andmarks it as
undeliverable; priority = 10. D assigns a tentative timestamp value of 10, which is greater
than all of the times tamps on REVISE_TSs seen so far, and then sends
PROPOSED_TS(10) message to A.
Thestateof thesystem isas shownin thefigure

Fig) An example to illustrate the three-phase total ordering algorithm. (a) A snapshot for
PROPOSED_TS and REVISE_TS messages. The dashed lines show the further execution
after the snapshot. (b) The FINAL_TS messages in the example.
Figure(b)Thecontinuingsequence ofmainstepsisas follows:
6. When A receives PROPOSED_TS(7) from C and PROPOSED_TS(10) from D, it
computes the final timestamp as max710=10, and sends FINAL_TS(10) to C and D.
7. When B receives PROPOSED_TS(9) from C and PROPOSED_TS(9) from D, it
computes the final timestamp as max99= 9, and sends FINAL_TS(9) to C and D.
8. C receives FINAL_TS(10) from A, updates the corresponding entry in temp_Q with the
timestamp, resorts the queue, and marks the message as deliverable. As the message is not
at the head of the queue, and some entry ahead of it is still undeliverable, the message isnot
moved to delivery_Q.
9. D receives FINAL_TS(9) from B, updates the corresponding entry in temp_Q by
marking the corresponding message as deliverable, and resorts the queue. As the messageis
at the head of the queue, it is moved to delivery_Q. This is the system snapshot shown in
Figure (b).
Thefollowing furthersteps will occur:
10. When C receives FINAL_TS(9) from B, it will update the correspond ing entry in
temp_Q by marking the corresponding message as deliv erable. As the message is at the
head of the queue, it is moved to the delivery_Q, and the next message (of A), which isalso
deliverable, is also moved to the delivery_Q.
11. When D receives FINAL_TS(10) from A, it will update the corre sponding entry in
temp_Q by marking the corresponding message as deliverable. As the message is at the
head of the queue, it is moved to the delivery_Q

GLOBALSTATEANDSNAPSHOTRECORDINGALGORITHMS
 A distributed computing system consists of processes that do not share a common
memory and communicate asynchronously with each other by message passing.
 Each component of has a local state. The state of the process is the local memory and
ahistory of its activity.
 The state of a channel is characterized by the set of messages sent along the channel
less the messages received along the channel. The global state of a distributed system
isa collection of the local states of its components.
 If shared memory were available, an up-to-date state of the entire system would be
available to the processes sharing the memory.
 The absence of shared memory necessitates ways of getting a coherent and complete
view ofthe system based on the local states of individual processes.
 A meaningful global snapshot can be obtained if the components of the distributed
system record their local states at the same time.
 This would be possible if the local clocks at processes were perfectly synchronized or
if there were a global system clock that could be instantaneously read bythe processes.
 If processes read time from a single common clock, various in determinate
transmission delays during the read operation will cause the processes to identify
various physical instants as the same time.

SystemModel
 Thesystemconsistsofacollectionofnprocesses,p1,p2,…,pnthatare connectedby
channels.
 LetCijdenotethechannelfromprocesspitoprocesspj.

Processesandchannels havestatesassociatedwiththem.

The state of a process at any time is defined by the contents of processor registers,
stacks, local memory, etc., and may be highly dependent on the local context ofthe
distributed application.
 ThestateofchannelCij, denotedbySCij,isgivenbythesetofmessagesintransit in the
channel.
 The events that may happen are: internal event, send (send (m ij)) and receive
(rec(mij)) events.

Theoccurrencesof eventscausechangesin theprocess state.

Achannelisadistributedentityanditsstatedependsonthelocalstatesof the
processes on which it is incident.

 Thetransitfunction recordsthestateofthechannel Cij.



In the FIFO model, each channel acts as a first-in first-out message queue and,
thus, message ordering is preserved by a channel.

Inthenon-FIFOmodel,a channelactslikeasetinwhichthesenderprocess adds
messages and the receiver process removes messages from it in a random order.

Aconsistentglobal state
Theglobalstateofadistributedsystemisacollectionofthelocalstatesofthe processes and
the channels. The global state is given by:

Thetwoconditions forglobal stateare:

Condition1preserveslawofconservationofmessages.ConditionC2statesthatin thecollected
global state, for every effect, its cause must be present.

Law of conservation of messages: Every message m ijthat is recorded as sent in the local state of a
process
 Inpiamust be captured
consistent globalinstate,
the state of the channel
everymessage that isC ijrecorded
or in theascollected
receivedlocal staterecorded
is also of the
receiver process pj.
as sent. Such a global state captures the notion of causality that a message cannot be
received if it was not sent.
 Consistent global states are meaningful global states and inconsistent global states are
not meaningful in the sense that a distributed system can never be in an inconsistent
state.

Interpretationof cuts

Cuts in a space–time diagram provide a powerful graphical aid in representing and
reasoning about the global states of a computation. A cut is a line joining an arbitrary
point on each process line that slices the space–time diagram into a PAST and a
FUTURE.

A consistent global state corresponds to a cut in which every message received in the
PAST of the cut has been sent in the PAST of that cut. Sucha cut is known as a
consistent cut.

In a consistent snapshot, all the recorded local states of processes are concurrent; that
is, the recorded local state of no process casually affects the recorded local state of
anyother process.

Issuesin recordingglobalstate
Thenon-availabilityofglobalclockin distributedsystem,raisesthefollowingissues:
Issue 1:
How to distinguishbetween the messagesto be recordedin the snapshotfrom those not
to be recorded?
Answer:

Anymessagethatissentbyaprocessbeforerecordingitssnapshot,must berecorded
in the global snapshot (from C1).

Any messagethatissentbyaprocessafterrecordingitssnapshot,must not
berecorded in the global snapshot (from C2).

Issue 2:
How to determine the instant when a process takes its snapshot?
The answer
Answer:
Aprocesspjmust record itssnapshotbefore processingamessagemij that was sentbyprocess piafter recording its
snapshot

SNAPSHOTALGORITHMSFORFIFOCHANNELS
Eachdistributedapplicationhasnumberofprocessesrunningondifferentphysical servers.
These processes communicate with each other through messaging channels.

Asnapshotcapturesthelocalstatesofeachprocessalongwiththestateofeachcommunicationchannel.

Snapshotsarerequiredto:

Checkpointing

Collectinggarbage

Detectingdeadlocks

Debugging

Chandy–Lamportalgorithm

Thealgorithm willrecordaglobalsnapshot foreach process channel.

TheChandy-Lamportalgorithmusesacontrolmessage,called a marker.

Afterasitehasrecordeditssnapshot,itsendsamarkeralongallofitsoutgoing channels before
sending out any more messages.

SincechannelsareFIFO,amarkerseparatesthemessagesin thechannel intothoseto be
included in the snapshot from those not to be recorded in the snapshot.


This addresses issue I1.The role of markers in a FIFO system is to act as
delimitersforthemessagesinthechannelssothatthechannelstaterecordedbytheprocess
atthereceiving endofthechannelsatisfies thecondition C2.

Fig2.10:Chandy–Lamportalgorithm Initiating

a snapshot
 ProcessPiinitiatesthesnapshot
 Pirecordsits ownstateandpreparesaspecialmarkermessage.

Sendthemarkermessage toall otherprocesses.
 StartrecordingallincomingmessagesfromchannelsCijforjnotequalto i.

Propagatingasnapshot
 Forall processesPjconsideramessageonchannelCkj.


Ifmarkermessageisseenforthefirst time:
 PjrecordsownsateandmarksCkjas empty

Sendthemarkermessage toall otherprocesses.
 Recordall incomingmessages fromchannels Cljfor1not equaltojork.

Elseadd allmessagesfrominbound channels.

Terminatingasnapshot

Allprocesseshavereceivedamarker.

AllprocesshavereceivedamarkeronalltheN-1 incomingchannels.

Acentralserver can gatherthepartialstatetobuildaglobalsnapshot.

Correctnessofthealgorithm

Since a process records its snapshot when it receives the first markeronanyincoming
channel, no messages that followmarkerson thechannels incoming to it are recorded in
the process’s snapshot.

A process stops recording the state of an incoming channel when a marker is received
on that channel.

Due to FIFO property of channels, it follows that no message sent after the marker on that
channel is recorded in the channel state. Thus, condition C2 is satisfied.
 Whenaprocesspj receivesmessagemij thatprecedesthemarkeronchannelCij,itacts as follows:
if process pj has not taken its snapshot yet, then it includes mij in its recorded snapshot.
Otherwise, it records mij in the state of the channel Cij. Thus, condition C1 issatisfied.

Complexity
TherecordingpartofasingleinstanceofthealgorithmrequiresO(e)messages
andO(d)time,whereeisthenumberofedgesinthenetworkanddisthediameterof thenetwork.

Propertiesoftherecordedglobalstate

Fig)Timingdiagram oftwopossibleexecutionsofthebanking examples


1. (Markers shown using dashed-and-dotted arrows.) Let site S1 initiate the algorithm just after t1.
Site S1 records its local state (account A=$550) and sends a marker to site S2. The marker is
received by site S2 after t4. When site S2 receives the marker, it records its local state
(account B=$170), the state of channel C12 as $0, and sends a marker along channel C21.
WhensiteS1receivesthismarker,itrecordsthestateofchannelC21as$80.The$800amount in the
system is conserved in the recordedglobal state,
A=$550B=$170C12=$0C21 =$80

2. (Markers shown using dotted arrows.) Let site S1 initiate the algorithm just after t0 and before
Sending the $50 for S2. Site S1 records its local state (account A = $600) and sends a marker to
S2. The marker is received by site S2 between t2 and t3. When site S2 receives the marker, it
records its local state (account B = $120), the state of channel C12 as $0, and sends a marker
alongchannelC21.WhensiteS1receivesthismarker,itrecordsthestateofchannelC21as$80. The $800
amount in the system is conserved in the recorded global state,
A=$600B=$120C12=$0C21 =$80

Therecordedglobalstatemaynotcorrespondtoanyoftheglobalstatesthatoccurred during the


computation.
Thishappensbecauseaprocesscanchangeitsstateasynchronouslybeforethemarkersit sentare received
by other sites and the other sites record their states.
But the system could have passed through the recorded global states in some equivalent
executions.
The recorded global state is a valid state in an equivalent execution and if a stable property
(i.e., a property that persists) holds in the system before the snapshot algorithm begins, it holds in
the recorded global snapshot.
Therefore,arecorded globalstateisusefulindetectingstable properties.

UNIT III

DISTRIBUTEDMUTEXANDDEADLOCK

DISTRIBUTED MUTEX & DEADLOCK


Distributedmutualexclusionalgorithms: Introduction –Preliminaries –Lamport‘salgorithm –
Ricart-Agrawala algorithm – Token-Based algorithms – Suzuki Kasami‘s broadcast
algorithm; Deadlock detection in distributed systems: Introduction – System model –
Preliminaries –Models of deadlocks – Chandy-Misra-Haas Algorithms for the AND model
and OR model.

DISTRIBUTEDMUTUALEXCLUSIONALGORITHMS
 Mutualexclusionisaconcurrencycontrolpropertywhichisintroducedtoprevent race
conditions.
 Itistherequirementthataprocesscannotaccessasharedresourcewhileanother concurrent
process is currently present or executing the same resource.

Mutualexclusioninadistributedsystemstatesthatonlyoneprocessisallowedtoexecutethe critical
section (CS) at any given time.
 Messagepassingisthesolemeans forimplementingdistributedmutualexclusion.
There are three basic approaches for implementing distributed mutual exclusion:
1. Token-basedapproach:
 Auniquetoken(also knownas the privilegemessage)is sharedamongthesites.
 Asiteis allowedtoenteritsCSifitpossessesthetoken.
 MutualExclusionisensuredbecausethetokenisunique.
 Eg: Suzuki-Kasami’s Broadcast Algorithm, Raymond’s Tree- Based Algorithm
etc
2. Non-token-basedapproach:
 Two or more successive rounds of messages are exchanged among the sites to
determine which site will enter the CS next.
 Eg:Lamport'salgorithm,Ricart–Agrawalaalgorithm

Quorum-basedapproach:

 EachsiterequestspermissiontoexecutetheCSfromasubsetofsites (called a
quorum)
 Anytwo subsets ofsitesorQuorumcontains acommon site.
 Thiscommonsiteisresponsibletomakesurethatonlyonerequestexcutesthe
CSat any time.
 Eg:Maekawa’s Algorithm

Preliminaries
 Thesystemconsists of N sites, S1,S2,S3, …, SN.

Assumethat asingleprocessis runningon each site.
 TheprocessatsiteSiisdenotedbypi.
 Alltheseprocessescommunicate asynchronouslyoveranunderlying
communication network.

Asitecanbeinoneofthefollowingthreestates:requestingtheCS,executingtheCS, or
neither requesting nor executing the CS.

IntherequestingtheCSstate,thesiteisblocked andcannotmakefurtherrequestsfor the CS.

Intheidlestate, thesiteisexecutingoutsidetheCS.

In thetoken-based algorithms,asitecan alsobeinastatewhereasiteholdingthe token
is executing outside the CS. Such state is referred to as the idle token state.

Atanyinstant,asitemayhaveseveralpending requestsforCS.Asitequeuesup these
requests and serves them one at a time.

Ndenotesthenumberof processesorsitesinvolvedininvokingthecritical section,T
denotes the average message delay, and E denotes the average critical section
execution time.

Requirementsofmutualexclusion algorithms

Safetyproperty:

At any instant, only one process can execute the critical section. This is an
essential property of a mutual exclusion algorithm.

Livenessproperty:
This property states the absence of deadlock and starvation. Two or more sites
should not endlessly wait for messages that will never arrive. time. This is an
important property of a mutual exclusion algorithm

Fairness:
Fairnessinthe contextof mutualexclusionmeans thateachprocessgets afair
chance to execute the CS. In mutual exclusion algorithms, the fairness property
generally means that theCS execution requests areexecutedin orderoftheirarrival in the
system.

Performancemetrics

 Messagecomplexity: Thisisthenumberofmessagesthatarerequiredper CS
execution by a site.
 Synchronization delay: After a site leaves the CS, it is the time required and before
the next site enters the CS.
 Response time: This is the time interval a request waits for its CS execution to be
over after its request messages have been sent out. Thus, response time does not
includethetimearequestwaitsat asitebeforeits requestmessageshavebeensent out.
Systemthroughput:Thisistherateatwhichthesystemexecutesrequestsforthe

CS.IfSDisthesynchronizationdelayandEistheaveragecriticalsection execution time.

Figure:Synchronizationdelay

Figure: Response Time


Low and High Load Performance:
 The performance of mutual exclusion algorithms is classified astwo special loading
conditions, viz., “low load” and “high load”.
 TheloadisdeterminedbythearrivalrateofCSexecution requests.
 Underlowloadconditions,thereisseldommorethanonerequestforthecritical section
present in the system simultaneously.
 Underheavyload conditions, thereis always apending request forcritical section at a
site.

Bestandworstcaseperformance
 In the best case, prevailing conditions are such that a performance metric attains the
best possible value. For example, the best value of the response time is a roundtrip
message delay plus the CS execution time, 2T +E.

 For examples, the best and worst values of the response time are achieved when load
is, respectively, low and high;
 The best and the worse message traffic is generated at low and heavy load conditions,
respectively.
LAMPORT’SALGORITHM
 Request for CS are executed in the increasing order of timestamps and time is
determined by logical clocks.
 Every site Si keeps a queue, request_queuei which contains mutual exclusion requests
ordered by their timestamps
 This algorithm requires communication channels to deliver messages the FIFO
order.Three types of messages are used Request, Reply and Release. These messages
with timestamps also updates logical clock

Fig:Lamport’sdistributedmutualexclusionalgorithm
To enter Critical section:
When a site Si wants to enter the critical section, it sends a request message
Request(tsi, i) to all other sites and places the request on request_queue i. Here, Tsi
denotes the timestamp of Site Si.
When a site Sj receives the request message REQUEST(ts i, i) from site Si, it returns a
timestamped REPLY message to site Si and places the request of site Si on
request_queuej

Toexecutethecriticalsection:
 A site Si can enter the critical section if it has received the message with timestamp
larger than (tsi, i) from all other sites and its own request is at the top of
request_queuei.

Toreleasethecritical section:
When a site Si exits the critical section, it rSemoves its own request from the top of its request
queue and sends a timestamped RELEASE message to all other sites. When a site Sj receives the
timestamped RELEASSE message from site S i, it removes the request of S ia from its request
queue.
Fig)S1andS2aremakingrequestsfortheCS

SiteS1enterstheCS

(1,l)

Fig)SiteS1entersthe CS

SiteS1enterstheCSSiteS1exitstheCS

Io,1) ( (
/
Sz

Fig)SiteS1existstheCSandsendsRELEASEmessages
Fig)SiteS2enterstheCS

Correctness
Theorem:Lamport’salgorithmachievesmutualexclusion.
Proof:Proofis bycontradiction.
Suppose two sites Si and Sj are executing the CS concurrently. For this to happen
conditions L1 and L2 must hold at both the sites concurrently.
Thisimpliesthatatsomeinstantintime,sayt,bothSiandSjhavetheirownrequests at the top
of their request queues and condition L1 holds at them. Without loss of generality,
assume that Si ’s request has smaller timestamp than the request of Sj .
From condition L1 and FIFO property of the communication channels, it is clear that
at instant t therequest of Si must bepresent in request queuejwhen Sjwas executing its
CS. This implies that Sj ’s own request is at the top of its own request queuewhen a
smaller timestamp request, Si ’s request, is present in the request queue j – a
contradiction!

Theorem:Lamport’salgorithmisfair.
Proof:Theproofis by contradiction.
SupposeasiteSi’srequest has asmallertimestamp thantherequest ofanothersiteS j and Sj
is able to execute the CS before Si .
For Sjto execute the CS, it has to satisfy the conditions L1 and L2. This impliesthatat
some instant in time say t, S j has its own request at the top of its queue and it has also
received a message with timestamp larger than the timestamp of its request from all
other sites.
But request queue at a site is ordered by timestamp, and according to our assumption
Si has lower timestamp. So Si ’s request must be placed ahead of the Sj’s request in
the request queuej . This is a contradiction!

MessageComplexity:
Lamport’s Algorithm requires invocation of 3(N – 1) messages per critical section execution.
These 3(N – 1) messages involves
 (N– 1)request messages
 (N–1)replymessages
 (N–1)releasemessages
DrawbacksofLamport’s Algorithm:
 Unreliable approach: failure of any one of the processes will halt the progress
ofentire system.
 Highmessagecomplexity:Algorithmrequires3(N-1)messagespercritical
sectioninvocation.

ToenterCriticalsection:
 When a site Si wants to enter the critical section, it send a timestamped
REQUESTmessage to all other sites.
 When a site Sj receives a REQUEST message from site S i, It sends a REPLY
messageto site Si if and only if Site Sj is neither requesting nor currently executing
the critical section.

 In case Site Sj is requesting, the timestamp of Site Si‘s request is smaller than its
ownrequest.
 Otherwisetherequestisdeferredbysite Sj.

Toexecutethecriticalsection:
Site Sienters the critical section if it has received the REPLY message from all other
sites.

Toreleasethecritical section:

UponexitingsiteSisends REPLYmessagetoallthedeferred requests.

Performance:
Synchronization delay is equal to maximum message transmission time. It requires 3(N –
1)messages per CS execution. Algorithm can be optimized to 2(N – 1) messages by
omitting the REPLY message in some situations.

RICART–AGRAWALAALGORITHM
 Ricart–Agrawala algorithm is an algorithm to for mutual exclusion in a
distributedsystem proposed by Glenn Ricart and Ashok Agrawala.
 ThisalgorithmisanextensionandoptimizationofLamport’s Distributed
MutualExclusion Algorithm.
 Itfollowspermissionbasedapproachtoensuremutualexclusion.
 Twotypeofmessages(REQUESTandREPLY)areusedandcommunicationchannels are
assumed to follow FIFO order.

 A site send a REQUEST message to all other site to get their permission to
entercritical section.
 AsitesendaREPLYmessagetoothersitetogiveitspermissiontoenterthe
criticalsection.
 AtimestampisgiventoeachcriticalsectionrequestusingLamport’slogicalclock.
 Timestampisusedtodeterminepriorityofcriticalsectionrequests.
 Smallertimestampgets highpriorityoverlarger timestamp.
 Theexecutionofcriticalsectionrequestisalwaysintheorderoftheirtimestamp.

Fig:Ricart–Agrawalaalgorithm

Fig)siteS1and S2eachmakearequestfortheCS
Fig)siteS1enterstheCS

Fig)SiteS1existstheCSandsendsareplymessagetoS2’sdeferred
request

Fig)SiteS2enterstheCS
Theorem:Ricart-Agrawalaalgorithmachievesmutualexclusion.
Proof:Proofis bycontradiction.
 Suppose two sites Si and Sj ‘ are executing the CS concurrently and S i ’s request has
higher priority than the request of S j . Clearly, Si received S j ’s request after it has
made its own request.
 Thus, Sj can concurrently execute the CS with S i only if Si returns a REPLY to S j (in
response to Sj ’s request) before Si exits the CS.
 However, this is impossible because S j ’s request has lower priority.Therefore, Ricart-
Agrawala algorithm achieves mutual exclusion.

MessageComplexity:
Ricart–Agrawalaalgorithmrequiresinvocationof2(N–1)messagespercriticalsection execution.
These 2(N – 1) messages involve:
 (N– 1)request messages
 (N–1)replymessages

DrawbacksofRicart–Agrawalaalgorithm:
 Unreliable approach: failure of any one of node in the system can halt the progress
of the system. In this situation, the process will starve forever. The problem of
failureof node can be solved by detecting failure after some timeout.

Performance:
SynchronizationdelayisequaltomaximummessagetransmissiontimeItrequires 2(N – 1)
messages per Critical section execution.

SUZUKI–KASAMI‘sBROADCASTALGORITHM
 Suzuki–Kasami algorithm is a token-based algorithm for achieving mutual
exclusionin distributed systems.
 This is modification of Ricart–Agrawala algorithm, a permission based (Non-
tokenbased) algorithm which uses REQUEST and REPLY messages to ensure
mutual exclusion.
 Intoken-basedalgorithms,Asiteisallowedtoenteritscriticalsectionifit possessesthe
unique token.
 Non-tokenbasedalgorithmsusestimestamptoorderrequestsforthecriticalsectionwhere
as sequence number is used in token based algorithms.
 Eachrequestsforcriticalsectioncontainsasequencenumber.Thissequencenumberis
used to distinguish old and current requests
ToenterCriticalsection:
 When a site Si wants to enter the critical section and it does not have the token then
itincrements its sequence number RN i[i] and sends a request message REQUEST(i,
sn)to all other sites in order to request the token.
 Heresnis updatevalue ofRNi[i]
 WhenasiteSjreceivestherequestmessageREQUEST(i,sn)fromsiteSi,it
setsRNj[i]tomaximumof RNj[i]and sni.eRNj[i] =max(RNj[i],sn).
AfterupdatingRNj[i],SiteSjsendsthetokentositeSiifithastokenandRNj[i]
=LN[i]+1

Fig: Suzuki–Kasami‘s broadcast

algorithmTo execute the critical section:


 SiteSiexecutesthecriticalsectionifithasacquiredthetoken.

Toreleasethecritical section:
AfterfinishingtheexecutionSiteSiexitsthecritical sectionanddoes following:
 setsLN[i]=RNi[i]toindicatethat itscriticalsectionrequestRNi[i]hasbeenexecuted
 For every site Sj, whose ID is not prsent in the token queue Q, it appends its ID to Q
ifRNj[j] = LN[j] + 1 to indicate that site Sjhas an outstanding request.
 Afteraboveupdation,iftheQueueQisnon-empty,itpopsasiteIDfromtheQ andsends the
token to site indicated by popped ID.
 IfthequeueQ isempty, itkeeps thetoken

Correctness
Mutual exclusion is guaranteed because there is only one token in the system and a site holds
the token during the CS execution.
Theorem:ArequestingsiteenterstheCSinfinitetime.
Proof:Token request messagesofasiteSireachother sitesin finitetime.
Sinceoneofthesesiteswillhavetokeninfinitetime,siteSi’srequestwillbeplacedinthe token queue in finite
time.
SincetherecanbeatmostN−1requestsinfrontofthisrequestinthetokenqueue,siteSi will get the token and
execute the CS in finite time.

MessageComplexity:
The algorithm requires 0 message invocation if the site already holds the idle token at the
time of critical section request or maximum of N message per critical section execution.
ThisN messages involves
 (N– 1)request messages
 1replymessage

DrawbacksofSuzuki–Kasami Algorithm:
 Non-symmetricAlgorithm:Asiteretainsthetokenevenifitdoesnothave requestedfor
critical section.

Performance:
Synchronization delay is 0 and no message is needed if the site holds the idle token at the
time of its request. In case site does not holds the idle token, the maximum
synchronizationdelay is equal to maximum message transmission time and a maximum of
N message is required per critical section invocation.

DEADLOCKDETECTIONINDISTRIBUTEDSYSTEMS
Deadlock can neither be prevented nor avoided in distributed system as the system is
so vast that it is impossible to do so. Therefore, only deadlock detection can
beimplemented.The techniques of deadlock detection in the distributed system require the
following:
 Progress:Themethodshouldbeabletodetectallthedeadlocksinthesystem.
 Safety:Themethod should notdetectfalse ofphantom deadlocks.

Therearethreeapproachesto detectdeadlocksindistributedsystems.
Centralizedapproach:
 Herethereis onlyoneresponsibleresourcetodetect deadlock.
 Theadvantageofthisapproachisthatitissimpleandeasytoimplement,whilethe
drawbacksincludeexcessiveworkloadatonenode,singlepointfailurewhichin turnsmakes
the system less reliable.

Distributedapproach:
 Inthe distributedapproachdifferentnodesworktogether todetectdeadlocks.
Nosingle point failure as workload is equally divided among all nodes.
 Thespeedofdeadlockdetectionalso increases.
Hierarchicalapproach:
 Thisapproachisthemostadvantageousapproach.
 Itisthecombinationofbothcentralizedanddistributedapproachesof
deadlockdetection in a distributed system.
 Inthisapproach,someselectednodesorclusterofnodesareresponsiblefor
deadlockdetection and these selected nodes are controlled by a single node.

SystemModel

 Adistributedprogramiscomposedofasetofnasynchronousprocessesp1,p2, ..
. , pi , . . . , pn that communicates by message passing over the communication
network.
 Without loss of generality we assume that each process is running on a different
processor.
 The processors do not share acommon globalmemory and communicate solely by
passing messages over the communication network.
 There is no physical global clock in the system to which processes have
instantaneous access.
 The communication medium may deliver messages out of order, messages
maybelost garbled or duplicated due to timeout and retransmission, processors
may fail and communication links may go down.
Wemakethefollowing assumptions:
 Thesystemshaveonlyreusable resources.
 Processesareallowed tomakeonlyexclusiveaccess to resources.
 Thereisonlyonecopyof eachresource.
 Aprocesscanbeintwo states:runningorblocked.
 Intherunningstate(alsocalledactivestate),aprocesshasalltheneeded resources and is
either executing or is ready for execution.
 Intheblockedstate,aprocessiswaitingtoacquiresomeresource.
Waitforgraph
Thisisusedfordeadlockdeduction.Agraphisdrawnbasedontherequestand
acquirement of the resource. If the graph created has a closed loop or a cycle, then there is
adeadlock.
Preliminaries

DeadlockHandlingStrategies
Handling of deadlock becomes highly complicated in distributed systems because
nosite has accurate knowledge of the current state of the system and because every inter-
site communication involves a finite and unpredictable delay. There are three strategies for
handling deadlocks:
 Deadlockprevention:
 Thisisachievedeitherbyhavingaprocessacquirealltheneeded resources
simultaneously before it begins executing or by preempting a process
whichholds the needed resource.
 Thisapproachishighlyinefficient andimpracticalindistributed systems.
 Deadlockavoidance:
 A resource is granted to a process if the resulting global system state is
safe.This is impractical in distributed systems.
 Deadlockdetection:
 This requires examination of the status of process-resource interactions
forpresence of cyclic wait.
 Deadlock detection in distributed systems seems to be the best approach
tohandle deadlocks in distributed systems.

IssuesindeadlockDetection
Deadlockhandlingfacestwomajorissues
1.Detectionofexistingdeadlocks
Resolutionofdetecteddeadlocks

Deadlock Detection

 Detection of deadlocks involves addressing two issues namely maintenance of


theWFG and searching of the WFG for the presence of cycles or knots.
 In distributed systems, a cycle or knot may involve several sites, the search
forcyclesgreatly depends upon how the WFG of the system is represented across the
system.
 Dependinguponthe way WFG informationismaintainedandthesearchforcyclesis
carried out, there are centralized, distributed, and hierarchical algorithms for
deadlockdetection in distributed systems.

Correctnesscriteria
Adeadlockdetectionalgorithm must satisfythefollowingtwo conditions:
1. Progress-Noundetected deadlocks:
The algorithm must detect all existing deadlocks in finite time. In other words,
afterallwait-fordependenciesforadeadlockhaveformed,thealgorithmshouldnotwaitforany
moreeventsto occurto detectthedeadlock.
2. Safety-Nofalse deadlocks:
Thealgorithmshould notreportdeadlocks which donot exist.Thisis alsocalled ascalled
phantomorfalse deadlocks

ResolutionofaDetectedDeadlock
 Deadlockresolutioninvolvesbreakingexistingwait-fordependenciesbetween
theprocesses to resolve the deadlock.
 Itinvolvesrollingbackoneormoredeadlockedprocessesandassigning
theirresources to blocked processes so that they can resume execution.
 Thedeadlockdetectionalgorithmspropagateinformationregardingwait-
fordependencies along the edges of the wait-for graph.
 When a wait-for dependency is broken, the corresponding information
shouldbeimmediately cleaned from the system.
 If this information is not cleaned in a timely manner, it may result in detection
ofphantom deadlocks.

MODELSOFDEADLOCKS
The models of deadlocks are explained based on their hierarchy.The diagrams illustrate the
working of the deadlock models. P a, Pb, Pc, Pdare passive processes that had already
acquired the resources. Peis active process that is requesting the resource.

SingleResource Model
 Aprocesscanhave atmostoneoutstandingrequestforonlyoneunit of aresource.
 The maximum out-degree of a node in a WFG for the single resource model can be
1,the presence of a cycle in the WFG shall indicate that there is a deadlock.

Fig:Deadlockinsingleresourcemodel

ANDModel
 In the AND model, a passive process becomes active (i.e., its activation conditionis

 fulfilled)onlyafteramessagefromeachprocessinitsdependentsethasarrived.
 In the AND model, a process can request more than one resource simultaneously
andtherequest is satisfied only after all the requested resources are granted to the
process.
 Therequestedresourcesmayexistatdifferentlocations.
 Theout degreeofanodeintheWFGforANDmodelcan bemorethan 1.
 Thepresenceofacycle in the WFGindicatesadeadlockin theANDmodel.
 EachnodeoftheWFG insuch amodel iscalled anANDnode.
 IntheANDmodel,ifacycleisdetectedintheWFG,itimpliesadeadlockbutnot viceversa.
That is, a process maynot be a part of a cycle, it can still be deadlocked.

Fig:DeadlockinANDmodel
ORModel

 Aprocesscanmakearequestfornumerousresourcessimultaneouslyandtherequestis
satisfied if any one of the requested resources is granted.
 PresenceofacycleintheWFGofanORmodeldoesnotimplya deadlockin the
OR model.
 IntheOR model,thepresenceofaknot indicates adeadlock.

DeadlockinORmodel:aprocessPiisblockedifithasapendingORrequesttobesatisfied.
 Witheveryblockedprocess,thereisanassociatedsetofprocessescalled
dependentset.
 Aprocessshallmovefromanidletoanactivestateonreceivingagrantmessagefrom any
of the processes in its dependent set.
 A process is permanently blocked if it never receives a grant message from any
oftheprocesses in its dependent set.
 AsetofprocessesS isdeadlockedifallthe processesin Sarepermanentlyblocked.
 Inshort,aprocessisdeadlockedorpermanentlyblocked,ifthefollowing conditionsare
met:
1. Eachoftheprocessisthe setS isblocked.
2. Thedependent set foreachprocess in Sisasubset ofS.
3. Nograntmessageisintransitbetween anytwoprocessesinsetS.
 AblockedprocessPisthesetSbecomesactiveonlyafterreceivingagrantmessagefrom a
process in its dependent set, which is a subset of S.
Fig:OR Model

Model(p outofqmodel)
 ThisisavariationofAND-OR model.

 Thisallowsarequesttoobtainanykavailableresourcesfromapoolofn resources.Both
the models are the same in expressive power.
 Thisfavoursmorecompact formationofarequest.
 Everyrequest inthismodelcanbeexpressedintheAND-ORmodelandvice-versa.

 Note that AND requests for p resources can be stated as and OR requests for
presources can be stated as

Fig:poutofqModel

Unrestrictedmodel
 Noassumptions aremaderegardingtheunderlyingstructureof resourcerequests.
 In this model, only one assumption that the deadlock is stable is made and hence it
isthe most general model.
 This model helps separate concerns: Concerns about properties of the problem
(stability and deadlock) are separated from underlying distributed systems
computations (e.g., message passing versus synchronous communication).
CHANDY–MISRA–HAASALGORITHMFORTHEAND MODEL
Thisisconsideredanedge-chasing,probe-basedalgorithm.
It is also considered one of the best deadlock detection algorithms for distributed
systems.
If a process makes a request for a resource which fails or times out, the process generates a
probe message and sends it to each of the processes holding one or more of its requested
resources.
This algorithm uses a special message called probe, which is a triplet (i, j,k), denoting that it
belongs to a deadlock detection initiated for process Pi and it is being sent by the homesiteof
process Pj to the home site of process Pk.
Eachprobemessagecontainsthefollowinginformation:
 theid oftheprocessthatisblocked (theonethat initiatestheprobemessage);
 theid oftheprocess issendingthisparticularversion oftheprobemessage;
 theidoftheprocessthatshould receivethisprobe message.
AprobemessagetravelsalongtheedgesoftheglobalWFGgraph,andadeadlockis
detectedwhenaprobemessagereturnstotheprocess thatinitiatedit.
A process Pj is said to be dependent on another process P k if there exists a sequence of
processes Pj, Pi1 , Pi2 , . . . , Pim, Pksuch that each process except Pkin the sequence is
blocked and each process, except the P j, holds a resource for which the previous process in
the sequence is waiting.
Process Pjis said to be locally dependent upon process Pkif Pjis dependent uponPk andboth
the processes are on the same site.
Whenaprocessreceivesaprobemessage,itcheckstoseeifitisalsowaitingfor resources
Ifnot,itiscurrentlyusingtheneededresourceandwilleventuallyfinishandreleasethe resource.
Ifitiswaitingforresources,itpassesontheprobemessagetoallprocessesitknows tobe holding
resources it has itself requested.
The process first modifies the probe message, changing the sender and
receiverids.Ifa process receives a probe message that it recognizes as having
initiated, it knows there is a cycle in the system and thus, deadlock.

Datastructures
EachprocessPimaintainsabooleanarray,dependenti,wheredependent(j)istrueonlyif Piknows
that Pj is dependent on it. Initially, dependen ti (j) is false for all i and j.

Fig:Chandy–Misra–HaasalgorithmfortheAND model

Performanceanalysis
Inthealgorithm, one probe messageis sentonevery edge of the WFGwhich connects
processes on two sites.
Thealgorithmexchangesatmostm(n−1)/2messagestodetectadeadlockthat involves m processes
and spans over n sites.
The size of messages is fixed and is very small (only three integer
words).The delay in detecting a deadlock is O(n).

Advantages:
Itiseasytoimplement.
Each probe message is of fixedlength.
There is very little computation.
Thereisverylittleoverhead.
There is no need to construct a graph, nor to pass graph information to other sites.
This algorithm does not find false (phantom) deadlock.
Thereis noneed forspecial datastructures.

CHANDYMISRAHAASALGORITHM FORTHE OR MODEL


Ablockedprocessdeterminesifitisdeadlockedbyinitiatingadiffusion computation.Two types of
messages are used in a diffusion computation:
 query(i,j,k)
 reply(i,j,k)

denotingthattheybelongtoadiffusioncomputationinitiatedbyaprocesspiandare beingsent
from process pjto process pk.
Ablockedprocessinitiatesdeadlockdetectionbysendingquerymessagestoall processes in
its dependent set.
Ifanactiveprocessreceivesaqueryorreplymessage,itdiscardsit.Whenablocked process Pk receives a
query(i, j, k) message, it takes the following actions:
1. If this is the first query message received by Pk for the deadlock detection
initiated by Pi, then it propagates the query to all the processes in itsdependent
set and sets a local variable numk (i) to the number of query messages sent.
2. If this is not the engaging query, then Pk returns a reply message to it
immediately provided Pk has been continuously blocked since it received
thecorresponding engaging query. Otherwise, it discards the query.
 Process Pk maintains a boolean variable waitk(i) that denotes the fact that it
has been continuously blocked since it received the last engaging query
fromprocess Pi.

WhenablockedprocessPkreceivesareply(i,j,k)message,itdecrementsnumk(i
) only if waitk(i)holds.

A process sends a reply message in response to an engaging query only after
ithas received a reply to every query message it has sent out for this engaging
query.

The initiator process detects a deadlock when it has received reply messages
toall the query messages it has sent out.

Fig:Chandy–Misra–HaasalgorithmfortheORmodel

Performanceanalysis
For every deadlock detection, the algorithm exchanges e query messages and e reply
messages, where e = n(n – 1) is the number of edges.

UNITIVCONSENSUSAND RECOVERY

Consensus and Agreement Algorithms: Problem Definition – Overview of Results – Agreement in a


Failure-Free System(Synchronous and Asynchronous) – Agreement in Synchronous Systems with
Failures; Checkpointing and Rollback Recovery: Introduction – Background and Definitions–Issues
inFailureRecovery–Checkpoint-basedRecovery–CoordinatedCheckpointingAlgorithm–
–AlgorithmforAsynchronousCheckpointing andRecovery

Problem definition
Agreement among the processes in a distributed system is a fundamental requirement for a
wide range of applications. Many forms of coordination require the processes to exchange
information to negotiate with one another and eventually reach a common understanding or
agreement, before taking application-specific actions. A classical example is that of the
commit decision in database systems, wherein the processes collectively decide whether to
commit or abort a transaction that they participate in.
Wefirststatesomeassumptionsunderlyingour studyofagreement algorithms:
• Failure models Among the n processes in the system, at most f processes can be faulty. A
faulty process can behave in any manner allowed by the failure model assumed. The various
failure models – fail-stop, send omission and receive omission, and Byzantine failures.
• Synchronous/asynchronous communication If a failure-prone process chooses to send a
messagetoprocessPibutfails,thenPicannotdetectthenon-arrivalofthemessageinan
asynchronous system. In a synchronous system, however, the scenario in which a messagehas
not been sent can be recognized by the intended recipient, at the end of the round.
• Network connectivity The system has full logical connectivity, i.e., each process can
communicate with any other by direct message passing.
• Sender identification A process that receives a message always knows the identity of the
sender process.
• ChannelreliabilityThechannelsarereliable,andonlytheprocessesmayfail(underoneof various
failure models).
• Authenticated vs. non-authenticated messages With unauthenticated messages, when a
faulty process relays a message to other processes, (i) it can forge the message and claim that it
was received from another process, and (ii) it can also tamper with the contents of a received
message before relaying it. When a process receives a message, it has no way to verify its
authenticity. An unauthenticated message is also called an oral message or an unsigned
message. Using authentication via techniques such as digital signatures, it is easier to solve the
agreement problem because, if some process forges a message or tampers withthe contents of a
received message before relaying it, the recipient can detect the forgery or tampering. Thus,
faulty processes can inflict less damage.
• Agreement variable The agreement variable may be boolean or multivalued, and need not
be an integer.

TheByzantineagreement
The Byzantine agreement problem requires a designated process, called the source
process,with an initial value
Problem definition agreement with the other processes about its initial value, subject
tothefollowing conditions:
• AgreementAllnon-faultyprocessesmustagree onthesamevalue.
• ValidityIfthesourceprocessisnon-faulty,then theagreeduponvaluebyallthenon-faulty
processes must be the same as the initial value of the source.
• Termination Each non-faulty process must eventually decide on a value. The validity
condition rules out trivial solutions, such as one in which the agreed upon value is a constant.
The consensus problem
The consensus problem differs from the Byzantine agreement problem in that each process
has an initial value and all the correct processes must agree on a single value
• AgreementAllnon-faultyprocessesmustagreeonthesame(single)value.
• ValidityIfallthenon-faultyprocesseshavethesameinitialvalue,thentheagreedupon value by all
the non-faulty processes must be that same value.
• TerminationEachnon-faultyprocessmusteventuallydecideonavalue.
Theinteractiveconsistencyproblem
The interactive consistency problem differs from the Byzantine agreement problem in that
each process has an initial value, and all the correct processes must agree upon a set ofvalues,
with one value for each process
• AgreementAllnon-faultyprocessesmustagreeonthesamearrayofvalues A[v1…vn]
• Validity If process i is non-faulty and its initial value is vi, then all nonfaulty processes
agree on vi as the ith element of the array A. If process j is faulty, then the non-faulty processes
can agree on any value for A[j].
• TerminationEachnon-faultyprocessmusteventuallydecideonthearrayA
Overview of results:
Failure Synchronoussystem Asynchronoussystem
mode (message-passing and (message-passing and
sharedmemory) sharedmemory)
NoFailure agreementattainable; agreementattainable;

commonknowledgeattainable concurrent common knowledge


CrashFailure agreementattainable agreementnotattainable

f<nprocesses
Byzantine agreementattainable agreementnotattainable
Failure f≤[(n-1)/3]Byzantineprocesses

AGREEMENT IN A FAILURE-FREE SYSTEM (SYNCHRONOUS OR


ASYNCHRONOUS)
In a failure-free system, consensus can be reached by collecting information from thedifferent
processes, arriving at a “decision,” and distributing this decision in the system.
A distributed mechanism would have each process broadcast its values to others, and each
process computes the same function on the values received.
Thedecisioncanbereachedbyusinganapplicationspecificfunction–somesimpleexamples being
the majority, max, and min functions. Algorithms to collect the initial values and then
distribute the decision may be based on the token circulation on a logical ring, or the three-
phase
Consensus andagreement algorithms tree-basedbroadcast–convergecast–broadcast, ordirect
communication with all nodes.
AGREEMENT IN (MESSAGE-PASSING) SYNCHRONOUS SYSTEMS WITH
FAILURES
CONSENSUSALGORITHMFORCRASHFAILURES(SYNCHRONOUSSYSTEM)
• The consensus algorithm for n processes where up to f processes where f < n may fail in a
fail stop failure model.
• Here the consensus variable x is integer value; each process has initial value xi. If up to f
failures are to be tolerated than algorithm has f+1 rounds, in each round a process i sense the
value of its variable xi to all other processes if that value has not been sent before.
• So, ofall thevalues receivedwithin that round and its ownvaluexi at that start ofthe round the
process takes minimum and updates xi occur f + 1 rounds the local value xi guaranteed to be
the consensus value.
• If one process is faulty, among three processes then f = 1. So the agreement requires f + 1
that is equal to two rounds.
• If it is faulty let us sayit will send 0 to 1 process and 1 to another process i, j and k. Now,on
receiving one on receiving 0 it will broadcast 0 over here and this particular process on
receiving 1 it will broadcast 1 over here.
• So, this will complete one round in this one round and this particular process on receiving 1
it will send 1 over here and this on the receiving 0 it will send 0 over here.

• Theagreementconditionissatisfiedbecauseinthef+1rounds,theremustbeatleastoneroundin which no process


failed.
• Inthisround,sayroundr,alltheprocessesthathavenotfailedsofarsucceedinbroadcastingtheir
values,andalltheseprocessestaketheminimumofthevaluesbroadcast andreceivedinthatround.
• Thus,thelocalvaluesattheend oftheroundarethesame,say x ri forallnon-failed processes.
• Infurtherrounds,onlythisvaluemaybesentbyeachprocessatmostonce,andnoprocessiwillupdate its value x
ri.
• Thevalidityconditionissatisfiedbecauseprocessesdonotsendfictitiousvaluesinthisfailuremodel.
• Foralli,iftheinitialvalueisidentical,thentheonlyvaluesentbyanyprocessisthevaluethathasbeen agreed upon
as per the agreement condition.
• Theterminationconditionisseentobe satisfied.
Complexity: The complexity of this particular algorithm is it requires f + 1 rounds where f < n and the
number of messages is O(n2)in each round and each message has one integers hence the total number
ofmessagesisO((f+1)·n 2)isthetotalnumberofroundsandineachroundn2messagesarerequired.

ConsensusalgorithmsforByzantinefailures(synchronoussystem)
STEPS FOR BYZANTINE GENERALS (ITERATIVEFORMULATION),
SYNCHRONOUS, MESSAGE-PASSING:
STEPS FOR BYZANTINE GENERALS (RECURSIVE FORMULATION),
SYNCHRONOUS, MESSAGE-PASSING:

CODEFORTHEPHASEKING ALGORITHM:

Each phase has a unique"phase king" derived, say,


fromPID.Each phase has two rounds:
 1in1stround,eachprocesssendsitsestimatetoallotherprocesses.

 2in2ndround,the"Phaseking"processarrivesatanestimatebasedonthe valuesit received in


1st round, and broadcasts its new estimate to all others.
Fig.Messagepatternforthephase-kingalgorithm.

PHASEKING ALGORITHM CODE:

(f+1)phases,(f+1)[(n-1)(n+1)]messages,andcantolerateuptof<dn=4e malicious processes

CorrectnessArgument

 1Amongf+1phases,atleastonephasekwherephase-kingisnon-malicious.

 2Inphasek,allnon-maliciousprocessesPiandPjwillhavesame
estimateofconsensusvalueasPkdoes.

 PiandPjusetheirownmajorityvalues.Pi'smult>n=2+f)

 Piuses its majority value; Pjuses phase-king's tie-breaker value. (Pi’s mult >
n=2 +f , Pj 's mult > n=2 for same value)

 Pi and Pj use the phase-king's tie-breaker value.(InthephaseinwhichPk isnon-


malicious, it sends same value to Pi and Pj )

Inall3cases,arguethatPiandPjendupwithsamevalueasestimate

If all non-malicious processes have the value x at the start of a phase, they

willcontinue to have x as the consensus value at the end of the phase.
Checkpointingandrollbackrecovery: Introduction
 Rollbackrecoveryprotocols restorethesystembacktoaconsistentstateaftera failure,
 It achieves fault tolerance by periodically saving the state of a process during the failure-
free execution
 Ittreatsadistributedsystemapplicationasa collectionofprocessesthatcommunicate over a
network
Checkpoints
The saved state is called a checkpoint, and the procedure of restarting from a previously check
pointed state is called rollback recovery. A checkpoint can be saved on either the stable storageor
the volatile storage
Why is rollback recovery of distributed systems complicated?
Messagesinduceinter-processdependenciesduringfailure-freeoperation
Rollback propagation
Thedependenciesamong messages mayforcesomeoftheprocesses that didnot failtorollback. This
phenomenon of cascaded rollback is called the domino effect.
Uncoordinatedcheckpointing
If each process takes its checkpoints independently, then the system cannot avoid the domino
effect – this scheme is called independent or uncoordinated check pointing
Techniquesthatavoiddominoeffect
1. Coordinated check pointing rollback recovery - Processes coordinate their checkpoints to
form a system-wide consistent state
2. Communication-induced check pointing rollback recovery - Forces each process to take
checkpoints based on information piggybacked on the application.
3. Log-basedrollbackrecovery-Combinescheckpointingwithloggingofnon- deterministic
events
• reliesonpiecewisedeterministic(PWD)assumption.
Backgroundanddefinitions System
model
 Adistributedsystemconsistsofafixednumberofprocesses,P1,P2,…_PN,which communicate
only through messages.
 Processes cooperate to execute a distributed application and interact with the outside
world by receiving and sending input and output messages, respectively.
 Rollback-recovery protocols generally make assumptions aboutthe reliability oftheinter-
process communication.
 Some protocols assume that the communication uses first-in-first-out (FIFO) order, while
other protocols assume that the communication subsystem can lose, duplicate, or reorder
messages.
 Rollback-recovery protocols therefore must maintain information about the internal
interactions among processes and also the external interactions with the outside world.

Anexampleofadistributedsystemwiththreeprocesses.

Alocal checkpoint
 Allprocessessavetheirlocalstatesatcertain instantsof time
 Alocalcheck point isasnapshotofthestateoftheprocessat agiveninstance
 Assumption
– Aprocess storesall localcheckpoints onthestable storage
– Aprocessisabletorollback toanyofits existinglocal checkpoints

 𝐶𝑖,𝑘–Thekthlocal checkpoint atprocess𝑃𝑖


 𝐶𝑖,0–Aprocess 𝑃𝑖takesacheckpoint 𝐶𝑖,0before it starts execution
Consistentstates
 Aglobalstateofadistributedsystemisacollectionoftheindividualstatesofallparticipating
processes and the states of the communication channels
 Consistentglobal state
– a global state that may occur during a failure-free execution of distribution
ofdistributed computation
– ifaprocess‟sstatereflectsamessagereceipt,thenthestateofthecorresponding
sender must reflect the sending of the message
 Aglobalcheckpointisasetoflocalcheckpoints, onefromeach process
 A consistent global checkpoint is a global checkpoint such that no message is sent by a
process after taking its local point that is received by another process before taking its
checkpoint.
 Forinstance,Figureshowstwoexamplesofglobalstates.
 Thestatein fig(a) is consistent and the statein Figure(b)is inconsistent.
 Note that the consistentstate in Figure (a) shows message m1 to have been sent but notyet
received, but that is alright.
 ThestateinFigure(a)isconsistentbecauseitrepresentsasituationinwhichevery message that
has been received, there is a corresponding message send event.
 The state in Figure (b) is inconsistent because process P2 is shown to have received
m2but the state of process P1 does not reflect having sent it.
 Suchastateisimpossibleinanyfailure-free,correctcomputation.Inconsistentstates occur
because of failures.
Interactionswithoutside world
A distributed system often interacts with the outside world to receive input data or deliver the
outcome of a computation. If a failure occurs, the outside world cannot be expected to roll back.
For example, a printer cannot roll back the effects of printing a character
OutsideWorld Process(OWP)
 Itisaspecialprocess thatinteracts withtherest ofthesystemthroughmessagepassing.
 It is therefore necessary that the outside world see a consistent behavior of the system
despite failures.
 Thus, before sending output to the OWP, the system must ensure thatthe state from which
the output is sent will be recovered despite any future failure.
A common approach is to save each input message on the stable storage before allowing the
application program to process it.
An interaction with the outside world to deliver the outcome of a computation is shown on the
process-line by the symbol “||”.
DifferenttypesofMessages

1. In-transitmessage
 messagesthathavebeensentbutnotyet received
2. Lost messages
 messageswhose“send‟ isdonebut“receive‟isundonedueto rollback
3. Delayedmessages
messages whose “receive‟ is not recorded because the receiving process was
either down or the message arrived after rollback
4. Orphanmessages
 messageswith“receive‟recordedbutmessage“send‟not recorded
 donot ariseifprocessesrollbacktoaconsistentglobal state
5. Duplicatemessages
 ariseduetomessageloggingandreplayingduringprocess recovery

In-transitmessages
In Figure , the global state {C1,8 , C2, 9 , C3,8, C4,8} shows that message m1 has been sent but
not yet received. We call such a message an in-transit message. Message m2 is also an in-transit
message.
Delayedmessages
Messages whose receive is not recorded because the receiving process was either down or the
message arrived after the rollback of the receiving process, are called delayed messages. For
example, messages m2 and m5 in Figure are delayed messages.

Lostmessages
Messages whose send is not undone but receive is undone due to rollback are called lost
messages.Thistypeofmessagesoccurswhentheprocessrollsbacktoacheckpointpriorto
reception of the message while the sender does not rollback beyond the send operation of the
message. In Figure , message m1 is a lost message.
Duplicatemessages
 Duplicate messages arise due to message logging and replaying during process
recovery. For example, in Figure, message m4 was sent and received before the
rollback. However, due to the rollback of process P4 to C4,8 and process P3 to C3,8,
both send and receipt of message m4 are undone.
 WhenprocessP3restartsfromC3,8,itwillresendmessage m4.
 Therefore,P4should notreplaymessagem4fromits log.
 IfP4replaysmessagem4, thenmessagem4 iscalled aduplicate message.
Issuesinfailurerecovery
In a failure recovery, we must not onlyrestore the system to a consistent state, but also
appropriatelyhandlemessagesthatare leftinanabnormalstateduetothefailureandrecovery

• The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network. The processes communicate solely by exchanging messages over fault
free, FIFO communication channels.
• Processes Pi, Pj , and Pk, have taken checkpoints {Ci,0, Ci,1}, {Cj,0, Cj,1, Cj,2}, and {Ck,0,
Ck,1}, respectively, and these processes have exchanged messages A to J
Suppose process Pi fails at the instance indicated in the figure. All the contents of the volatile
memory of Pi are lost and, after Pi has recovered from the failure, the system needs to berestored
to a consistent global state from where the processes can resume their execution.
• Process Pi’s state is restored to a valid state by rolling it back to its most recent checkpointCi,1.
To restore the system to a consistent state, the process Pj rolls back to checkpoint Cj,1 because
the rollback of process Pi to checkpoint Ci,1 created an orphan message H (the receive event of
H is recorded at process Pj while the send event of H has been undone at process Pi).
• Pj does not roll back to checkpoint Cj,2 but to checkpoint Cj,1. An orphan message I is created
due to the roll back of process Pj to checkpoint Cj,1. To eliminate this orphan message, process
Pk rolls back to checkpoint Ck,1.
• Messages C, D, E, and F are potentially problematic. Message C is in transit during the failure
and it is a delayed message. The delayed message C has several possibilities: C might arrive at
processPibeforeitrecovers,itmightarrivewhilePiisrecovering,oritmightarriveafterPihas
completedrecovery.Eachofthesecasesmustbe dealtwith correctly.
• Message D is a lost message since the send event for D is recorded in the restored state for
process Pj , but the receive event has been undone at process Pi. Process Pj will not resend D
without an additional mechanism.
• Messages E and F are delayed orphan messages and pose perhaps the most serious problem of
all the messages. When messages E and F arrive at their respective destinations, they must be
discarded since their send events have been undone. Processes, after resuming execution from
their checkpoints, will generate both of these messages.
• Lost messages like D can be handled by having processes keep a message log of all the sent
messages. So when a process restores to a checkpoint, it replays the messages from its log to
handle the lost message problem.
• Overlapping failures further complicate the recovery process. If overlapping failures are to be
tolerated, a mechanism must be introduced to deal with amnesia and the resultinginconsistencies.
Checkpoint-basedrecovery
Checkpoint-basedrollback-recoverytechniquescanbeclassifiedintothree categories:
1. Uncoordinatedcheckpointing
2. Coordinatedcheckpointing
3. Communication-inducedcheckpointing

1. UncoordinatedCheckpointing
 Eachprocesshasautonomyindecidingwhentotake checkpoints
 Advantages
Thelowerruntimeoverheadduringnormalexecution
 Disadvantages
1. Dominoeffectduringa recovery
2. Recoveryfromafailureisslowbecauseprocesses needtoiteratetofinda
consistent set of checkpoints
3. Each process maintains multiple checkpoints and periodically invoke a
garbage collection algorithm
4. Notsuitableforapplicationwith frequentoutputcommits
 The processes record the dependencies among their checkpoints caused by
messageexchange during failure-free operation

 Thefollowingdirectdependencytrackingtechniqueiscommonlyusedinuncoordinated
checkpointing.
Directdependencytracking technique
 Assumeeachprocess𝑃𝑖startsits executionwithaninitial checkpoint𝐶𝑖,0
 𝐼𝑖,𝑥:checkpoint interval,intervalbetween𝐶𝑖,𝑥−1 and𝐶𝑖,𝑥
When 𝑃𝑗 receives a message m during 𝐼𝑗,𝑦 , it records the dependency from 𝐼𝑖,𝑥 to
𝐼𝑗,𝑦, which is later saved onto stable storage when 𝑃𝑗 takes 𝐶𝑗,𝑦

 When a failure occurs, the recovering process initiates rollback by broadcasting a


dependency request message to collect all the dependency informationmaintained by each
process.
 When a process receives this message, it stops its execution and replies with the
dependency information saved on the stable storage as well as with the dependency
information, if any, which is associated with its current state.
 The initiator then calculates the recovery line based on the globaldependency information
and broadcasts a rollback request message containing the recovery line.
 Upon receiving this message, a process whose current state belongs to the recovery line
simplyresumesexecution;otherwise,itrollsbacktoanearliercheckpointasindicatedby the
recovery line.
2. CoordinatedCheckpointing
In coordinated checkpointing, processes orchestrate their checkpointing activitiesso that all local
checkpoints form a consistent global state
Types
1. BlockingCheckpointing:Afteraprocesstakesalocalcheckpoint,topreventorphan messages,
it remains blocked until the entire checkpointing activity is complete Disadvantages: The
computation is blocked during the checkpointing
2. Non-blocking Checkpointing: The processes need not stop their execution while taking
checkpoints.Afundamentalproblemincoordinatedcheckpointingistopreventaprocess from
receiving application messages that could make the checkpoint inconsistent.
Example(a):Checkpointinconsistency
 Message m is sent by 𝑃0 after receiving a checkpoint request from the checkpoint
coordinator
 Assumemreaches𝑃1beforethe checkpointrequest
 This situation results in an inconsistent checkpoint since checkpoint 𝐶1,𝑥 shows the

𝑃0
receipt of message mfrom𝑃0, while checkpoint𝐶0,𝑥does not showm being sent from

Example(b):A solutionwithFIFOchannels
 If channels are FIFO, this problem can be avoided by preceding the first post-checkpoint
message on each channel by a checkpoint request, forcing each process to take a
checkpoint before receiving the first post-checkpoint message

Impossibilityofmin-processnon-blockingcheckpointing
 A min-process, non-blocking checkpointing algorithm is one that forces only a minimum
number of processes to take a new checkpoint, and at the same time it does not force any
process to suspend its computation.

Algorithm
 The algorithm consists of two phases. During the first phase, the checkpoint initiator
identifies all processes with which it has communicated since the last checkpoint and
sends them a request.
 Upon receiving the request, each process in turn identifies all processes it has
communicated with since the last checkpoint and sends them a request, and so on, untilno
more processes can be identified.
 During the second phase, all processes identified in the first phase take a checkpoint. The
result is a consistent checkpoint that involves only the participating processes.
 In this protocol, after a process takes a checkpoint, it cannot send any message until the
second phase terminates successfully, although receiving a message after the checkpoint
has been taken is allowable.
3. Communication-inducedCheckpointing
Communication-induced checkpointing is another way to avoid the domino effect,while allowing
processes to take some of their checkpoints independently. Processes may be forced to take
additional checkpoints
Twotypesofcheckpoints
1. Autonomouscheckpoints
2. Forcedcheckpoints
Thecheckpoints that aprocess takes independentlyare called local checkpoints, while thosethat a
process is forced to take are called forced checkpoints.
 Communication-inducedcheckpointingpiggybacksprotocol-relatedinformationon
eachapplication message
 The receiver of each application message uses the piggybacked information to determine
if it has to take a forced checkpoint to advance the global recovery line
 The forced checkpoint must be taken before the application may process the contents of
the message
 In contrast with coordinated check pointing, no special coordination messages are
exchanged
Twotypesofcommunication-inducedcheckpointing
1. Model-basedcheckpointing
2. Index-basedcheckpointing.
Model-basedcheckpointing
 Model-based checkpointing prevents patterns of communications and checkpointsthat
could result in inconsistent states among the existing checkpoints.
 No control messages are exchanged among the processes during normaloperation. All
information necessary to execute the protocol is piggybacked on application messages
 Thereareseveraldomino-effect-freecheckpointandcommunicationmodel.
 The MRS (mark, send, and receive) model of Russell avoids the domino effect by
ensuring that within every checkpoint interval allmessage receiving events precede all
message-sending events.
Index-basedcheckpointing.
 Index-based communication-induced checkpointing assigns monotonically increasing
indexes to checkpoints, such that the checkpoints having the same index at different
processes form a consistent state.

KOOANDTOUEGCOORDINATEDCHECKPOINTINGAND RECOVERY
TECHNIQUE:
 Koo and Toueg coordinated check pointing and recovery technique takes a consistent set
of checkpoints and avoids the domino effect and livelock problems during the recovery.
• Includes2parts: thecheckpointingalgorithm andtherecoveryalgorithm

A. TheCheckpointingAlgorithm
Thecheckpoint algorithm makesthefollowingassumptions aboutthedistributed system:
 Processescommunicate byexchangingmessagesthroughcommunication channels.
 CommunicationchannelsareFIFO.
 Assumethatend-to-endprotocols(theslidingwindowprotocol)existtohandlewith message
loss due to rollback recovery and communication failure.
 Communicationfailuresdonotdividethe network.
The checkpoint algorithm takes two kinds of checkpoints on the stable storage: Permanent
andTentative.
Apermanentcheckpointisalocalcheckpointataprocessandisapartofaconsistentglobal checkpoint.
Atentativecheckpointisatemporarycheckpointthatismadeapermanentcheckpointonthe successful
termination of the checkpoint algorithm.
Thealgorithm consists oftwo phases.
First Phase
1. An initiating process Pi takes a tentative checkpoint and requests all other processes to
take tentative checkpoints. Each process informs Pi whether it succeeded in taking a
tentative checkpoint.
2. Aprocesssays“no”toarequestifit failstotakeatentativecheckpoint
3. IfPilearnsthatalltheprocesseshavesuccessfullytakententativecheckpoints,Pidecides
thatalltentativecheckpointsshouldbemade permanent;otherwise,Pidecidesthatallthe
tentative checkpoints should be thrown-away.
SecondPhase
1. Piinformsalltheprocessesof thedecisionitreached at theendofthefirstphase.
2. Aprocess,on receivingthemessagefromPiwillact accordingly.
3. Eitherallornoneoftheprocessesadvancethecheckpointbytakingpermanent checkpoints.
4. Thealgorithmrequiresthatafteraprocesshastakenatentativecheckpoint,itcannot send
messages related to the basic computation until it is informed of Pi’s decision.
Correctness:fortworeasons
i. Eitherall ornoneoftheprocessestakepermanentcheckpoint
ii. Noprocesssendsmessageaftertakingpermanentcheckpoint
AnOptimization
The above protocol may cause a process to take a checkpoint even whenit is not necessary
forconsistency. Since taking a checkpoint is an expensive operation, we avoid taking
checkpoints.

B. TheRollbackRecoveryAlgorithm
The rollback recovery algorithm restores the system state to a consistent state after a failure. The
rollback recovery algorithm assumes that a single process invokes the algorithm. It assumes that
the checkpoint and the rollback recovery algorithms are not invoked concurrently. The rollback
recovery algorithm has two phases.
First Phase
1. AninitiatingprocessPisendsamessagetoallotherprocessestocheckiftheyallare willing to
restart from their previous checkpoints.
2. A process may reply “no” to a restart request due to any reason (e.g., it is already
participatinginacheckpointingorarecoveryprocessinitiatedbysomeotherprocess).
3. IfPilearnsthatallprocessesarewillingtorestartfromtheirpreviouscheckpoints,Pi decides that
all processes should roll back to their previous checkpoints. Otherwise,
4. Piabortstherollbackattemptanditmayattemptarecoveryatalatertime.
SecondPhase
1. Pipropagatesitsdecisiontoalltheprocesses.
2. OnreceivingPi’sdecision,aprocessactsaccordingly.
3. During the execution of the recovery algorithm, a process cannot send messages relatedto
the underlying computation while it is waiting for Pi’s decision.
Correctness:Resumefrom aconsistent state
Optimization:Maynot to recoverall,sincesomeoftheprocesses didnotchangeanything

Optimization:Maynottorecoverall,sincesomeofthe processesdidnotchange
anything

Theaboveprotocol, in the event offailureofprocess X, the above protocol will require processes
X, Y, and Z to restart from checkpoints x2, y2, and z2, respectively.
ProcessZneednotrollbackbecausetherehasbeennointeractionbetweenprocessZandthe other two
processes since the last checkpoint at Z.

ALGORITHMFORASYNCHRONOUSCHECKPOINTING ANDRECOVERY:
The algorithm of Juang and Venkatesan for recovery in a system that uses asynchronous check
pointing.
A.SystemModeland Assumptions
Thealgorithmmakesthe followingassumptionsabout theunderlyingsystem:
 The communication channels are reliable, deliver the messages in FIFO order and have
infinite buffers.
 Themessagetransmissiondelayisarbitrary,but finite.
 Underlying computation/application is event-driven: process P is at state s, receives
message m, processes themessage, moves to state s’ and send messages out. So the
triplet (s, m, msgs_sent) represents the state of P
Twotype oflogstoragearemaintained:
– Volatile log: short time to accessbutlost ifprocessor crash.Move tostable log
periodically.
– Stablelog: longertimetoaccessbut remainedif crashed
A. AsynchronousCheck pointing
–Afterexecutingan event, thetriplet is recorded without anysynchronizationwith
other processes.
– Localcheckpointconsistofsetofrecords,firstarestoredinvolatilelog,then moved to
stable log.
B. The Recovery Algorithm
Notationsanddatastructure
Thefollowingnotations and data structureareused bythealgorithm:
• RCVDi←j(CkPti) represents the number of messages received by processor pi from processor
pj , from the beginning of the computation till the checkpoint CkPti.

• SENTi→j(CkPti)representsthenumberofmessagessentbyprocessorpitoprocessorpj,from the
beginning of the computation till the checkpoint CkPti.
Basicidea
 Since the algorithm is based on asynchronous check pointing, the main issue in the
recovery is to find a consistent set of checkpoints to which the system can be restored.
 The recovery algorithm achieves this by making each processor keep track of both the
number of messages it has sent to other processors as well as the number of messages it
has received from other processors.
 Whenever a processor rolls back, it is necessary for all other processors to find out if any
message has become an orphan message. Orphan messages are discovered by comparing
the number of messages sent to and received from neighboring processors.
For example, if RCVDi←j(CkPti) > SENTj→i(CkPtj) (that is, the number of messages received
by processor pi from processor pj is greater than the number of messages sent by processor pj to
processor pi, according to the current states the processors), then one or more messages at
processor pj are orphan messages.

TheAlgorithm
Whenaprocessorrestarts afterafailure,itbroadcasts aROLLBACKmessagethatithad failed
ProcedureRollBack_Recovery
processorpiexecutesthefollowing:
STEP (a)
if processor pi is recovering after a failure then
CkPti:=latesteventloggedinthestablestorage else
CkPti:=latesteventthattookplaceinpi{Thelatesteventatpicanbeeitherinstableorin
volatilestorage.}
end if
STEP (b)
fork=11toN{Nisthenumberofprocessorsinthesystem}do for each
neighboring processor pj do
computeSENTi→j(CkPti)

sendaROLLBACK(i,SENTi→j(CkPti))messagetopj
end for
foreveryROLLBACK(j,c) messagereceived from aneighborjdo
ifRCVDi←j(CkPti)>c{Impliesthepresenceoforphanmessages}then
findthelatesteventesuchthatRCVDi←j(e)=c{Suchaneventemaybeinthevolatilestorage or stable
storage.}
CkPti:=e
endif end
for
endfor{fork}
D.AnExample
Consider an example shown in Figure 2 consisting of three processors. Suppose processor Y
failsandrestarts. Ifeventey2isthelatestcheckpointedeventatY,thenYwillrestartfromthe state
corresponding to ey2.

Figure2:AnexampleofJuan-Venkatesanalgorithm.
 BecauseofthebroadcastnatureofROLLBACKmessages,therecoveryalgorithmis initiated at
processors X and Z.
 Initially, X, Y, and Z set CkPtX ← ex3, CkPtY ← ey2 and CkPtZ ← ez2,respectively,
and X, Y, and Z send the following messages during the first iteration:
 YsendsROLLBACK(Y,2)to XandROLLBACK(Y,1)to Z;

 XsendsROLLBACK(X,2)to YandROLLBACK(X,0)to Z;
 ZsendsROLLBACK(Z,0)toXandROLLBACK(Z,1)toY.
Since RCVDX←Y (CkPtX) = 3 > 2 (2 is the value received in the ROLLBACK(Y,2) message
from Y), X will set CkPtX to ex2 satisfying RCVDX←Y (ex2) = 1≤ 2.
Since RCVDZ←Y (CkPtZ) = 2 > 1, Z will set CkPtZ to ez1 satisfying RCVDZ←Y (ez1) = 1 ≤1.
AtY,RCVDY←X(CkPtY)=1<2andRCVDY←Z(CkPtY )=1=SENTZ←Y (CkPtZ).
Yneednotrollback further.
Intheseconditeration,YsendsROLLBACK(Y,2)toXandROLLBACK(Y,1)toZ;

Z sends ROLLBACK(Z,1) to Y and ROLLBACK(Z,0) to X;


XsendsROLLBACK(X,0)toZandROLLBACK(X,1)toY.
If Y rolls back beyond ey3 and loses the message from X that caused ey3, X can resend this
message to Y because ex2 is logged at X and this message available in the log. The second and
third iteration will progress in the same manner. The set of recovery points chosen at the end of
the first iteration, {ex2, ey2, ez1}, is consistent, and no further rollback occurs.
{\ '
C u.d.C Ml ,l• <l
,o{IAc\,:.," tI,., "1_. I._,,_ • cW-
Clo"'J.ck)lo1U' I c\. - \-iu.J.,t.o?ww t
1
A_...10•v'
c\cu,\oM J. kJl 1
069\ l J l
bi,lo.,; tG ,lidQ1,
-l(l,,oS,l f\,v,hc_.b:d' R,p1:L
\J..i,J;,w ckvl..\o,-N.Lt ().;\, pt ,, lo
/,tJvJ.t.'zc ,&h, v. -11.MJ,U.u Appl ,St.MJ

()J+Y)_"I-t,kJ,"'f •
JO()J CJ}fV\F lb l\, \'Yl,()M 1'°""
v1tcyJb ' clm,"V'\,.: ci.J;' 0-1'1-. a..J.1/lJ.uro..rk
()..,1.,ho.Ml pool9,-um1/ru,,k ¾
ti,m,p

G-U.11.C,U l :!\.J;;:.,01 OYWQ.M1 ,hk t\.1

llf'Pl ;..uw:i..w/ {'.ft.ti, bo., J\Of.rJ\t


yo'I (V\(}v .1\tsl IDiJ-lY'llu\iJ
.,J; b7', ,MMIJb'.- pn'/uiutt:nk
C :
eY\, d.Q.X{\, Jr J:,QJvJJJ..L

scJ r\QJM) o.k(V.tJUJ,


Ru\WJ\,L,Qp..t,JO½
Rc,_f J,(IJ,
f"Y M.. \.

Rp_bl U:>ih

IJvlsoWi.u1.J.

Ro1it
NJl-tu\1 .0 -rhL

muUipk i.uw .krr'\.PvKIL tull-- ttt±k... °"'M_

gha-rt.A, QUJ\,{M.

wJ. ½1tY\\Ll\,lc M11tkL•·


'. - u.J,•'
lh-n.. tW t,&tlJ\V MO- DJJdo1k}
y,ubk t>Ji 0\. 1 frotlllf9t-
--rk_ J\IU M1l.. V.. J;1W
\).).,VII:,. 311.J\N.AW O-M..- p-mv"Ji.J b"Q()_, p
e,, 'rUv,_ -l- u bul
k \J.liYf¼huJ,Al\JL_ 'f dv.Jwpa.J
ttu½ Pi?pk.
rQ.,
"'.lD'l-, TM£.k
.lva .w,o_ o_ + Dr

Q,,«,,t i.bwhl ti)\, Aw 0 !,',

V INUI
\
\ \ '\ 1_.1.A l, 1'\ f b
alp \'

r b
.\
-l•
r
½ j.,\_
.
C 4tmut YY\.o l
Q .WM (riv
t J, ·ki_lr.7\
wDJ\A: lt) t KQ.
1
Ar? w ,S, 0......p¥1v

-
-f c: ).,uw w:
sot1t-w°''YiL(».,o..$Q..YWLU-s()..g)
Pi?\.v-../f'''-()ri\t\JA'(;Tr\"ttt-r\-0\.,
?'("\)'('\ '{QJ\,Nlw-

t>lrJc'rN\.M $Q:1N (Pll.IA.s)


y\)\inC\OJ\/? c\Y'CV<'AfWD'n<J.>)
{)cvJv&vrw/4 r\v,

,.:t X'Mvt (l\h 0-,. Q)w.¼L L:r: G\.6\s)


1o.a..S vi tliu thtL- l0-.po1 -;I;.,?'C1lvl..\
C, ¾ ().X\4, .;.,hi'\. 'fUO\.W.L8 ,b; .M.l,QM

y1,h.l \'AA. w Wvi J,h,


lA.s UV/\, ,,!.ti>--A- I
,).., euJ,
v,
-th{L. m w W v\
,¼.\liJ\fu& J\MQ-\1.11,W. ni"cv.b!k.
\MU-6 o.JIJ bJ11bw.ul G-1'\.. 0, pi -pQA IL

P°"'°-·
f;,'Ml._Q.,_ u.,b\11flL l84>,.h.. i

AJJ>XL\/Ms•
,

1 c,\"\. • .·"A{c. ca;;:


1 ?(A.$
,t\. ...•

pn11.Miu .,LJ.bllM & Uf oJ,

J..w (\j\4, <hrA-pp1'.


6- Q>J Koo I A-f>I s, £[w- .Ll,AOJvlU
.&_o.JvJJ.,<_g Fvi ct,tul bi -ti'UL 4Q.NJA,(,t..
\( i 6',QJ\ 0., ,J.,Q>,,,YML PYOVi clrvt rvi.

\;ht., \.J\ \M>,bk- Lvt ,_g VIAJ

() \At
1
0S o..ru\._ .J.,h,,h_ . t1u-Ub1!.M ,v., CW.

',llhF\ bk. '"' chxJ.op,;._ ,cl ,


ri Mi App,k,.
o /:rpp '2n...,,i-it-
hub,0pllY\.shc

3. £-,\;-w-o,'\C....C\/2 0v £,Q_Jv,J '!,Qa_S,•)


So.o_S niY ub-tk.. 1W, 0.. U>YnfL._h_
S\LlJfi-pp1vs OY tht.,.tMQA, t'W- ;tl,v._ Ap
L¼4- -r\v-c_ \l-\,\,6., £;CWJ.W..- pniv m
su-..vll.M,o04
1
os,sh, WA-ppk s[w-
().M jJJ.,QJ\, Lt, U-1\_(l,lliM\JL It"\-
w,.c1it1L fu - pk cw..PYtJ'-'ik
Q.J\ kh-,wt o. -tb. LJv-
l-'('&WJ,UJ'I\ 80..0.!, Pifpk.ON.- plos-r
.
rv,J, NC4

-=-:)
Ir-------

\ Appk

] Os

l"5>::i.V-y,.........-'i\!-\"' l

i------
i I
p,c_ol
@
0p
'rt) UJ.i>.P1"
\b'<.
7.
N��
7

NU-Wtirk

p) '\f\ ,h,.J.:AnJ-v: 11.

tb O.'fLLo \'°" ¼PlJ\v/.ho1,


t(tpw I T?a_,_
f"--\ ¼vu .,lb, 'V DM ·,

l(lfa..1 G,11N,in11&o "' biLh J.:'\It.-


"Fit))T/ji>u¼-pOJwMo.7l
1i-l o.».Ji1 !)"--

lIL 4ftWJisO/\. f))\, Jw.b.tcl


1wW < Jµ.OM N,.1M, .br Dg
0JJ_ "' } DS,

ICnlU610sJ [ (huutos I

[ H-tf?Vv\//4D J

[ t+o-bOS ]

I. H

\).},iQ\L. 0\.

H) 1rf'tA. ,&o'1
ln1 Os·.
fl k Os _LI, £M\., O j.J. bJJt..J..
\)I, et. \f\\"'(\ k
oi..'('/\(W\, O'!>• I:Y\, vi.,h.o1:'t.cl:-,,.kN, k M tc,,i,

bL J_;. \ k Os.

\/ 7'<V'M ti\-'liiho1 C4trocJutuiuii


'.\=JiVI cJ;'t-·,
1Yv \uil viU. 1 vr,J-;:..z._k
lT Cfimf kl t'4kI) Mr,,, &
\,ln'-¾ l"-'.:H \!, a.J.b"<l
91-Vv.,QJ,, le, h/M\h iD.s
'x-
l
llivr Ar
UVWi\,O- Lnu;.sle

\h \j

f )flAU"'
"'C\I'(\\[\ cl : '
. 12... <Wt OS th'm.E>,1: iooi.ci,1._
uk w:,J;h -\;k ¼P11.NJu t)!I. k.; \,NlfTtlVL
w tt\ )l,tuy,v.,hh
P"¾ .,QJ\ 0v-'

µOJiR..
ti 0v ,l.,ewIJJ\. )-\)J) J., QJ\•

fw-1 -tl\LJ.t- J:L!>"- lNI NYt

MJ.,,U1Y\, \h f) • f V\/4 4t_U'UL


ML b -. .ko., )J..l,Qj\_
s
S4icJz ti.J>bM})"-h•.• Q_JWQJ\,.

f>iU, .:/J,v._ J,©.M/IV\,


b oJJ, & ,£,U.t,
tU>J.Xr(v ()J\L h5 & J,OvM.L

i)YGJ,;Jba..c.kq:

bt¾hi ,1,W\J\/0)), .lo1,t/ M


vQ.A poihibh-·
I,, •• ,L€'y\, clo1o1C4 . .

All i,!,UJ.,/.ZII- Lnti i1,h,w!, j;,

\]\ (}, .6Of(),,\,Ji ./2t$.I,.WV,, c:lah1,cv.,!L-


:n, µvu.
" Q,
[
,AIU,), 0-+'lm lb hnllM
W- \XI '(W\, 1-6,QJ\ f) .

-w
1
JR 'n..-W t: """' ,f,,<v.,J,J.fr'{\,

Y'M t>Ls9n -
Bt--,. t:tt·. ----
Mo W V o App L rmJ1:p1-
"i"6l9.f,¼-Y'IW\t"WJ,k l',WNpQJi. "t¾
1u.., p\a.nn'¾ (mo
¼
ti\ 11.,0..c..h fa -
tha_ Tl\:- J,,\:L o\-!LO-ch &\-
91 M, Ptpf entUJwN,111k QA

o.x0v& CAf'iw,<l.,, w .
,wkA wo.-t<.L> App -,
fu J 0-fPJ\.O ha.vobLQ.A -4-bt;
llfOA .!.cJa_oul-·
g wl{!'\vok Old. l'i\,o M
9t-Jk_
r
.g ¾,rQ._, .

0-f lf\\/'l))w.-11,\ JJ-f 1-1-!w-'thl

Rt.tl•
R- tA,WJ h a.,.J,
t,-(W\, • • '1-cloh., ix, thckJ'
tkJ., buad-- ckh, ,.llfl,:Ofp
- JiLfSi lll pk
lo 1
o.Jor/\Jd_ ,M,Ull/VI- 1 IA,W)v riJ
h ·e.J';I:f-LR) lew-"1.0Jv"% &l,-lfl-
1 .
11J-Q_ 0JHL- ,\;hYUL p '1IL-,
Of3',0 (4g \,J,OJ, J-
k.o,h
l.fV'(i,\d bMc.J, R"--pu,:le-.\_
hNLa-.Q b<½ui"'-a.pl,:t_ IU <1.ftWfJib
.2,bi h n,\_o1 espclolo-
1-ul;.,kc, h, 0-- N-XVWh- J.,h,.?t,1- .

'.I:-tw()r\<,, In )\. ,p_.,n,vitwYI. .,J;1.,


t\\wJ:!)S Arc,-- bruJ. Nf JJA
NI i,,,lbi..M £-hi (Nf>I) !JA sA (e.h,J\. A-c
NQ}wo,\t)•
pdJNbTl:-,_ \mJ.,o..i a.k J,o.<.,olo.ru:\.
M,m.Qa ii,!.-tir U:,1t

n I

g,-ro, &h>'rtl--cy-
A'CYl\G A

\ •.Yv bD-h e.oJ '\, •


·tw- b 8\(L;?tl JM AA
I).; I'L: -\;ho1: J...:h\;'I\, fu, Nlw- W\_J._ l·Ju t
M - h,9q
Q-1'\A , -•

• ('_, .7\,U19-Wv<U k
bQ._.'tvwv- v'.N.J., ¼-'..
-hi <tlVw.lil$ :f'llviJJ. c1.J:
SQ.MW- pnv1'- h MlNJ-'«.8 •

w.wh 0iUut MA. u.- J,Ja:


h • vwlM-U. •

[ Wmtn;i.;,, /
f we1 j L-------------------+
A
fA 1J
frtot:-;t;I
tgJ' )

Fi } fitJ. +o 8WI
A-l-iL
e_wj;tw pl tv\,S: .
71\ Q_ b,-tr1,w\,twiib !.'I..Ve
.:J:1u, -u,d.
. mchl»,;th. );;-CW,, tu:
.l(j\;¼l i:R RL?hfieJ. 4-'0\ShvJ.Q_

u_ck04 ¥'-,J cJ (\;'\_ch r1teal•

¥?mG-Q '

t\-¼_t _.,At !"'-lh t,'\


Fif)tlO-Qcl-rt,t-wnu \v\od.J
p tltL- \t[w-- k t'hu..Yi. _VV'v"v

pevv\., -\;\uP, icJ, ti\VJ "fUO\JJl,(U

AAOW\JJJ, Q.n.ru:>k \'.lO&t-


t .!.Qm llh.t.. vuJ..•

1'X"\- 9t ,hJ, 1
v1 tVl\ ok-t.w;li,.& 1
\.) (
sJ.<>1- 12-- oxJ.,n\wi/\.1.Aol,\M!i• hL p.lali-0N'/\,ru.J_
\'fl(c\.o\.\w-MIL..½UI b ll_p!J'YI.& l ll.ct&: J°t}u,,
}> ()Acl f-ri)V',,l\ru }1AA1:t tQUU
Jc,tt,o'IJ.t{_V, u.chCIJ. d_Aj\_,bOJ.12_ .!,OJ\VA,(_L., r(\ '

J..Q_J;.J/vW,) .h,u ,bOJWJ.W,'


"ThQ 8Q;rVJ-U, l'IL(j'r/\\;--; °'ztU\, PlliV 0M
f\1>1S '\u'-,M ,ro MW: cvJ_ i'/1,QVV: .h,a-e1ud_
N-1l).}JJ\,C,U• TubipY'IWll Jlltl l)j\ ll> Jkt,C1pp,kl QP\,

ol-tt-\ 3Q°'-S Frp J.. G\h clJM\,

h A-pp,k, v.¾J,f'ipp,l,,,t VYl

1 -hMo.n -AoJt---30YWM_L pt>JJ:-a1.


tGror go..,.u.u•..
(.!lw\, .-1,WMA.U r";,le_ Jr.ruw
tJ_o.loh- CApcu In tk. c.1J•\J\ylwJ
rn11. t be,rt4:hl f ,1,1-

i,1wickl 6dfu. W wiµ.e... pnviAu-[':


lI'<\) IN\t\d.GW- L('i\ m-.)!J 11,. eu.,,hm_i'
l -l:h.12... ww, {r/\1)..°t\/,I)...

m0c "-l-lmpicJi-M ct

t\wu,'1\¥ (oS1 ?P\,,,,J\/VI, Wr:'lppkl.

You might also like