0% found this document useful (0 votes)
22 views

Unit-I Notes

distributed computing
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Unit-I Notes

distributed computing
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

CS8603-Distributed System

UNIT I
INTRODUCTION
Introduction: Definition –Relation to computer system components –Motivation –Relation to
parallel systems – Message-passing systems versus shared memory systems –Primitives for
distributed communication –Synchronous versus asynchronous executions –Design issues and
challenges. A model of distributed computations: A distributed program –A model of distributed
executions –Models of communication networks –Global state – Cuts –Past and future cones of an
event –Models of process communications.

1.1. Introduction:
1.1.1. Definition:
A distributed system is one in which components located at networked computers
communicate and coordinate their actions only by passing messages.
A distributed system consists of a collection of autonomous computers, connected through a
network and distribution middleware, which enables computers to coordinate their activities and to
share the resources of the system, so that users perceive the system as a single, integrated
computing facility.
Centralised System Characteristics
 One component with non-autonomous parts.
 Component shared by users all the time.
 All resources accessible.
 Software runs in a single process.
 Single Point of control.
 Single Point of failure.
Distributed System Characteristics
 Multiple autonomous components.
 Components are shared by all the users.
 Resources may not be accessible.
 Multiple points of control and failure.
 Software runs in concurrent processes on different processor.
Common Characteristics Certain common characteristics can be used to assess distributed
systems
 Resource Sharing.
 Openness.
 Concurrency.
 Scalability.
 Fault Tolerance.
 Transparency
Issues in distributed systems
 Concurrency
 Distributed system function in a heterogeneous environment. So adaptability is a major
issue.
 Latency
 Memory Considerations: The distributed system works on both local and shared memory.

1
 Synchronization issues.
 Since they are widespread security is a major issue.
 Limits imposed on scalability.
 They are less transparent.
QOS parameters
 Performance
 Reliability
 Availability
 Security.
Features and Consequences
⚫ No Common Physical Clock:
When programs need to cooperate they coordinate their actions by exchanging
messages. Close coordination often depends on a shared idea of the time at which the
programs’ actions occur. But it turns out that there are limits to the accuracy with which the
computers in a network can synchronize their clocks – there is no single global notion of the
correct time. This is a direct consequence of the fact that the only communication is by
sending messages through a network.

⚫ No shared Memory:
⚫ Distributed system provide the abstraction of a common address space via
distributed shared memory abstraction.
⚫ Autonomy and Heterogeneity
⚫ Processors are loosely coupled
⚫ They have different speed and each can be running a different OS.
⚫ They are not a part of dedicated systems.
⚫ They are cooperate with one another by offering services or solving a problem
jointly.
⚫ Concurrency
In a network of computers, concurrent program execution is the norm. The capacity
of the system to handle shared resources can be increased by adding more resources (for
example. computers) to the network. The coordination of concurrently executing programs
that share resources is also an important one.

⚫ Independent Failure
All computer systems can fail, and it is the responsibility of system designers to plan
for the consequences of possible failures. Distributed systems can fail in new ways. Faults in
the network result in the isolation of the computers that are connected to it, but that doesn’t
mean that they stop running. In fact, the programs on them may not be able to detect
whether the network has failed or has become unusually slow. Similarly, the failure of a
computer, or the unexpected termination of a program somewhere in the system (a crash), is
not immediately made known to the other components with which it communicates Each
component of the system can fail independently, leaving the others still running.
⚫ Geographical Separation:
⚫ The entities are geographically distributed.
⚫ It is not necessary for the processors to be wide-area network (WAN)
⚫ Network of Workstations (NOW) or Cluster of Workstations are also considered to
be distributed systems.

2
⚫ NOW configuration is to become popular because of low cost high speed off-the self
processor is available.
⚫ Google Search engine is based on NOW architecture
.
1.1.2. Relation to computer system components
The following figure1 shows the typical distributed system model. Each computer has a
processor (CPU), local memory and interface.All computers are connected by communication
network. Communication between 2 or more node is only by passing messages. There is no
common memory is available.The distributed system uses a layered architecture to break down the
complexity of system design. Each computer has memory processing unit and the computers are
connected by a communication network. All the computers can communicate often through LAN
and WAN. A distributed system is an information-processing system that contains a number of
independent computers that cooperate with one another over a communication network in order to
achieve a specific objective.

Usually, distributed systems are asynchronous, i.e., they do not use a common clock and do
not impose any bounds on relative processor speed or message transfer times. Difference between
the various computers and the ways in which they communicate are mostly hidden from users.

Figure 1: A typical distributed system model

The figure 2 shows the relationship of the software components running on each computers
and the use of local operating systems and network protocol stack for functioning.Distributed
software is also known as middlewares. Users and applications can interact with a distributed
system in a consistent and uniform way, regardless of where and when interaction takes place. Each
host executes components and operates a distribution middleware. Distributed execution is the
execution of the processes across the distributed systems to collaboratively achieve a common goal.
Middleware enables the components to co-ordinate their activities. Users perceive the system as a
single, integrated computing facility. A distributed system can consists of any number of possible
configurations, such as mainframes, personal computers, workstations, mini computers and so on.

3
Figure 2: Relationship between software components, local OS and Network
protocol stack

Several libraries are exist to choose from to invoke primitive for the most common functions
such as reliable and ordered multicasting of middleware layer.Several standard middlewares are
exists:
⚫ Object Management Group’s (OMG)
⚫ Common Object Request Broker Architecture (CORBA)
⚫ Remote Procedure Call (RPC)
⚫ Some commercial standard middleware:
⚫ CORBA
⚫ Distributed Component Object Model (DCOM)
⚫ RMI (Remote Method Invocation)
⚫ Message Passing Interface (MPI)

1.1.3. Motivation
⚫ Inherently distributed computation
⚫ Most of the applications are geographically distant, the computation is inherently
distributed.
⚫ Examples
⚫ Money transfer in banking
⚫ Reaching consensus among parties.
⚫ Resource sharing
⚫ Resources can not be fully replicated at sites
⚫ Because resources are often neither practical nor cost-effective.
⚫ Further they can’t be placed at a single site because access to that site might prove to
be bottleneck.
⚫ Hence, such resources are distributed across the systems.
⚫ Some of the resources are: Peripherals, complete data set in database, special
libraries and etc.
⚫ Access to geographically remote data and resources
⚫ Many scenarios , the data can not be replicated in every sites.
⚫ Because, it may be too large or too sensitive to be replicated.
⚫ For example,

4
⚫ Payroll data within multinational corporation is both too large and too
sensitive to be replicated at every branch office/site.
⚫ Hence, such data stored in central database
⚫ Similarly, special resources such as supercomputer exist only in certain locations
⚫ Enhanced reliability
⚫ DS has inherent potential to provide increased reliability
⚫ Because of the possibility of replicating resources and execution
⚫ In reality, geographically distributed resources are not likely to crash/malfunction at
the same time under normal circumstances.
⚫ It has several aspects:
⚫ Availability – resources should be accessible at all time
⚫ Integrity – the value/ state of the resource should be correct
⚫ Fault tolerance – ability to recover from system failure.
⚫ Increased Performance/Cost ratio
⚫ The performance/cost ratio is increased.
⚫ Even though, higher throughput has not necessarily been the main objective behind
the DS, any task can be partitioned across the various computers in DS.
⚫ It will provide better performance/cost ratio than using special parallel mechanism.
⚫ It is practically true for NOW configuration.
⚫ Scalability
⚫ Adding more processor does not pose a direct bottleneck for communication
network.
⚫ Distributed system can be extended through the addition of components, thereby
providing better scalability compared to centralized systems.
⚫ Speed
⚫ A distributed system may have more total computing power than a mainframe.
⚫ Reliability
⚫ If one machine crashes, the systems as a whole can still survive. It gives higher
availability and improved reliability.
⚫ Economics
⚫ A collection of micro processors offer a better price/performance than main frame.
⚫ Low price/ performance ratio is the cost effective way to increase computing power
⚫ Incremental growth
⚫ Computing power can be added in small increments.
⚫ Modularity and incremental expandability
⚫ Heterogeneous processors may be easily added into the system without affecting the
performance.
⚫ Constraint: those processors are running the same middleware algorithm.
⚫ Similarly, existing processors may be easily replaced by other processors.
1.1.4. Relation to Parallel Systems
Characteristics of Parallel Systems

Multiprocessor Systems:

It is a parallel system. It has direct access to shared memory which forms common address
space. This processor do not have common clock. The figure 3 shows the architecture of
Multiprocessor system (UMA-Uniform Memory Access). Multiprocessor systems are corresponds
to UMA architecture. The access latency, i.e. waiting time, to complete an access to any memory

5
location from any processor is the same. The processors are in very close in proximity. They are
connected by an interconnection network. Inter process communication across processors are
occurred across on shared memory (through read and write operation) & MPI is also possible.
Mostly, bus interconnection topology is used for access the memory. Even though, it is more
efficient, it is usually a multistage switch with symmetric and regular design. Two popular
interconnection networks are:

⚫ Omega network
⚫ Butterfly network

Figure 3: Architecture of UMA Multi Processor Systems

Omega Networks

The figure 4 shows the 3-stage omega network interconnection. They are multi-staged network
formed of 2 x 2 switching elements. Each 2 x 2 switch allows data on either of the 2 input wires to
be switched to the upper or lower output wires. Only one data unit can be sent on an output wire.
Collision will occurs if data from both the input wires is to be routed to the same output wire in a
single step. Collision can be addressed by buffering.Each 2 x 2 switches is represented as a
rectangle. n-input and n-output network uses log n stages and log n bits for addressing. Routing in 2
x 2 switch at stage k uses only k th bit. The multi-stage networks can be expressed using an iterative
or recursive generating function.

6
Figure 4. Three Stage Omega Network

Omega Interconnection Function

It connects n processor to n memory units.It has (n/2)log2n switch elements of size 2


x 2 arranged in log2 n stages.The iterative generation function is as follows,Here output i of a
stage connected to input j of next stage:

For above example omega network, n=8 In any stage, for the outputs i, where
0 ≤ i ≤ 3, the output i is connected to input 2i of the next stage. For 4 ≤ i ≤ 7, the output i of
any stage is connected to input 2i+1−n of the next stage.

Routing Function:

In any stage s at any switch to route to destination J,

If S+1th MSB of j=0, then route on upper wireElse [s+1th MSB of j=1], then
route on lower wire.

Butterfly Networks

The figure 5 shows the 3-stage butterfly networks.

7
Figure 5. Three Stage Butterfly Network
Butterfly Interconnection Function

Here, interconnection pattern between a pairs of adjacent stages depends not only on
n but also on the stage s. But in omega network which depends only on n.The recursive
function as

⚫ Let M=n/2 switches per stage.


⚫ Switches be denoted by the tuple <x,s> , where x€ [0,M-1] and stage s €
[0,log2n-1)

Butterfly Routing Function

if the s+1th MSB of j is 0, the data is routed to the upper output wire, otherwise it is
routed to the lower output wire.

Multi Computer Systems:

The figure 6 shows the Non-Uniform memory Access Multiprocessor system. It is a parallel
system.It do not have direct access to shared memory.It may or may not form common address
space.Also do not have any common clock.The processors are in in close physical proximityThey
are usually tightly coupled.Processor can communicated via either common address space or
message passing.

8
Figure 6 .Architecture of NUMA Multiprocessor System

Multi computer systems correspond to NUMA architecture.The latency to access various


shared memory locations form different processors varies.Examples of parallel multi computers are:

⚫ NYU ultra computers

⚫ Sequent shared memory machine , CM* Connection Mahcine


⚫ Processor configured in regular symmetrical topologies such as array or mesh, ring
torus, cube and hypercube.

Figure 7 shows a wrap-around 4×4 mesh. For a k×k mesh which will contain K2
processors.The maximum path length between any two processors is 2(k/2−1). Routing can be done
along the Manhattan grid.

Figure 7 .Wrap-around 4 x 4 Mesh Interconnection

Figure 8 shows a four-dimensional hypercube. A k-dimensional hypercube has 2k processor-


and-memory units. Each such unit is a node in the hypercube, and has a unique k-bit label.
Each of the k dimensions is associated with a bit position in the label. The labels of any two
adjacent nodes are identical except for the bit position corresponding to the dimension in which the
two nodes differ. Thus, the processors are labelled such that the shortest path between any two
processors is the Hamming distance between the processor labels.

9
Figure 8 .4D- Hypecube Interconnection

Hamming distance is defined as the number of bit positions in which the two equal sized bit
strings differ. This is clearly bounded by k.Nodes 0101 and 1100 have a Hamming distance of
2.The shortest path between them has length 2.Routing in the hypercube is done hop-by-hop. At any
hop, the message can be sent along any dimension corresponding to the bit position in which the
current node’s address and the destination address differ.

Array Processors:

Array processors belong to a class of parallel computers .They are physically co-located,
are very tightly coupled, and have a common system clock. But may not share memory and
communicate by passing data using messages.Array processors and systolic arrays that perform
tightly synchronized processing and data exchange in lock-step for applications such as DSP and
image processing belong to this category. These applications usually involve a large number of
iterations on the data. This class of parallel systems has a very niche market.

Flynn’s Taxonomy

Based on the no of instruction stream and data stream used, flynn’s classify the architecture
in to 4:

⚫ Single instruction stream, single data stream (SISD)

⚫ Single instruction stream, multiple data stream (SIMD)


⚫ Multiple instruction stream, single data stream (MISD)

⚫ Multiple instruction stream, multiple data stream (MIMD)

Single instruction stream, single data stream (SISD)

It corresponds to the conventional processing in the von Neumann paradigm.It has a


single CPU, and a single memory unit connected by a system bus.

Single instruction stream, multiple data stream (SIMD)

10
This mode corresponds to the processing by multiple homogenous processorsThey
execute in lock-step on different data items. Applications that involve operations on large
arrays and matrices, such as scientific applications.It provides the SIMD mode of operation
because the data sets can be partitioned easily. Several of the earliest parallel computers:

⚫ Illiac-IV, MPP, CM2,andMasPar MP-1 were SIMD machines.

⚫ Vector processors, array processors’ and systolic arrays also belong to the
SIMD class of processing.

⚫ Recent SIMD architectures include co-processing units such as the MMX


units in Intel processors (e.g., Pentium with the streaming SIMD extensions
(SSE) options)

⚫ DSP chips such as the Sharc

Multiple instruction stream, single data stream (MISD)

This mode corresponds to the execution of different operations in parallel on the


same data. This is a specialized mode of operation with limited but niche applications, e.g.,
visualization.

Multiple instruction stream, multiple data stream (MIMD)

In this mode, the various processors execute different code on different data. This is
the mode of operation in distributed systems as well as in the vast majority of parallel
systems. There is no common clock among the system processors. Examples:

⚫ Sun Ultra servers


⚫ multicomputer PC

⚫ IBM SP machines

Figure 9 .Flynn’s Taxonomy

Difference between SIMD and MIMD :

11
SIMD MIMD

Stands for Single Instruction stream Stands for multiple Instruction stream
multiple data stream multiple data stream

Architecture is simple. Architecture is complex

Low Cost Medium Cost

Size and performance is scalable Complex size and good performance

Automatic synchronization of all send Explicit synchronization and


and receive operation identification protocols needed.

Coupling, parallelism, concurrency, and granularity

Coupling:

The degree of coupling among a set of modules(may be a hardware or software) is


measured in terms of the interdependency and binding and/or homogeneity among the
modules.Degree of coupling low, the modules are said be loosely coupled.Degree of coupling high,
the modules are said to be tightly coupled.Eg: Tightly coupled multiprocessor with UMA shared
memory: NYU utlracomputer, sequent, EncoreTightly coupled multiprocessor with NUMA shared
memory: SGI Origin 2000, Sun Ultra HPC servers, hupercubes and torusLoosely coupled
multicomputers without shared memory: NOW, Myrinet Card)

Parallelism or Speedup ratio:

It is a measure of the relative speedup of a specific program, on a given machine. It


depends on the number of processors and the mapping of the code to the processors. It is expressed
as the ratio of the time T1 with a single processor, to the time Tn with n processors.

Concurrency of a Program:

This is a broader term that means roughly the same as parallelism of a program. But,
it is used in the context of distributed programs.The parallelism/ concurrency in a
parallel/distributed program can be measured by the ratio of the number of local (non-
communication and non-shared memory access) operations to the total number of operations,
including the communication or shared memory access operations.

Granularity:

It is the amount of computation and manipulation involved in a software process.The


simplest way is to count the number of instruction in a given program segment.Grain size decides
the basic program segment chosen for parallel processing.Grain sizes are usually fin, medium of
coarse depending on the processing levels involvedLatency is the time of communication overhead
acquire among subsystems.The time required for 2 processes to synchronize with each other is

12
called synchronization latency.Computational granularity and communication latency are closely
related.

1.1.5. Message-passing systems versus shared memory systems


Message Passing:
 Two processes communicate with each other by passing messages.
 Message passing is direct and indirect communication
 Indirect communication uses mailbox for sending and receiving message from
others.
 Message passing system requires the synchronization and communication between 2
processes
 Message passing is used as a method of communication in microkernels.
 Message passing systems come in many forms.
 Messages sent by a process can be either fixed or variable size.
 The actual function of message passing is normally provided in the
form of a pair of primitives.
o Send(destination_name, message)
o Receive(source_name,message)
 Send primitive is used for sending a message to destination
 Process sends information in the form of a message to another
process designated by a destination.
 A process receives information by executing the receive primitive,
which indicates the source of sending process and the message.
Shared Memory:
 In which, there is a common shared address space throughout the system.
 Communication among processors takes place via shared data variables and control
variables for synchronization among the processor.
 To achieve synchronization in shared memory systems, semaphore and monitor were
designed.
 All multicomputer (NUMA as well as message-passing) systems that do not have a
shared address space provided by the underlying architecture and hardware
necessarily communicate by message passing.
 Shared memory abstraction provides to simulate a shared address space.
 This abstraction is called as distributed shared memory.
 Implementing this abstraction has a certain cost
 But it simplifies the task of the application programmer.
 Shared memory is faster than message passing, as message passing systems are
typically implemented using system calls.
 Thus, message passing requires the more time-consuming task of kernel intervention.

Emulating message-passing on a shared memory system (MP →SM)


 The shared address space can be partitioned into disjoint parts, one part being
assigned to each processor.
 “Send” and “receive” operations can be implemented by writing to and reading from
the destination/sender processor’s address space, respectively.
 Specifically, a separate location can be reserved as the mailbox for each ordered pair
of processes.

13
⚫ A Pi–Pj message-passing can be emulated by a write by Pi to the mailbox and then a
read by Pj from the mailbox.
⚫ In the simplest case, these mailboxes can be assumed to have unbounded size.
⚫ The write and read operations need to be controlled using synchronization primitives
to inform the receiver/sender after the data has been sent/received.

Emulating shared memory on a message-passing system (SM →MP)


⚫ This involves the use of “send” and “receive” operations for “write” and “read”
operations.
⚫ Each shared location can be modeled as a separate process
⚫ “write” to a shared location is emulated by sending an update message to the
corresponding owner process
⚫ A “read” to a shared location is emulated by sending a query message to the owner
process.
⚫ As accessing another processor’s memory requires send and receive operations, this
emulation is expensive.

1.1.6. Primitives for distributed communication


 Send() – Message send communication primitives
 Receive() – Message receive communication primitives
 These primitives have two arguments:
 Send(msg,dst)
 Receive(sr,buffer)
 There are two options in send primitives:
 Buffered – user data is copied in the kernel buffer
 Un buffered- the data gets copies directly from the user buffer on to the network.
 Communication of a message between 2 process implies some level of
synchronization between the processes
 Synchronous primitives :
o A Send or a Receive primitive is synchronous if both the Send() and
Receive() handshake with each other.
 Asynchronous primitives :
o A Send primitive is said to be asynchronous if control returns back to the
invoking process after the data item to be sent has been copied out of the
user-specified buffer.
 Blocking primitives :
o A primitive is blocking if control returns to the invoking process after the
processing for the primitive
 Non-blocking primitives:
o A primitive is non-blocking if control returns back to the invoking process
immediately after invocation, even though the operation has not completed
 Sender and receiver can be blocking or non blocking
 3 combinations are possible using blocking and non blocking
o Blocking Synchronous send, blocking receive
o Non blocking send, blocking receive
o Non blocking send, non blocking receive
 Blocking send, blocking receive

14
o Both the sender and receivers are blocked until message is delivered.
o This is called rendezvous.
o It allows for tight synchronization between process

 Non blocking send, blocking receive:


o Sender may continue on, the receiver is blocked until requested
message arrives.
o A process that must receive a message before it can do useful work needs to
be blocked until such message arrives
o An example
 Server process that exist to provide a service or resources to other
process.
 Non blocking Synchronous send, non blocking receive:
o Sending process sends the message and resumes the operation
o Receiver receives either a valid message or null
o i.e. neither party is required to wait.

15
1.1.7. Synchronous versus asynchronous executions
Synchronous execution – features:
 Lower and upper bounds on execution time of processes can be set.
 Transmitted messages are received with a known bound time
 Drift rates between local clock have a known bound.
Synchronous execution – Important Consequences:
⚫ There is a notion of global physical time with a known relative precision depending
on the drift rate.
⚫ The system have a predictable behaviour in terms of timing.
⚫ Only such systems can be used for hard real-time applications
⚫ It is possible and safe to use timeouts in order to detect failures of process or
communication link.

An example of synchronous execution

Asynchronous execution:

⚫ Many distributed systems are asynchronous


⚫ No bound-on process execution time.i.e. nothing can be assumed about speed, load,
and reliability of computers
⚫ No bound-on message transmission delays. i.e. nothing can be assumed about speed,
load, reliability of interconnections
⚫ No bounds on drift rates between local clocks.
Asynchronous execution - Important consequences:

⚫ There is no global physical time.


⚫ Asynchronous distributed systems are unpredictable in terms of timing.
⚫ No timeouts can be used.

16
1.1.8. Design issues and challenges
Challenges from System Perspectives:
⚫ Communication mechanisms
⚫ Processes
⚫ Naming
⚫ Synchronization
⚫ Data storage and access
⚫ Consistency and replication
⚫ Distributed systems security
⚫ Communication mechanisms
⚫ It involves designing appropriate mechanism for communication among the process
in the network
⚫ Example: RPC, Remote Object Invocation (ROI), Message-oriented vs stream-
oriented communication
⚫ Process
⚫ Issue involved are code migration, process/thread management at clients and servers,
design of software and mobile agents
⚫ Naming:
⚫ Easy to use identifiers needed to locate resources and processes transparently and
scalable.
⚫ Synchronization :
⚫ Mechanism for synchronization or coordination among the processes are essential.
⚫ Mutual exclusion is the classical example of synchronization
⚫ Data storage and access:
⚫ Various schemes for data storage, searching and lookup should be fast and scalable
across network
⚫ Consistency and replication:
⚫ To avoid bottleneck, to provide fast accesses to data provide replication of fast
access, scalability.
⚫ Require consistency management among replicas.
⚫ Distributed system security:
⚫ Secure channels, access control, key management, authorization, secure group
management are the various method used to provide security.
⚫ Designing the distributed system does not come for free.
⚫ Some challenges need to be overcome.
⚫ Transparency
⚫ Openness
⚫ Heterogeneity
⚫ Scalability
⚫ Security
⚫ Failure handling
⚫ Concurrency

17
Scalability

Security Failure

Openness Ideal Distributed Systems Concurrency

Heterogeneity Transparency

Transparency
 Transparency is defined as the hiding of the separation of components in a
distributed systems from the user and the application programmer.
 With transparency the system is perceived as a whole rather than a collection of
independent components.
 Transparency is an important goal.
Transparency Description
Hide differences in data representation and
Access
how a resource is accessed
Location Hide where a resource is located
Hide that a resource may move to another
Migration
location
Hide that a resource may be moved to
Relocation
another location while in use
Replication Hide that a resource is replicated
Hide that a resource may be shared by
Concurrency
several competitive users
Failure Hide the failure and recovery of a resource
Movement of resources and clients with a
Mobility system without affecting the operation of
users and programs, e.g., mobile phone
Allows the system to be reconfigured to
Performance
improve performance as loads vary
Allows the system and applications to
expand in scale without change to the
Scaling
system structure or the application
algorithms

18
 Goal of a distributed systems:
o To connect users and resources in a transparent, open and scalable way.
 Advantages of Transparency:
o Easier for the user
o Doesn’t have to bother with system topography
o Doesn’t have to know about changes
o Easier to understand
o Easier for the programmer
 Disadvantages of Transparency
o Optimization cannot be done by programmer or user
o Strange behavior when the underlying system fails
o Underlying system can be very complete.

Openness
 Openness is the characteristic that determines whether the system can be
extended.
 It refers to the ability of plug and play.
 Open DS: offers services according to standard rules that describe the syntax and
semantics of those services.
 In DSs, services are generally specified through interfaces, which are often
described in an Interface Definition Language (IDL).
 Here, interface should be open.
 i.e. it should be standardized
 E.g., internet protocol documents RFC (Request for Comments)
Heterogeneity
 Heterogeneous components that must be able to interoperate, apply to all of the
following:
o Networks
o Hardware architectures
o Operating systems
o Programming language

 There may be many different representation of data in the system.


 Most of the data can be marshaled from one system to another without losing
significance.
 Attempts to provide a universal canonical form of information is lagging.
 Heterogeneity is unavoidable in Distributed systems.

19
 Examples that mask differences in network, operating systems, hardware and
software to provide heterogeneity are
o Middleware
o Mobile code
o Virtual Machine
 Middleware
 Middleware applies to a software layer.
 Middleware provides a programming abstraction.
 Middleware masks the heterogeneity of the underlying networks, hardware,
operating systems and programming languages.
 The Common Object Request Broker (CORBA) is a middleware example.
 Mobile code
 Mobile code is the code that can be sent from one computer to another and
run at the destination.
 Java applets are the example of mobile codes.
 Virtual machine
 Virtual machine provides a way of making code executable on any hardware.

Scalability
 A system is described as scalable if it will remain effective when there is a
significant increase (or decrease) in the number of resources and the number of
users.
 Scalability presents a number of challenges such as
o Controlling the cost of physical resources
o Controlling the performance loss – set of data whose size is proportional
to the number of users or resources in the system
o Preventing software resources running out, etc.-eg. IPV4
o Avoiding Performance bottleneck – e.g. DNS
 Caching and Replication in web are examples of providing Scalability.

Security
 Security of a computer system is the characteristic that the resources are
accessible to authorized users and used in the way they are intended.
 Security for information resources has three components:

 Confidentiality
o Protection against disclosure to unauthorized individual.
 Integrity
o Protection against alteration or corruption.

20
 Availability
o Protection against interference with the means to access the resources.

Failure Handling
 Failures in distributed systems are partial, that is some components fail while
others continue to function.
 Techniques for dealing with failures:
 Detecting failures
o E.g. Checksums
 Masking failures
o E.g. Retransmission of corrupt messages
o E.g. File redundancy

 Tolerating failures
o E.g. Exception handling
o E.g. Timeouts
 Recovery from Failure
o E.g. Rollback mechanisms
 Redundancy
o E.g. Redundant components

Concurrency
 Concurrency is the ability of different parts or units of a program, algorithm, or
problem to be executed out-of-order or in partial order, without affecting the
final outcome.
 With concurrency, services and applications can be shared by clients in a
distributed system.
 For an object to be safe in a concurrent environment, its operations must be
synchronized in such a way that its data remains consistent.
 Concurrency can be achieved by standard techniques such as semaphores, which
are used in most operating systems.

1.2.A model of distributed computations


1.2.1. A Distributed Program:
⚫ Distributed system can be modelled by a directed graph.
⚫ Vertices in directed graph represents a Processor and where as a edges represent
unidirectional communication channel shown in the following figure
⚫ Distributed Systems are set of processor connected by communication channel
⚫ The main use of Communication Network in distributed system is exchanging the
information among processor.
⚫ In distributed system, processor do not share common memory.
⚫ Each processor communicates with each other by passing message.
⚫ There is no physical global clock is present in distributed systems.
⚫ Message may be delivered in out of order, may be lost, grabbled or duplicated because of
time out and retransmission, processor may failed and communication link may goes down.
⚫ Distributed Application runs as collection of processes.

21
P1

P2
P5

P3
P4

⚫ A distributed program is composed of a set of n asynchronous processes,p1,p2,p3. …,pi,…,pn.


⚫ The processes do not share a global memory and communicate solely by passing messages.
⚫ The processes do not share a global clock that is instantaneously accessible to these
processes.
⚫ Process execution and message transfer are asynchronous.
⚫ Without loss of generality, we assume that each process is running on a different processor.
⚫ Let Cij denote the channel from process pi to process pj and let mij denote a message sent
by pi to pj.
⚫ The message transmission delay is finite and unpredictable.

1.2.2. A Model of Distributed Execution


⚫ The execution of a process consists of a sequential execution of its actions.
⚫ The actions are atomic.
⚫ The actions of a process are modeled as three types of events, namely:
⚫ Internal events
⚫ Message send events

⚫ Let 𝑒�
⚫ Message receive events.
𝑥
denotes the xth event at process pi.
⚫ The occurrence
� of events changes the states of respective processes and channels.
⚫ An internal event changes the state of the process at which it occurs.
⚫ A send event changes the state of the process that sends the message and the state of the
channel on which the message is sent.
⚫ A receive event changes the state of the process that receives the message and the state of the
channel on which the message is received.
⚫ The events at a process are linearly ordered by their order of occurrence.
⚫ The execution of process pi produces a sequence of events e1i, e2i, ..., exi, ex+1 ,i ... and is denoted
by Hi where,
⚫ Hi = (hi , →i )
⚫ hi is the set of events produced by pi and
⚫ binary relation →i defines a linear order on these events.
⚫ Relation →i expresses causal dependencies among the events of pi .
⚫ The send and the receive events signify the flow of information between processes and establish
causal dependency from the sender process to the receiver process.
⚫ A relation →msg that captures the causal dependency due to message exchange, is defined as
follows.
⚫ For every message m that is exchanged between two processes, we have,
⚫ send (m) →msg rec (m)

22
⚫ Relation →msg defines causal dependencies between the pairs of corresponding send and
receive events.
⚫ The evolution of a distributed execution is depicted by a space-time diagram.
⚫ Horizontal line represents progress of the process
⚫ Dot represents event
⚫ Slant arrow represents message transfer
⚫ Since we assume that an event execution is atomic (hence, indivisible and instantaneous), it is
justified to denote it as a dot on a process line.
⚫ In the Figure, for process p1, the second event is a message send event, the third event is an
internal event, and the fourth event is a message receive event.
1 2
e e3
4 5
e 1
e1 e1
1
p 1

6
e21 e22 e23 e24 e2
p
2
e25
e31 e33
p
3
2 4
e3 e3
time

Causal Precedence Relation


⚫ The execution of a distributed application results in a set of distributed events produced by the
processes.
⚫ Let H=𝖴i hi denote the set of events executed in a distributed computation.
⚫ Define a binary relation → on the set H as follows that expresses causal dependencies between
events in the distributed execution.

⚫ The causal precedence relation induces an irreflexive partial order on the events of a distributed
computation that is denoted as H=(H, →).
⚫ Note that the relation → is nothing but Lamport’s “happens before” relation.
⚫ For any two events eiand ej, if ei→ ej, then event ejis directly or transitively dependent on event
e i.
⚫ The relation → denotes flow of information in a distributed computation
⚫ ei→ ej - all 1the information available at eiis potentially accessible at ej.
⚫ Example: e →e3 and e3 → e6 .
1 3 3 2
⚫ Event e6 has the knowledge of all other events
⚫ ei→ ej - event ejis does not directly or transitively dependent on event ei. So, these events are
concurrent events denoted as ei||ej
⚫ Example: e3 → e3 and e4 → e1 .
1 3 2 3
⚫ i.e., event ei does not casually affect ej
⚫ In this case, event ejis not aware of the execution of eior any event executed after eion the same
process.
⚫ Two Rules

23
⚫ The relation ǁ is not transitive; that is, (eiǁ ej)𝖠 (ejǁ ek) ⇒eiǁ ek.
⚫ For any two events eiand ejin a distributed execution, ei→ ejor ej→ ei, or eiǁ ej.
Logical vs. Physical Concurrency
⚫ In a distributed computation, two events are logically concurrent if and only if they do not
causally affect each other.
⚫ Physical concurrency, on the other hand, has a connotation that the events occur at the same
instant in physical time.
⚫ Two or more events may be logically concurrent even though they do not occur at the same
instant in physical time.
⚫ However, if processor speed and message delays would have been different, the execution of
these events could have very well coincided in physical time.
⚫ Whether a set of logically concurrent events coincide in the physical time
or not, does not change the outcome of the computation.
⚫ Therefore, even though a set of logically concurrent events may not have occurred at the same
instant in physical time, we can assume that these events occurred at the same instant in
physical time.

1.2.3. A Model of Communication Network

⚫ There are several communication networks models:


⚫ FIFO
⚫ Non-FIFO
⚫ Casual Ordering
⚫ In FIFO, each channel acts as a first-in-first out message queue. Here, message ordering is
preserved by a channel.
⚫ In Non FIFO, a channel acts like a set in which the sender process adds messages and the
receiver process removes messages from it in a random order.
⚫ The “causal ordering” model is based on Lamport’s “happens before” relation.
⚫ A system that supports the causal ordering model satisfies the following property
⚫ CO: For any two messages mijand mkj, if send (mij) −→
send(mkj), then rec (mij) −→ rec (mkj).
⚫ This property ensures that causally related messages destined to the same destination are

⚫ Causally ordered delivery of messages implies FIFO message delivery. (Note that CO ⊂ FIFO
delivered in an order that is consistent with their causality relation.

⊂ Non-FIFO.)
⚫ Causal ordering model considerably simplifies the design of distributed algorithms because it
provides a built-in synchronization.

1.2.4. Global State


⚫ “A collection of the local states of its components, namely, the processes and the
communication channels.”
⚫ The state of a process is defined by the contents of processor registers, stacks, local memory,
etc. and depends on the local context of the distributed application.
⚫ The state of channel is given by the set of messages in transit in the channel.
⚫ The occurrence of events changes the states of respective processes and channels.
⚫ An internal event changes the state of the process at which it occurs.
⚫ A send event changes the state of the process that sends the message and the state of the
channel on which the message is sent.

24
⚫ A receive event changes the state of the process that or receives the message and the state of
the channel on which the message is received.
⚫ Global state of distributed computation is the set of local state of all individual processes
involved in the computation plus the state of the communication channel.
 Requirements of Global States:
 Distributed Garbage Collection
 Distributed Deadlock Detection
 Distributed termination Detection
 Distributed Debugging
Requirements of Global States - Distributed Garbage Collection:

⚫ An object is considered to be garbage if there are no longer any references to it anywhere in the
distributed system.
⚫ The memory taken up by that object can be reclaimed once it is known to be garbage.
⚫ To check that an object is garbage, we must verify that there are no references to it anywhere in
the system.
⚫ Consider the fig.,
⚫ Process p1 has two objects that both have references – one has a reference within p1
itself, and p2 has a reference to the other.
⚫ Process p2 has one garbage object, with no
references to it anywhere in the system.

⚫ It also has an object for which neither p1 nor p2 has a reference, but there is a reference to it
in a message that is in transit between the processes.
⚫ This shows that when we consider properties of a system, we must include the state of
communication channels as well as the state of the processes.

Requirements of Global States - Distributed deadlock detection:


⚫ A distributed deadlock occurs when each of a collection of processes waits for another process to
send it a message
⚫ There is a cycle in the graph of this ‘waits-for’ relationship.
⚫ Processes p1 and p2 are each waiting for a message from the other, so this system will never make
progress.

25
Requirements of Global States - Distributed termination detection:

⚫ The problem here is:


⚫ how to detect that a distributed algorithm has terminated.
⚫ Detecting termination is a problem .
⚫ First to test whether each process has halted.
⚫ Two processes p1 and p2 , each of which may request values from the others.
⚫ Instantaneously, we may find that a process is either active or passive
⚫ A passive process is not engaged in any activity of its own
⚫ But it is prepared to respond with a value requested by the other.
⚫ Suppose we discover that p1 is passive and that p2 is passive. We cannot conclude algorithm
was terminated.
⚫ The phenomena of termination and deadlock are similar.
⚫ But they are different problems.
⚫ First, a deadlock may affect only a subset of the processes in a system, whereas all
processes must have terminated.
⚫ Second, process passivity is not the same as waiting in a deadlock cycle: a deadlocked process
is attempting to perform a further action, for which another process waits a passive process is
not engaged in any activity

Requirements of Global States - Distributed debugging:

⚫ Distributed systems are complex to debug


⚫ Care needs to be taken in establishing what occurred during the execution.

Notations of Global State


⚫ LSxdenotes the state of process pi after the occurrence of event ex and before the event ex+1.
⚫ LS0 denotes the initial state of process pi .

Let send (m)≤LSxdenote the fact that ∃y :1≤y ≤x :: ey=send (m).


⚫ LSxis a result of the execution of all the events executed by process pi till ex .

Let rec (m)≤LSxdenote the fact that ∀y :1≤y ≤x :: ey=rec (m).




⚫ A Channel State
⚫ The state of a channel depends upon the states of the processes it connects.
⚫ Let denote the state of the channel Cij
⚫ The state of the channel is defined as follows:

⚫ Thus, channel state denotes all messages that pi sent upto event ex and which process pjhad
not received until event ey.
⚫ The global state of a distributed system is a collection of the local states of the processes and
the channels.
⚫ Notationally, global state GS is defined as,

⚫ For a global state to be meaningful, the states of all the components of the distributed system
must be recorded at the same instant.

26
⚫ This will be possible if the local clocks at processes were perfectly synchronized or if there
were a global system clock that can be instantaneously read by the processes. (However,
both are impossible.)

Consistent Global State


⚫ Even if the state of all the components is not recorded at the same instant, such a state will
be
meaningful provided every message that is recorded as received is also recorded as sent.
⚫ Basic idea is that a state should not violate causality – an effect should not be present
without
its cause. A message cannot be received if it was not sent.
⚫Such states are called consistent global states and are meaningful global states.
⚫ Inconsistent global states are not meaningful in the sense that a distributed system can never
be in an inconsistent state.
⚫For example consider the following figure,
⚫A global state GS1 = {LS1, LS3, LS3, LS2} is inconsistent because the state of p2
has recorded the receipt of message m12, however, the state of p1 has not recorded its
send.
⚫A global state GS2 consisting of local states {LS2, LS4, LS4, LS2} is consistent
⚫All the channels are empty except C21 that contains message m21.

1.2.5. Cuts of Distributed Computation


⚫ Global state of the distributed system is a collection of the local state of the processes and
the channels.
⚫ OS can not know the current state of all process in distributed system.
⚫ A process can only know the current state of all processes on the local system.
⚫ Remote processes only know state information that is received by message.
⚫ The essential problem is the absence of global time.
⚫ If all processes had perfectly synchronized clocks, then we could agree on a time at which
each process would record its state
⚫ The result would be an actual global state of the system.
⚫ From the collection of process state, we can say whether the processes were deadlocked.
⚫ But we can not achieve perfect clock synchronization.
Cuts of Distributed Computation Contd…-Need for Global State
⚫ Fig shows the stages of a computation of when $100 is transferred from account A to
account B.
⚫ Communication channel C1 and C2 are assumed to be FIFO
⚫ Bank account is distributed over two branches (Branch A and B).
⚫ The total amount in the account is the sum at each branch.
⚫ At 3.00 PM the account balance is determined.
⚫ Message are sent to request the information

27
⚫ If at the time of balance determination, the balance form branch A is in transit to Branch B.
⚫ The result is a false reading
⚫ All messages in the transmit must be examined at the time of observation.
⚫ total consists of balance at both branches and amount in message.
⚫ Each amount sent and just received, must be added only one time.

⚫ If the clock at the 2 branches are not perfectly synchronized, then the transfer amount at 3.01
from branch A
⚫ The amount arrives at branch B at 2.59
⚫ At 3.00 the amount is counted twice.

⚫ Channel: Exist between two processes if they exchange message.


⚫ State: Sequence of message that have been sent and received along channels incident with
the process.
⚫ Snapshot: Records the state of the process
⚫ Distributed Snapshot: A collection of snapshots, one for each process.
⚫ Cut : The notation of a global state can be graphically represented by what is called cut.
⚫ History (Pi)= hi=<ei0,ei1,ei2…>
⚫ History is execution of each process.
⚫ Finite prefix: hik=<ei0,ei1,ei2,….eik)
⚫ Each event is either an internal action of the process or it is the sending receipt of message
over communication channel.

28
⚫ Global history of Si is union of the individual process histories:
⚫ H=ho Մ h1 Մ …Մ hn-1
⚫ Prefix Union
C1 c2 cn
⚫ C= h1 Մ h2 …U hn
⚫ The set of events {eici:i=1,2,..N} is called the frontier of the cut.
⚫ Cut: Cuts of systems execution is a subset of its global history.
⚫ Inconsistent cut
⚫ A Inconsistent cut can violate temporal causality
⚫ At p2, it includes the receipt of the message m1 but at p1 it does not includes the
sending of the message.
⚫ Consistent Cut
⚫ A consistent cut cannot violate temporal causality

Cuts of Distributed Computation -Chandy-Lamport Algorithm


⚫ It records a set of process and channel states such that the combination is a consistent global
state.
⚫ Communication channel is assumed to be FIFO
⚫ Assumption
⚫ No failure, all messages arrive intact, exactly once.
⚫ Communication channels are unidirectional and FIFO ordered
⚫ There is a communication channel between each pair of processes.
⚫ Any process may initiate the snapshot
⚫ Snapshot does not interfere with normal execution.
⚫ It uses control message.
⚫ i.e., marker- role of marker in FIFO system is to separate the messages in the
channel.
⚫ After a site has recorded its snapshot, it sends a marker.
⚫ A marker separates the message in the channel into those to be included in the snapshot
from those not to be recorded in the snapshot.
⚫ A process must record its snapshot no later than when it receives a marker on any of its
incoming channels.
Algorithm:
Steps:
1. Initiator process P0 records its state locally.
2. Marker sending rule for process Pi:
a. After Pi has recorded its state, for each outgoing
channel Chij, Pi sends one marker message over a Chij
3. Marker Receiving rule for Process Pi:
a. Process Pi on the receipt of a marker over channel chji
b. If(Pi has not yet recorded its state)
i. Record its process state now;
ii. Records the state of Chji as empty set;
iii. Starts recording messages arriving over other
incoming channels

29
Steps:
4. Else (Pi has already recorded its state)
a. Pi records the state of Chji as the set of all messages it
has received over Chji since it saved its state.

1.2.6.
Past and Future Cones of an Event
⚫An event ej could have been affected only by all events ei such that ei → ej .
⚫In this situation, all the information available at ei could be made accessible atej .
⚫All such events ei belong to the past of ej .
⚫Let Past(ej) denote all events in the past of ej in a computation (H, →). Then,
⚫ Past(ej ) = {ei |∀ei∈ H, ei → ej }.
⚫ Figure (next slide) shows the past of an event ej .

1.2.7. A Model of Process Communications


⚫ There are two basic models of process communications – synchronous and asynchronous.
⚫ The synchronous communication model is a blocking type
⚫ Here once a message sends, the sender process blocks until the message has been received
by the receiver process.
⚫ The sender process resumes execution only after it learns that the receiver process has
accepted the message.
⚫ Thus, the sender and the receiver processes must synchronize to exchange a message.
⚫ Asynchronous communication model is a non-blocking type
⚫ The sender and the receiver do not synchronize to exchange a message.
⚫ After having sent a message, the sender process does not wait for the message to be
delivered to the receiver process.
⚫ The message is buffered by the system and is delivered to the receiver process when it is
ready to accept the message
⚫ Neither of the communication models is superior to the other.
⚫ Asynchronous communication provides higher parallelism
⚫ Because the sender process can execute while the message is in transit to the receiver.
⚫ However, A buffer overflow may occur if a process sends a large number of messages in a
burst to another process.
⚫ Thus, an implementation of asynchronous communication requires more complex buffer
management.
⚫ It is much more difficult to design, verify, and implement distributed algorithms for
asynchronous communications.
⚫ Because of higher degree of parallelism and non-determinism
⚫ Synchronous communication is simpler to handle and implement.
⚫ However, due to frequent blocking, it is likely to have poor performance and is likely to be
more prone to deadlocks.

30
1.3. Logical Time
1.3.1. Introduction
We require computers around the world to timestamp electronic commerce transactions
consistently. Time is also an important theoretical construct in understanding how distributed
executions unfold. But time is problematic in distributed systems. Each computer may have its own
physical clock, but the clocks typically deviate, and we cannot synchronize them perfectly. The
absence of global physical time makes it difficult to find out the state of our distributed programs as
they execute. We often need to know what state process A is in when process B is in a certain state,
but we cannot rely on physical clocks to know what is true at the same time.
Time is an important and interesting issue in distributed systems, for several reasons. First,
time is a quantity we often want to measure accurately. In order to know at what time of day a
particular event occurred at a particular computer it is necessary to synchronize its clock with an
authoritative, external source of time.
Algorithms that depend upon clock synchronization have been developed for several
problems in distribution. These include maintaining the consistency of distributed data, checking
the authenticity of a request sent to a server.
Measuring time can be problematic due to the existence of multiple frames of reference. The
relative order of two events can even be reversed for two different observers. But this cannot
happen if one event causes the other to occur the physical effect follows the physical cause for all
observers, although the time elapsed between cause and effect can vary. The timing of physical
events was thus proved to be relative to the observer.

1.3.2. Clocks, Events and Process State


 A distributed system consists of a collection P of N processes pi, i = 1,2,… N – Each
process pi has a state siconsisting of its variables (which it transforms as it executes)
 Processes communicate only by messages (via a network)
 Actions of processes: Send, Receive, change own state
 Event: the occurrence of a single action that a process carries out as it executes.
 Events at a single process pi, can be placed in a total ordering denoted by the relation →i
between the events. i.e. – e →i e’ if and only if the event e occurs before event e’ at process
pi .
 A history of process pi: is a series of events ordered by relation →i history(pi) = hi =<ei0,
ei1, ei2, …>
Clocks:
 To timestamp events, use the computer‘s clock • At real time, t, the OS reads the time on the
computer‘s hardware clock Hi(t)
 It calculates the time on its software clock Ci(t)=αHi(t) + β
 e.g. a 64 bit value giving nanoseconds since some base time.
 In general, the clock is not completely accurate – but if Ci behaves well enough, it can be
used to timestamp events at pi.
 Clock resolution – the period between updates of the clock value – is smaller than the time
interval between successive events. The rate at which events occur depends on such factors
as the length of the processor instruction cycle.
Clock skew and clock drift:
Computer clocks are not generally in perfect agreement
Clock skew: the difference between the times on two clocks (at any instant)
Computer clocks use crystal-based clocks that are subject to physical variations.

31
Clock drift: they count time at different rates and so diverge (frequencies of oscillation differ).
Clock drift rate: the difference per unit of time from some ideal reference clock
– Ordinary quartz clocks drift by about 1 sec in 11-12 days. (10-6 secs/sec).
– High precision quartz clocks drift rate is about 10-7 or 10-8 secs/sec

Coordinated Universal Time:


 UTC is an international standard for time keeping – It is based on atomic time, but
occasionally adjusted to astronomical time – International Atomic Time is based on very
accurate physical clocks (drift rate 10-13).
 It is broadcast from radio stations on land and satellite (e.g. GPS).
 Computers with receivers can synchronize their clocks with these timing signals (by
requesting time from GPS/UTC source).
 Signals from land-based stations are accurate to about 0.1-10 millisecond – Signals from
GPS are accurate to about 1 microsecond.

1.3.3. A framework for a system of logical clocks.


 Causality between events is fundamental to the design and analysis of parallel and
distributed computing and operating systems.
 Causality is tracked using physical time.
 It is not possible to have a global physical time.
 Approximation is possible.
 The fundamental monotonicity property associated with causality in distributed systems is
captured by logical time.
 There are 3 ways to implement logical time:
 Scalar time
 Vector time
 Matrix time
 Casual precedence relation among events of process solves variety of problem in distributed
systems:
 Distributed algorithm design
 Tracking of dependent events
 Knowledge about the progress of computation
 Concurrency measures.

⚫ Definition
⚫ A logical clocks consists :
⚫ Time domain T
⚫ Logical clock C .
⚫ Elements of T form a partially ordered set over a relation <.
⚫ Relation < is called the happened before or causal precedence.
⚫ This relation is similar to “earlier than” relation in physical time.
⚫ The logical clock C is a function.

32
⚫ It maps an event e in a distributed system to an element in the time domain T.
⚫ It is denoted as C(e) – called as time stamp of e & is defined as follows:
⚫ C : H ›→ T
⚫ such that the following property is satisfied:
⚫ for two events ei and ej , ei → ej =⇒ C(ei ) < C(ej ).
⚫ This monotonicity property is called the clock consistency condition.

⚫ for two events ei and ej , ei → ej ⇔ C(ei ) < C(ej ) the system of clocks is said to be
⚫ When T and C satisfy the following condition,

strongly consistent.
⚫ Implementing Logical Clocks
⚫ Implementation of logical clocks requires to address two issues:
⚫ Local data structures to represent logical time in every process
⚫ Protocol to update the data structures to ensure the consistency condition.
⚫ Each process pi maintains data structures that allow it the following two
capabilities:
⚫ A local logical clock(lci )
⚫ It helps process pi measure its own progress.
⚫ A logical global clock(gci )
⚫ It is a representation of process pi ’s local view of the logical global
time.
⚫ Typically, lci is a part of gci .
⚫ The protocol ensures that a process’s logical clock, and its view of the global time, is
managed consistently.
⚫ The protocol consists of the following two rules:
⚫ R1: This rule governs how the local logical clock is updated by a

process when it executes an event.


⚫ R2: This rule governs how a process updates its global logical
clock to update its view of the global time and global

progress.
⚫ Systems of logical clocks differ in their representation of logical time and also in the
protocol to update the logical clocks.

1.3.4. Scalar Time


 Proposed by Lamport in 1978 as an attempt to totally order events in a distributed system.
 Time domain is the set of non-negative integers.
 The logical local clock of a process pi and its local view of the global time are packed in
into one integer variable Ci .
 Rules R1 and R2 to update the clocks are as follows:
 R1: Before executing an event (send, receive, or internal), process pi executes the
following:
 Ci := Ci + d (d > 0)
 Every time R1 is executed, d can have a different value; however, typically d is kept at 1
o R2: Each message piggybacks the clock value of its sender at sending time.
o When a process pi receives a message with timestamp Cmsg , it executes the
following actions:

33
 Ci := max (Ci , Cmsg )
 Execute R1.
 Deliver the message.
o Fig show the evolution of scalar time.

Figure : The space-time diagram of a distributed execution.


Lamport Timestamp
⚫ Consider 3 process.
⚫ Each runs on different machine with its own clock and speed.
⚫ When clock has ticked 6 time in process p1 , it has ticked 8 times in process p2 and 10 times
in process p3.
⚫ Each clock runs at constant rate.
⚫ But rate varies according to the crystals.
⚫ @ time 6, process p1 sends message m1 to process p2
⚫ @16 process receives message m1
⚫ Process 2 conclude that it took 10 ticks to reach from process1 process2
⚫ According to this reasoning, message 2 from process 2 to process 3 takes 16 ticks
⚫ Message m3 form process 3 to process 2 leaves @60 and arrives at 56
⚫ Message m4 from process p2 leave at 64 and arrive at 54
⚫ But this values are not possible one.

Process with its own clock runs


at different rate
⚫ Lamport solution is given in the fig.
⚫ It uses happen-before relation.
⚫ Since message m3 left at 60, it must arrive at 61 or later.
⚫ So each message carry its sending time according to the sender’s clock.
⚫ When message arrive and the receiver's clock shows value prior to the time the message was
sent, the receiver fast forwards its clock to be one more than the sending time

34
Lamport’s Algorithm

Basic Properties
Consistency Property
⚫ Scalar clocks satisfy the monotonicity and hence the consistency property: for two
events ei and ej , ei → ej =⇒ C(ei ) < C(ej ).
Total Ordering
⚫ Scalar clocks can be used to totally order events.
⚫ The main problem in totally ordering events is that two or more events at different
processes may have identical timestamp.
⚫ For example in the previous Figure , the third event of process P1 and the second
event of process P2 have identical scalar timestamp.
⚫ A tie-breaking mechanism is needed to order such events.
⚫ A tie is broken as follows:
⚫ Process identifiers are linearly ordered.
⚫ Tie among events with identical scalar timestamp is broken on the basis of
their process identifiers.
⚫ The lower the process identifier in the ranking, the higher the priority.
⚫ The timestamp of an event is denoted by a tuple (t, i )
⚫ t - time of occurrence

⚫ The total order relation ≺ on two events x and y with timestamps (h,i) and
⚫ i- identity of the process where it occurred.

⚫ x ≺ y ⇔ (h < k or (h = k and i < j))


(k,j), respectively, is defined as follows:

⚫ Event counting
⚫ If the increment value d is always 1, the scalar time has the following interesting
property:
⚫ if event e has a timestamp h, then h-1 represents the minimum logical
duration, counted in units of events, required before producing the event e;
⚫ We call it the height of the event e.
⚫ In other words, h-1 events have been produced sequentially before the event e
regardless of the processes that produced these events.
⚫ For example, in previous figure, five events precede event b on the longest causal
path ending at b.
⚫ No Strong Consistency
⚫ Scalar clocks is not strongly consistent.

35
⚫ i.e., for two events ei and ej , C(ei ) < C(ej ) ƒ=⇒ ei → ej .
⚫ For example, in above figure, the third event of process P1 has smaller scalar
timestamp than the third event of process P2.
⚫ However, the former did not happen before the latter.
⚫ The reason that scalar clocks are not strongly consistent is that the logical local
clock and logical global clock of a process are squashed into one, resulting in the
loss causal dependency information among events at different processes.
⚫ For example, in the above Figure, when process P2 receives the first message from
process P1, it updates its clock to 3, forgetting that the timestamp of the latest event
at P1 on which it depends is 2.

1.3.5. Vector Time


 The system of vector clocks was developed by Fidge, Mattern and Schmuck.
 Here, the time domain is represented by a set of n-dimensional non-negative integer vectors.
 Each process pi maintains a vector vti [1..n].
 vti [i ] is the local logical clock of pi and describes the logical time progress at process pi .
 vti [j] represents process pi ’s latest knowledge of process pj local time.
 If vti [j]=x , then process pi knows that local time at process pj has progressed till x .
 The entire vector vti constitutes pi ’s view of the global logical time and is used to timestamp
events.
 Process pi uses the following two rules R1 and R2 to update its clock:
 R1: Before executing an event, process pi updates its local logical time as follows:
 vti [i ] := vti [i ] + d (d > 0)
 R2: Each message m is piggybacked with the vector clock vt of the sender process
at sending time.
 On the receipt of such a message (m,vt), process pi executes the following sequence of actions:
 Update its global logical time as follows:
 1 ≤ k ≤ n : vti [k ] := max (vti [k ], vt[k ])
 Execute R1.
 Deliver the message m.
 The timestamp of an event is the value of the vector clock of its process when the event is
executed.
 Figure shows an example of vector clocks progress with the increment value d=1.
 Initially, a vector clock is [0, 0, 0,...., 0].

Figure : Evolution of vector time.


⚫ The following relations are defined to compare two vector timestamps, vh and vk :
⚫ If the process at which an event occurred is known, the test to compare two timestamps can
be simplified as follows:
⚫ If events x and y respectively occurred at processes pi and pj and are assigned timestamps
vh and vk, respectively, then

36
⚫ x → y ⇔ vh[i ] ≤ vk [i ]
⚫ x ǁ y ⇔ vh[i ] > vk [i ] 𝖠 vh[j] < vk [j]
⚫ Isomorphism
⚫ If events in a distributed system are timestamped using a system of vector clocks,
we have the following property.

⚫ x → y ⇔ vh < vk x ǁ y ⇔ vh ǁ vk .
⚫ If two events x and y have timestamps vh and vk, respectively, then

⚫ Thus, there is an isomorphism between the set of partially ordered events produced
by a distributed computation and their vector timestamps.
⚫ Strong Consistency
⚫ The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related.
⚫ However, Charron-Bost showed that the dimension of vector clocks cannot be less
than n, the total number of processes in the distributed computation, for this
property to hold.
⚫ Event Counting
⚫ If d=1 (in rule R1), then the i th component of vector clock at process pi , vti [i ],
denotes the number of events that have occurred at pi until that instant.
⚫ So, if an event e has timestamp vh, vh[j] denotes the number of events executed by
process pj that causally precede e. Clearly, Σ vh[j] − 1 represents the total number of
events that causally precede e in the distributed computation.
⚫ Efficient Implementations of Vector Clocks
⚫ If the number of processes in a distributed computation is large, then vector clocks
will require piggybacking of huge amount of information in messages.
⚫ The message overhead grows linearly with the number of processors in the system
and when there are thousands of processors in the system, the message size becomes
huge even if there are only a few events occurring in few processors.
⚫ We discuss an efficient way to maintain vector clocks.
⚫ Charron-Bost showed that if vector clocks have to satisfy the strong consistency
property, then in general vector timestamps must be at least of size n, the total
number of processes.
⚫ However, optimizations are possible and next, and we discuss a technique to
implement vector clocks efficiently.

1.3.6. Physical Clock Synchronization: NTP


 Motivation
o Time is more important in most of applications and algorithms run in a distributed
systems
o It requires the following context for knowing the time:
 The time of the day at which an event happened on a specific machine in the
network.
 The time interval between two events that happened on different machines in
the network.
 The relative ordering of events that happened on different machines in the
network.
 Unless the clocks in each machine have a common notion of time, time-based queries
cannot be answered.

37
 Clock synchronization has a significant effect on many problems:
o Secure systems
o Fault diagnosis and recovery
o Scheduled operations
o Database systems
o Real-world clock values.
 Clock synchronization is the process of ensuring that physically distributed processors have
a common notion of time.
 Periodically a clock synchronization must be performed to correct the clock skew, if the
clocks at various sites may diverge with time.
 Because of different clock rate in clock, it may be varied.
 Clocks are synchronized to an accurate real-time standard like UTC (Universal Coordinated
Time).
 Clocks that must not only be synchronized with each other but also have to adhere to
physical time are termed physical clocks.
 Coordinated Universal Time (UTC):
o UTC is an international standard for time keeping
o It is based on atomic time, but occasionally adjusted to astronomical time
o International Atomic Time is based on very accurate physical clocks (drift rate 10-
13 ms).
o It is broadcast from radio stations on land and satellite (e.g. GPS).
o Computers with receivers can synchronize their clocks with these timing signals (by
requesting time from GPS/UTC source).
o Signals from land-based stations are accurate to about 0.1-10 millisecond – Signals
from GPS are accurate to about 1 microsecond.
⚫ Definitions and Terminology
⚫ Let Ca and Cb be any two clocks.
⚫ Time: The time of a clock in a machine p is given by the function Cp (t),
where Cp (t) = t for a perfect clock.
⚫ Frequency: Frequency is the rate at which a clock progresses. The
frequency at time t of clock Ca is Ca (t).
⚫ Offset: Clock offset is the difference between the time reported by a clock
and the real time. The offset of the clock Ca is given by Ca (t) − t. The offset
of clock Ca relative to Cb at time t ≥ 0 is given by Ca(t) − Cb (t).
⚫ Skew: The skew of a clock is the difference in the frequencies of the clock
and the perfect clock. The skew of a clock Ca relative to clock Cb at time t
⚫ Clock Inaccuracies
⚫ Physical clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time).
⚫ However, due to the clock inaccuracy discussed above, a timer (clock) is said to be
working within its specification if (where constant ρ is the maximum skew rate
specified by the manufacturer.)

1 − ρ ≤ dc/dt≤ 1 + ρ

Figure:The behavior of fast, slow, and perfect


clocks with respect to UTC.

38
⚫ Offset delay estimation method
⚫ The Network Time Protocol (NTP) which is widely used for clock synchronization
on the Internet uses the Offset Delay Estimation method.
⚫ The design of NTP involves a hierarchical tree of time servers.
⚫ The primary server at the root synchronizes with the UTC.
⚫ The next level contains secondary servers, which act as a backup to the
primary server.
⚫ At the lowest level is the synchronization subnet which has the clients.
⚫ strata: the hierarchy level

⚫ Clock offset and delay estimation:


⚫ In practice, a source node cannot accurately estimate the local time on the
target node due to varying message or network delays between the nodes.
⚫ This protocol employs a common practice of performing several trials and chooses
the trial with the minimum delay.
⚫ Figure shows how NTP timestamps are numbered and exchanged between peers A
and B.
⚫ Let T1, T2, T3, T4 be the values of the four most recent timestamps as shown.
⚫ Assume clocks A and B are stable and running at the same speed.

Offset and delay estimation


⚫ Let a = T1 − T3 and b = T2 − T4.
⚫ If the network delay difference from A to B and from B to A, called differential delay,
⚫ It is small, the clock offset θ and roundtrip delay δ of B relative to A at time T4 are
approximately given by the following.
⚫ Each NTP message includes the latest three timestamps T1, T2 and T3, while T4 is
determined upon arrival.

θ= a + b/ 2 δ= a−b
⚫ Thus, both peers A and B can independently calculate delay and offset using a single
bidirectional message stream as shown in Figure .

39
⚫ A pair of servers in symmetric mode exchange pairs of timing messages.
⚫ A store of data is then built up about the relationship between the two servers (pairs of
offset and delay).
⚫ Specifically, assume that each peer maintains pairs (Oi ,Di ), where
⚫ Oi - measure of offset (θ)
⚫ Di - transmission delay of two messages (δ).
⚫ The offset corresponding to the minimum delay is chosen.
⚫ Specifically, the delay and offset are calculated as follows.
⚫ Assume that message m takes time t to transfer and m′ takes t′ to transfer.
⚫ The offset between A’s clock and B’s clock is O. If A’s local clock time is
⚫ A(t) and B’s local clock time is B(t), we have
⚫ A(t) = B(t) + O
⚫ Then,
⚫ Ti −2 = Ti −3 + t + O
⚫ Ti = Ti −1 − O + t
⚫ Assuming t = t′, the offset Oi can be estimated as:
⚫ Oi = (Ti −2 − Ti −3 + Ti −1 − Ti )/2
⚫ The round-trip delay is estimated as:
⚫ Di = (Ti − Ti −3) − (Ti −1 − Ti −2)
⚫ The eight most recent pairs of (Oi , Di ) are retained.
⚫ The value of Oi that corresponds to minimum Di is chosen to estimate O.

40

You might also like