0% found this document useful (0 votes)
80 views

Introduction To Distributed Systems: CSE 380 Computer Operating Systems

This document provides an introduction to distributed systems. It defines a distributed system as a collection of independent computers that appear to users as a single system. Examples include networks of workstations, manufacturing systems, and branch office networks. Distributed systems provide advantages like availability, reliability, scalability, and the ability to incrementally increase computing power. However, they also present challenges for developing software and achieving performance, reliability, transparency and coordination across independent systems. The document discusses concepts like logical clocks, event ordering, and algorithms for clock synchronization and reaching agreement in unreliable distributed environments.

Uploaded by

sharmaarakeysh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Introduction To Distributed Systems: CSE 380 Computer Operating Systems

This document provides an introduction to distributed systems. It defines a distributed system as a collection of independent computers that appear to users as a single system. Examples include networks of workstations, manufacturing systems, and branch office networks. Distributed systems provide advantages like availability, reliability, scalability, and the ability to incrementally increase computing power. However, they also present challenges for developing software and achieving performance, reliability, transparency and coordination across independent systems. The document discusses concepts like logical clocks, event ordering, and algorithms for clock synchronization and reaching agreement in unreliable distributed environments.

Uploaded by

sharmaarakeysh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Introduction to Distributed Systems

Why do we develop distributed systems?


availability of powerful yet cheap microprocessors (PCs,
workstations), continuing advances in communication technology,

CSE 380
Computer Operating Systems

What is a distributed system?


A distributed system is a collection of independent computers that
appear to the users of the system as a single system.
Examples:
Network of workstations
Distributed manufacturing system (e.g., automated assembly line)
Network of branch office computers

Instructor: Insup Lee

University of Pennsylvania
Fall 2003
Lecture Note: Distributed Systems

Advantages of Distributed Systems


over Centralized Systems

Distributed Systems

Comparison of three kinds of multiple CPU systems

Economics: a collection of microprocessors offer a better


price/performance than mainframes. Low price/performance ratio: cost
effective way to increase computing power.
Speed: a distributed system may have more total computing power
than a mainframe. Ex. 10,000 CPU chips, each running at 50 MIPS.
Not possible to build 500,000 MIPS single processor since it would
require 0.002 nsec instruction cycle. Enhanced performance through
load distributing.
Inherent distribution: Some applications are inherently distributed. Ex.
a supermarket chain.
Reliability: If one machine crashes, the system as a whole can still
survive. Higher availability and improved reliability.
Incremental growth: Computing power can be added in small
increments. Modular expandability
Another deriving force: the existence of large number of personal
computers, the need for people to collaborate and share information.
4

Advantages of Distributed Systems


over Independent PCs

Disadvantages of Distributed Systems

Data sharing: allow many users to access to a


common data base
Resource Sharing: expensive peripherals like
color printers
Communication: enhance human-to-human
communication, e.g., email, chat
Flexibility: spread the workload over the
available machines

Software: difficult to develop software for


distributed systems
Network: saturation, lossy transmissions
Security: easy access also applies to secrete
data

Software Concepts

Network Operating Systems


loosely-coupled software on loosely-coupled hardware
A network of workstations connected by LAN
each machine has a high degree of autonomy

Software more important for users


Two types:
1. Network Operating Systems

o rlogin machine
o rcp machine1:file1 machine2:file2

Files servers: client and server model


Clients mount directories on file servers
Best known network OS:

2. (True) Distributed Systems

o Suns NFS (network file servers) for shared file systems (Fig.
9-11)

a few system-wide requirements: format and meaning


of all the messages exchanged
7

NFS (Network File System)

(True) Distributed Systems


tightly-coupled software on loosely-coupled hardware

NFS Architecture
Server exports directories
Clients mount exported directories
NSF Protocols
For handling mounting
For read/write: no open/close, stateless
NSF Implementation

provide a single-system image or a virtual uniprocessor


a single, global interprocess communication mechanism,
process management, file system; the same system call
interface everywhere
Ideal definition:
A distributed system runs on a collection of computers that do
not have shared memory, yet looks like a single computer to its
users.

Design Issues of Distributed Systems

10

1. Transparency
How to achieve the single-system image, i.e., how to
make a collection of computers appear as a single
computer.
Hiding all the distribution from the users as well as
the application programs can be achieved at two
levels:
1) hide the distribution from users
2) at a lower level, make the system look transparent to
programs.
1) and 2) requires uniform interfaces such as access to
files, communication.

Transparency
Flexibility
Reliability
Performance
Scalability

11

12

2. Flexibility

3. Reliability

Make it easier to change


Monolithic Kernel: systems calls are trapped and
executed by the kernel. All system calls are served
by the kernel, e.g., UNIX.
Microkernel: provides minimal services.
IPC
some memory management
some low-level process management and scheduling
low-level i/o (E.g., Mach can support multiple file
systems, multiple system interfaces.)

Distributed system should be more reliable than


single system. Example: 3 machines with .95
probability of being up. 1-.05**3 probability of being
up.
Availability: fraction of time the system is usable.
Redundancy improves it.
Need to maintain consistency
Need to be secure
Fault tolerance: need to mask failures, recover from
errors.

13

4. Performance

14

5. Scalability
Systems grow with time or become obsolete.
Techniques that require resources linearly in terms of
the size of the system are not scalable. (e.g.,
broadcast based query won't work for large
distributed systems.)
Examples of bottlenecks

Without gain on this, why bother with distributed


systems.
Performance loss due to communication delays:
fine-grain parallelism: high degree of interaction
coarse-grain parallelism
Performance loss due to making the system fault
tolerant.

o Centralized components: a single mail server


o Centralized tables: a single URL address book
o Centralized algorithms: routing based on complete information

15

16

Distributed Coordination

Why need to synchronize clocks?


foo.o created

Communication between processes in a distributed system can have


unpredictable delays, processes can fail, messages may be lost
Synchronization in distributed systems is harder than in centralized
systems because the need for distributed algorithms.
Properties of distributed algorithms:
1 The relevant information is scattered among multiple machines.
2 Processes make decisions based only on locally available information.
3 A single point of failure in the system should be avoided.
4 No common clock or other precise global time source exists.
Challenge: How to design schemes so that multiple systems can
coordinate/synchronize to solve problems efficiently?

Computer for
compiling
Computer for
editing

2144

2142

2145

2143

2146

2144

2147

2145

Local clock time

Local clock time

foo.c modified

17

Logical and physical clocks

18

Event Ordering
Since there is no common memory or clock, it is sometimes impossible
to say which of two events occurred first.
The happened-before relation is a partial ordering of events in
distributed systems such that
1 If A and B are events in the same process, and A was executed before B,
then A B.
2 If A is the event of sending a message by one process and B is the event of
receiving that by another process, then A B.
3 If A B and B C, then A C.
If two events A and B are not related by the relation, then they are
executed concurrently (no causal relationship)
To obtain a global ordering of all the events, each event can be time
stamped satisfying the requirement: for every pair of events A and B, if
A B then the time stamp of A is less than the time stamp of B. (Note
that the converse need not be true.)

How a computer timer works?


A counter register and a holding register.
The counter is decremented by a quartz crystals oscillator.
When it reaches zero, an interrupted is generated and the counter is
reloaded from the holding register.
E.g, interrupt 60 times per second.
clock skew problem
logical clocks -- to provide consistent event ordering
physical clocks -- clocks whose values must not deviate from
the real time by more than a certain amount.

19

20

Global ordering

Example of Event Ordering

How do we enforce the global ordering requirement


in a distributed environment (without a common
clock)?
1 For each process Pi , a logical clock LCi assign a unique
value to every event in that process.
2 If process Pi receives a message (event B) with time
stamp t and LCi(B) < t, then advance its clock so that
LCi(B) = t+1.
3 Use processor ids to break ties to create a total
ordering.

21

Example: Lamports Algorithm

Example of Global Timestamps


(1,0)

(2,0)

(1,1)

(5,0)

(2,2)

(4,1)
(5,1)

(6,0)

Three processes, each with its own clock. The


clocks run at different rates.
Lamports Algorithm corrects the clock.

(1,2)

(2,1)
(3,1)

22

(4,2)

(6,1)
(7,1)

(7,0)

Note: ts(A) < ts(B) does not imply A happened before B.

23

24

Physical clock synchronization algorithms

Physical clock synchronization algorithms

Maximum drift rate


One can determine how often they should be
synchronized

Cristian's algorithm
need to change time gradually
need to consider msg delays, subtract (T1 - T0 - I)/2

Getting the current time from a time server

Not all clocks tick precisely at the current rate.


25

Physical clock synchronization algorithms

26

Physical clock synchronization algorithms


Multiple external time sources
UTC (Universal Coordinated Time)
NIST broadcasts WWV signal at every UTC sec from CO.

The Berkeley algorithm


Averaging algorithm
The time daemon asks all the other machines for their clock values.
The machines answer.
The Time daemon tells everyone how to adjust their clock.

Computing UTC from multiple time sources, each of which gives


a time interval in which UTC falls.

27

28

Unreliable communication

Reaching Agreement
How can processes reach consensus in a distributed system
Messages can be delayed
Messages can be lost
Processes can fail (or even malignant)
Messages can be corrupted
Each process starts with a bit (0 or 1) and Non-faulty
processes should eventually agree on common value
No solution is possible
Note: solutions such as computing majority do not work. Why?
Two generals problem (unreliable communications)
Byzantine generals problem (faulty processes)
29

Two generals problem

30

Impossibility Proof
Theorem. If any message can be lost, it is not possible for two
processes to agree on non-trivial outcome using only messages
for communication.
Proof. Suppose it is possible. Let m[1],,m[k] be a finite
sequence of messages that allowed them to decide. Furthermore,
lets assume that it is a minimal sequence, that is, it has the least
number of messages among all such sequences. However, since
any message can be lost, the last message m[k] could have been
lost. So, the sender of m[k] must be able to decide without
having to send it (since the sender knows that it may not be
delivered) and the receiver of m[k] must be able to decide without
receiving it. That is, m[k] is not necessary for reaching
agreement. That is, m[1],,m[k-1] should have been enough for
the agreement. This is a contradiction to that the sequence
m[1],,m[k] was minimum.

Two generals on opposite sides of a valley have to agree on whether to


attack or not (at a pre-agreed time)
Goal: Each must be sure that the other one has made the same
decision
Communicate by sending messenger who may get captured
Can never be sure whether the last messenger reached the other side
(every message needs an ack), so no perfect solution
Impossibility of consensus is as fundamental as undecidability of the
halting problem !
In practice: probability of losing a repeatedly sent message decreases
(so agreement with high probability possible)

31

32

Mutual Exclusion and Synchronization

A Centralized Algorithm
Use a coordinator which enforces mutual exclusion.
Two operations: request and release.
Process 1 asks the coordinator for permission to enter a critical region. Permission
is granted.
Process 2 then asks permission to enter the same critical region. The coordinator
des not reply.
When process 1 exists the critical region, it tells the coordinator, which then replies
to 2.

To solve synchronization problems in a distributed


system, we need to provide distributed semaphores.
Schemes for implementation :
1 A Centralized Algorithm
2 A Distributed Algorithm
3 A Token Ring Algorithm

33

34

A Centralized Algorithm

A Centralized Algorithm (continued)


Coordinator
loop
receive(msg);
case msg of
REQUEST: if nobody in CS
then reply GRANTED
else queue the REQ;
reply DENIED
RELEASE: if queue not empty then
remove 1st on the queue
reply GRANTED
end case
end loop
Client
send(REQUEST);
receive(msg);
if msg != GRANTED then receive(msg);
enter CS;
send(RELEASE)

Algorithm properties
guarantees mutual exclusion
fair (First Come First Served)
a single point of failure (Coordinator)
if no explicit DENIED message, then cannot distinguish
permission denied from a dead coordinator

35

36

A Decentralized Algorithm

A Decentralized Algorithm
Decision making is distributed across the entire
system
a) Two processes want to enter the same critical region
at the same moment.
b) Process 0 has the lowest timestamp, so it wins.
c) When process 0 is done, it sends an OK also, so 2
can now enter the critical region.

Decision making is distributed across the entire system


a) Two processes want to enter the same critical region at the
same moment.
b) Both send request messages to all processes
c) All events are time-stamped by the global ordering
algorithm
d) The process whose request event has smaller time-stamp
wins
e) Every process must respond to request messages

37

38

Correctness

Decentralized Algorithm (continued)

Theorem. The Algorithm achieves mutual exclusion.


Proof:
By contradiction.
Suppose two processes Pi and Pj are in CS concurrently.
WLOG, assume that Pis request has earlier timestamp than
Pj. That is, Pi received Pj's request after Pi made its own
request.
Thus, Pj can concurrently execute the CS with Pi only if Pi
returns a REPLY to Pj before Pi exits the CS.
But, this is impossible since Pj has a later timestamp than Pi.

1 When a process wants to enter its critical section, it


generates a new time stamp, TS, and sends the msg
request(p,TS) to all other processes in the system (recall
algorithm for global ordering of events)
2 A process, which receives reply msgs from all other
processes, can enter its critical section.
3 When a process receives a request message,
(A) if it is in CS, defers its answer;
(B) if it does not want to enter its CS, reply immediately;
(C) if it also wants to enter its CS, it maintains a queue of
requests (including its own request) and sends a reply to the
request with the minimum time-stamp
39

40

10

Properties

A Token Passing Algorithm

1
2
3
4
5

mutual exclusion is guaranteed


deadlock free
no starvation, assuming total ordering on msgs
2(N-1) msgs: (N-1) request and (N-1) reply msgs
n points of failure (i.e., each process becomes a point of failure) can
use explicit ack and timeout to detect failed processes
6 each process needs to maintain group membership; (i.e. IDs of all
active processes) non-trivial for large and/or dynamically changing
memberships
7 n bottlenecks since all processes involved in all decisions
8 may use majority votes to improve the performance

A token is circulated in a logical ring.


A process enters its CS if it has the token.
Issues:
If the token is lost, it needs to be regenerated.
Detection of the lost token is difficult since there is no
bound on how long a process should wait for the token.
If a process can fail, it needs to be detected and then bypassed.
When nobody wants to enter, processes keep on
exchanging messages to circulate the token
41

Comparison

42

Leader Election
In many distributed applications, particularly the
centralized solutions, some process needs to be
declared the central coordinator
Electing the leader also may be necessary when the
central coordinator crashes
Election algorithms allow processes to elect a unique
leader in a decentralized manner

A comparison of three mutual exclusion algorithms

43

44

11

Bully Algorithm

Bully Algorithm

Goal: Figure out the active process with max ID


1. Suppose a process P detects a failure of current leader
P sends an election message to all processes with higher ID
If nobody responds within interval T, sends coordinator
message to all processes with lower IDs
If someone responds with OK message, P waits for a
coordinator message (if not received, restart the algorithm)
2. If P receives a message election from a process with lower
ID, responds with OK message, and starts its own leader
election algorithm (as in step 1)
3. If P receives coordinator message, record the ID of leader

(a) Process 4 holds an election. (b) Processes 5 and 6


respond, telling 4 to stop. (c) Now 5 and 6 each hold an
election. (d) Process 6 tells 5 to stop. (e) Process 6 wins
and tells everyone.

45

Leader Election in a Ring

ID1

ID5

ID4

ID2

ID3

46

Distributed Deadlock

Each process has unique ID; can


receive messages from left, and send
messages to the right
Goal: agree on who is the leader
(initially everyone knows only its own ID)
Idea:
initially send your own ID to the right.
When you receive an ID from left, if it is
higher than what you have seen so far,
send it to right.
If your own ID is received from left, you
have the highest ID and are the leader

A deadlock occurs when a set of processes in a


system are blocked waiting for requests that can
never be satisfied.
Approaches:
Detection (& Recovery)
Prevention
Avoidance - not practical in distributed setting
Difficulties:
resource allocation information is distributed
gathering information requires messages. Since
messages have non-zero delays, it is difficult to have an
accurate and current view of resource allocation.
47

48

12

Deadlock Detection Recall

Wait For Graph (WFG)

Suppose following information is available:


For each process, the resources it currently holds
For each process, the request that it is waiting for
Then, one can check if the current system state is
deadlocked, or not
In single-processor systems, OS can maintain this
information, and periodically execute deadlock
detection algorithm
What to do if a deadlock is detected?
Kill a process involved in the deadlocked set
Inform the users, etc.

Definition. A resource graph is a bipartite directed graph (N,E),


where
N = P U R,
P = {p1, ... pn} , R = {r1 , ... rn}
(r1 , ... rn) available unit vector,
An edge (pi , rj) a request edge, and
An edge (ri , pj) an allocation edge.
Definition: Wait For Graph (WFG) is a directed graph, where
nodes are processes and a directed edge from P Q
represents that P is blocked waiting for Q to release a resource.
So, there is an edge from process P to process Q if P needs a
resource currently held by Q.

49

Definitions

50

Sufficient Conditions for Deadlock

Def: A node Y is reachable from a node X, X Y, if


there is a path (i.e., a sequence of directed edges)
from node X to node Y.
Def: A cycle in a graph is a path that starts and ends
on the same node. If a set C of nodes is a cycle, then
for all X in C : X X
Def: A knot K in a graph is a non-empty set of nodes
such that, for each X in K, all nodes in K and only the
nodes in K are reachable from X. That is,
(for every X for every Y in K, X Y) and
(for every X in K, there exists Z s.t. X Z implies Z is
in K)

Resource Model
1 reusable resource
2 exclusive access
Three Request Models
1 Single-unit request model:
a cycle in WFG

2 AND request model : simultaneous requests


blocked until all of them granted
a cycle in WFG
a process can be in more than one cycle

3 OR request model : any one, e.g., reading a replicated data object


a cycle in WFG not a sufficient condition (but necessary)
a knot in WFG is a sufficient condition (but not necessary)

51

52

13

Deadlock Detection Algorithms

Wait-for Graph for Detection

Centralized Deadlock Detection


false deadlock

Assume only one instance of each resource


Nodes are processes
Recall Resource Allocation Graph: it had nodes for resources as
well as processes (basically same idea)
Edges represent waiting: If P is waiting to acquire a resource
that is currently held by Q, then there is an edge from P to Q
A deadlock exists if and only if the global wait-for graph has a
cycle
Each process maintains a local wait-for graph based on the
information it has
Global wait-for graph can be obtained by the union of the
edges in all the local copies

(a) Initial resource graph for machine 0.


(b) Initial resource graph for machine 1.
(c) The coordinators view of the world.
(d) The situation after the delayed message.
53

Distributed Cycle Detection

54

Deadlock Detection Algorithms

Each site looks for potential cycles


Suppose site S1 has processes P1, P2, P3, P4.
S1 knows that P7 (on a different site) is waiting for P1, P1 is
waiting for P4, P4 is waiting for P2, and P2 is waiting for P9
(on a different site S3)
This can be a potential cycle
S1 sends a message to S3 giving the chain P7, P1, P4, P2,
P9
Site S3 knows the local dependencies, and can extend the
chain, and pass it on to a different site
Eventually, some site will detect a deadlock, or will stop
forwarding the chain

Distributed Deadlock Detection: An Edge-Chasing


Algorithm

Chandy, Misra, and Haas distributed deadlock detection algorithm.

55

56

14

Deadlock Prevention

Two commonly used schemes


Wait-Die (WD): Non-preemptive
When P requests a resource currently held by Q , P is
allowed to wait only if it is older than Q. Otherwise, P is
rolled back (i.e., dies).

Hierarchical ordering of resources avoids cycles


Time-stamp ordering approach:
Prevent the circular waiting condition by preempting resources if
necessary.

Wound-Wait (WW): Preemptive


When P requests a resource currently held by Q , P is
allowed to wait only if P is younger than Q. Otherwise, Q is
rolled back (releasing its resource). That is, P wounds Q.

The basic idea is to assign a unique priority to each process and use
these priorities to decide whether process P should wait for process Q.
Let P wait for Q if P has a higher priority than Q; Otherwise, P is
rolled back.

Note:
Both favor old jobs (1) to avoid starvation, and (2) since older
jobs might have done more work, expensive to roll back.
Unnecessary rollbacks may occur.

This prevents deadlocks since for every edge (P ,Q) in the wait-for
graph, P has a higher priority than Q.
Thus, a cycle cannot exist.

57

Sample Scenario

58

WD versus WW

Processes P, Q, R are executing at 3 distributed sites


Suppose the time-stamps assigned to them (at the
time of their creation) are 5, 10, 20, resp
Q acquires a shared resource
Later, R requests the same resource
WD would roll back R
WW would make R wait
Later, P requests the same resource
WD would make P wait
WW would roll back Q, and give the resource to P

59

60

15

Example

Differences between WD and WW

Let P1 (5), P2 (10), P3 (15), and P2 has a resource.

In WD, older waits for younger to release resources.


In WW, older never waits for younger.
WD has more roll back than WW.
In WD, P3 requests and dies because P2 is older in the above
example. If P3 restarts and again asks for the same resource, it rolls
back again if P2 is still using the resource.
However, in WW, P2 is rolled back by P1. If it requests the
resource again, it waits for P1 to release it.

Wait-Die (WD):
(1) P1 requests the resource held by P2. P1 waits.
(2) P3 requests the resource held by P2. P3 rolls back.
Wound-Wait (WW):
(1) P1 requests the resource held by P2. P1 gets the resource
and P2 is rolled back.
(2) P3 requests the resource held by P2. P3 waits.

When there are more than one process waiting for a resource held
by P, which process should be given the resource when P finishes?
In WD, the youngest among waiting ones. In WW, the oldest.

61

Layers of distributed systems

62

Middleware for Distributed Systems

Computer networks
Local area networks such as Ethernet
Wide area networks such as Internet
Network services
Connection-oriented services
Connectionless services
Datagrams
Network protocols
Internet Protocol (IP)
Transmission Control Protocol (TCP)
Middleware

Middleware is a layer of software between applications and


OS that gives a uniform interface
Central to developing distributed applications
Different types
Document based (world-wide web)
File-system based (e.g., NFS)
Shared object-based (CORBA)
Coordination based (Linda, Publish-subscribe, Jini)

63

64

16

Summary
Distributed coordination problems
Event ordering
Agreement
Mutual exclusion
Leader election
Deadlock detection
Middleware for distributed application support
Starting next week: Chapter 9 (Security)

65

17

You might also like