Global Snapshots
Distributed Snapshot
A distributed computing system consist of spatially separated
processes that do not share a common memory and communicate
asynchronously with each other by message passing over
communication channels.
A computation is a sequence of atomic actions that transform a
given initial state to the final state. While such actions are totally
ordered in a sequential process, they are only partially ordered in
a distributed system.
Introduction
Each component of a distributed system has a local state.
The state of a process is characterized by the state of its
local memory and a history of its activity.
The state of a channel is characterized by the set of
messages sent along the channel .
The global state of a Distributed system is a collection
of the local states of its components.
Recording the global state of a distributed system on-the fly is
an important paradigm .
A is recorded at t0 and B , C12, C21 at t2 then Global state
shows 850 in the system . An extra 50 appears in the system !!!
Why take a global snapshot?
p1 p2
object
reference
message
a. (Distributed) garbage object
Garbage collection
p1 p2
wait-for
b. (Distributed) wait-for
Deadlock Detection
p1 p2
activate
c. Termination Detection passive passive
Difficulties in global calculation
Consider a system of three processes numbered 0, 1, and 2 connected by
FIFO channels, and assume that an unknown number of
indistinguishable tokens are circulating indefinitely through this network.
We want the processes to cooperate with one another to count the exact
number of tokens circulating in the system (without ever stopping the
system).
The task has to be initiated by an initiator process (say process 0) that will
send query messages to the other processes to record the number of
tokens sighted by them.
Possibilities
PROPERTIES OF CONSISTENT SNAPSHOTS
A snapshot state (SSS) consists of a set of local
states, where each local state is the outcome of a
recording event that follows a send, or a receive, or an
internal action.
System Model
Process history
For a process Pi , where events ei0, ei1, … occur:
history(Pi) = hi = <ei0, ei1, … >
prefix history(Pik) = hik = <ei0, ei1, …,eik >
Sik : Pi ’s state immediately before kth event
e10 e11 e12 e13
P1
e21
P2 e22
e20
P3 e30 e31 e32
Dept. of IT, Jadavpur University 11
Models of communication
Global history and cuts
For a set of processes P1 , …,Pi , …. :
global history: H = i (hi)
global state: S = i (Siki)
a cut C H = h1c1 h2c2 … hncn
the frontier of C = {eici, i = 1,2, … n}
e10 e11 e12 e13
P1
e21
P2 e22
e20
P3 e30 e31 e32
Dept. of IT, Jadavpur University 16
Note: Global state does not record the state of the channels separately
A cut is called consistent, if for each event that it contains, it also
includes all events causally ordered before it. Let a, b be two events
in a distributed system. Then
(a ∈ consistent cut C) ∧ (b ≺ a) ⇒ b ∈ C
Thus, for a message m, if the state following receive (m) belongs
to a consistent cut, then the state following send (m) also must
belong to that cut.
Cut 1 = {a, b, c, m, k} is consistent, but Cut 2 = {a, b, c, d, g, m, e,
k, i} is not, since (g ∈ Cut 2) ∧ (h ≺ g), but h does not belong to
Cut 2. As processes make progress, new events update the
consistent cut.
The set of local states following the most recent events (an event a is
a most recent event, if there is no other event b such that a ≺ b)
of a cut defines a snapshot.
Here {c, d, f, g, h} is a cut.