Lecture 1
Lecture 1
Lecture 1
Proposed by
Dr: Alshaimaa Mostafa Mohammed
1-1
Lecture Rules
1. Arrive on Time.
2. Turn Off Cell Phones (Silent).
3. If You Have a Question, Ask for Help
4. Do not have private conversations.
Evaluation
•Total Degree: 150
•Mid Term: 15
•Practical: 30
•Oral:15
•Final Exam: 90
Recommended Course
Textbooks
• DISTRIBUTED SYSTEMS Concepts
and Design - Fifth Edition - George
Coulouris
1.1What is a distributed
system?
1.2 Design goals
1.3 Types of distributed
Definition
9 / 56
Characteristic features
Autonomous computing elements, also referred to as
nodes, be they hardware devices or software
processes.
Single coherent system: users or applications perceive
a single system ⇒ nodes need to collaborate.
10 /
Collection of autonomous nodes
Independent behavior
Each node is autonomous and will thus have its own
notion of time: there is no global clock.
Leads to fundamental synchronization and
coordination problems.
11 /
Collection of autonomous nodes
Collection of nodes
12 /
Organization
Overlay network
a node is typically a software process equipped with a list of
other processes it can directly send messages to.
It may also be the case that a neighbor needs to be first
looked up.
Message passing is then done through TCP/IP or UDP
channels
14 /
Coherent system
Essence
The collection of nodes as a whole operates the same, no matter
where, when, and how interaction between a user and the system
takes place.
Examples
► An end user cannot tell where a computation is taking place
► Where data is exactly stored should be irrelevant to an
application
► If or not data has been replicated is completely hidden
Keyword is distribution transparency
15 /
Middleware: the OS of distributed systems
Network
16 /
In a sense, middleware is the same to a
distributed system as what an operating
system is to a computer: a manager of
resources offering its applications to
efficiently share and deploy those
resources across a network.
Next to resource management, it offers
services that can also be found in most
operating systems, including:
• Facilities for inter application
communication.
• Security services.
• Accounting services.
What do we want to achieve?
18 /
Sharing resources
Canonical examples
► Cloud-based shared storage and files
► Peer-to-peer assisted multimedia streaming
► Shared mail services (think of outsourced mail systems)
► Shared Web hosting (think of content distribution networks)
Observation
“The network is the computer”
19 /
Distribution transparency
An important goal of a distributed
system is to hide the fact that its
processes
and resources are physically
distributed across multiple
computers possibly
separated by large distances.
In other words, it tries to make
the distribution of processes and
Distribution transparency Types
Transparency Description
Access Hide differences in data representation and how
an object is accessed
Location Hide where an object is located
Relocation Hide that an object may be moved to another
location while in use
Migration Hide that an object may move to another location
Replication Hide that an object is replicated
Concurrency Hide that an object may be shared by several
independent users
Failure Hide the failure and recovery of an object
21 /
Degree of transparency
Observation
Aiming at full distribution transparency may be too much:
10 / 56
Degree of transparency
Observation
Aiming at full distribution transparency may be too much:
► There are communication latencies that cannot be hidden
10 / 56
Degree of transparency
Observation
Aiming at full distribution transparency may be too much:
► There are communication latencies that cannot be hidden
► Completely hiding failures of networks and nodes is (theoretically
and practically) impossible
► You cannot distinguish a slow computer from a failing one
► You can never be sure that a server actually performed an
operation before a crash
10 / 56
Degree of transparency
Observation
Aiming at full distribution transparency may be too much:
► There are communication latencies that cannot be hidden
► Completely hiding failures of networks and nodes is (theoretically
and practically) impossible
► You cannot distinguish a slow computer from a failing one
► You can never be sure that a server actually performed an
operation before a crash
► Full transparency will cost performance, exposing distribution of
the system
► Keeping replicas exactly up-to-date with the master takes
time
► Immediately flushing write operations to disk for fault
tolerance
10 / 56
Degree of transparency
11 / 56
Degree of transparency
Conclusion
Distribution transparency is a nice a goal, but achieving it is a different
story, and it should often not even be aimed at.
11 / 56
Openness of distributed systems
What are we talking about?
Be able to interact with services from other open systems, irrespective
of the underlying environment:
► Systems should conform to well-defined interfaces
► Systems should easily interoperate
► Systems should support portability of applications
► Systems should be easily extensible
12 / 56
Policies versus mechanisms
13 / 56
On strict separation
Observation
The stricter the separation between policy and mechanism, the more
we need to make ensure proper mechanisms, potentially leading to
many configuration parameters and complex management.
Finding a balance
Hard coding policies often simplifies management and reduces
complexity at the price of less flexibility. There is no obvious solution.
14 / 56
Scale in distributed systems
Observation
Many developers of modern distributed systems easily use the
adjective “scalable” without making clear why their system actually
scales.
15 / 56
Scale in distributed systems
Observation
Many developers of modern distributed systems easily use the
adjective “scalable” without making clear why their system actually
scales.
15 / 56
Scale in distributed systems
Observation
Many developers of modern distributed systems easily use the
adjective “scalable” without making clear why their system actually
scales.
Observation
Most systems account only, to a certain extent, for size scalability.
Often a solution: multiple powerful servers operating independently in 15 / 56
Size scalability
16 / 56
Formal analysis
A centralized service can be modeled as a simple queuing
system
Requests Response
Queue Process
(1 − U )U U
N= ∑ k ·pk = k∑≥0k ·(1−U)U k = (1−U) k∑≥0k ·U k = (1 − U) 2= 1
k ≥0
−U
Average throughput
λ
X= U · µ + (1 − U ) · 0 = · µ =
µ
λ
serv e r a t work serv e r
18 / 56
Formal analysis
Observations
► If U is small, response-to-service time is close to 1: a request is
immediately processed
► If U goes up to 1, the system comes to a grinding halt. Solution:
decrease S.
19 / 56
Problems with geographical scalability
20 / 56
Problems with administrative scalability
Essence
Conflicting policies concerning usage (and thus payment),
management, and security
Examples
► Computational grids: share expensive resources between
different domains.
► Shared equipment: how to control, manage, and use a shared
radio telescope constructed as large-scale shared sensor
network?
22 / 56
Techniques for scaling
Facilitate solution by moving computations to client
Client Server
MAARTEN M
FIRST NAME A
LAST NAME VAN STEEN A
E-MAIL R
[email protected] T
E
N
23 / 56
Techniques for scaling
24 / 56
Techniques for scaling
25 / 56
Scaling: The problem with replication
26 / 56
Scaling: The problem with replication
26 / 56
Scaling: The problem with replication
26 / 56
Scaling: The problem with replication
26 / 56
Scaling: The problem with replication
Observation
If we can tolerate inconsistencies, we may reduce the need for global
synchronization, but tolerating inconsistencies is application
dependent.
26 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
27 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
False (and often hidden) assumptions
27 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
False (and often hidden) assumptions
► The network is reliable
27 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
27 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
27 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
27 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
► Latency is zero
27 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
► Latency is zero
► Bandwidth is infinite
27 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
► Latency is zero
► Bandwidth is infinite
► Transport cost is zero
27 / 56
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by mistakes
that required patching later on. Many false assumptions are often
made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
► Latency is zero
► Bandwidth is infinite
► Transport cost is zero
► There is one administrator
27 / 56
Three types of distributed systems
28 / 56
Parallel computing
Observation
High-performance distributed computing started with parallel
computing
Interconnect P P P P
P P P Interconnect
Processor Memory
29 / 56
Distributed shared memory systems
Observation
Multiprocessors are relatively easy to program in comparison to
multicomputers, yet have problems when increasing the number of
processors (or cores). Solution: Try to implement a shared-memory
model on top of a multicomputer.
Problem
Performance of distributed shared memory could never compete with
that of multiprocessors, and failed to meet the expectations of
programmers. It has been widely abandoned by now.
30 / 56
Cluster computing
31 / 56
Grid computing
Note
To allow for collaborations, grids generally use virtual organizations. In
essence, this is a grouping of users (or better: their IDs) that will allow
for authorization on resource allocation.
32 / 56
Architecture for grid computing
The layers
Fabric: Provides interfaces to local
resources (for querying state and
capabilities, locking, etc.)
Applications Connectivity: Communication/transaction
protocols, e.g., for moving data
Collective layer
between resources. Also various
authentication protocols.
33 / 56
Cloud computing
Google docs
aa Svc
Software
Platforms
Amazon S3
Computation (VM), storage (block, file) Amazon EC2
Infrastructure
Infrastructure
aa Svc
Hardware
34 / 56
Cloud computing
35 / 56
Is cloud computing cost-effective?
Observation
An important reason for the success of cloud computing is that it allows
organizations to outsource their IT infrastructure: hardware and
software. Essential question: is outsourcing also cheaper?
Approach
► Consider enterprise applications, modeled as a collection of
components, each component Ci requiring Ni servers.
► Application now becomes a directed graph, with a vertex
−→
representing a component, and an arc ( i , j) representing
flowing from Ci to Cj .
data
► Two associated weights per arc:
► Ti,j is the number of transactions per time unit that causes a
data flow from Ci to Cj .
► Si,j is the total amount of data associated with Ti,j .
36 / 56
Is cloud computing cost-effective?
Migration plan
Figure out for each component Ci , how many ni of its Ni servers
should migrate, such that the monetary benefits reduced by additional
costs for Internet communication, are maximal.
37 / 56
Computing benefits
Monetary savings
► Bc : benefits of migrating a compute-intensive component
► Mc : total number of migrated compute-intensive components
► Bs : benefits of migrating a storage-intensive component
►Ms : total number of migrated storage-intensive components
Obviously, total benefits are: Bc · Mc + Bs · Ms
38 / 56
Internet costs
39 / 56
Rate of transactions after migration
Some notations
► Ci,local : set of servers of Ci that continue locally.
► Ci,cloud : set of servers of Ci that are placed in the cloud.
►Assume traffic distribution is the same for local and cloud server
Note that |Ci,cloud | = ni . Let fi = ni /N i , and si a server of Ci .
(1 − f ) · (1 − f ) ·
i j i,j when si ∈ Ci,local and sj ∈ Cj,local
T
(1 − fi ) · fj ·
Ti,j =
∗
when si ∈ Ci,local and sj ∈ Cj,cloud
Tfi i·,j (1 −jf ) · i,j
T
fi · fj · i,j when si ∈ Ci,cloud and sj ∈ Cj,local
T
when si ∈ Ci,cloud and sj ∈ Cj,cloud
40 / 56
Overall Internet costs
Notations
► costlocal,inet : per unit Internet costs to local part
► costcloud ,inet : per unit Internet costs to cloud
Trl∗ocal,inet =
∑ ∑
(Ti∗,j Si∗,j + Tj∗,i Sj∗,i ) + (Tu∗ser ,j Su∗ser ,j + Tj∗,user
C i,local ,C j,local Cj,local
∑ Sj∗,user )
(Ti∗,j Si∗,j + Tj∗,i Sj∗,i ) +
∑ (Tu∗ser ,j Su∗ser ,j + Tj∗,user
Trc∗loud ,inet Ci,cloud ,Cj,cloud Cj,cloud
costs S∗ )
= =cost local,inet (Trjl∗,user ∗
ocal,inet − Trlocal,inet ) + cost cloud,inet Trc loud ,inet
41 / 56
Integrating applications
Situation
Organizations confronted with many networked applications, but
achieving interoperability was painful.
Basic approach
A networked application is one that runs on a server making its
services available to remote clients. Simple integration: clients
combine requests for (different) applications; send that off; collect
responses, and present a coherent result to the user.
Next step
Allow direct application-to-application communication, leading to
Enterprise Application Integration.
42 / 56
Example EAI: (nested) transactions
Transaction
Primitive Description
BEGIN TRANSACTION Mark the start of a transaction
END TRANSACTION Terminate the transaction and try to commit
ABORT TRANSACTION Kill the transaction and restore the old values
READ Read data from a file, a table, or otherwise
WRITE Write data to a file, a table, or otherwise
Issue: all-or-nothing
Nested transaction
43 / 56
TPM: Transaction Processing Monitor
Server
Reply
Transaction Request
Requests
Request
Client TP monitor Server
application
Reply
Reply
Request
Server
Reply
Observation
In many cases, the data involved in a transaction is distributed across
several servers. A TP Monitor is responsible for coordinating the
execution of a transaction.
44 / 56
Middleware and EAI
Client Client
application application
Communication middleware
46 / 56
Distributed pervasive systems
Observation
Emerging next-generation of distributed systems in which nodes are
small, mobile, and often embedded in a larger system, characterized
by the fact that the system naturally blends into the user’s environment.
47 / 56
Distributed pervasive systems
Observation
Emerging next-generation of distributed systems in which nodes are
small, mobile, and often embedded in a larger system, characterized
by the fact that the system naturally blends into the user’s environment.
47 / 56
Distributed pervasive systems
Observation
Emerging next-generation of distributed systems in which nodes are
small, mobile, and often embedded in a larger system, characterized
by the fact that the system naturally blends into the user’s environment.
47 / 56
Distributed pervasive systems
Observation
Emerging next-generation of distributed systems in which nodes are
small, mobile, and often embedded in a larger system, characterized
by the fact that the system naturally blends into the user’s environment.
47 / 56
Introduction: Types of distributed systems Pervasive systems
Ubiquitous systems
Core elements
1. (Distribution) Devices are networked, distributed, and accessible
in a transparent manner
2. (Interaction) Interaction between users and devices is highly
unobtrusive
3. (Context awareness) The system is aware of a user’s
context in
order to optimize interaction
4. (Autonomy) Devices operate autonomously without human
intervention, and are thus highly self-managed
5. (Intelligence) The system as a whole can handle a wide
range of
dynamic actions and interactions
Distinctive features
► A myriad of different mobile devices (smartphones, tablets, GPS
devices, remote controls, active badges.
► Mobile implies that a device’s location is expected to change over
time ⇒ change of local services, reachability, etc. Keyword:
discovery.
► Communication may become more difficult: no stable route, but
also perhaps no guaranteed connectivity ⇒ disruption-tolerant
networking.
49 / 56
Mobility patterns
Issue
What is the relationship between information dissemination and human
mobility? Basic idea: an encounter allows for the exchange of
information (pocket-switched networks).
A successful strategy
► Alice’s world consists of friends and strangers.
► If Alice wants to get a message to Bob: hand it out to all her
friends
► Friend passes message to Bob at first encounter
Observation
This strategy works because (apparently) there are relatively closed
communities of friends.
50 / 56
Community detection
Issue
How to detect your community without having global knowledge?
51 / 56
How mobile are people?
Experimental results
Tracing 100,000 cell-phone users during six months leads to:
1
-2
10
Probability
-4
10
-6
10
Moreover: people tend to return to the same place after 24, 48, or 72
hours ⇒ we’re not that mobile.
52 / 56
Sensor networks
Characteristics
The nodes to which sensors are attached are:
► Many (10s-1000s)
► Simple (small memory/compute/communication capacity)
► Often battery-powered (or even battery-less)
53 / 56
Sensor networks as distributed databases
Two extremes
Sensor network
Operator's site
Sensor data
is sent directly
to operator
Each sensor
can process and Sensor network
store data
Operator's site
Query
Sensors
send only
answers
54 / 56
Duty-cycled networks
Issue
Many sensor networks need to operate on a strict energy budget:
introduce duty cycles
Definition
A node is active during Tactive time units, and then suspended for
Tsuspended units, to become active again. Duty cycle τ:
Tactive
τ =
Tactive + Tsuspended
Typical duty cycles are 10 − 30%, but can also be lower than 1%.
55 / 56
Keeping duty-cycled networks in sync
Issue
If duty cycles are low, sensor nodes may not wake up at the same time
anymore and become permanently disconnected: they are active
during different, nonoverlapping time slots.
Solution
► Each node A adopts a cluster ID CA , being a number.
► Let a node send a join message during its suspended period.
► When A receives a join message from B and CA < CB , it sends a
join message to its neighbors (in cluster CA ) before joining B.
► When CA > CB it sends a join message to B during B’s active
period.
Note
Once a join message reaches a whole cluster, merging two clusters is
very fast. Merging means: re-adjust clocks.
56 / 56