Distributed and Cloud Computing
K. Hwang, G. Fox and J. Dongarra
Chapter 1: Enabling Technologies
and Distributed System Models
Copyright © 2012, Elsevier Inc. All rights reserved. 1 1-1
Cloud Computing
Cloud Computing provides us a means by which
we can access the applications as utilities, over
the Internet. It allows us to create, configure,
and customize applications online.
2
What is Cloud?
The term Cloud refers to a Network or Internet. In other
words, we can say that Cloud is something, which is
present at remote location. Cloud can provide services
over network, i.e., on public networks or on private
networks, i.e., WAN, LAN or VPN.
Applications such as e-mail, web conferencing,
customer relationship management (CRM),all run in
cloud.
What is Cloud Computing?
Cloud Computing refers to manipulating, configuring,
and accessing the applications online. It offers online
data storage, infrastructure and application.
3
4
Data Deluge Enabling New Challenges
(Courtesy of Judy Qiu, Indiana University, 2011)
Copyright © 2012, Elsevier Inc. All rights reserved. 5 1-5
1.1 Scalable Computing over the Internet
The Age of Internet Computing
Scalable Computing Trends and New Paradigms
The Internet of Things and Cyber-Physical Systems
6
The Age of Internet Computing
The Platform Evolution
High-Performance Computing
High-Throughput Computing
Three New Computing Paradigms
Computing Paradigm Distinctions
Distributed System Families
7
From Desktop/HPC/Grids to
Internet Clouds in 30 Years
HPC moving from centralized supercomputers
to geographically distributed desktops, desksides,
clusters, and grids to clouds over last 30 years
R/D efforts on HPC, clusters, Grids, P2P, and virtual
machines has laid the foundation of cloud computing
that has been greatly advocated since 2007
Location of computing infrastructure in areas with
lower costs in hardware, software, datasets,
space, and power requirements – moving from
desktop computing to datacenter-based clouds
8
Interactions among 4 technical challenges :
Data Deluge, Cloud Technology, eScience,
and Multicore/Parallel Computing
(Courtesy of Judy Qiu, Indiana University, 2011)
Copyright © 2012, Elsevier Inc. All rights reserved. 9 1-9
Clouds and Internet of Things
HPC: High-
Performance
Computing
HTC: High-
Throughput
Computing
P2P:
Peer to Peer
MPP:
Massively
Parallel
Source: K. Hwang, G. Fox, and J. Dongarra,
Distributed and Cloud Computing, Processors
Morgan Kaufmann, 2012.
Copyright © 2012, Elsevier Inc. All rights reserved. 10 1 - 10
Three New Computing Paradigms
Radio-frequency identification (RFID)
Radio-frequency identification (RFID) uses electromagnetic fields to
automatically identify and track tags attached to objects. The tags contain
electronically stored information.
Global Positioning System (GPS)
The Global Positioning System (GPS), originally Navstar GPS, is a
space-based radionavigation system owned by the United States
government and operated by the United States Air Force. It is a
global navigation satellite system that provides geolocation and time
information to a GPS receiver anywhere on or near the Earth where there
is an unobstructed line of sight to four or more GPS satellites
Internet of Things (IoT)
The Internet of things (IoT) is the network of physical devices, vehicles,
home appliances, and other items embedded with electronics, software,
sensors, actuators, and network connectivity which enable these objects to
connect and exchange data.
11
Computing Paradigm Distinctions
Centralized Computing
All computer resources are centralized in one physical system.
Parallel Computing
All processors are either tightly coupled with central shard
memory or loosely coupled with distributed memory
Distributed Computing
Field of CS/CE that studies distributed systems. A distributed
system consists of multiple autonomous computers, each with
its own private memory, communicating over a network.
Cloud Computing
An Internet cloud of resources that may be either centralized or
decentralized. The cloud apples to parallel or distributed
computing or both. Clouds may be built from physical or
virtualized resources.
12
Distributed System Families
Networks and networks of clusters have been consolidated into
many national projects designed to establish wide area computing
infrastructures, known as computational grids or data grids.
In the future, both HPC and HTC systems will demand multicore or
many-core processors that can handle large numbers of computing
threads per core. Both HPC and HTC systems emphasize
parallelism and distributed computing. Future HPC and HTC
systems must be able to satisfy this huge demand in computing
power in terms of throughput, efficiency, scalability, and reliability.
Meeting these goals requires to yield the following design
objectives:
Efficiency
Dependability
Adaptation in the programming model
Flexibility in application deployment
13
Scalable Computing Trends and New Paradigms
Degrees of Parallelism
Bit-level parallelism (BLP)
Instruction-level parallelism (ILP)
VLIW (very long instruction word)
Data-level parallelism (DLP)
task-level parallelism (TLP)
job-level parallelism (JLP)
Innovative Applications
The Trend toward Utility Computing
The Hype Cycle of New Technologies
14
Innovative Applications
15
Technology Convergence toward HPC for Science
and HTC for Business: Utility Computing
Copyright © 2012, Elsevier Inc. All rights reserved. 16
2011 Gartner “IT Hype Cycle” for Emerging Technologies
2010
2009 2011
2008
2007
Copyright © 2012, Elsevier Inc. All rights reserved. 17
The Internet of Things and Cyber-Physical Systems
•The Internet of Things
•Cyber-Physical Systems(CPS)
A cyber-physical system (CPS) is the result of interaction between computational
processes and the physical world. A CPS integrates “cyber” (heterogeneous, asynchronous)
with “physical” (concurrent and information-dense) objects. A CPS merges the “3C”
technologies of computation, communication, and control
18
1.2 TECHNOLOGIES FOR NETWORK-BASED SYSTEMS
1.2.1 Multicore CPUs and Multithreading Technologies
•[Link] Advances in CPU Processors
•[Link] Multicore CPU and Many-Core GPU Architectures
•[Link] Multithreading Technology
19
Advances in CPU Processors
33 year Improvement in Processor and Network
Technologies
20
Modern Multi-core CPU Chip
21
Multi-threading Processors
Four-issue
Superscalar processor (e.g. Sun Ultrasparc I)
Implements instruction level parallelism (ILP) within a single
processor.
Executes more than one instruction during a clock cycle by
sending multiple instructions to redundant functional units.
Fine-grain multithreaded processor
Switch threads after each cycle
Interleave instruction execution
If one thread stalls, others are executed
Coarse-grain multithreaded processor
Executes a single thread until it reaches certain situations
Simultaneous multithread processor (SMT)
Instructions from more than one thread can execute in any
given pipeline stage at a time. 22
5 Micro-architectures of CPUs
Each row represents the issue slots for a single execution cycle:
•A filled box indicates that the processor found an instruction to execute in that
issue slot on that cycle;
•An empty box denotes an unused slot.
23
1.2.2 GPU Computing to Exascale and Beyond
•[Link] How GPUs Work
•[Link] GPU Programming Model
•[Link] Power Efficiency of the GPU
24
GPU Programming Model
Architecture of A Many-Core Multiprocessor GPU
interacting
with a CPU Processor
Copyright © 2012, Elsevier Inc. All rights reserved. 25 1 - 25
NVIDIA Fermi GPU
26
GPU Performance
Bottom – CPU - 0.8 Gflops/W/Core (2011)
Middle – GPU - 5 Gflops/W/Core (2011)
Top - EF – Exascale computing (10^18 Flops)
27
1.2.3 Memory, Storage, and Wide-Area Networking
•[Link] Memory Technology
•[Link] Disks and Storage Technology
•[Link] System-Area Interconnects
•[Link] Wide-Area Networking
28
33 year Improvement in Memory and Disk
Technologies
Copyright © 2012, Elsevier Inc. All rights reserved. 29 1 - 29
Interconnection Networks
• SAN (storage area network) - connects servers with disk arrays
• LAN (local area network) – connects clients, hosts, and servers
• NAS (network attached storage) – connects clients with large storage
systems
30
Virtual Machines and Virtualization Middleware
•Virtual Machines
•VM Primitive Operations
•Virtual Infrastructures
31
Initial Hardware Model
All applications access hardware resources (i.e.
memory, i/o) through system calls to operating
system (privileged instructions)
Advantages
Design is decoupled (i.e. OS people can develop
OS separate of Hardware people developing
hardware)
Hardware and software can be upgraded without
notifying the Application programs
Disadvantage
Application compiled on one ISA will not run on
another ISA..
Applications compiled for Mac use different
operating system calls then application designed
for windows.
ISA’s must support old software
Can often be inhibiting in terms of performance
Since software is developed separately from
hardware… Software is not necessarily optimized
for hardware.
32
Virtual Machines
A conventional computer has a single OS image. This offers
a rigid architecture that tightly couples application software
to a specific hardware platform. Some software running well
on one machine may not be executable on another platform
with a different instruction set under a fixed OS.
Virtual machines (VMs) offer novel solutions to
underutilized resources, application inflexibility, software
manageability, and security concerns in existing physical
machines.
33
Virtual Machines
Eliminate real machine constraint
Increases portability and flexibility
Virtual machine adds software to a physical
machine to give it the appearance of a different
platform or multiple platforms.
Benefits
Cross platform compatibility
Increase Security
Enhance Performance
Simplify software migration
34
Virtual Machine Basics
Virtual software placed
between underlying machine
and conventional software
Conventional software sees
different ISA from the one
supported by the hardware
Virtualization process
involves:
Mapping of virtual resources
(registers and memory) to
real hardware resources
Using real machine
instructions to carry out the
actions specified by the
virtual machine instructions
35
36
VM Primitive Operations
37
Virtual Infrastructures
Virtual infrastructure is what connects resources to
distributed applications. It is a dynamic mapping of
system resources to specific applications. The result
is decreased costs and increased efficiency and
responsiveness.
38
Data Center Virtualization for Cloud Computing
Data Center Growth and Cost Breakdown
Low-Cost Design Philosophy
Convergence of Technologies
39
40
Convergence of Technologies
cloud computing is enabled by the convergence of technologies
in four areas:
(1) hardware virtualization and multi-core chips,
(2) utility and grid computing,
(3) SOA, Web 2.0, and WSmashups, and
(4) Atonomic computing and data center automation
41
System Models for Distributed and Cloud
Computing
42
System Models for Distributed and Cloud
Computing
•Clusters of Cooperative Computers
•Grid Computing Infrastructures
•Peer-to-Peer Network Families
•Cloud Computing over the Internet
43
Clusters of Cooperative Computers
•Cluster Architecture
•Single-System Image
•Hardware, Software, and Middleware Support
•Major Cluster Design Issues
44
A Typical Cluster Architecture
Copyright © 2012, Elsevier Inc. All rights reserved. 45 1 - 45
Major Cluster Design Issues
46
Grid Computing Infrastructures
Computational Grids
Grid Families
47
Computational or Data Grid
48
49
Peer-to-Peer Network Families
•P2P Systems
•Overlay Networks
•P2P Application Families
•P2P Computing Challenges
50
Peer-to-Peer (P2P) Network
A distributed system architecture
Each computer in the network can act as a client or server for
other network computers.
No centralized control
Typically many nodes, but unreliable and heterogeneous
Nodes are symmetric in function
Take advantage of distributed, shared resources (bandwidth,
CPU, storage) on peer-nodes
Fault-tolerant, self-organizing
Operate in dynamic environment, frequent join and leave is
the norm
51
Peer-to-Peer (P2P) Network
Overlay network - computer network built on top of another network.
•Nodes in the overlay can be thought of as being connected by virtual or logical links,
each of which corresponds to a path, perhaps through many physical links, in the
underlying network.
•For example, distributed systems such as cloud computing, peer-to-peer networks,
and client-server applications are overlay networks because their nodes run on top of
the Internet.
52
There are two types of overlay networks:
unstructured and structured.
An unstructured overlay network is characterized by a
random graph. There is no fixed route to send messages or
files among the nodes. Often, flooding is applied to send a
query to all nodes in an unstructured overlay, thus resulting
in heavy network traffic and nondeterministic search results.
Structured overlay networks follow certain connectivity
topology and rules for inserting and removing nodes (peer
IDs) from the overlay graph. Routing mechanisms are
developed to take advantage of the structured overlays.
53
P2P Application Families
Copyright © 2012, Elsevier Inc. All rights reserved. 54 1 - 54
Cloud Computing over the Internet
Internet Clouds
The Cloud Landscape
55
The Cloud
Historical roots in today’s
Internet apps
Search, email, social networks
File storage (Live Mesh, Mobile
Me, Flicker, …)
A cloud infrastructure provides a
framework to manage scalable, reliable,
on-demand access to applications
A cloud is the “invisible” backend to
many of our mobile applications
A model of computation and data
storage based on “pay as you go”
access to “unlimited” remote data center
capabilities
Copyright © 2012, Elsevier Inc. All rights reserved. 56
Basic Concept of Internet Clouds
• Cloud computing is the use of computing resources (hardware and
software) that are delivered as a service over a network (typically the
Internet).
• The name comes from the use of a cloud-shaped symbol as an
abstraction for the complex infrastructure it contains in system
diagrams.
• Cloud computing entrusts remote services with a user's data,
software and computation.
Copyright © 2012, Elsevier Inc. All rights reserved. 57 1 - 57
The Cloud Landscape
58
The Next Revolution in IT
Cloud Computing
Classical Cloud Computing
Subscribe
Computing
Use
Buy & Own
Hardware, System
Software,
Every 18 months?
Applications often to
meet peak needs.
Install, Configure, Test,
Verify, Evaluate
Manage
.. $ - pay for what you use,
Finally, use it based on QoS
$$$$....$(High CapEx) (Courtesy of Raj Buyya, 2012)
Copyright © 2012, Elsevier Inc. All rights reserved. 59
Cloud Service Models (1)
Infrastructure as a service (IaaS)
Most basic cloud service model
Cloud providers offer computers, as physical or more often as virtual
machines, and other resources.
Virtual machines are run as guests by a hypervisor, such as Xen or
KVM.
Cloud users deploy their applications by then installing operating
system images on the machines as well as their application software.
Cloud providers typically bill IaaS services on a utility computing basis,
that is, cost will reflect the amount of resources allocated and consumed.
Examples of IaaS include: Amazon CloudFormation (and underlying
services such as Amazon EC2), Rackspace Cloud, Terremark, and
Google Compute Engine.
60
Cloud Service Models (2)
Platform as a service (PaaS)
Cloud providers deliver a computing platform typically
including operating system, programming language
execution environment, database, and web server.
Application developers develop and run their software on a
cloud platform without the cost and complexity of buying and
managing the underlying hardware and software layers.
Examples of PaaS include: Amazon Elastic Beanstalk,
Cloud Foundry, Heroku, [Link], EngineYard, Mendix,
Google App Engine, Microsoft Azure and OrangeScape.
61
Cloud Service Models (3)
Software as a service (SaaS)
Cloud providers install and operate application software in
the cloud and cloud users access the software from cloud
clients.
The pricing model for SaaS applications is typically a
monthly or yearly flat fee per user, so price is scalable and
adjustable if users are added or removed at any point.
Examples of SaaS include: Google Apps, innkeypos,
Quickbooks Online, Limelight Video Platform,
[Link], and Microsoft Office 365.
62
SOFTWARE ENVIRONMENTS FOR
DISTRIBUTED SYSTEMS AND CLOUDS
Service-Oriented Architecture (SOA)
Trends toward Distributed Operating Systems
Parallel and Distributed Programming Models
63
Service-Oriented Architecture (SOA)
Layered Architecture for Web Services and Grids
Web Services and Tools
The Evolution of SOA
Grids versus Clouds
64
Service-oriented architecture (SOA)
SOA is an evolution of distributed computing based on
the request/reply design paradigm for synchronous and
asynchronous applications.
An application's business logic or individual functions
are modularized and presented as services for
consumer/client applications.
Key to these services - their loosely coupled nature;
i.e., the service interface is independent of the implementation.
Application developers or system integrators can build
applications by composing one or more services without
knowing the services' underlying implementations.
For example, a service can be implemented either in .Net or
J2EE, and the application consuming the service can be on a
different platform or language.
65
SOA key characteristics:
SOA services have self-describing interfaces in platform-independent XML
documents.
Web Services Description Language (WSDL) is the standard used to describe
the services.
SOA services communicate with messages formally defined via XML
Schema (also called XSD).
Communication among consumers and providers or services typically happens
in heterogeneous environments, with little or no knowledge about the provider.
Messages between services can be viewed as key business documents
processed in an enterprise.
SOA services are maintained in the enterprise by a registry that acts as a
directory listing.
Applications can look up the services in the registry and invoke the service.
Universal Description, Definition, and Integration (UDDI) is the standard used
for service registry.
Each SOA service has a quality of service (QoS) associated with it.
Some of the key QoS elements are security requirements, such as
authentication and authorization, reliable messaging, and policies regarding
who can invoke services.
66
Layered Architecture for Web Services
67
Web Services and Tools
REST systems(Representational state transfer)
SOAP(Simple Object Access Protocol)
68
The Evolution of SOA
69
Trends toward Distributed Operating
Systems
•Distributed Operating Systems
•Amoeba versus DCE
•MOSIX2 for Linux Clusters
•Transparency in Programming Environments
70
Copyright © 2012, Elsevier Inc. All rights reserved. 71 1 - 71
Transparent Cloud Computing Environment
Separates user data, application, OS, and space – good for
cloud computing.
72
Parallel and Distributed Programming
Models
Message-Passing Interface (MPI)
MapReduce
Hadoop Library
Open Grid Services Architecture (OGSA)
Globus Toolkits and Extensions
73
Parallel and Distributed Programming
Copyright © 2012, Elsevier Inc. All rights reserved. 74 1 - 74
Grid Standards and Middleware :
Copyright © 2012, Elsevier Inc. All rights reserved. 75 1 - 75
PERFORMANCE, SECURITY, AND ENERGY
EFFICIENCY
Performance Metrics and Scalability Analysis
Fault Tolerance and System Availability
Network Threats and Data Integrity
Energy Efficiency in Distributed Computing
76
Performance Metrics and Scalability Analysis
Performance Metrics
Dimensions of Scalability
Scalability versus OS Image Count
Amdahl’s Law
Problem with Fixed Workload
Gustafson’s Law
77
Dimensions of Scalability
Size – increasing performance by increasing
machine size
Software – upgrade to OS, libraries, new apps.
Application – matching problem size with
machine size
Technology – adapting system to new
technologies
78
System Scalability vs. OS Multiplicity
Copyright © 2012, Elsevier Inc. All rights reserved. 79 1 - 79
Amdahl’s Law
Consider the execution of a given program on a uniprocessor
workstation with a total execution time of T minutes. Now, let’s say
the program has been parallelized or partitioned for parallel
execution on a cluster of many processing nodes. Assume that a
fraction α of the code must be executed sequentially, called the
sequential bottleneck. Therefore, (1 − α) of the code can be
compiled for parallel execution by n processors. The total execution
time of the program is calculated by α T + (1− α)T/n, where the first
term is the sequential execution time on a single processor and the
second term is the parallel execution time on n processing nodes. All
system or communication overhead is ignored here. The I/O time or
exception handling time is also not included in the following speedup
analysis.
Amdahl’s Law states that the speedup factor of using the n-
processor system over the use of a single processor is expressed
by:
Speedup =S= T/[αT +(1− α)T/n] =1/[α +(1 −α)/n]
80
Gustafson’s Law
Let W be the workload in a given program. When using
an n-processor system, the user scales the workload to
W′ = αW + (1 − α)nW
Note that only the parallelizable portion of the workload is
scaled n times in the second term. This scaled workload
W′ is essentially the sequential execution time on a single
processor. The parallel execution time of a scaled
workload W′ on n processors is defined by a scaled-
workload speedup as follows:
S′ =W′/ W =[αW + (1− α)nW]W = α+ (1 −α)n
This speedup is known as Gustafson’s law. By fixing the
parallel execution time at level W, the following efficiency
expression is obtained:
E′ =S′/n= α/n+(1− α)
81
Fault Tolerance and System Availability
HA (high availability) is desired in all clusters, grids, P2P
networks, and cloud systems. A system is highly
available if it has a long mean time to failure (MTTF) and
a short mean time to repair (MTTR). System availability
is formally defined as follows:
System Availability =MTTF/(MTTF +MTTR)
82
System Availability vs. Configuration Size :
Copyright © 2012, Elsevier Inc. All rights reserved. 83 1 - 83
Network Threats and Data Integrity
84
Operational Layers of Distributed Computing System
85
Four Reference Books:
1. K. Hwang, G. Fox, and J. Dongarra, Distributed and Cloud
Computing: from Parallel Processing to the Internet of Things
Morgan Kauffmann Publishers, 2011
2. R. Buyya, J. Broberg, and A. Goscinski (eds), Cloud Computing:
Principles and Paradigms, ISBN-13: 978-0470887998, Wiley Press,
USA, February 2011.
3. T. Chou, Introduction to Cloud Computing: Business and
Technology, Lecture Notes at Stanford University and at Tsinghua
University, Active Book Press, 2010.
4. T. Hey, Tansley and Tolle (Editors), The Fourth Paradigm : Data-
Intensive Scientific Discovery, Microsoft Research, 2009.
Copyright © 2012, Elsevier Inc. All rights reserved. 86 1 - 86