CS6065
Cloud Computing and Applications
Cloud Infrastructure
Google Clouds
• Google is a major player in cloud computing. It has:
Pioneered applications of Artificial Intelligence (AI) and
Machine Learning (ML)
Developed TPU (Tensor Processing Units) and populated
some of its instances with TPUs.
• Unlike AWS, Google often publish about new
developments in hardware and software of its cloud
infrastructure.
Papers published by Google Research Division contributes
significantly to advancements in cloud computing.
• Google Cloud infrastructure consists of a large
number of clusters in multiple geographical
locations.
A typical cluster has ~10000 servers,
Workload is a mix of CPU-intensive batch computations and
in-memory databases for latency-sensitive applications.
2
Google Clouds
• Google's effort is concentrated in several areas of:
• Infrastructure-as-a-Service (IaaS)
• Software-as-a-Service (SaaS)
• Platform-as-a-Service (PaaS)
• Services such as Gmail, Google Drive, Google
Calendar, Picasa, and Google Groups are
• Free of charge for individual use.
• Paid for organizations.
• Services run on Google clouds and can be invoked
from a broad spectrum of devices.
• The data for these services is stored in data centers
on the cloud. 3
Google SaaS services
• Gmail
hosts Emails on Google servers and provides a web interface to access
the Email.
• Google docs
Web-based software for building text documents, spreadsheets and
presentations.
• Google Calendar
Browser-based scheduler; supports multiple user calendars, calendar
sharing, event search, display of daily/weekly/monthly views, and so on.
• Google Maps
Web mapping service; offers street maps, a route planner, and an urban
business locator for numerous countries around the world.
• Google Co-op
Allows users to create customized search engines based on a set of
facets/categories.
• Google Drive
An online service for data storage.
• Google Base
Allows users to load structured data from different sources to a central
repository, a very large, self-describing, semi-structured, heterogeneous
4
database.
Google AppEngine (AE)
• An infrastructure for building web and mobile applications
that run on Google servers.
o Initially supported Python, Java was added later.
o The database for code development can be accessed with
GQL (Google Query Language) with a SQL-like syntax.
• AE is an ensemble of computer, storage, search, and
networking services.
• The Compute Engine (CE) supports the creation of VMs
with resources tailored to the application needs. CE
configurations range from:
o Micro instances to the ones with 32 vCPUs or 208 GB of
memory.
o Up to 64 TB of network storage can be attached to a VM.
o Always-encrypted local solid-state drive (SSD) block
storage and automatic scaling are also supported.
o Several operating systems like
Debian, CentOS, CoreOS, SUSE, Ubuntu, Red Hat, FreeBSD, or
Windows 2008 R2 and 2012 R2.
5
Google Container Engine (CntE)
•CntE is a cluster manager and orchestration system
for Docker containers built on the Kubernetes system.
•The Container Registry stores private Docker images.
•CntE schedules and manages containers
automatically according to user specifications.
•JSON config files are used to specify the amount of
CPU/memory, the number of replicas, and other
relevant information.
•The Cloud Container Engine SLA commitment is a
monthly uptime of at least 99.5%.
6
Google Functions
•Cloud Functions (CF) is a lightweight, event-based,
asynchronous system to create single-purpose
functions that respond to cloud events.
•CFs are written in Javascript and executed in a
Node.js runtime environment.
•Cloud Load Balancing supports scalable load
balancing on the Google Cloud Platform.
7
Microsoft PaaS and SaaS services
• Azure is PaaS cloud platform.
• Online Services is SaaS cloud platforms.
• Windows Azure
An operating system with 3 components:
o Compute - provides a computation environment.
o Storage - for scalable storage.
o Fabric Controller - deploys, manages, and monitors
applications.
• SQL Azure
A cloud-based version of the SQL Server.
• Azure AppFabric, formerly .NET Services
A collection of services for cloud applications.
8
Azure
Connect Applications and Data CDN
Compute Storage
Blobs Tables Queues
Fabric Controller
Azure components:
Compute—runs cloud applications;
Storage—uses blobs, tables, and queues to store data;
Fabric Controller— deploys, manages, and monitors applications;
CDN—maintains cache copies of data; and
Connect—allows IP connections between the user systems and 9
applications running on Windows Azure.
Open-source platforms for private clouds
• Eucalyptus
Can be regarded as an open-source counterpart of Amazon's EC2.
• Open-Nebula
A private cloud with users actually logging into the head node to
access cloud functions. The system is centralized and its default
configuration uses the NFS file system.
• Nimbus
A cloud solution for scientific applications based on Globus
software; inherits from Globus:
o The image storage.
o The credentials for user authentication.
o The requirement that a running Nimbus process can ssh into all
compute nodes.
10
Eucalyptus
• Virtual Machines
Run under several VMMs including Xen, KVM, and VMware.
• Node Controller
Runs on server nodes hosting a VM and controls the activities of
the node.
• Cluster Controller
Controls a number of servers.
• Cloud Controller
Provides the cloud access to end-users, developers, and
administrators.
• Storage Controller
Provides persistent virtual hard drives to applications. It is the
correspondent of EBS.
• Storage Service (Walrus)
Provides persistent storage; similar to S3, it allows users to store
objects in buckets.
11
12
Side by Side Comparison
13
Side by Side Comparison
Conclusion
• Eucalyptus is best suited for a large corporation with its
own private cloud
As it ensures a degree of protection from user malice and
mistakes
• OpenNebula is best suited for a testing environment with a
few servers
• Nimbus is more adequate for a scientific community
As scientific community is less interested in the internals of the
system than the broad customization requirements.
14
Cloud storage diversity and vendor lock-in
• Risks when a large organization relies on a single
cloud service provider:
Cloud services may be unavailable for a short or an extended
period of time.
Permanent data loss in case of a catastrophic system failure.
The provider may increase the prices for service.
• Switching to another provider could be very costly
due to the large volume of data to be transferred
from the old to the new provider.
• A solution is to replicate the data to multiple cloud
service providers, similar to data replication in
RAID.
15
Data Replication in Cloud
RAID 5 controller
• RAID-5 uses block-level stripping
with distributed parity
• Disk controller distributes the
sequential blocks of data to the a1 a2 a3 aP
physical disks and computes a b1 b2 bP b3
parity block c1 cP c2 c3
By bit-wise XORing of data blocks dP d1 d2 d3
• Parity block is written on a Disk 1 Disk 2 Disk 3 Disk 4
different disk for each file (a)
• This technique allows data Cloud 1 Cloud 2
a1
recovery in case of a disk loss. b1
c1
a2
b2
d1
E.g. disk 2 is lost dP c1
cP
d1
o we still have all the blocks of the third
file, c1, c2, and c3, and we can recover Client Proxy
the missing blocks for the others as a3
follows: bP
c2
d2
aP
d3
b3
c3 Cloud 3
d3
Cloud 4
16
(b)
Cloud interoperability – The Intercloud
• An Intercloud
A federation of clouds that cooperate to provide
a better user experience.
• Is an Intercloud feasible?
• Not likely at this time:
No standards for either storage or processing.
Clouds are based on different delivery models.
o Set of services supported by these delivery models is large and open; new
services are offered every few months.
CSPs (Cloud Service Providers) believe that they have a
competitive advantage due to the uniqueness of the
added value of their services.
Security is a major concern for cloud users and an
Intercloud could only create new threats.
17
Energy use and ecological impact
• Energy consumption of large-scale DCs and their
costs for energy and for cooling are significant.
• In 2006,
6,000 data centers in the U.S consumed 61x109 KWh of energy,
1.5% of all electricity consumption, at a cost of $4.5 billion.
Energy consumed by DCs was expected to double from 2006 to
2011 and peak instantaneous demand to increase from 7 GW to
12 GW.
• Greenhouse gas emission due to DCs is estimated
to
Increase from 116 x109 tones of CO2 in 2007 to 257 tones in 2020
due to increased consumer demand.
• The effort to reduce energy use is focused on
computing, networking, and storage activities of a
data center.
18
Energy use and ecological impact
• Operating efficiency of a system is captured
by the performance per Watt of power.
• The performance of supercomputers has
increased 3.5 times faster than their
operating efficiency
7,000% versus 2,000% during 1998 – 2007
• A typical Google cluster spends most of its
time within the 10-50% CPU utilization range
There is a mismatch between server workload
profile and server energy efficiency.
19
Energy-proportional systems
• An energy-proportional system consumes
No power when idle
Very little power under a light load
Gradually, more power as the load increases.
• By definition, an ideal energy-proportional
system is always operating at 100%
efficiency.
20
Energy-proportional systems
• Humans are a good approximation of an ideal
energy proportional system
About 70 W at rest
120 W on average on a daily basis
Can go as high as 1,000 – 2,000 W during a exhausting, short
time effort.
• Even when power requirements scale linearly
with the load, the energy efficiency of a
computing system is not a linear function of
the load
Even when idle, a system may use 50% of the
power corresponding to the full load. 21
Power Consumption vs. System Utilization
22
Service Level Agreement (SLA)
• SLA is a contract between the customer and CSP
• It might be
A legal binding, or
An informal statement of services
• Objectives of the agreement:
Identify and define the customer’s needs and constraints including the level
of resources, security, timing, and QoS.
Provide a framework for understanding; a critical aspect of this framework is
a clear definition of classes of service and the costs.
Simplify complex issues; clarify the boundaries between the responsibilities
of clients and CSP in case of failures.
Reduce areas of conflict.
Encourage dialog in the event of disputes.
Eliminate unrealistic expectations.
• Specifies the services that the customer receives, rather
than how the cloud service provider delivers the services.23
Responsibility sharing between user and CSP
User responsibility
SaaS PaaS IaaS
C
Interface Interface Interface L
O
U
D
Application Application Application
U
S
Operating system Operating system Operating system E
R
S
Hypervisor Hypervisor Hypervisor
E
R
V
Computing service Computing service Computing service I
C
E
Storage service Storage service Storage service
P
R
O
Network Network Network
V
I
D
Local infrastructure Local infrastructure Local infrastructure E
R
24
User security concerns
• Potential loss of control/ownership of data.
• Data integration, privacy enforcement, data
encryption.
• Data remanence after de-provisioning.
• Multi tenant data isolation.
• Data location requirements within national borders.
• Hypervisor security.
• Audit data integrity protection.
• Verification of subscriber policies through provider
controls.
• Certification/Accreditation requirements for a given
cloud service.
25